7 月之前 · 81d5a0ea30
--- a/docs/installation/installation.en.md
+++ b/docs/installation/installation.en.md
@@ -25,7 +25,7 @@ If your use case for PaddleX involves <b>custom development</b> (e.g. retraining
 
				 
			
 
				 After installing the PaddleX plugins you need, you can not only perform inference and integration with the supported models but also conduct advanced operations such as model training for custom development.
			
 
				 
			
 
				-The plugins supported by PaddleX are listed below. Please determine the name(s) of the plugin(s) you need based on your development requirements:
			
 
				+The model training related plugins supported by PaddleX are listed below. Please determine the name(s) of the plugin(s) you need based on your development requirements:
			
 
				 
			
 
				 <details><summary>👉 <b>Plugin and Pipeline Correspondence (Click to Expand)</b></summary>
			
 
				 
			
@@ -122,7 +122,7 @@ Next, we provide detailed installation tutorials for your reference. If you are
 
				 ## 2. Detailed Tutorial for Installing PaddleX on Linux
			
 
				 When installing PaddleX on Linux, we <b>strongly recommend using the official PaddleX Docker image</b>. Alternatively, you can use other custom installation methods.
			
 
				 
			
 
				-When using the official Docker image, <b>PaddlePaddle, PaddleX (including the wheel package and all plugins), and the corresponding CUDA environment are already pre-installed</b>. You can simply obtain the Docker image and start the container to begin using it.
			
 
				+When using the official Docker image, <b>PaddlePaddle, PaddleX (including the wheel package and all plugins), and the corresponding CUDA environment are already pre-installed</b>. You can simply obtain the Docker image and start the container to begin using it. <b>Please note that the official Docker image of PaddleX is different from the official Docker image of the PaddlePaddle framework, as the latter does not come with PaddleX pre-installed.</b>
			
 
				 
			
 
				 When using custom installation methods, you need to first install the PaddlePaddle framework, then obtain the PaddleX source code, and finally choose the PaddleX installation mode.
			
 
				 ### 2.1 Get PaddleX based on Docker
			
@@ -233,10 +233,6 @@ All packages are installed.
 
				 
			
 
				 For PaddleX installation on more hardware environments, please refer to the [PaddleX Multi-hardware Usage Guide](../other_devices_support/multi_devices_use_guide.en.md)
			
 
				 
			
 
				-Sure! Here's the English translation:
			
 
				-
			
 
				----
			
 
				-
			
 
				 ### 2.3 Selective Installation of Dependencies
			
 
				 
			
 
				 PaddleX offers a wide range of features, and different features require different dependencies. The features in PaddleX that can be used without installing plugins are categorized as "basic features." The official PaddleX Docker images have all dependencies required for these basic features preinstalled. Similarly, using the installation method introduced earlier—`pip install "...[base]"`—will install all dependencies needed for the basic features.
			
--- a/docs/installation/installation.md
+++ b/docs/installation/installation.md
@@ -27,7 +27,7 @@ pip install "https://paddle-model-ecology.bj.bcebos.com/paddlex/whl/paddlex-3.0.
 
				 
			
 
				 安装您需要的PaddleX插件之后，您不仅同样能够对插件支持的模型进行推理与集成，还可以对其进行模型训练等二次开发更高级的操作。
			
 
				 
			
 
				-PaddleX支持的插件如下，请您根据开发需求，确定所需的一个或多个插件名称：
			
 
				+PaddleX支持的模型训练相关插件如下，请您根据开发需求，确定所需的一个或多个插件名称：
			
 
				 
			
 
				 <details><summary>👉 <b>插件和产线对应关系（点击展开）</b></summary>
			
 
				 
			
@@ -128,7 +128,7 @@ paddlex --install PaddleXXX  # 例如PaddleOCR
 
				 ## 2. Linux安装PaddeX详细教程
			
 
				 使用Linux安装PaddleX时，我们<b>强烈推荐使用PaddleX官方Docker镜像安装</b>，当然也可使用其他自定义方式安装。
			
 
				 
			
 
				-当您使用官方 Docker 镜像安装时，其中<b>已经内置了 PaddlePaddle、PaddleX（包括wheel包和所有插件）</b>，并配置好了相应的CUDA环境，<b>您获取 Docker 镜像并启动容器即可开始使用</b>。
			
 
				+当您使用官方 Docker 镜像安装时，其中<b>已经内置了 PaddlePaddle、PaddleX（包括wheel包和所有插件）</b>，并配置好了相应的CUDA环境，<b>您获取 Docker 镜像并启动容器即可开始使用</b>。<b>请注意，PaddleX 官方 Docker 镜像与飞桨框架官方 Docker 镜像不同，后者并没有预装 PaddleX。</b>
			
 
				 
			
 
				 当您使用自定义方式安装时，需要先安装飞桨 PaddlePaddle 框架，随后获取 PaddleX 源码，最后选择PaddleX的安装模式。
			
 
				 
			
--- a/docs/installation/paddlepaddle_install.en.md
+++ b/docs/installation/paddlepaddle_install.en.md
@@ -7,7 +7,7 @@ comments: true
 
				 When installing PaddlePaddle, you can choose to install it via Docker or pip.
			
 
				 
			
 
				 ## Installing PaddlePaddle via Docker
			
 
				-<b>If you choose to install via Docker</b>, please refer to the following commands to use the official PaddlePaddle Docker image to create a container named `paddlex` and map the current working directory to the `/paddle` directory inside the container:
			
 
				+<b>If you choose to install via Docker</b>, please refer to the following commands to use the official Docker image of the PaddlePaddle framework to create a container named `paddlex` and map the current working directory to the `/paddle` directory inside the container:
			
 
				 
			
 
				 If your Docker version >= 19.03, please use:
			
 
				 
			
@@ -80,10 +80,10 @@ If the installation is successful, the following content will be output:
 
				 
			
 
				 If you want to use the [Paddle Inference TensorRT Subgraph Engine](https://www.paddlepaddle.org.cn/documentation/docs/en/guides/paddle_v3_features/paddle_trt_en.html), after installing Paddle, you need to refer to the [TensorRT Documentation](https://docs.nvidia.com/deeplearning/tensorrt/archives/index.html) to install the corresponding version of TensorRT:
			
 
				 
			
 
				-- For CUDA 11.8, the compatible TensorRT version is 8.x (x>=6). PaddleX has completed compatibility testing for Paddle-TensorRT on TensorRT 8.6.1.6, so it is **strongly recommended to install TensorRT 8.6.1.6**.
			
 
				-- For CUDA 12.6, the compatible TensorRT version is 10.x (x>=5), and it is recommended to install TensorRT 10.5.0.18.
			
 
				+- For PaddlePaddle with CUDA 11.8, the compatible TensorRT version is 8.x (where x >= 6). PaddleX has completed compatibility tests of Paddle-TensorRT on TensorRT 8.6.1.6, so it is **strongly recommended to install TensorRT 8.6.1.6**.
			
 
				+- For PaddlePaddle with CUDA 12.6, the compatible TensorRT version is 10.x (where x >= 5), and it is recommended to install TensorRT 10.5.0.18.
			
 
				 
			
 
				-Below is an example of installing TensorRT-8.6.1.6 using the "Tar File Installation" method in a CUDA 11.8 environment:
			
 
				+Below is an example of installing TensorRT 8.6.1.6 using the "Tar File Installation" method in a CUDA 11.8 environment:
			
 
				 
			
 
				 ```bash
			
 
				 # Download TensorRT tar file
			
--- a/docs/installation/paddlepaddle_install.md
+++ b/docs/installation/paddlepaddle_install.md
@@ -9,7 +9,7 @@ comments: true
 
				 安装飞桨 PaddlePaddle 时，支持通过 Docker 安装和通过 pip 安装。
			
 
				 
			
 
				 ## 基于 Docker 安装飞桨
			
 
				-<b>若您通过 Docker 安装</b>，请参考下述命令，使用飞桨官方 Docker 镜像，创建一个名为 `paddlex` 的容器，并将当前工作目录映射到容器内的 `/paddle` 目录：
			
 
				+<b>若您通过 Docker 安装</b>，请参考下述命令，使用飞桨框架官方 Docker 镜像，创建一个名为 `paddlex` 的容器，并将当前工作目录映射到容器内的 `/paddle` 目录：
			
 
				 
			
 
				 若您使用的 Docker 版本 >= 19.03，请执行：
			
 
				 
			
@@ -81,10 +81,10 @@ python -c "import paddle; print(paddle.__version__)"
 
				 
			
 
				 如果想要使用 [Paddle Inference TensorRT 子图引擎](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/paddle_v3_features/paddle_trt_cn.html)，在安装paddle后需参考 [TensorRT 文档](https://docs.nvidia.com/deeplearning/tensorrt/archives/index.html) 安装相应版本的 TensorRT：
			
 
				 
			
 
				-- 对于 CUDA 11.8，兼容的 TensorRT 版本为 8.x（x>=6）。PaddleX 已在 TensorRT 8.6.1.6 上完成了 Paddle-TensorRT 的兼容性测试，因此**强烈建议安装 TensorRT 8.6.1.6**。
			
 
				-- 对于 CUDA 12.6，兼容的 TensorRT 版本为 10.x（x>=5），建议安装 TensorRT 10.5.0.18。
			
 
				+- 对于 CUDA 11.8 版本的飞桨，兼容的 TensorRT 版本为 8.x（x>=6）。PaddleX 已在 TensorRT 8.6.1.6 上完成了 Paddle-TensorRT 的兼容性测试，因此**强烈建议安装 TensorRT 8.6.1.6**。
			
 
				+- 对于 CUDA 12.6 版本的飞桨，兼容的 TensorRT 版本为 10.x（x>=5），建议安装 TensorRT 10.5.0.18。
			
 
				 
			
 
				-下面是在 CUDA11.8 环境下使用 "Tar File Installation" 方式安装 TensoRT-8.6.1.6 的例子：
			
 
				+下面是在 CUDA 11.8 环境下使用 "Tar File Installation" 方式安装 TensoRT 8.6.1.6 的例子：
			
 
				 
			
 
				 ```bash
			
 
				 # 下载 TensorRT tar 文件
			
--- a/docs/pipeline_deploy/high_performance_inference.en.md
+++ b/docs/pipeline_deploy/high_performance_inference.en.md
@@ -4,32 +4,31 @@ comments: true
 
				 
			
 
				 # PaddleX High-Performance Inference Guide
			
 
				 
			
 
				-In actual production environments, many applications have stringent standards for the performance metrics of deployment strategies (especially response speed) to ensure efficient system operation and smooth user experiences. To this end, PaddleX provides a high-performance inference plugin that significantly improves model inference speed for users without requiring them to focus on complex configurations and low-level details, through automatic configuration and multi-backend inference capabilities.
			
 
				+In real production environments, many applications impose strict performance metrics—especially in response time—on deployment strategies to ensure system efficiency and a smooth user experience. To address this, PaddleX offers a high-performance inference plugin that, through automatic configuration and multi-backend inference capabilities, enables users to significantly accelerate model inference without concerning themselves with complex configurations and low-level details.
			
 
				 
			
 
				 ## Table of Contents
			
 
				 
			
 
				-- [1. Basic Usage](#1.-basic-usage)
			
 
				-  - [1.1 Installing the High-Performance Inference Plugin](#11-installing-the-high-performance-inference-plugin)
			
 
				-  - [1.2 Enabling High-Performance Inference](#12-enabling-high-performance-inference)
			
 
				-- [2. Advanced Usage](#2-advanced-usage)
			
 
				-  - [2.1 High-Performance Inference Modes](#21-high-performance-inference-modes)
			
 
				-  - [2.2 High-Performance Inference Configuration](#22-high-performance-inference-configuration)
			
 
				-  - [2.3 Modifying High-Performance Inference Configuration](#23-modifying-high-performance-inference-configuration)
			
 
				-  - [2.4 Example of Modifying High-Performance Inference Configuration](#24-example-of-modifying-high-performance-inference-configuration)
			
 
				-  - [2.5 Enabling/Disabling High-Performance Inference in Sub-pipelines/Sub-modules](#25-enablingdisabling-high-performance-inference-in-sub-pipelinessub-modules)
			
 
				-  - [2.6 Model Caching Instructions](#26-model-caching-instructions)
			
 
				-  - [2.7 Customizing Model Inference Libraries](#27-customizing-model-inference-libraries)
			
 
				-- [3. Frequently Asked Questions](#3.-frequently-asked-questions)
			
 
				+- [1. Basic Usage](#1.-Basic-Usage)
			
 
				+  - [1.1 Installing the High-Performance Inference Plugin](#1.1-Installing-the-High-Performance-Inference-Plugin)
			
 
				+  - [1.2 Enabling the High-Performance Inference Plugin](#1.2-Enabling-the-High-Performance-Inference-Plugin)
			
 
				+- [2. Advanced Usage](#2-Advanced-Usage)
			
 
				+  - [2.1 Working Modes of High-Performance Inference](#21-Working-Modes-of-High-Performance-Inference)
			
 
				+  - [2.2 High-Performance Inference Configuration](#22-High-Performance-Inference-Configuration)
			
 
				+  - [2.3 Modifying the High-Performance Inference Configuration](#23-Modifying-the-High-Performance-Inference-Configuration)
			
 
				+  - [2.4 Enabling/Disabling the High-Performance Inference Plugin on Sub-pipelines/Submodules](#24-EnablingDisabling-the-High-Performance-Inference-Plugin-on-Sub-pipelinesSubmodules)
			
 
				+  - [2.5 Model Cache Description](#25-Model-Cache-Description)
			
 
				+  - [2.6 Customizing the Model Inference Library](#26-Customizing-the-Model-Inference-Library)
			
 
				+- [3. Frequently Asked Questions](#3-Frequently-Asked-Questions)
			
 
				 
			
 
				 ## 1. Basic Usage
			
 
				 
			
 
				-Before using the high-performance inference plugin, ensure you have completed the installation of PaddleX according to the [PaddleX Local Installation Tutorial](../installation/installation.en.md) and successfully run the quick inference using the PaddleX pipeline command-line instructions or Python script instructions.
			
 
				+Before using the high-performance inference plugin, please ensure that you have completed the PaddleX installation according to the [PaddleX Local Installation Tutorial](../installation/installation.en.md) and have run the quick inference using the PaddleX pipeline command line or the PaddleX pipeline Python script as described in the usage instructions.
			
 
				 
			
 
				-High-performance inference supports processing PaddlePaddle static models( `.pdmodel`, `.json` ) and ONNX format models( `.onnx` )**. For ONNX format models, it is recommended to use the [Paddle2ONNX plugin](./paddle2onnx.en.md) for conversion. If multiple format models exist in the model directory, PaddleX will automatically select them as needed.
			
 
				+High-performance inference supports handling **PaddlePaddle static graph models (`.pdmodel`, `.json`)** and **ONNX format models (`.onnx`)**. For ONNX format models, it is recommended to convert them using the [Paddle2ONNX Plugin](./paddle2onnx.md). If multiple model formats are present in the model directory, PaddleX will automatically choose the appropriate one as needed.
			
 
				 
			
 
				 ### 1.1 Installing the High-Performance Inference Plugin
			
 
				 
			
 
				-The processor architectures, operating systems, device types, and Python versions currently supported by high-performance inference are shown in the table below:
			
 
				+Currently, the supported processor architectures, operating systems, device types, and Python versions for high-performance inference are as follows:
			
 
				 
			
 
				 <table>
			
 
				   <tr>
			
@@ -47,7 +46,7 @@ The processor architectures, operating systems, device types, and Python version
 
				     <td>3.8–3.12</td>
			
 
				   </tr>
			
 
				   <tr>
			
 
				-    <td>GPU (CUDA 11.8 + cuDNN 8.9)</td>
			
 
				+    <td>GPU&nbsp;(CUDA&nbsp;11.8&nbsp;+&nbsp;cuDNN&nbsp;8.9)</td>
			
 
				     <td>3.8–3.12</td>
			
 
				   </tr>
			
 
				   <tr>
			
@@ -55,91 +54,95 @@ The processor architectures, operating systems, device types, and Python version
 
				     <td>3.10</td>
			
 
				   </tr>
			
 
				   <tr>
			
 
				-    <td>aarch64</td>
			
 
				+    <td>AArch64</td>
			
 
				     <td>NPU</td>
			
 
				     <td>3.10</td>
			
 
				   </tr>
			
 
				 </table>
			
 
				 
			
 
				-#### (1) Installing the High-Performance Inference Plugin Based on Docker (Highly Recommended):
			
 
				+#### (1) Installing the High-Performance Inference Plugin in a Docker Container (Highly Recommended):
			
 
				 
			
 
				-Refer to [Get PaddleX based on Docker](../installation/installation.en.md#21-obtaining-paddlex-based-on-docker) to use Docker to start the PaddleX container. After starting the container, execute the following commands according to the device type to install the high-performance inference plugin:
			
 
				+Refer to [Get PaddleX based on Docker](../installation/installation.en.md#21-obtaining-paddlex-based-on-docker) to start a PaddleX container using Docker. After starting the container, execute the following commands according to your device type to install the high-performance inference plugin:
			
 
				 
			
 
				 <table>
			
 
				-    <thead>
			
 
				-        <tr>
			
 
				-            <th>Device Type</th>
			
 
				-            <th>Installation Command</th>
			
 
				-            <th>Description</th>
			
 
				-        </tr>
			
 
				-    </thead>
			
 
				-    <tbody>
			
 
				-        <tr>
			
 
				-            <td>CPU</td>
			
 
				-            <td><code>paddlex --install hpi-cpu</code></td>
			
 
				-            <td>Installs the CPU version of high-performance inference.</td>
			
 
				-        </tr>
			
 
				-        <tr>
			
 
				-            <td>GPU</td>
			
 
				-            <td><code>paddlex --install hpi-gpu</code></td>
			
 
				-            <td>Installs the GPU version of high-performance inference.<br />Includes all features of the CPU version.</td>
			
 
				-        </tr>
			
 
				-        <tr>
			
 
				-            <td>NPU</td>
			
 
				-            <td><code>paddlex --install hpi-npu</code></td>
			
 
				-            <td>Installs the NPU version of high-performance inference.<br />For usage instructions, please refer to the <a href="../practical_tutorials/high_performance_npu_tutorial.en.md">Ascend NPU High-Performance Inference Tutorial</a>.</td>
			
 
				-        </tr>
			
 
				-    </tbody>
			
 
				+  <thead>
			
 
				+      <tr>
			
 
				+          <th>Device Type</th>
			
 
				+          <th>Installation Command</th>
			
 
				+          <th>Description</th>
			
 
				+      </tr>
			
 
				+  </thead>
			
 
				+  <tbody>
			
 
				+      <tr>
			
 
				+          <td>CPU</td>
			
 
				+          <td><code>paddlex --install hpi-cpu</code></td>
			
 
				+          <td>Installs the CPU version of the high-performance inference functionality.</td>
			
 
				+      </tr>
			
 
				+      <tr>
			
 
				+          <td>GPU</td>
			
 
				+          <td><code>paddlex --install hpi-gpu</code></td>
			
 
				+          <td>Installs the GPU version of the high-performance inference functionality.<br />Includes all functionalities of the CPU version.</td>
			
 
				+      </tr>
			
 
				+  </tbody>
			
 
				 </table>
			
 
				 
			
 
				-#### (2) Local Installation of High-Performance Inference Plugin:
			
 
				+In the official PaddleX Docker image, TensorRT is installed by default. The high-performance inference plugin can then accelerate inference using the Paddle Inference TensorRT subgraph engine.
			
 
				+
			
 
				+**Please note that the aforementioned Docker image refers to the official PaddleX image described in [Get PaddleX via Docker](../installation/installation.en.md#21-get-paddlex-based-on-docker), rather than the PaddlePaddle official image described in [PaddlePaddle Local Installation Tutorial](../installation/paddlepaddle_install.en.md#installing-paddlepaddle-via-docker). For the latter, please refer to the local installation instructions for the high-performance inference plugin.**
			
 
				 
			
 
				-##### Installing the High-Performance Inference Plugin for CPU:
			
 
				+#### (2) Installing the High-Performance Inference Plugin Locally (Not Recommended):
			
 
				 
			
 
				-Execute:
			
 
				+##### To install the CPU version of the high-performance inference plugin:
			
 
				+
			
 
				+Run:
			
 
				 
			
 
				 ```bash
			
 
				 paddlex --install hpi-cpu
			
 
				 ```
			
 
				 
			
 
				-##### Installing the High-Performance Inference Plugin for GPU:
			
 
				+##### To install the GPU version of the high-performance inference plugin:
			
 
				 
			
 
				-Refer to the [NVIDIA official website](https://developer.nvidia.com/) to install CUDA and cuDNN locally, then execute:
			
 
				+Before installation, please ensure that CUDA and cuDNN are installed in your environment. The official PaddleX currently only provides a precompiled package for CUDA 11.8 + cuDNN 8.9, so please ensure that the installed versions of CUDA and cuDNN are compatible with the compiled versions. Below are the installation documentation links for CUDA 11.8 and cuDNN 8.9:
			
 
				 
			
 
				-```bash
			
 
				-paddlex --install hpi-gpu
			
 
				-```
			
 
				+- [Install CUDA 11.8](https://developer.nvidia.com/cuda-11-8-0-download-archive)
			
 
				+- [Install cuDNN 8.9](https://docs.nvidia.com/deeplearning/cudnn/archives/cudnn-890/install-guide/index.html)
			
 
				 
			
 
				-The required CUDA and cuDNN versions can be obtained through the following commands:
			
 
				+If you are using the official PaddlePaddle image, the CUDA and cuDNN versions in the image already meet the requirements, so there is no need for a separate installation.
			
 
				+
			
 
				+If PaddlePaddle is installed via pip, the relevant CUDA and cuDNN Python packages will usually be installed automatically. In this case, **you still need to install the non-Python-specific CUDA and cuDNN**. It is also advisable to install the CUDA and cuDNN versions that match the versions of the Python packages in your environment to avoid potential issues arising from coexisting libraries of different versions. You can check the versions of the CUDA and cuDNN related Python packages as follows:
			
 
				 
			
 
				 ```bash
			
 
				-# CUDA version
			
 
				+# For CUDA related Python packages
			
 
				 pip list | grep nvidia-cuda
			
 
				-# cuDNN version
			
 
				+# For cuDNN related Python packages
			
 
				 pip list | grep nvidia-cudnn
			
 
				 ```
			
 
				 
			
 
				-Reference documents for installing CUDA 11.8 and cuDNN 8.9:
			
 
				-- [Install CUDA 11.8](https://developer.nvidia.com/cuda-11-8-0-download-archive)
			
 
				-- [Install cuDNN 8.9](https://docs.nvidia.com/deeplearning/cudnn/archives/cudnn-890/install-guide/index.html)
			
 
				+If you wish to use the Paddle Inference TensorRT subgraph engine, you will need to install TensorRT additionally. Please refer to the related instructions in the [PaddlePaddle Local Installation Tutorial](../installation/paddlepaddle_install.en.md). Note that because the underlying inference library of the high-performance inference plugin also integrates TensorRT, it is recommended to install the same version of TensorRT to avoid version conflicts. Currently, the TensorRT version integrated into the high-performance inference plugin's underlying inference library is 8.6.1.6. If you are using the official PaddlePaddle image, you do not need to worry about version conflicts.
			
 
				+
			
 
				+After confirming that the correct versions of CUDA, cuDNN, and TensorRT (optional) are installed, run:
			
 
				 
			
 
				-**Notes**:
			
 
				+```bash
			
 
				+paddlex --install hpi-gpu
			
 
				+```
			
 
				 
			
 
				-1. **GPUs only support CUDA 11.8 + cuDNN 8.9**, and support for CUDA 12.6 is under development.
			
 
				+##### To install the NPU version of the high-performance inference plugin:
			
 
				 
			
 
				-2. Only one version of the high-performance inference plugin should exist in the same environment.
			
 
				+Please refer to the [Ascend NPU High-Performance Inference Tutorial](../practical_tutorials/high_performance_npu_tutorial.md).
			
 
				 
			
 
				-3. For instructions on high-performance inference using NPU devices, refer to the [Ascend NPU High-Performance Inference Tutorial](../practical_tutorials/high_performance_npu_tutorial.md).
			
 
				+**Note:**
			
 
				 
			
 
				-4. Windows only supports installing and using the high-performance inference plugin via Docker.
			
 
				+1. **Currently, the official PaddleX only provides a precompiled package for CUDA 11.8 + cuDNN 8.9**; support for CUDA 12.6 is in progress.
			
 
				+2. Only one version of the high-performance inference plugin should exist in the same environment.
			
 
				+3. For Windows systems, it is currently recommended to install and use the high-performance inference plugin within a Docker container.
			
 
				 
			
 
				-### 1.2 Enabling High-Performance Inference
			
 
				+### 1.2 Enabling the High-Performance Inference Plugin
			
 
				 
			
 
				-Below are examples of enabling high-performance inference in the general image classification pipeline and image classification module using PaddleX CLI and Python API.
			
 
				+Below are examples of enabling the high-performance inference plugin in both the PaddleX CLI and Python API for the general image classification pipeline and the image classification module.
			
 
				 
			
 
				-For PaddleX CLI, specify `--use_hpip` to enable high-performance inference.
			
 
				+For the PaddleX CLI, specify `--use_hpip` to enable the high-performance inference plugin.
			
 
				 
			
 
				-General Image Classification Pipeline:
			
 
				+**General Image Classification Pipeline:**
			
 
				 
			
 
				 ```bash
			
 
				 paddlex \
			
@@ -149,7 +152,7 @@ paddlex \
 
				     --use_hpip
			
 
				 ```
			
 
				 
			
 
				-Image Classification Module:
			
 
				+**Image Classification Module:**
			
 
				 
			
 
				 ```bash
			
 
				 python main.py \
			
@@ -161,9 +164,9 @@ python main.py \
 
				     -o Predict.use_hpip=True
			
 
				 ```
			
 
				 
			
 
				-For the PaddleX Python API, the method to enable high-performance inference is similar. Taking the General Image Classification Pipeline and Image Classification Module as examples:
			
 
				+For the PaddleX Python API, enabling the high-performance inference plugin is similar. For example:
			
 
				 
			
 
				-General Image Classification Pipeline:
			
 
				+**General Image Classification Pipeline:**
			
 
				 
			
 
				 ```python
			
 
				 from paddlex import create_pipeline
			
@@ -177,7 +180,7 @@ pipeline = create_pipeline(
 
				 output = pipeline.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_image_classification_001.jpg")
			
 
				 ```
			
 
				 
			
 
				-Image Classification Module:
			
 
				+**Image Classification Module:**
			
 
				 
			
 
				 ```python
			
 
				 from paddlex import create_model
			
@@ -191,29 +194,29 @@ model = create_model(
 
				 output = model.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_image_classification_001.jpg")
			
 
				 ```
			
 
				 
			
 
				-The inference results obtained with the high-performance inference plugin enabled are consistent with those without the plugin. For some models, **it may take a longer time to complete the construction of the inference engine when enabling the high-performance inference plugin for the first time**. PaddleX will cache relevant information in the model directory after the first construction of the inference engine and reuse the cached content in subsequent runs to improve initialization speed.
			
 
				+The inference results obtained with the high-performance inference plugin enabled are identical to those without the plugin. For some models, **the first time the high-performance inference plugin is enabled, it may take a longer time to complete the construction of the inference engine**. PaddleX caches the related information in the model directory after the inference engine is built for the first time, and subsequently reuses the cached content to improve the initialization speed.
			
 
				 
			
 
				-**Enabling high-performance inference by default affects the entire pipeline/module**. If you want to control the scope of application with finer granularity, such as enabling the high-performance inference plugin for only a specific sub-pipeline or sub-module within the pipeline, you can set `use_hpip` at different levels of configuration in the pipeline configuration file. Please refer to [2.5 Enabling/Disabling High-Performance Inference in Sub-pipelines/Sub-modules](#25-enablingdisabling-high-performance-inference-in-sub-pipelinessub-modules).
			
 
				+**By default, enabling the high-performance inference plugin applies to the entire pipeline/module.** If you want to control the scope in a more granular way (e.g., enabling the high-performance inference plugin for only a sub-pipeline or a submodule), you can set the `use_hpip` parameter at different configuration levels in the pipeline configuration file. Please refer to [2.4 Enabling/Disabling the High-Performance Inference Plugin on Sub-pipelines/Submodules](#24-EnablingDisabling-the-High-Performance-Inference-Plugin-on-Sub-pipelinesSubmodules) for more details.
			
 
				 
			
 
				 ## 2. Advanced Usage
			
 
				 
			
 
				-This section introduces the advanced usage of high-performance inference, suitable for users who have some understanding of model deployment or wish to manually configure and optimize. Users can customize the use of high-performance inference based on their own needs by referring to the configuration instructions and examples. Next, the advanced usage methods will be introduced in detail.
			
 
				+This section introduces the advanced usage of the high-performance inference plugin, which is suitable for users who have a good understanding of model deployment or wish to manually adjust configurations. Users can customize the use of the high-performance inference plugin according to their requirements by referring to the configuration instructions and examples. The following sections describe advanced usage in detail.
			
 
				 
			
 
				-### 2.1 High-Performance Inference Modes
			
 
				+### 2.1 Working Modes of High-Performance Inference
			
 
				 
			
 
				-High-performance inference is divided into two modes:
			
 
				+The high-performance inference plugin supports two working modes. The operating mode can be switched by modifying the high-performance inference configuration.
			
 
				 
			
 
				 #### (1) Safe Auto-Configuration Mode
			
 
				 
			
 
				-The safe auto-configuration mode has a protection mechanism and **automatically selects the configuration with better performance for the current environment by default**. In this mode, users can override the default configuration, but the provided configuration will be checked, and PaddleX will reject unavailable configurations based on prior knowledge. This is the default mode.
			
 
				+In safe auto-configuration mode, a protective mechanism is enabled. By default, **the configuration with the best performance for the current environment is automatically selected**. In this mode, while the user can override the default configuration, the provided configuration will be subject to checks, and PaddleX will reject configurations that are not available based on prior knowledge. This is the default operating mode.
			
 
				 
			
 
				 #### (2) Unrestricted Manual Configuration Mode
			
 
				 
			
 
				-The unrestricted manual configuration mode provides complete configuration freedom, allowing **free selection of the inference backend and modification of backend configurations**, but cannot guarantee successful inference. This mode is suitable for experienced users with specific needs for the inference backend and its configurations and is recommended for use after familiarizing with high-performance inference.
			
 
				+In unrestricted manual configuration mode, full freedom is provided to configure—users can **choose the inference backend freely and modify its configuration, etc.**—but there is no guarantee that inference will always succeed. This mode is recommended for experienced users who have clear requirements for the inference backend and its configuration; it is advised to use this mode only when familiar with high-performance inference.
			
 
				 
			
 
				 ### 2.2 High-Performance Inference Configuration
			
 
				 
			
 
				-Common high-performance inference configurations include the following fields:
			
 
				+Common configuration fields for high-performance inference include:
			
 
				 
			
 
				 <table>
			
 
				 <thead>
			
@@ -227,32 +230,32 @@ Common high-performance inference configurations include the following fields:
 
				 <tbody>
			
 
				 <tr>
			
 
				 <td><code>auto_config</code></td>
			
 
				-<td>Whether to enable the safe auto-configuration mode.<br /><code>True</code> to enable, <code>False</code> to enable the unrestricted manual configuration mode.</td>
			
 
				+<td>Whether to enable the safe auto-configuration mode.<br /><code>True</code> enables safe auto-configuration mode, <code>False</code> enables the unrestricted manual configuration mode.</td>
			
 
				 <td><code>bool</code></td>
			
 
				 <td><code>True</code></td>
			
 
				 </tr>
			
 
				 <tr>
			
 
				   <td><code>backend</code></td>
			
 
				-  <td>Specifies the inference backend to use. Cannot be <code>None</code> in unrestricted manual configuration mode.</td>
			
 
				+  <td>Specifies the inference backend to use. In unrestricted manual configuration mode, it cannot be <code>None</code>.</td>
			
 
				   <td><code>str | None</code></td>
			
 
				   <td><code>None</code></td>
			
 
				 </tr>
			
 
				 <tr>
			
 
				   <td><code>backend_config</code></td>
			
 
				-  <td>The configuration of the inference backend, which can override the default configuration items of the backend if it is not <code>None</code>.</td>
			
 
				+  <td>The configuration for the inference backend. If not <code>None</code>, it can override the default backend configuration options.</td>
			
 
				   <td><code>dict | None</code></td>
			
 
				   <td><code>None</code></td>
			
 
				 </tr>
			
 
				 <tr>
			
 
				   <td><code>auto_paddle2onnx</code></td>
			
 
				-  <td>Whether to enable the <a href="./paddle2onnx.en.md">Paddle2ONNX plugin</a> to automatically convert Paddle models to ONNX models.</td>
			
 
				+  <td>Whether to enable the <a href="./paddle2onnx.en.md">Paddle2ONNX plugin</a> to automatically convert a Paddle model to an ONNX model.</td>
			
 
				   <td><code>bool</code></td>
			
 
				   <td><code>True</code></td>
			
 
				 </tr>
			
 
				 </tbody>
			
 
				 </table>
			
 
				 
			
 
				-The available options for `backend` are shown in the following table:
			
 
				+The optional values for `backend` are as follows:
			
 
				 
			
 
				 <table>
			
 
				   <tr>
			
@@ -262,12 +265,12 @@ The available options for `backend` are shown in the following table:
 
				   </tr>
			
 
				   <tr>
			
 
				     <td><code>paddle</code></td>
			
 
				-    <td>Paddle Inference engine, supporting the Paddle Inference TensorRT subgraph engine to improve GPU inference performance of models.</td>
			
 
				+    <td>Paddle Inference engine; supports enhancing GPU inference performance using the Paddle Inference TensorRT subgraph engine.</td>
			
 
				     <td>CPU, GPU</td>
			
 
				   </tr>
			
 
				   <tr>
			
 
				     <td><code>openvino</code></td>
			
 
				-    <td><a href="https://github.com/openvinotoolkit/openvino">OpenVINO</a>, a deep learning inference tool provided by Intel, optimized for model inference performance on various Intel hardware.</td>
			
 
				+    <td><a href="https://github.com/openvinotoolkit/openvino">OpenVINO</a>, a deep learning inference tool provided by Intel, optimized for inference performance on various Intel hardware.</td>
			
 
				     <td>CPU</td>
			
 
				   </tr>
			
 
				   <tr>
			
@@ -277,126 +280,116 @@ The available options for `backend` are shown in the following table:
 
				   </tr>
			
 
				   <tr>
			
 
				     <td><code>tensorrt</code></td>
			
 
				-    <td><a href="https://developer.nvidia.com/tensorrt">TensorRT</a>, a high-performance deep learning inference library provided by NVIDIA, optimized for NVIDIA GPUs to improve speed.</td>
			
 
				+    <td><a href="https://developer.nvidia.com/tensorrt">TensorRT</a>, a high-performance deep learning inference library provided by NVIDIA, optimized for NVIDIA GPUs to enhance speed.</td>
			
 
				     <td>GPU</td>
			
 
				   </tr>
			
 
				   <tr>
			
 
				     <td><code>om</code></td>
			
 
				-    <td>a inference engine of offline model format customized for Huawei Ascend NPU, deeply optimized for hardware to reduce operator computation time and scheduling time, effectively improving inference performance.</td>
			
 
				+    <td>The inference engine corresponding to the offline model format customized for Huawei Ascend NPU, deeply optimized for hardware to reduce operator computation and scheduling time, effectively enhancing inference performance.</td>
			
 
				     <td>NPU</td>
			
 
				   </tr>
			
 
				 </table>
			
 
				 
			
 
				-The available values for `backend_config` vary depending on the backend, as shown in the following table:
			
 
				+The available options for `backend_config` vary for different backends, as shown in the following table:
			
 
				 
			
 
				 <table>
			
 
				   <tr>
			
 
				     <th>Backend</th>
			
 
				-    <th>Available Values</th>
			
 
				+    <th>Options</th>
			
 
				   </tr>
			
 
				   <tr>
			
 
				     <td><code>paddle</code></td>
			
 
				-    <td>Refer to <a href="../module_usage/instructions/model_python_API.en.md">PaddleX Single Model Python Usage Instructions: 4. Inference Backend Configuration</a>.</td>
			
 
				+    <td>Refer to <a href="../module_usage/instructions/model_python_API.en.md">PaddleX Single-Model Python Script Usage: 4. Inference Backend Settings</a>.</td>
			
 
				   </tr>
			
 
				   <tr>
			
 
				     <td><code>openvino</code></td>
			
 
				-    <td><code>cpu_num_threads</code>: The number of logical processors used for CPU inference. Default is <code>8</code>.</td>
			
 
				+    <td><code>cpu_num_threads</code>: The number of logical processors used for CPU inference. The default is <code>8</code>.</td>
			
 
				   </tr>
			
 
				   <tr>
			
 
				     <td><code>onnxruntime</code></td>
			
 
				-    <td><code>cpu_num_threads</code>: The number of parallel computing threads within operators for CPU inference. Default is <code>8</code>.</td>
			
 
				+    <td><code>cpu_num_threads</code>: The number of parallel computation threads within the operator during CPU inference. The default is <code>8</code>.</td>
			
 
				   </tr>
			
 
				   <tr>
			
 
				     <td><code>tensorrt</code></td>
			
 
				     <td>
			
 
				-      <code>precision</code>: The precision used, <code>fp16</code> or <code>fp32</code>. Default is <code>fp32</code>.
			
 
				+      <code>precision</code>: The precision used, either <code>fp16</code> or <code>fp32</code>. The default is <code>fp32</code>.
			
 
				       <br />
			
 
				-      <code>dynamic_shapes</code>: Dynamic shapes. Dynamic shapes include minimum shape, optimal shape, and maximum shape, which represent TensorRT’s ability to defer specifying some or all tensor dimensions until runtime. The format is:<code>{input tensor name}: [{minimum shape}, [{optimal shape}], [{maximum shape}]]</code>. For more information, please refer to the  <a href="https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#work_dynamic_shapes">TensorRT official documentation.</a>。
			
 
				+      <code>dynamic_shapes</code>: Dynamic shapes. Dynamic shapes include the minimum shape, optimal shape, and maximum shape, which represent TensorRT’s ability to delay specifying some or all tensor dimensions until runtime. The format is: <code>{input tensor name}: [ [minimum shape], [optimal shape], [maximum shape] ]</code>. For more details, please refer to the <a href="https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#work_dynamic_shapes">TensorRT official documentation</a>.
			
 
				+    </td>
			
 
				+  </tr>
			
 
				   <tr>
			
 
				     <td><code>om</code></td>
			
 
				-    <td>None</td>
			
 
				+    <td>None at the moment</td>
			
 
				   </tr>
			
 
				 </table>
			
 
				 
			
 
				-### 2.3 How to Modify High-Performance Inference Configuration
			
 
				-
			
 
				-Due to the diversity of actual deployment environments and requirements, the default configuration may not meet all needs. In such cases, manual adjustments to the high-performance inference configuration may be necessary. Here are two common scenarios:
			
 
				-
			
 
				-- Needing to change the inference backend.
			
 
				-  - For example, in an OCR pipeline, specifying the `text_detection` module to use the `onnxruntime` backend and the `text_recognition` module to use the `tensorrt` backend.
			
 
				+### 2.3 Modifying the High-Performance Inference Configuration
			
 
				 
			
 
				-- Needing to modify the dynamic shape configuration for TensorRT:
			
 
				-  - When the default dynamic shape configuration cannot meet requirements (e.g., the model may require input shapes outside the specified range), dynamic shapes need to be specified for each input tensor. After modification, the model's `.cache` directory should be cleaned up.
			
 
				-
			
 
				-In these scenarios, users can modify the configuration by altering the `hpi_config` field in the **pipeline/module configuration file**, **CLI** parameters, or **Python API** parameters. **Parameters passed through CLI or Python API will override settings in the pipeline/module configuration file**.
			
 
				-
			
 
				-### 2.4 Examples of Modifying High-Performance Inference Configuration
			
 
				+Due to the diversity of actual deployment environments and requirements, the default configuration might not meet all needs. In such cases, manual adjustment of the high-performance inference configuration may be necessary. Users can modify the configuration by editing the **pipeline/module configuration file** or by passing the `hpi_config` field in the parameters via **CLI** or **Python API**. **Parameters passed via CLI or Python API will override the settings in the pipeline/module configuration file.** The following examples illustrate how to modify the configuration.
			
 
				 
			
 
				 #### (1) Changing the Inference Backend
			
 
				 
			
 
				-##### Using the `onnxruntime` backend for all models in a general OCR pipeline:
			
 
				+  ##### For the general OCR pipeline, use the `onnxruntime` backend for all models:
			
 
				 
			
 
				-<details><summary>👉 1. Modifying the pipeline configuration file (click to expand)</summary>
			
 
				+  <details><summary>👉 1. Modify via Pipeline Configuration File (click to expand)</summary>
			
 
				 
			
 
				-```yaml
			
 
				-pipeline_name: OCR
			
 
				+  ```yaml
			
 
				+  pipeline_name: OCR
			
 
				 
			
 
				-use_hpip: True
			
 
				-hpi_config:
			
 
				+  hpi_config:
			
 
				     backend: onnxruntime
			
 
				 
			
 
				-...
			
 
				-```
			
 
				+  ...
			
 
				+  ```
			
 
				 
			
 
				-</details>
			
 
				-<details><summary>👉 2. CLI parameter passing method (click to expand)</summary>
			
 
				+  </details>
			
 
				+  <details><summary>👉 2. CLI Parameter Method (click to expand)</summary>
			
 
				 
			
 
				-```bash
			
 
				-paddlex \
			
 
				+  ```bash
			
 
				+  paddlex \
			
 
				       --pipeline image_classification \
			
 
				       --input https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_image_classification_001.jpg \
			
 
				       --device gpu:0 \
			
 
				       --use_hpip \
			
 
				       --hpi_config '{"backend": "onnxruntime"}'
			
 
				-```
			
 
				+  ```
			
 
				 
			
 
				-</details>
			
 
				-<details><summary>👉 3. Python API parameter passing method (click to expand)</summary>
			
 
				+  </details>
			
 
				+  <details><summary>👉 3. Python API Parameter Method (click to expand)</summary>
			
 
				 
			
 
				-```python
			
 
				-from paddlex import create_pipeline
			
 
				+  ```python
			
 
				+  from paddlex import create_pipeline
			
 
				 
			
 
				-pipeline = create_pipeline(
			
 
				+  pipeline = create_pipeline(
			
 
				       pipeline="OCR",
			
 
				       device="gpu",
			
 
				       use_hpip=True,
			
 
				       hpi_config={"backend": "onnxruntime"}
			
 
				-)
			
 
				-```
			
 
				+  )
			
 
				+  ```
			
 
				 
			
 
				-</details>
			
 
				+  </details>
			
 
				 
			
 
				-##### Using the `onnxruntime` backend for the image classification module:
			
 
				+  ##### For the image classification module, use the `onnxruntime` backend:
			
 
				 
			
 
				-<details><summary>👉 1. Modifying the module configuration file (click to expand)</summary>
			
 
				+  <details><summary>👉 1. Modify via Pipeline Configuration File (click to expand)</summary>
			
 
				 
			
 
				-```yaml
			
 
				-# paddlex/configs/modules/image_classification/ResNet18.yaml
			
 
				-...
			
 
				-Predict:
			
 
				+  ```yaml
			
 
				+  # paddlex/configs/modules/image_classification/ResNet18.yaml
			
 
				+  ...
			
 
				+  Predict:
			
 
				     ...
			
 
				-    use_hpip: True
			
 
				     hpi_config:
			
 
				         backend: onnxruntime
			
 
				     ...
			
 
				-...
			
 
				-```
			
 
				+  ...
			
 
				+  ```
			
 
				 
			
 
				-</details>
			
 
				-<details><summary>👉 2. CLI parameter passing method (click to expand)</summary>
			
 
				+  </details>
			
 
				+  <details><summary>👉 2. CLI Parameter Method (click to expand)</summary>
			
 
				 
			
 
				-```bash
			
 
				-python main.py \
			
 
				+  ```bash
			
 
				+  python main.py \
			
 
				       -c paddlex/configs/modules/image_classification/ResNet18.yaml \
			
 
				       -o Global.mode=predict \
			
 
				       -o Predict.model_dir=None \
			
@@ -404,34 +397,34 @@ python main.py \
 
				       -o Global.device=gpu:0 \
			
 
				       -o Predict.use_hpip=True \
			
 
				       -o Predict.hpi_config='{"backend": "onnxruntime"}'
			
 
				-```
			
 
				+  ```
			
 
				 
			
 
				-</details>
			
 
				-<details><summary>👉 3. Python API parameter passing method (click to expand)</summary>
			
 
				+  </details>
			
 
				+  <details><summary>👉 3. Python API Parameter Method (click to expand)</summary>
			
 
				 
			
 
				-```python
			
 
				-from paddlex import create_model
			
 
				+  ```python
			
 
				+  from paddlex import create_model
			
 
				 
			
 
				-model = create_model(
			
 
				+  model = create_model(
			
 
				       model_name="ResNet18",
			
 
				       device="gpu",
			
 
				       use_hpip=True,
			
 
				       hpi_config={"backend": "onnxruntime"}
			
 
				-)
			
 
				-```
			
 
				+  )
			
 
				+  ```
			
 
				 
			
 
				-</details>
			
 
				+  </details>
			
 
				 
			
 
				-##### Using the `onnxruntime` backend for the `text_detection` module and the `tensorrt` backend for the `text_recognition` module in a general OCR pipeline:
			
 
				+  ##### For the general OCR pipeline, use the `onnxruntime` backend for the `text_detection` module and the `tensorrt` backend for the `text_recognition` module:
			
 
				 
			
 
				-<details><summary>👉 1. Modifying the pipeline configuration file (click to expand)</summary>
			
 
				+  <details><summary>👉 1. Modify via Pipeline Configuration File (click to expand)</summary>
			
 
				 
			
 
				-```yaml
			
 
				-pipeline_name: OCR
			
 
				+  ```yaml
			
 
				+  pipeline_name: OCR
			
 
				 
			
 
				-...
			
 
				+  ...
			
 
				 
			
 
				-SubModules:
			
 
				+  SubModules:
			
 
				     TextDetection:
			
 
				       module_name: text_detection
			
 
				       model_name: PP-OCRv4_mobile_det
			
@@ -441,11 +434,8 @@ SubModules:
 
				       thresh: 0.3
			
 
				       box_thresh: 0.6
			
 
				       unclip_ratio: 2.0
			
 
				-      # Enable high-performance inference for the current submodule
			
 
				-      use_hpip: True
			
 
				-      # High-performance inference configuration for the current submodule
			
 
				       hpi_config:
			
 
				-          backend: onnxruntime
			
 
				+        backend: onnxruntime
			
 
				     TextLineOrientation:
			
 
				       module_name: textline_orientation
			
 
				       model_name: PP-LCNet_x0_25_textline_ori
			
@@ -457,22 +447,19 @@ SubModules:
 
				       model_dir: null
			
 
				       batch_size: 6
			
 
				       score_thresh: 0.0
			
 
				-      # Enable high-performance inference for the current submodule
			
 
				-      use_hpip: True
			
 
				-      # High-performance inference configuration for the current submodule
			
 
				       hpi_config:
			
 
				-          backend: tensorrt
			
 
				-```
			
 
				+        backend: tensorrt
			
 
				+  ```
			
 
				 
			
 
				-</details>
			
 
				+  </details>
			
 
				 
			
 
				-#### (2) Modify TensorRT's Dynamic Shape Configuration
			
 
				+#### (2) Modifying TensorRT Dynamic Shape Configuration
			
 
				 
			
 
				-##### Modifying dynamic shape configuration for general image classification pipeline:
			
 
				+  ##### For the general image classification pipeline, modify dynamic shape configuration:
			
 
				 
			
 
				-<details><summary>👉 Click to Expand</summary>
			
 
				+  <details><summary>👉 Click to expand</summary>
			
 
				 
			
 
				-```yaml
			
 
				+  ```yaml
			
 
				     ...
			
 
				     SubModules:
			
 
				       ImageClassification:
			
@@ -480,7 +467,6 @@ SubModules:
 
				         hpi_config:
			
 
				           backend: tensorrt
			
 
				           backend_config:
			
 
				-            precision: fp32
			
 
				             dynamic_shapes:
			
 
				               x:
			
 
				                 - [1, 3, 300, 300]
			
@@ -488,48 +474,46 @@ SubModules:
 
				                 - [32, 3, 1200, 1200]
			
 
				               ...
			
 
				     ...
			
 
				-```
			
 
				+  ```
			
 
				 
			
 
				-</details>
			
 
				+  </details>
			
 
				 
			
 
				-##### Modifying dynamic shape configuration for image classification module:
			
 
				+  ##### For the image classification module, modify dynamic shape configuration:
			
 
				 
			
 
				-<details><summary>👉 Click to Expand</summary>
			
 
				+  <details><summary>👉 Click to expand</summary>
			
 
				 
			
 
				-```yaml
			
 
				-...
			
 
				-Predict:
			
 
				+  ```yaml
			
 
				+  ...
			
 
				+  Predict:
			
 
				     ...
			
 
				-    use_hpip: True
			
 
				     hpi_config:
			
 
				         backend: tensorrt
			
 
				         backend_config:
			
 
				-          precision: fp32
			
 
				           dynamic_shapes:
			
 
				             x:
			
 
				               - [1, 3, 300, 300]
			
 
				               - [4, 3, 300, 300]
			
 
				               - [32, 3, 1200, 1200]
			
 
				     ...
			
 
				-...
			
 
				-```
			
 
				+  ...
			
 
				+  ```
			
 
				 
			
 
				-</details>
			
 
				+  </details>
			
 
				 
			
 
				-### 2.5 Enabling/Disabling High-Performance Inference in Sub-pipelines/Sub-modules
			
 
				+### 2.4 Enabling/Disabling the High-Performance Inference Plugin on Sub-pipelines/Submodules
			
 
				 
			
 
				-High-performance inference support allows **only specific sub-pipelines/sub-modules within a pipeline to use high-performance inference** by utilizing `use_hpip` at the sub-pipeline/sub-module level. Examples are as follows:
			
 
				+High-performance inference supports enabling the high-performance inference plugin for only specific sub-pipelines/submodules by configuring `use_hpip` at the sub-pipeline or submodule level. For example:
			
 
				 
			
 
				-##### Enabling High-Performance Inference for the `text_detection` module in general OCR pipeline, while disabling it for the `text_recognition` module:
			
 
				+##### In the general OCR pipeline, enable high-performance inference for the `text_detection` module, but not for the `text_recognition` module:
			
 
				 
			
 
				-<details><summary>👉 Click to Expand</summary>
			
 
				+  <details><summary>👉 Click to expand</summary>
			
 
				 
			
 
				-```yaml
			
 
				-pipeline_name: OCR
			
 
				+  ```yaml
			
 
				+  pipeline_name: OCR
			
 
				 
			
 
				-...
			
 
				+  ...
			
 
				 
			
 
				-SubModules:
			
 
				+  SubModules:
			
 
				     TextDetection:
			
 
				       module_name: text_detection
			
 
				       model_name: PP-OCRv4_mobile_det
			
@@ -539,40 +523,43 @@ SubModules:
 
				       thresh: 0.3
			
 
				       box_thresh: 0.6
			
 
				       unclip_ratio: 2.0
			
 
				-      use_hpip: True # Enable high-performance inference for the current sub-module
			
 
				+      use_hpip: True # This submodule uses high-performance inference
			
 
				     TextLineOrientation:
			
 
				       module_name: textline_orientation
			
 
				       model_name: PP-LCNet_x0_25_textline_ori
			
 
				       model_dir: null
			
 
				       batch_size: 6
			
 
				+      # This submodule does not have a specific configuration; it defaults to the global configuration
			
 
				+      # (if neither the configuration file nor CLI/API parameters set it, high-performance inference will not be used)
			
 
				     TextRecognition:
			
 
				       module_name: text_recognition
			
 
				       model_name: PP-OCRv4_mobile_rec
			
 
				       model_dir: null
			
 
				       batch_size: 6
			
 
				       score_thresh: 0.0
			
 
				-      use_hpip: False # Disable high-performance inference for the current sub-module
			
 
				-```
			
 
				+      use_hpip: False # This submodule does not use high-performance inference
			
 
				+  ```
			
 
				 
			
 
				-</details>
			
 
				+  </details>
			
 
				 
			
 
				-**Notes**:
			
 
				+**Note:**
			
 
				 
			
 
				-1. When setting `use_hpip` in a sub-pipeline or sub-module, the deepest-level configuration takes precedence.
			
 
				+1. When setting `use_hpip` in sub-pipelines or submodules, the configuration at the deepest level will take precedence.
			
 
				+2. **When enabling or disabling the high-performance inference plugin by modifying the pipeline configuration file, it is not recommended to also configure it using the CLI or Python API.** Setting `use_hpip` through the CLI or Python API is equivalent to modifying the top-level `use_hpip` in the configuration file.
			
 
				 
			
 
				-2. **It is strongly recommended to enable high-performance inference by modifying the pipeline configuration file**, rather than using CLI or Python API settings. Enabling `use_hpip` through CLI or Python API is equivalent to setting `use_hpip` at the top level of the configuration file.
			
 
				+### 2.5 Model Cache Description
			
 
				 
			
 
				-### 2.6 Model Cache Description
			
 
				+The model cache is stored in the `.cache` directory under the model directory, including files such as `shape_range_info.pbtxt` and those starting with `trt_serialized` generated when using the `tensorrt` or `paddle` backends.
			
 
				 
			
 
				-The model cache will be stored in the `.cache` directory under the model directory, including files such as `shape_range_info.pbtxt` and those prefixed with `trt_serialized` generated when using the `tensorrt` or `paddle` backend.
			
 
				+**After modifying TensorRT-related configurations, it is recommended to clear the cache to avoid the new configuration being overridden by the cache.**
			
 
				 
			
 
				 When the `auto_paddle2onnx` option is enabled, an `inference.onnx` file may be automatically generated in the model directory.
			
 
				 
			
 
				-### 2.7 Custom Model Inference Library
			
 
				+### 2.6 Customizing the Model Inference Library
			
 
				 
			
 
				-`ultra-infer` is the underlying model inference library for high-performance inference, located in the `PaddleX/libs/ultra-infer` directory. The compilation script is located at `PaddleX/libs/ultra-infer/scripts/linux/set_up_docker_and_build_py.sh`. The default compilation builds the GPU version and includes OpenVINO, TensorRT, and ONNX Runtime as inference backends for `ultra-infer`.
			
 
				+`ultra-infer` is the model inference library that the high-performance inference plugin depends on. It is maintained as a sub-project under the `PaddleX/libs/ultra-infer` directory. PaddleX provides a build script for `ultra-infer`, located at `PaddleX/libs/ultra-infer/scripts/linux/set_up_docker_and_build_py.sh`. The build script, by default, builds the GPU version of `ultra-infer` and integrates three inference backends: OpenVINO, TensorRT, and ONNX Runtime.
			
 
				 
			
 
				-When compiling customized versions, you can modify the following options as needed:
			
 
				+If you need to customize the build of `ultra-infer`, you can modify the following options in the build script according to your requirements:
			
 
				 
			
 
				 <table>
			
 
				     <thead>
			
@@ -584,69 +571,66 @@ When compiling customized versions, you can modify the following options as need
 
				     <tbody>
			
 
				         <tr>
			
 
				             <td>http_proxy</td>
			
 
				-            <td>Use a specific HTTP proxy when downloading third-party libraries, default is empty</td>
			
 
				+            <td>The HTTP proxy used when downloading third-party libraries; default is empty.</td>
			
 
				         </tr>
			
 
				         <tr>
			
 
				             <td>PYTHON_VERSION</td>
			
 
				-            <td>Python version, default is <code>3.10.0</code></td>
			
 
				+            <td>Python version, default is <code>3.10.0</code>.</td>
			
 
				         </tr>
			
 
				         <tr>
			
 
				             <td>WITH_GPU</td>
			
 
				-            <td>Whether to compile support for Nvidia-GPU, default is <code>ON</code></td>
			
 
				+            <td>Whether to enable GPU support, default is <code>ON</code>.</td>
			
 
				         </tr>
			
 
				         <tr>
			
 
				             <td>ENABLE_ORT_BACKEND</td>
			
 
				-            <td>Whether to compile and integrate the ONNX Runtime backend, default is <code>ON</code></td>
			
 
				+            <td>Whether to integrate the ONNX Runtime backend, default is <code>ON</code>.</td>
			
 
				         </tr>
			
 
				         <tr>
			
 
				             <td>ENABLE_TRT_BACKEND</td>
			
 
				-            <td>Whether to compile and integrate the TensorRT backend (GPU only), default is <code>ON</code></td>
			
 
				+            <td>Whether to integrate the TensorRT backend (GPU-only), default is <code>ON</code>.</td>
			
 
				         </tr>
			
 
				         <tr>
			
 
				             <td>ENABLE_OPENVINO_BACKEND</td>
			
 
				-            <td>Whether to compile and integrate the OpenVINO backend (CPU only), default is <code>ON</code></td>
			
 
				+            <td>Whether to integrate the OpenVINO backend (CPU-only), default is <code>ON</code>.</td>
			
 
				         </tr>
			
 
				     </tbody>
			
 
				 </table>
			
 
				 
			
 
				-Compilation Example:
			
 
				+Example:
			
 
				 
			
 
				 ```shell
			
 
				-# Compilation
			
 
				+# Build
			
 
				+cd PaddleX/libs/ultra-infer/scripts/linux
			
 
				 # export PYTHON_VERSION=...
			
 
				 # export WITH_GPU=...
			
 
				 # export ENABLE_ORT_BACKEND=...
			
 
				 # export ...
			
 
				-
			
 
				-cd PaddleX/libs/ultra-infer/scripts/linux
			
 
				 bash set_up_docker_and_build_py.sh
			
 
				 
			
 
				-# Installation
			
 
				+# Install
			
 
				 python -m pip install ../../python/dist/ultra_infer*.whl
			
 
				 ```
			
 
				 
			
 
				 ## 3. Frequently Asked Questions
			
 
				 
			
 
				-**1. Why is the inference speed similar to regular inference after using the high-performance inference feature?**
			
 
				-
			
 
				-High-performance inference accelerates inference by intelligently selecting backends, but due to factors such as model complexity or unsupported operators, some models may not be able to use accelerated backends (like OpenVINO, TensorRT, etc.). In such cases, relevant information will be prompted in the logs, and the **fastest available backend** known will be selected, potentially reverting to regular inference.
			
 
				+**1. Why does the inference speed not appear to improve noticeably before and after enabling the high-performance inference plugin?**
			
 
				 
			
 
				 The high-performance inference plugin accelerates inference by intelligently selecting the backend.
			
 
				 
			
 
				-For modules, due to model complexity or unsupported operators, some models may not be able to use accelerated backends (such as OpenVINO, TensorRT, etc.). In such cases, relevant information will be prompted in the logs, and the **fastest available backend** known will be selected, potentially falling back to regular inference.
			
 
				+For modules, due to model complexity or unsupported operators, some models may not be able to use acceleration backends (such as OpenVINO, TensorRT, etc.). In such cases, corresponding messages will be logged, and the fastest available backend known will be chosen, which may fall back to standard inference.
			
 
				 
			
 
				-For pipelines, the performance bottleneck may not be in the model inference stage.
			
 
				+For model pipelines, the performance bottleneck may not lie in the inference stage.
			
 
				 
			
 
				-You can use the [PaddleX benchmark](../module_usage/instructions/benchmark.md) tool to conduct actual speed tests for a more accurate performance assessment.
			
 
				+You can use the [PaddleX benchmark](../module_usage/instructions/benchmark.md) tool to conduct actual speed tests for a more accurate performance evaluation.
			
 
				 
			
 
				-**2. Does the high-performance inference feature support all model pipelines and modules?**
			
 
				+**2. Does the high-performance inference functionality support all model pipelines and modules?**
			
 
				 
			
 
				-The high-performance inference feature supports all model pipelines and modules, but some models may not experience accelerated inference. Specific reasons can be referred to in Question 1.
			
 
				+The high-performance inference functionality supports all model pipelines and modules, but some models may not see an acceleration effect due to reasons mentioned in FAQ 1.
			
 
				 
			
 
				-**3. Why does the installation of the high-performance inference plugin fail, with the log displaying: "Currently, the CUDA version must be 11.x for GPU devices."?**
			
 
				+**3. Why does the installation of the high-performance inference plugin fail with a log message stating: “Currently, the CUDA version must be 11.x for GPU devices.”?**
			
 
				 
			
 
				-The environments supported by the high-performance inference feature are shown in [the table in Section 1.1](#11-installing-the-high-performance-inference-plugin). If the installation fails, it may be due to the high-performance inference feature not supporting the current environment. Additionally, CUDA 12.6 is already under support.
			
 
				+The high-performance inference functionality currently supports only a limited set of environments. Please refer to the installation instructions. If installation fails, it may be that the current environment is not supported by the high-performance inference functionality. Note that CUDA 12.6 is already under support.
			
 
				 
			
 
				-**4. Why does the program get stuck or display WARNING and ERROR messages when using the high-performance inference feature? How should this be handled?**
			
 
				+**4. Why does the program freeze during runtime or display some WARNING and ERROR messages after using the high-performance inference functionality? What should be done in such cases?**
			
 
				 
			
 
				-During engine construction, due to subgraph optimization and operator processing, the program may take longer and generate WARNING and ERROR messages. However, as long as the program does not exit automatically, it is recommended to wait patiently as the program will usually continue to run until completion.
			
 
				+When initializing the model, operations such as subgraph optimization may take longer and may generate some WARNING and ERROR messages. However, as long as the program does not exit automatically, it is recommended to wait patiently, as the program usually continues to run to completion.
			
--- a/docs/pipeline_deploy/high_performance_inference.md
+++ b/docs/pipeline_deploy/high_performance_inference.md
@@ -14,18 +14,17 @@ comments: true
 
				 - [2. 进阶使用方法](#2-进阶使用方法)
			
 
				   - [2.1 高性能推理工作模式](#21-高性能推理工作模式)
			
 
				   - [2.2 高性能推理配置](#22-高性能推理配置)
			
 
				-  - [2.3 如何修改高性能推理配置](#23-如何修改高性能推理配置)
			
 
				-  - [2.4 修改高性能推理配置示例](#24-修改高性能推理配置示例)
			
 
				-  - [2.5 高性能推理在子产线/子模块中的启用/禁用](#25-高性能推理在子产线子模块中的启用禁用)
			
 
				-  - [2.6 模型缓存说明](#26-模型缓存说明)
			
 
				-  - [2.7 定制模型推理库](#27-定制模型推理库)
			
 
				+  - [2.3 修改高性能推理配置](#23-修改高性能推理配置)
			
 
				+  - [2.4 高性能推理插件在子产线/子模块中的启用/禁用](#24-高性能推理插件在子产线子模块中的启用禁用)
			
 
				+  - [2.5 模型缓存说明](#25-模型缓存说明)
			
 
				+  - [2.6 定制模型推理库](#26-定制模型推理库)
			
 
				 - [3. 常见问题](#3.-常见问题)
			
 
				 
			
 
				 ## 1. 基础使用方法
			
 
				 
			
 
				-使用高性能推理插件前，请确保您已经按照[PaddleX本地安装教程](../installation/installation.md) 完成了PaddleX的安装，且按照PaddleX产线命令行使用说明或PaddleX产线Python脚本使用说明跑通了产线的快速推理。
			
 
				+使用高性能推理插件前，请确保您已经按照 [PaddleX本地安装教程](../installation/installation.md) 完成了PaddleX的安装，且按照PaddleX产线命令行使用说明或PaddleX产线Python脚本使用说明跑通了产线的快速推理。
			
 
				 
			
 
				-高性能推理支持处理 **PaddlePaddle 静态图模型( `.pdmodel`、 `.json` )** 和 **ONNX 格式模型( `.onnx` )**。对于 ONNX 格式模型，建议使用[Paddle2ONNX 插件](./paddle2onnx.md)转换得到。如果模型目录中存在多种格式的模型，PaddleX 会根据需要自动选择。
			
 
				+高性能推理支持处理 **PaddlePaddle 静态图模型( `.pdmodel`、 `.json` )** 和 **ONNX 格式模型( `.onnx` )**。对于 ONNX 格式模型，建议使用 [Paddle2ONNX 插件](./paddle2onnx.md) 转换得到。如果模型目录中存在多种格式的模型，PaddleX 会根据需要自动选择。
			
 
				 
			
 
				 ### 1.1 安装高性能推理插件
			
 
				 
			
@@ -55,13 +54,13 @@ comments: true
 
				     <td>3.10</td>
			
 
				   </tr>
			
 
				   <tr>
			
 
				-    <td>aarch64</td>
			
 
				+    <td>AArch64</td>
			
 
				     <td>NPU</td>
			
 
				     <td>3.10</td>
			
 
				   </tr>
			
 
				 </table>
			
 
				 
			
 
				-#### (1) 基于 Docker 安装高性能推理插件（强烈推荐）：
			
 
				+#### (1) 在 Docker 容器中安装高性能推理插件（强烈推荐）：
			
 
				 
			
 
				 参考 [基于Docker获取PaddleX](../installation/installation.md#21-基于docker获取paddlex) 使用 Docker 启动 PaddleX 容器。启动容器后，根据设备类型，执行如下指令，安装高性能推理插件：
			
 
				 
			
@@ -84,15 +83,14 @@ comments: true
 
				               <td><code>paddlex --install hpi-gpu</code></td>
			
 
				               <td>安装 GPU 版本的高性能推理功能。<br />包含了 CPU 版本的所有功能。</td>
			
 
				           </tr>
			
 
				-          <tr>
			
 
				-              <td>NPU</td>
			
 
				-              <td><code>paddlex --install hpi-npu</code></td>
			
 
				-              <td>安装 NPU 版本的高性能推理功能。<br />使用说明请参考<a href="../practical_tutorials/high_performance_npu_tutorial.md">昇腾 NPU 高性能推理教程</a>。</td>
			
 
				-          </tr>
			
 
				       </tbody>
			
 
				   </table>
			
 
				 
			
 
				-#### (2) 本地安装高性能推理插件：
			
 
				+PaddleX 官方 Docker 镜像中默认安装了 TensorRT，高性能推理插件可以使用 Paddle Inference TensorRT 子图引擎进行推理加速。
			
 
				+
			
 
				+**请注意，以上提到的镜像指的是 [基于Docker获取PaddleX](../installation/installation.md#21-基于docker获取paddlex) 中描述的 PaddleX 官方镜像，而非 [飞桨PaddlePaddle本地安装教程](../installation/paddlepaddle_install.md#基于-docker-安装飞桨) 中描述的飞桨框架官方镜像。对于后者，请参考高性能推理插件本地安装说明。**
			
 
				+
			
 
				+#### (2) 本地安装高性能推理插件（不推荐）：
			
 
				 
			
 
				 ##### 安装 CPU 版本的高性能推理插件：
			
 
				 
			
@@ -104,40 +102,47 @@ paddlex --install hpi-cpu
 
				 
			
 
				 ##### 安装 GPU 版本的高性能推理插件：
			
 
				 
			
 
				-参考 [NVIDIA 官网](https://developer.nvidia.com/) 本地安装 CUDA 和 cuDNN，再执行：
			
 
				+在安装前，需要确保环境中安装有 CUDA 与 cuDNN。目前 PaddleX 官方仅提供 CUDA 11.8 + cuDNN 8.9 的预编译包，请保证安装的 CUDA 和 cuDNN 版本与编译版本兼容。以下分别是 CUDA 11.8 和 cuDNN 8.9 的安装说明文档：
			
 
				 
			
 
				-```bash
			
 
				-paddlex --install hpi-gpu
			
 
				-```
			
 
				+- [安装 CUDA 11.8](https://developer.nvidia.com/cuda-11-8-0-download-archive)
			
 
				+- [安装 cuDNN 8.9](https://docs.nvidia.com/deeplearning/cudnn/archives/cudnn-890/install-guide/index.html)
			
 
				 
			
 
				-所需的 CUDA 和 cuDNN 版本可以通过如下方式获取：
			
 
				+如果使用的是飞桨框架官方镜像，则镜像中的 CUDA 和 cuDNN 版本已经是满足要求的，无需重新安装。
			
 
				+
			
 
				+如果通过 pip 安装飞桨，通常 CUDA、cuDNN 的相关 Python 包将被自动安装。在这种情况下，**仍需要通过安装非 Python 专用的 CUDA 与 cuDNN**。同时，建议安装的 CUDA 和 cuDNN 版本与环境中存在的 Python 包版本保持一致，以避免不同版本的库共存导致的潜在问题。可以通过如下方式可以查看 CUDA 和 cuDNN 相关 Python 包的版本：
			
 
				 
			
 
				 ```bash
			
 
				-# CUDA 版本
			
 
				+# CUDA 相关 Python 包版本
			
 
				 pip list | grep nvidia-cuda
			
 
				-# cuDNN 版本
			
 
				+# cuDNN 相关 Python 包版本
			
 
				 pip list | grep nvidia-cudnn
			
 
				 ```
			
 
				 
			
 
				-安装 CUDA 11.8 和 cuDNN 8.9 的参考文档：
			
 
				-- [安装CUDA 11.8](https://developer.nvidia.com/cuda-11-8-0-download-archive)
			
 
				-- [安装cuDNN 8.9](https://docs.nvidia.com/deeplearning/cudnn/archives/cudnn-890/install-guide/index.html)
			
 
				+如果希望使用 Paddle Inference TensorRT 子图引擎，需额外安装 TensorRT。请参考 [飞桨PaddlePaddle本地安装教程](../installation/paddlepaddle_install.md) 中的相关说明。需要注意的是，由于高性能推理插件的底层推理库也集成了 TensorRT，建议安装相同版本的 TensorRT 以避免版本冲突。目前，高性能推理插件的底层推理库集成的 TensorRT 版本为 8.6.1.6。如果使用的是飞桨框架官方镜像，则无需关心版本冲突问题。
			
 
				 
			
 
				-**注意：**
			
 
				+确认安装了正确版本的 CUDA、cuDNN、以及 TensorRT （可选）后，执行：
			
 
				+
			
 
				+```bash
			
 
				+paddlex --install hpi-gpu
			
 
				+```
			
 
				+
			
 
				+##### 安装 NPU 版本的高性能推理插件：
			
 
				 
			
 
				-1. **GPU 只支持 CUDA 11.8 + cuDNN8.9**，CUDA 12.6 已经在支持中。
			
 
				+请参考 [昇腾 NPU 高性能推理教程](../practical_tutorials/high_performance_npu_tutorial.md)
			
 
				 
			
 
				-2. 同一环境下只应该存在一个高性能推理插件版本。
			
 
				+**注意：**
			
 
				+
			
 
				+1. **目前 PaddleX 官方仅提供 CUDA 11.8 + cuDNN 8.9 的预编译包**；CUDA 12.6 已经在支持中。
			
 
				 
			
 
				-3. NPU 设备的高性能推理使用说明参考 [昇腾 NPU 高性能推理教程](../practical_tutorials/high_performance_npu_tutorial.md)。
			
 
				+2. 同一环境中只应该存在一个版本的高性能推理插件。
			
 
				 
			
 
				-4. Windows 只支持基于 Docker 安装和使用高性能推理插件。
			
 
				+3. 对于 Windows 系统，目前建议在 Docker 容器中安装和使用高性能推理插件。
			
 
				 
			
 
				 ### 1.2 启用高性能推理插件
			
 
				 
			
 
				-以下是使用 PaddleX CLI 和 Python API 在通用图像分类产线和图像分类模块中启用高性能推理功能的示例。
			
 
				+以下是使用 PaddleX CLI 和 Python API 在通用图像分类产线和图像分类模块中启用高性能推理插件的示例。
			
 
				 
			
 
				-对于 PaddleX CLI，指定 `--use_hpip`，即可启用高性能推理。
			
 
				+对于 PaddleX CLI，指定 `--use_hpip`，即可启用高性能推理插件。
			
 
				 
			
 
				 通用图像分类产线：
			
 
				 
			
@@ -161,7 +166,7 @@ python main.py \
 
				     -o Predict.use_hpip=True
			
 
				 ```
			
 
				 
			
 
				-对于 PaddleX Python API，启用高性能推理的方法类似。以通用图像分类产线和图像分类模块为例：
			
 
				+对于 PaddleX Python API，启用高性能推理插件的方法类似。以通用图像分类产线和图像分类模块为例：
			
 
				 
			
 
				 通用图像分类产线：
			
 
				 
			
@@ -193,15 +198,15 @@ output = model.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/
 
				 
			
 
				 启用高性能推理插件得到的推理结果与未启用插件时一致。对于部分模型，**在首次启用高性能推理插件时，可能需要花费较长时间完成推理引擎的构建**。PaddleX 将在推理引擎的第一次构建完成后将相关信息缓存在模型目录，并在后续复用缓存中的内容以提升初始化速度。
			
 
				 
			
 
				-**启用高性能推理默认作用于整条产线/整个模块**，若想细粒度控制作用范围，如只对产线中某条子产线或某个子模块启用高性能推理插件，可以在产线配置文件中不同层级的配置里设置`use_hpip`，请参考 [2.5 高性能推理在子产线/子模块中的启用/禁用](#25-高性能推理在子产线子模块中的启用禁用)。
			
 
				+**启用高性能推理插件默认作用于整条产线/整个模块**，若想细粒度控制作用范围，如只对产线中某条子产线或某个子模块启用高性能推理插件，可以在产线配置文件中不同层级的配置里设置`use_hpip`，请参考 [2.4 高性能推理插件在子产线/子模块中的启用/禁用](#24-高性能推理插件在子产线子模块中的启用禁用)。
			
 
				 
			
 
				 ## 2. 进阶使用方法
			
 
				 
			
 
				-本节介绍高性能推理的进阶使用方法，适合对模型部署有一定了解或希望进行手动配置调优的用户。用户可以参照配置说明和示例，根据自身需求自定义使用高性能推理。接下来将对进阶使用方法进行详细介绍。
			
 
				+本节介绍高性能推理插件的进阶使用方法，适合对模型部署有一定了解或希望进行手动配置调优的用户。用户可以参照配置说明和示例，根据自身需求自定义使用高性能推理插件。接下来将对进阶使用方法进行详细介绍。
			
 
				 
			
 
				 ### 2.1 高性能推理工作模式
			
 
				 
			
 
				-高性能推理分为两种工作模式：
			
 
				+高性能推理插件支持两种工作模式。通过修改高性能推理配置，可以切换不同的工作模式。
			
 
				 
			
 
				 #### (1) 安全自动配置模式
			
 
				 
			
@@ -262,7 +267,7 @@ output = model.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/
 
				   </tr>
			
 
				   <tr>
			
 
				     <td><code>paddle</code></td>
			
 
				-    <td>Paddle Inference 推理引擎，支持 Paddle Inference TensorRT 子图引擎的方式提升模型的 GPU 推理性能。</td>
			
 
				+    <td>Paddle Inference 推理引擎，支持通过 Paddle Inference TensorRT 子图引擎的方式提升模型的 GPU 推理性能。</td>
			
 
				     <td>CPU, GPU</td>
			
 
				   </tr>
			
 
				   <tr>
			
@@ -287,7 +292,7 @@ output = model.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/
 
				   </tr>
			
 
				 </table>
			
 
				 
			
 
				-`backend_config` 根据不同后端有不同的可选值，如下表所示：
			
 
				+`backend_config` 对不同的后端有不同的可选值，如下表所示：
			
 
				 
			
 
				 <table>
			
 
				   <tr>
			
@@ -300,11 +305,11 @@ output = model.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/
 
				   </tr>
			
 
				   <tr>
			
 
				     <td><code>openvino</code></td>
			
 
				-    <td><code>cpu_num_threads</code>：CPU推理使用的逻辑处理器数量。默认为<code>8</code>。</td>
			
 
				+    <td><code>cpu_num_threads</code>：CPU 推理使用的逻辑处理器数量。默认为<code>8</code>。</td>
			
 
				   </tr>
			
 
				   <tr>
			
 
				     <td><code>onnxruntime</code></td>
			
 
				-    <td><code>cpu_num_threads</code>：CPU推理时算子内部的并行计算线程数。默认为<code>8</code>。</td>
			
 
				+    <td><code>cpu_num_threads</code>：CPU 推理时算子内部的并行计算线程数。默认为<code>8</code>。</td>
			
 
				   </tr>
			
 
				   <tr>
			
 
				     <td><code>tensorrt</code></td>
			
@@ -320,30 +325,19 @@ output = model.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/
 
				   </tr>
			
 
				 </table>
			
 
				 
			
 
				-### 2.3 如何修改高性能推理配置
			
 
				-
			
 
				-由于实际部署环境和需求的多样性，默认配置可能无法满足所有要求。这时，可能需要手动调整高性能推理配置。以下是两种常见的情况：
			
 
				-
			
 
				-- 需要更换推理后端。
			
 
				-  - 例如在OCR产线中，指定`text_detection`模块使用`onnxruntime`后端，`text_recognition`模块使用`tensorrt`后端。
			
 
				+### 2.3 修改高性能推理配置
			
 
				 
			
 
				-- 需要修改 TensorRT 的动态形状配置：
			
 
				-  - 当默认的动态形状配置无法满足需求（例如，模型可能需要范围外的输入形状），就需要为每一个输入张量指定动态形状。修改完成后，需要清理模型的`.cache`缓存目录。
			
 
				-
			
 
				-在这些情况下，用户可以通过修改**产线/模块配置文件**、**CLI**或**Python API**所传递参数中的 `hpi_config` 字段内容来修改配置。**通过 CLI 或 Python API 传递的参数将覆盖产线/模块配置文件的设置**。
			
 
				-
			
 
				-### 2.4 修改高性能推理配置示例
			
 
				+由于实际部署环境和需求的多样性，默认配置可能无法满足所有要求。这时，可能需要手动调整高性能推理配置。用户可以通过修改**产线/模块配置文件**、**CLI**或**Python API**所传递参数中的 `hpi_config` 字段内容来修改配置。**通过 CLI 或 Python API 传递的参数将覆盖产线/模块配置文件中的设置**。以下将结合一些例子介绍如何修改配置。
			
 
				 
			
 
				 #### (1) 更换推理后端。
			
 
				 
			
 
				-  ##### 通用OCR产线的所有模型使用`onnxruntime`后端：
			
 
				+  ##### 通用OCR产线的所有模型使用 `onnxruntime` 后端：
			
 
				 
			
 
				   <details><summary>👉 1. 修改产线配置文件方式（点击展开）</summary>
			
 
				 
			
 
				   ```yaml
			
 
				   pipeline_name: OCR
			
 
				 
			
 
				-  use_hpip: True
			
 
				   hpi_config:
			
 
				     backend: onnxruntime
			
 
				 
			
@@ -378,7 +372,7 @@ output = model.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/
 
				 
			
 
				   </details>
			
 
				 
			
 
				-  ##### 图像分类模块的模型使用`onnxruntime`后端：
			
 
				+  ##### 图像分类模块使用 `onnxruntime` 后端：
			
 
				 
			
 
				   <details><summary>👉 1. 修改产线配置文件方式（点击展开）</summary>
			
 
				 
			
@@ -387,7 +381,6 @@ output = model.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/
 
				   ...
			
 
				   Predict:
			
 
				     ...
			
 
				-    use_hpip: True
			
 
				     hpi_config:
			
 
				         backend: onnxruntime
			
 
				     ...
			
@@ -424,7 +417,7 @@ output = model.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/
 
				 
			
 
				   </details>
			
 
				 
			
 
				-  ##### 通用OCR产线的`text_detection`模块使用`onnxruntime`后端，`text_recognition`模块使用`tensorrt`后端：
			
 
				+  ##### 通用OCR产线的 `text_detection` 模块使用 `onnxruntime` 后端，`text_recognition` 模块使用 `tensorrt` 后端：
			
 
				 
			
 
				   <details><summary>👉 1. 修改产线配置文件方式（点击展开）</summary>
			
 
				 
			
@@ -443,11 +436,8 @@ output = model.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/
 
				       thresh: 0.3
			
 
				       box_thresh: 0.6
			
 
				       unclip_ratio: 2.0
			
 
				-      # 当前子模块启用高性能推理
			
 
				-      use_hpip: True
			
 
				-      # 当前子模块使用如下高性能推理配置
			
 
				       hpi_config:
			
 
				-          backend: onnxruntime
			
 
				+        backend: onnxruntime
			
 
				     TextLineOrientation:
			
 
				       module_name: textline_orientation
			
 
				       model_name: PP-LCNet_x0_25_textline_ori
			
@@ -459,11 +449,8 @@ output = model.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/
 
				       model_dir: null
			
 
				       batch_size: 6
			
 
				       score_thresh: 0.0
			
 
				-      # 当前子模块启用高性能推理
			
 
				-      use_hpip: True
			
 
				-      # 当前子模块使用如下高性能推理配置
			
 
				       hpi_config:
			
 
				-          backend: tensorrt
			
 
				+        backend: tensorrt
			
 
				   ```
			
 
				 
			
 
				   </details>
			
@@ -482,7 +469,6 @@ output = model.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/
 
				         hpi_config:
			
 
				           backend: tensorrt
			
 
				           backend_config:
			
 
				-            precision: fp32
			
 
				             dynamic_shapes:
			
 
				               x:
			
 
				                 - [1, 3, 300, 300]
			
@@ -502,11 +488,9 @@ output = model.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/
 
				   ...
			
 
				   Predict:
			
 
				     ...
			
 
				-    use_hpip: True
			
 
				     hpi_config:
			
 
				         backend: tensorrt
			
 
				         backend_config:
			
 
				-          precision: fp32
			
 
				           dynamic_shapes:
			
 
				             x:
			
 
				               - [1, 3, 300, 300]
			
@@ -518,11 +502,11 @@ output = model.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/
 
				 
			
 
				   </details>
			
 
				 
			
 
				-### 2.5 高性能推理在子产线/子模块中的启用/禁用
			
 
				+### 2.4 高性能推理插件在子产线/子模块中的启用/禁用
			
 
				 
			
 
				 高性能推理支持通过在子产线/子模块级别使用 `use_hpip`，实现**仅产线中的某个子产线/子模块使用高性能推理**。示例如下：
			
 
				 
			
 
				-##### 通用OCR产线的`text_detection`模块使用高性能推理，`text_recognition`模块不使用高性能推理：
			
 
				+##### 通用OCR产线的 `text_detection` 模块使用高性能推理，`text_recognition` 模块不使用高性能推理：
			
 
				 
			
 
				   <details><summary>👉 点击展开</summary>
			
 
				 
			
@@ -541,19 +525,20 @@ output = model.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/
 
				       thresh: 0.3
			
 
				       box_thresh: 0.6
			
 
				       unclip_ratio: 2.0
			
 
				-      use_hpip: True # 当前子模块启用高性能推理
			
 
				+      use_hpip: True # 当前子模块使用高性能推理
			
 
				     TextLineOrientation:
			
 
				       module_name: textline_orientation
			
 
				       model_name: PP-LCNet_x0_25_textline_ori
			
 
				       model_dir: null
			
 
				       batch_size: 6
			
 
				+      # 当前子模块未单独配置，默认与全局配置一致（如果配置文件和 CLI、API 参数均未设置，则不使用高性能推理）
			
 
				     TextRecognition:
			
 
				       module_name: text_recognition
			
 
				       model_name: PP-OCRv4_mobile_rec
			
 
				       model_dir: null
			
 
				       batch_size: 6
			
 
				       score_thresh: 0.0
			
 
				-      use_hpip: False # 当前子模块不启用高性能推理
			
 
				+      use_hpip: False # 当前子模块不使用高性能推理
			
 
				   ```
			
 
				 
			
 
				   </details>
			
@@ -561,20 +546,21 @@ output = model.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/
 
				 **注意：**
			
 
				 
			
 
				 1. 在子产线或子模块中设置 `use_hpip` 时，将以最深层的配置为准。
			
 
				+2. **当通过修改产线配置文件的方式启用/禁用高性能推理插件时，不建议同时使用 CLI 或 Python API 的方式进行设置。** 通过 CLI 或 Python API 设置 `use_hpip` 等同于修改配置文件顶层的 `use_hpip`。
			
 
				 
			
 
				-2. **强烈建议通过修改产线配置文件的方式开启高性能推理**，不建议使用CLI或Python API的方式进行设置。如果通过CLI或Python API启用 `use_hpip`，等同于在配置文件的最上层设置 `use_hpip`。
			
 
				-
			
 
				-### 2.6 模型缓存说明
			
 
				+### 2.5 模型缓存说明
			
 
				 
			
 
				 模型缓存会存放在模型目录下的 `.cache` 目录下，包括使用 `tensorrt` 或 `paddle` 后端时产生的 `shape_range_info.pbtxt`与`trt_serialized`开头的文件。
			
 
				 
			
 
				+**修改 TensorRT 相关配置后，建议清理缓存，以避免出现缓存导致新配置不生效的情况。**
			
 
				+
			
 
				 当启用`auto_paddle2onnx`选项时，可能会在模型目录下自动生成`inference.onnx`文件。
			
 
				 
			
 
				-### 2.7 定制模型推理库
			
 
				+### 2.6 定制模型推理库
			
 
				 
			
 
				-`ultra-infer`是高性能推理底层依赖的模型推理库，位于 `PaddleX/libs/ultra-infer` 目录。编译脚本位于 `PaddleX/libs/ultra-infer/scripts/linux/set_up_docker_and_build_py.sh` ，编译默认编译GPU版本和包含 OpenVINO、TensorRT、ONNX Runtime 三种推理后端的 `ultra-infer`。
			
 
				+`ultra-infer` 是高性能推理底层依赖的模型推理库，在 `PaddleX/libs/ultra-infer` 目录以子项目形式维护。PaddleX 提供 `ultra-infer` 的构建脚本，位于 `PaddleX/libs/ultra-infer/scripts/linux/set_up_docker_and_build_py.sh` 。编译脚本默认构建 GPU 版本的 `ultra-infer`，集成 OpenVINO、TensorRT、ONNX Runtime 三种推理后端。
			
 
				 
			
 
				-自定义编译时可根据需求修改如下选项：
			
 
				+如果需要自定义构建 `ultra-infer`，可根据需求修改构建脚本的如下选项：
			
 
				 
			
 
				 <table>
			
 
				     <thead>
			
@@ -586,41 +572,40 @@ output = model.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/
 
				     <tbody>
			
 
				         <tr>
			
 
				             <td>http_proxy</td>
			
 
				-            <td>在下载三方库时使用具体的http代理，默认空</td>
			
 
				+            <td>在下载三方库时使用的 HTTP 代理，默认为空</td>
			
 
				         </tr>
			
 
				         <tr>
			
 
				             <td>PYTHON_VERSION</td>
			
 
				-            <td>Python版本，默认 <code>3.10.0</code></td>
			
 
				+            <td>Python 版本，默认 <code>3.10.0</code></td>
			
 
				         </tr>
			
 
				         <tr>
			
 
				             <td>WITH_GPU</td>
			
 
				-            <td>是否编译支持Nvidia-GPU，默认 <code>ON</code></td>
			
 
				+            <td>是否支持 GPU，默认 <code>ON</code></td>
			
 
				         </tr>
			
 
				         <tr>
			
 
				             <td>ENABLE_ORT_BACKEND</td>
			
 
				-            <td>是否编译集成ONNX Runtime后端，默认 <code>ON</code></td>
			
 
				+            <td>是否集成 ONNX Runtime 后端，默认 <code>ON</code></td>
			
 
				         </tr>
			
 
				         <tr>
			
 
				             <td>ENABLE_TRT_BACKEND</td>
			
 
				-            <td>是否编译集成TensorRT后端（仅支持GPU），默认 <code>ON</code></td>
			
 
				+            <td>是否集成 TensorRT 后端（仅支持GPU），默认 <code>ON</code></td>
			
 
				         </tr>
			
 
				         <tr>
			
 
				             <td>ENABLE_OPENVINO_BACKEND</td>
			
 
				-            <td>是否编译集成OpenVINO后端（仅支持CPU），默认 <code>ON</code></td>
			
 
				+            <td>是否集成 OpenVINO 后端（仅支持CPU），默认 <code>ON</code></td>
			
 
				         </tr>
			
 
				     </tbody>
			
 
				 </table>
			
 
				 
			
 
				-编译示例：
			
 
				+示例：
			
 
				 
			
 
				 ```shell
			
 
				-# 编译
			
 
				+# 构建
			
 
				+cd PaddleX/libs/ultra-infer/scripts/linux
			
 
				 # export PYTHON_VERSION=...
			
 
				 # export WITH_GPU=...
			
 
				 # export ENABLE_ORT_BACKEND=...
			
 
				 # export ...
			
 
				-
			
 
				-cd PaddleX/libs/ultra-infer/scripts/linux
			
 
				 bash set_up_docker_and_build_py.sh
			
 
				 
			
 
				 # 安装
			
@@ -639,14 +624,14 @@ python -m pip install ../../python/dist/ultra_infer*.whl
 
				 
			
 
				 可以使用 [PaddleX benchmark](../module_usage/instructions/benchmark.md) 工具进行实际速度测试，以便更准确地评估性能。
			
 
				 
			
 
				-**2: 高性能推理功能是否支持所有模型产线与单功能模块？**
			
 
				+**2. 高性能推理功能是否支持所有模型产线与单功能模块？**
			
 
				 
			
 
				 高性能推理功能支持所有模型产线与单功能模块，但部分模型可能无法加速推理，具体原因可以参考问题1。
			
 
				 
			
 
				-**3: 为什么安装高性能推理插件会失败，日志显示：Currently, the CUDA version must be 11.x for GPU devices.？**
			
 
				+**3. 为什么安装高性能推理插件会失败，日志显示：Currently, the CUDA version must be 11.x for GPU devices.？**
			
 
				 
			
 
				-高性能推理功能目前支持的环境如 [1.1节的表](#11-安装高性能推理插件) 所示。如果安装失败，可能是高性能推理功能不支持当前环境。另外，CUDA 12.6 已经在支持中。
			
 
				+高性能推理功能目前仅支持有限的环境，详情请参考安装说明。如果安装失败，可能是高性能推理功能不支持当前环境。另外，CUDA 12.6 已经在支持中。
			
 
				 
			
 
				 **4. 为什么使用高性能推理功能后，程序在运行过程中会卡住或者显示一些 WARNING 和 ERROR 信息？这种情况下应该如何处理？**
			
 
				 
			
 
				-在引擎构建过程中，由于子图优化和算子处理，可能会导致程序耗时较长，并生成一些 WARNING 和 ERROR 信息。然而，只要程序没有自动退出，建议耐心等待，程序通常会继续运行至完成。
			
 
				+在初始化模型时，子图优化等操作可能会导致程序耗时较长，并生成一些 WARNING 和 ERROR 信息。然而，只要程序没有自动退出，建议耐心等待，程序通常会继续运行至完成。
			
--- a/docs/pipeline_deploy/paddle2onnx.en.md
+++ b/docs/pipeline_deploy/paddle2onnx.en.md
@@ -34,7 +34,7 @@ paddlex --install paddle2onnx
 
				         <tr>
			
 
				             <td>opset_version</td>
			
 
				             <td>int</td>
			
 
				-            <td>The ONNX opset version to use. Defaults to <code>19</code> for <code>.json</code> format Paddle models and <code>7</code> for <code>.pdmodel</code> format Paddle models.</td>
			
 
				+            <td>The ONNX opset version to use. Defaults to <code>7</code>.</td>
			
 
				         </tr>
			
 
				     </tbody>
			
 
				 </table>
			
--- a/docs/pipeline_deploy/paddle2onnx.md
+++ b/docs/pipeline_deploy/paddle2onnx.md
@@ -35,7 +35,7 @@ paddlex --install paddle2onnx
 
				         <tr>
			
 
				             <td>opset_version</td>
			
 
				             <td>int</td>
			
 
				-            <td>使用的ONNX opset版本。<code>.json</code>格式的Paddle模型默认为<code>19</code>，<code>.pdmodel</code>格式的Paddle模型默认为<code>7</code>。</td>
			
 
				+            <td>使用的ONNX opset版本。默认为<code>7</code>。</td>
			
 
				         </tr>
			
 
				     </tbody>
			
 
				 </table>
			
--- a/paddlex/paddlex_cli.py
+++ b/paddlex/paddlex_cli.py
@@ -179,7 +179,7 @@ def args_cfg():
 
				         help="Output directory for the ONNX model",
			
 
				     )
			
 
				     paddle2onnx_group.add_argument(
			
 
				-        "--opset_version", type=int, help="Version of the ONNX opset to use"
			
 
				+        "--opset_version", type=int, default=7, help="Version of the ONNX opset to use"
			
 
				     )
			
 
				 
			
 
				     # Parse known arguments to get the pipeline name
			
@@ -367,12 +367,6 @@ def paddle_to_onnx(paddle_model_dir, onnx_model_dir, *, opset_version):
 
				     def _run_paddle2onnx(input_dir, pd_model_file_ext, output_dir, opset_version):
			
 
				         logging.info("Paddle2ONNX conversion starting...")
			
 
				         # XXX: To circumvent Paddle2ONNX's bug
			
 
				-        if opset_version is None:
			
 
				-            if pd_model_file_ext == ".json":
			
 
				-                opset_version = 19
			
 
				-            else:
			
 
				-                opset_version = 7
			
 
				-            logging.info("Using default ONNX opset version: %d", opset_version)
			
 
				         cmd = [
			
 
				             "paddle2onnx",
			
 
				             "--model_dir",
			
--- a/paddlex/utils/deps.py
+++ b/paddlex/utils/deps.py
@@ -222,4 +222,4 @@ def require_paddle2onnx_plugin():
 
				 
			
 
				 
			
 
				 def get_paddle2onnx_spec():
			
 
				-    return "paddle2onnx == 2.0.0a5"
			
 
				+    return "paddle2onnx == 2.0.1a1"