Переглянути джерело

update hpi docs (#3807)

* update hpi docs

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* OCR API reference add max_num_input_imgs

* update

* update hpi en doc, add paddle2onnx en doc

* update OCR pipelines API reference

* update

* update

* update paddle en doc

* update

* add high_performance_npu_tutorial.en.md
zhang-prog 7 місяців тому
батько
коміт
581ae29e00
29 змінених файлів з 1219 додано та 1039 видалено
  1. 16 0
      docs/installation/paddlepaddle_install.en.md
  2. 16 0
      docs/installation/paddlepaddle_install.md
  3. 488 288
      docs/pipeline_deploy/high_performance_inference.en.md
  4. 433 709
      docs/pipeline_deploy/high_performance_inference.md
  5. 61 0
      docs/pipeline_deploy/paddle2onnx.en.md
  6. 62 0
      docs/pipeline_deploy/paddle2onnx.md
  7. 2 2
      docs/pipeline_usage/pipeline_develop_guide.md
  8. 7 2
      docs/pipeline_usage/tutorials/information_extraction_pipelines/document_scene_information_extraction_v3.en.md
  9. 7 2
      docs/pipeline_usage/tutorials/information_extraction_pipelines/document_scene_information_extraction_v3.md
  10. 7 2
      docs/pipeline_usage/tutorials/information_extraction_pipelines/document_scene_information_extraction_v4.en.md
  11. 7 2
      docs/pipeline_usage/tutorials/information_extraction_pipelines/document_scene_information_extraction_v4.md
  12. 7 2
      docs/pipeline_usage/tutorials/ocr_pipelines/OCR.en.md
  13. 7 2
      docs/pipeline_usage/tutorials/ocr_pipelines/OCR.md
  14. 7 2
      docs/pipeline_usage/tutorials/ocr_pipelines/PP-StructureV3.en.md
  15. 7 2
      docs/pipeline_usage/tutorials/ocr_pipelines/PP-StructureV3.md
  16. 7 2
      docs/pipeline_usage/tutorials/ocr_pipelines/doc_preprocessor.en.md
  17. 7 2
      docs/pipeline_usage/tutorials/ocr_pipelines/doc_preprocessor.md
  18. 7 2
      docs/pipeline_usage/tutorials/ocr_pipelines/formula_recognition.en.md
  19. 7 2
      docs/pipeline_usage/tutorials/ocr_pipelines/formula_recognition.md
  20. 7 2
      docs/pipeline_usage/tutorials/ocr_pipelines/layout_parsing.en.md
  21. 7 2
      docs/pipeline_usage/tutorials/ocr_pipelines/layout_parsing.md
  22. 7 2
      docs/pipeline_usage/tutorials/ocr_pipelines/seal_recognition.en.md
  23. 7 2
      docs/pipeline_usage/tutorials/ocr_pipelines/seal_recognition.md
  24. 7 2
      docs/pipeline_usage/tutorials/ocr_pipelines/table_recognition.en.md
  25. 7 2
      docs/pipeline_usage/tutorials/ocr_pipelines/table_recognition.md
  26. 7 2
      docs/pipeline_usage/tutorials/ocr_pipelines/table_recognition_v2.en.md
  27. 7 2
      docs/pipeline_usage/tutorials/ocr_pipelines/table_recognition_v2.md
  28. 0 0
      docs/practical_tutorials/high_performance_npu_tutorial.en.md
  29. 1 0
      mkdocs.yml

+ 16 - 0
docs/installation/paddlepaddle_install.en.md

@@ -56,6 +56,9 @@ python -m pip install paddlepaddle-gpu==3.0.0rc0 -i https://www.paddlepaddle.org
 # GPU, this command is only suitable for machines with CUDA version 12.3
 python -m pip install paddlepaddle-gpu==3.0.0rc0 -i https://www.paddlepaddle.org.cn/packages/stable/cu123/
 ```
+
+The Docker images support the [Paddle Inference TensorRT Subgraph Engine](https://www.paddlepaddle.org.cn/documentation/docs/en/guides/paddle_v3_features/paddle_trt_en.html) by default.
+
 Note: For more PaddlePaddle Wheel versions, please refer to the [PaddlePaddle official website](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/en/install/pip/linux-pip.html).
 
 <b>For installing PaddlePaddle on other hardware, please refer to</b> [PaddleX Multi-hardware Usage Guide](../other_devices_support/multi_devices_use_guide.en.md).
@@ -71,4 +74,17 @@ If the installation is successful, the following content will be output:
 3.0.0-rc0
 ```
 
+If you want to use the [Paddle Inference TensorRT Subgraph Engine](https://www.paddlepaddle.org.cn/documentation/docs/en/guides/paddle_v3_features/paddle_trt_en.html), after installing Paddle, you need to install the corresponding version of TensorRT by referring to the [TensorRT documentation](https://docs.nvidia.com/deeplearning/tensorrt/archives/index.html). Below is an example of installing TensorRT-8.6.1.6 using the "Tar File Installation" method in a CUDA 11.8 environment:
+
+```bash
+# Download TensorRT tar file
+wget https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/secure/8.6.1/tars/TensorRT-8.6.1.6.Linux.x86_64-gnu.cuda-11.8.tar.gz
+# Extract TensorRT tar file
+tar xvf TensorRT-8.6.1.6.Linux.x86_64-gnu.cuda-11.8.tar.gz
+# Install TensorRT wheel package
+python -m pip install TensorRT-8.6.1.6/python/tensorrt-8.6.1-cp310-none-linux_x86_64.whl
+# Add the absolute path of TensorRT's `lib` directory to LD_LIBRARY_PATH
+export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:TensorRT-8.6.1.6/lib
+```
+
 > ❗ <b>Note</b>: If you encounter any issues during the installation process, feel free to [submit an issue](https://github.com/PaddlePaddle/Paddle/issues) in the Paddle repository.

+ 16 - 0
docs/installation/paddlepaddle_install.md

@@ -57,6 +57,9 @@ python -m pip install paddlepaddle-gpu==3.0.0rc0 -i https://www.paddlepaddle.org
 # gpu,该命令仅适用于 CUDA 版本为 12.3 的机器环境
 python -m pip install paddlepaddle-gpu==3.0.0rc0 -i https://www.paddlepaddle.org.cn/packages/stable/cu123/
 ```
+
+Docker 镜像默认支持 [Paddle Inference TensorRT 子图引擎](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/paddle_v3_features/paddle_trt_cn.html)。
+
 > ❗ <b>注</b>:无需关注物理机上的 CUDA 版本,只需关注显卡驱动程序版本。更多飞桨 Wheel 版本请参考[飞桨官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)。
 
 <b>关于其他硬件安装飞桨,请参考</b>[PaddleX多硬件使用指南](../other_devices_support/multi_devices_use_guide.md)<b>。</b>
@@ -72,4 +75,17 @@ python -c "import paddle; print(paddle.__version__)"
 3.0.0-rc0
 ```
 
+如果想要使用 [Paddle Inference TensorRT 子图引擎](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/paddle_v3_features/paddle_trt_cn.html),在安装paddle后需参考 [TensorRT 文档](https://docs.nvidia.com/deeplearning/tensorrt/archives/index.html)安装相应版本的 TensorRT,下面是在 CUDA11.8 环境下使用 "Tar File Installation" 方式安装 TensoRT-8.6.1.6 的例子:
+
+```bash
+# 下载 TensorRT tar 文件
+wget https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/secure/8.6.1/tars/TensorRT-8.6.1.6.Linux.x86_64-gnu.cuda-11.8.tar.gz
+# 解压 TensorRT tar 文件
+tar xvf TensorRT-8.6.1.6.Linux.x86_64-gnu.cuda-11.8.tar.gz
+# 安装 TensorRT wheel 包
+python -m pip install TensorRT-8.6.1.6/python/tensorrt-8.6.1-cp310-none-linux_x86_64.whl
+# 添加 TensorRT 的 `lib` 目录的绝对路径到 LD_LIBRARY_PATH 中
+export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:TensorRT-8.6.1.6/lib
+```
+
 > ❗ <b>注</b>:如果在安装的过程中,出现任何问题,欢迎在Paddle仓库中[提Issue](https://github.com/PaddlePaddle/Paddle/issues)。

+ 488 - 288
docs/pipeline_deploy/high_performance_inference.en.md

@@ -4,422 +4,622 @@ comments: true
 
 # PaddleX High-Performance Inference Guide
 
-In real-world production environments, many applications have stringent standards for deployment strategy performance metrics, particularly response speed, to ensure efficient system operation and smooth user experience. To this end, PaddleX provides high-performance inference plugins designed to deeply optimize model inference and pre/post-processing, achieving significant speedups in the end-to-end process. This document will first introduce the installation and usage of the high-performance inference plugins, followed by a list of pipelines and models currently supporting the use of these plugins.
+In actual production environments, many applications have stringent standards for the performance metrics of deployment strategies (especially response speed) to ensure efficient system operation and smooth user experiences. To this end, PaddleX provides a high-performance inference plugin that significantly improves model inference speed for users without requiring them to focus on complex configurations and low-level details, through automatic configuration and multi-backend inference capabilities.
 
-## 1. Installation and Usage of High-Performance Inference Plugins
+## Table of Contents
 
-Before using the high-performance inference plugins, ensure you have completed the installation of PaddleX according to the [PaddleX Local Installation Tutorial](../installation/installation.en.md), and have successfully run the quick inference of the pipeline using either the PaddleX pipeline command line instructions or the Python script instructions.
+- [1. Basic Usage](#1.-basic-usage)
+  - [1.1 Installing the High-Performance Inference Plugin](#11-installing-the-high-performance-inference-plugin)
+  - [1.2 Enabling High-Performance Inference](#12-enabling-high-performance-inference)
+- [2. Advanced Usage](#2-advanced-usage)
+  - [2.1 High-Performance Inference Modes](#21-high-performance-inference-modes)
+  - [2.2 High-Performance Inference Configuration](#22-high-performance-inference-configuration)
+  - [2.3 Modifying High-Performance Inference Configuration](#23-modifying-high-performance-inference-configuration)
+  - [2.4 Example of Modifying High-Performance Inference Configuration](#24-example-of-modifying-high-performance-inference-configuration)
+  - [2.5 Enabling/Disabling High-Performance Inference in Sub-pipelines/Sub-modules](#25-enablingdisabling-high-performance-inference-in-sub-pipelinessub-modules)
+  - [2.6 Model Caching Instructions](#26-model-caching-instructions)
+  - [2.7 Customizing Model Inference Libraries](#27-customizing-model-inference-libraries)
+- [3. Frequently Asked Questions](#3.-frequently-asked-questions)
 
-### 1.1 Installing High-Performance Inference Plugins
+## 1. Basic Usage
 
-Find the corresponding installation command based on your processor architecture, operating system, device type, and Python version in the table below and execute it in your deployment environment. Please replace `{paddlex version number}` with the actual paddlex version number, such as the current latest stable version `3.0.0b2`. If you need to use the version corresponding to the development branch, replace `{paddlex version number}` with `0.0.0.dev0`.
+Before using the high-performance inference plugin, ensure you have completed the installation of PaddleX according to the [PaddleX Local Installation Tutorial](../installation/installation.en.md) and successfully run the quick inference using the PaddleX pipeline command-line instructions or Python script instructions.
+
+High-performance inference supports processing PaddlePaddle format models and ONNX format models. For ONNX format models, it is recommended to use the [Paddle2ONNX plugin](./paddle2onnx.en.md) for conversion. If multiple format models exist in the model directory, they will be automatically selected as needed.
+
+### 1.1 Installing the High-Performance Inference Plugin
+
+The processor architectures, operating systems, device types, and Python versions currently supported by high-performance inference are shown in the table below:
 
 <table>
   <tr>
-    <th>Processor Architecture</th>
     <th>Operating System</th>
+    <th>Processor Architecture</th>
     <th>Device Type</th>
     <th>Python Version</th>
-    <th>Installation Command</th>
   </tr>
   <tr>
-    <td rowspan="7">x86-64</td>
-    <td rowspan="7">Linux</td>
-    <td rowspan="4">CPU</td>
+    <td rowspan="5">Linux</td>
+    <td rowspan="4">x86-64</td>
   </tr>
   <tr>
-    <td>3.8</td>
-    <td>curl -s https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/deploy/paddlex_hpi/install_script/{paddlex version number}/install_paddlex_hpi.py | python3.8 - --arch x86_64 --os linux --device cpu --py 38</td>
+    <td>CPU</td>
+    <td>3.8–3.12</td>
   </tr>
   <tr>
-    <td>3.9</td>
-    <td>curl -s https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/deploy/paddlex_hpi/install_script/{paddlex version number}/install_paddlex_hpi.py | python3.9 - --arch x86_64 --os linux --device cpu --py 39</td>
+    <td>GPU (CUDA 11.8 + cuDNN 8.6)</td>
+    <td>3.8–3.12</td>
   </tr>
   <tr>
+    <td>NPU</td>
     <td>3.10</td>
-    <td>curl -s https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/deploy/paddlex_hpi/install_script/{paddlex version number}/install_paddlex_hpi.py | python3.10 - --arch x86_64 --os linux --device cpu --py 310</td>
-  </tr>
-  <tr>
-    <td rowspan="3">GPU&nbsp;(CUDA&nbsp;11.8&nbsp;+&nbsp;cuDNN&nbsp;8.6)</td>
-    <td>3.8</td>
-    <td>curl -s https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/deploy/paddlex_hpi/install_script/{paddlex version number}/install_paddlex_hpi.py | python3.8 - --arch x86_64 --os linux --device gpu_cuda118_cudnn86 --py 38</td>
-  </tr>
-  <tr>
-    <td>3.9</td>
-    <td>curl -s https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/deploy/paddlex_hpi/install_script/{paddlex version number}/install_paddlex_hpi.py | python3.9 - --arch x86_64 --os linux --device gpu_cuda118_cudnn86 --py 39</td>
   </tr>
   <tr>
+    <td>aarch64</td>
+    <td>NPU</td>
     <td>3.10</td>
-    <td>curl -s https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/deploy/paddlex_hpi/install_script/{paddlex version number}/install_paddlex_hpi.py | python3.10 - --arch x86_64 --os linux --device gpu_cuda118_cudnn86 --py 310</td>
   </tr>
 </table>
 
-* For Linux systems, execute the installation instructions using Bash.
-* When using NVIDIA GPUs, please use the installation instructions corresponding to the CUDA and cuDNN versions that match your environment. Otherwise, you will not be able to use the high-performance inference plugin properly.
-* When the device type is CPU, the installed high-performance inference plugin only supports inference using the CPU; for other device types, the installed high-performance inference plugin supports inference using the CPU or other devices.
+#### (1) Installing the High-Performance Inference Plugin Based on Docker (Highly Recommended):
+
+Refer to [Get PaddleX based on Docker](../installation/installation.en.md#21-obtaining-paddlex-based-on-docker) to use Docker to start the PaddleX container. After starting the container, execute the following commands according to the device type to install the high-performance inference plugin:
+
+<table>
+    <thead>
+        <tr>
+            <th>Device Type</th>
+            <th>Installation Command</th>
+            <th>Description</th>
+        </tr>
+    </thead>
+    <tbody>
+        <tr>
+            <td>CPU</td>
+            <td><code>paddlex --install hpi-cpu</code></td>
+            <td>Installs the CPU version of high-performance inference.</td>
+        </tr>
+        <tr>
+            <td>GPU</td>
+            <td><code>paddlex --install hpi-gpu</code></td>
+            <td>Installs the GPU version of high-performance inference.<br />Includes all features of the CPU version, no need to install the CPU version separately.</td>
+        </tr>
+        <tr>
+            <td>NPU</td>
+            <td><code>paddlex --install hpi-npu</code></td>
+            <td>Installs the NPU version of high-performance inference.<br />For usage instructions, please refer to the <a href="../practical_tutorials/high_performance_npu_tutorial.en.md">Ascend NPU High-Performance Inference Tutorial</a>.</td>
+        </tr>
+    </tbody>
+</table>
 
-### 1.2 Obtaining Serial Numbers and Activation
+#### (2) Local Installation of High-Performance Inference Plugin:
 
-On the [Baidu AIStudio Community - AI Learning and Training Platform](https://aistudio.baidu.com/paddlex/commercialization) page, under the "Open-source Pipeline Deployment Serial Number Inquiry and Acquisition" section, select "Acquire Now" as shown in the following image:
+After locally [installing CUDA 11.8](https://developer.nvidia.com/cuda-11-8-0-download-archive) and [installing cuDNN 8.6](https://docs.nvidia.com/deeplearning/cudnn/archives/cudnn-860/install-guide/index.html), execute the above installation commands.
 
-<img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/pipeline_deploy/image-1.png">
+**Notes**:
 
-Select the pipeline you wish to deploy and click "Acquire". Afterwards, you can find the acquired serial number in the "Open-source Pipeline Deployment SDK Serial Number Management" section at the bottom of the page:
+1. **GPU only supports CUDA 11.8 + cuDNN 8.6**, and CUDA 12.6 is under support.
 
-<img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/pipeline_deploy/image-2.png">
+2. Only one version of the high-performance inference plugin can exist in the same environment.
 
-After using the serial number to complete activation, you can utilize high-performance inference plugins. PaddleX provides both online and offline activation methods (both only support Linux systems):
+3. For NPU device usage instructions, refer to the [Ascend NPU High-Performance Inference Tutorial](../practical_tutorials/high_performance_npu_tutorial.en.md).
 
-* Online Activation: When using the inference API or CLI, specify the serial number and enable online activation to automatically complete the process.
-* Offline Activation: Follow the instructions in the serial number management interface (click "Offline Activation" under "Operations") to obtain the device fingerprint of your machine. Bind the serial number with the device fingerprint to obtain a certificate and complete the activation. For this activation method, you need to manually store the certificate in the `${HOME}/.baidu/paddlex/licenses` directory on the machine (create the directory if it does not exist) and specify the serial number when using the inference API or CLI.
+4. Windows only supports installing and using the high-performance inference plugin based on Docker.
 
-Please note: Each serial number can only be bound to a unique device fingerprint and can only be bound once. This means that if users deploy models on different machines, they must prepare separate serial numbers for each machine.
+### 1.2 Enabling High-Performance Inference
 
-### 1.3 Enabling High-Performance Inference Plugins
+Below are examples of enabling high-performance inference in the general image classification pipeline and image classification module using PaddleX CLI and Python API.
 
-For Linux systems, if using the high-performance inference plugin in a Docker container, please mount the host machine's `/dev/disk/by-uuid` and `${HOME}/.baidu/paddlex/licenses` directories to the container.
+For PaddleX CLI, specify `--use_hpip` to enable high-performance inference.
 
-For PaddleX CLI, specify `--use_hpip` and set the serial number to enable the high-performance inference plugin. If you wish to activate the license online, specify `--update_license` when using the serial number for the first time. Taking the general image classification pipeline as an example:
+General Image Classification Pipeline:
 
 ```bash
 paddlex \
     --pipeline image_classification \
     --input https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_image_classification_001.jpg \
     --device gpu:0 \
-    --use_hpip \
-    --serial_number {serial number}
+    --use_hpip
+```
 
-# If you wish to perform online activation
-paddlex \
-    --pipeline image_classification \
-    --input https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_image_classification_001.jpg \
-    --device gpu:0 \
-    --use_hpip \
-    --serial_number {serial number} \
-    --update_license
+Image Classification Module:
+
+```bash
+python main.py \
+    -c paddlex/configs/modules/image_classification/ResNet18.yaml \
+    -o Global.mode=predict \
+    -o Predict.model_dir=None \
+    -o Predict.input=https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_image_classification_001.jpg \
+    -o Global.device=gpu:0 \
+    -o Predict.use_hpip=True
 ```
 
-For PaddleX Python API, enabling the high-performance inference plugin is similar. Still taking the general image classification pipeline as an example:
+For the PaddleX Python API, the method to enable high-performance inference is similar. Taking the General Image Classification Pipeline and Image Classification Module as examples:
+
+General Image Classification Pipeline:
 
 ```python
 from paddlex import create_pipeline
 
 pipeline = create_pipeline(
     pipeline="image_classification",
-    use_hpip=True,
-    hpi_params={"serial_number": "{serial number}"},
+    device="gpu",
+    use_hpip=True
 )
 
 output = pipeline.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_image_classification_001.jpg")
 ```
 
-The inference results obtained with the high-performance inference plugin enabled are consistent with those without the plugin enabled. For some models, enabling the high-performance inference plugin for the first time may take a longer time to complete the construction of the inference engine. PaddleX will cache the relevant information in the model directory after the first construction of the inference engine and reuse the cached content in subsequent runs to improve initialization speed.
+Image Classification Module:
 
-### 1.4 Modifying High-Performance Inference Configurations
+```python
+from paddlex import create_model
 
-PaddleX combines model information and runtime environment information to provide default high-performance inference configurations for each model. These default configurations are carefully prepared to be applicable in several common scenarios and achieve relatively optimal performance. Therefore, users typically may not need to be concerned with the specific details of these configurations. However, due to the diversity of actual deployment environments and requirements, the default configuration may not yield ideal performance in certain scenarios and could even result in inference failures. In cases where the default configuration does not meet the requirements, users can manually adjust the configuration by modifying the Hpi field in the inference.yml file within the model directory (if this field does not exist, it needs to be added). The following are two common situations:
+model = create_model(
+    model_name="ResNet18",
+    device="gpu",
+    use_hpip=True
+)
 
-- Switching inference backends:
+output = model.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_image_classification_001.jpg")
+```
 
-    When the default inference backend is not available, the inference backend needs to be switched manually. Users should modify the `selected_backends` field (if it does not exist, it needs to be added).
+The inference results obtained with the high-performance inference plugin enabled are consistent with those without the plugin. For some models, **it may take a longer time to complete the construction of the inference engine when enabling the high-performance inference plugin for the first time**. PaddleX will cache relevant information in the model directory after the first construction of the inference engine and reuse the cached content in subsequent runs to improve initialization speed.
 
-    ```yaml
-    Hpi:
-      ...
-      selected_backends:
-        cpu: paddle_infer
-        gpu: onnx_runtime
-      ...
-    ```
+**Enabling high-performance inference by default affects the entire pipeline/module**. If you want to control the scope of application with finer granularity, such as enabling the high-performance inference plugin for only a specific sub-pipeline or sub-module within the pipeline, you can set `use_hpip` at different levels of configuration in the pipeline configuration file. Please refer to [2.5 Enabling/Disabling High-Performance Inference in Sub-pipelines/Sub-modules](#25-enablingdisabling-high-performance-inference-in-sub-pipelinessub-modules).
 
-    Each entry should follow the format `{device type}: {inference backend name}`.
+## 2. Advanced Usage
 
-    The currently available inference backends are:
+This section introduces the advanced usage of high-performance inference, suitable for users who have some understanding of model deployment or wish to manually configure and optimize. Users can customize the use of high-performance inference based on their own needs by referring to the configuration instructions and examples. Next, the advanced usage methods will be introduced in detail.
 
-    * `paddle_infer`: The Paddle Inference engine. Supports CPU and GPU. Compared to the PaddleX quick inference, TensorRT subgraphs can be integrated to enhance inference performance on GPUs.
-    * `openvino`: [OpenVINO](https://github.com/openvinotoolkit/openvino), a deep learning inference tool provided by Intel, optimized for model inference performance on various Intel hardware. Supports CPU only. The high-performance inference plugin automatically converts the model to the ONNX format and uses this engine for inference.
-    * `onnx_runtime`: [ONNX Runtime](https://onnxruntime.ai/), a cross-platform, high-performance inference engine. Supports CPU and GPU. The high-performance inference plugin automatically converts the model to the ONNX format and uses this engine for inference.
-    * `tensorrt`: [TensorRT](https://developer.nvidia.com/tensorrt), a high-performance deep learning inference library provided by NVIDIA, optimized for NVIDIA GPUs to improve speed. Supports GPU only. The high-performance inference plugin automatically converts the model to the ONNX format and uses this engine for inference.
+### 2.1 High-Performance Inference Modes
 
-- Modifying dynamic shape configurations for Paddle Inference or TensorRT:
+High-performance inference is divided into two modes:
 
-    Dynamic shape is the ability of TensorRT to defer specifying parts or all of a tensor’s dimensions until runtime. If the default dynamic shape configuration does not meet requirements (e.g., the model may require input shapes beyond the default range), users need to modify the `trt_dynamic_shapes` or `dynamic_shapes` field in the inference backend configuration:
+#### (1) Safe Auto-Configuration Mode
 
-    ```yaml
-    Hpi:
-      ...
-      backend_configs:
-        # Configuration for the Paddle Inference backend
-        paddle_infer:
-          ...
-          trt_dynamic_shapes:
-            x:
-              - [1, 3, 300, 300]
-              - [4, 3, 300, 300]
-              - [32, 3, 1200, 1200]
-          ...
-        # Configuration for the TensorRT backend
-        tensorrt:
-          ...
-          dynamic_shapes:
-            x:
-              - [1, 3, 300, 300]
-              - [4, 3, 300, 300]
-              - [32, 3, 1200, 1200]
-          ...
-    ```
+The safe auto-configuration mode has a protection mechanism and **automatically selects the configuration with better performance for the current environment by default**. In this mode, users can override the default configuration, but the provided configuration will be checked, and PaddleX will reject unavailable configurations based on prior knowledge. This is the default mode.
 
-    In `trt_dynamic_shapes` or `dynamic_shapes`, each input tensor requires a specified dynamic shape in the format: `{input tensor name}: [{minimum shape}, [{optimal shape}], [{maximum shape}]]`. For details on minimum, optimal, and maximum shapes and further information, please refer to the official TensorRT documentation.
+#### (2) Unrestricted Manual Configuration Mode
 
-    After completing the modifications, please delete the cache files in the model directory (`shape_range_info.pbtxt` and files starting with `trt_serialized`).
+The unrestricted manual configuration mode provides complete configuration freedom, allowing **free selection of the inference backend and modification of backend configurations**, but cannot guarantee successful inference. This mode is suitable for experienced users with specific needs for the inference backend and its configurations and is recommended for use after familiarizing with high-performance inference.
 
-## 2. Pipelines and Models Supporting High-Performance Inference Plugins
+### 2.2 High-Performance Inference Configuration
+
+Common high-performance inference configurations include the following fields:
 
 <table>
-  <tr>
-    <th>Pipeline</th>
-    <th>Module</th>
-    <th>Model Support List</th>
-  </tr>
+<thead>
+<tr>
+<th>Parameter</th>
+<th>Description</th>
+<th>Type</th>
+<th>Default Value</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td><code>auto_config</code></td>
+<td>Whether to enable the safe auto-configuration mode.<br /><code>True</code> to enable, <code>False</code> to enable the unrestricted manual configuration mode.</td>
+<td><code>bool</code></td>
+<td><code>True</code></td>
+</tr>
+<tr>
+  <td><code>backend</code></td>
+  <td>Specifies the inference backend to use. Cannot be <code>None</code> in unrestricted manual configuration mode.</td>
+  <td><code>str | None</code></td>
+  <td><code>None</code></td>
+</tr>
+<tr>
+  <td><code>backend_config</code></td>
+  <td>The configuration of the inference backend, which can override the default configuration items of the backend if it is not <code>None</code>.</td>
+  <td><code>dict | None</code></td>
+  <td><code>None</code></td>
+</tr>
+<tr>
+  <td><code>auto_paddle2onnx</code></td>
+  <td>Whether to enable the <a href="./paddle2onnx.en.md">Paddle2ONNX plugin</a> to automatically convert Paddle models to ONNX models.</td>
+  <td><code>bool</code></td>
+  <td><code>True</code></td>
+</tr>
+</tbody>
+</table>
 
-  <tr>
-    <td rowspan="2">OCR</td>
-    <td>Text Detection</td>
-    <td>✅</td>
-  </tr>
+The available options for `backend` are shown in the following table:
 
+<table>
   <tr>
-    <td>Text Recognition</td>
-    <td>✅</td>
+    <th>Option</th>
+    <th>Description</th>
+    <th>Supported Devices</th>
   </tr>
-
   <tr>
-    <td rowspan="7">PP-ChatOCRv3-doc</td>
-    <td>Table Recognition</td>
-    <td>✅</td>
+    <td><code>paddle</code></td>
+    <td>Paddle Inference engine, supporting the Paddle Inference TensorRT subgraph engine to improve GPU inference performance of models.</td>
+    <td>CPU, GPU</td>
   </tr>
-
   <tr>
-    <td>Layout Detection</td>
-    <td>✅</td>
+    <td><code>openvino</code></td>
+    <td><a href="https://github.com/openvinotoolkit/openvino">OpenVINO</a>, a deep learning inference tool provided by Intel, optimized for model inference performance on various Intel hardware.</td>
+    <td>CPU</td>
   </tr>
-
   <tr>
-    <td>Text Detection</td>
-    <td>✅</td>
+    <td><code>onnxruntime</code></td>
+    <td><a href="https://onnxruntime.ai/">ONNX Runtime</a>, a cross-platform, high-performance inference engine.</td>
+    <td>CPU, GPU</td>
   </tr>
-
   <tr>
-    <td>Text Recognition</td>
-    <td>✅</td>
+    <td><code>tensorrt</code></td>
+    <td><a href="https://developer.nvidia.com/tensorrt">TensorRT</a>, a high-performance deep learning inference library provided by NVIDIA, optimized for NVIDIA GPUs to improve speed.</td>
+    <td>GPU</td>
   </tr>
-
   <tr>
-    <td>Seal Text Detection</td>
-    <td>✅</td>
+    <td><code>om</code></td>
+    <td>OM, a inference engine of offline model format customized for Huawei Ascend NPU, deeply optimized for hardware to reduce operator computation time and scheduling time, effectively improving inference performance.</td>
+    <td>NPU</td>
   </tr>
+</table>
 
-  <tr>
-    <td>Text Image Unwarping</td>
-    <td>✅</td>
-  </tr>
+The available values for `backend_config` vary depending on the backend, as shown in the following table:
 
+<table>
   <tr>
-    <td>Document Image Orientation Classification</td>
-    <td>✅</td>
+    <th>Backend</th>
+    <th>Available Values</th>
   </tr>
-
   <tr>
-    <td rowspan="4">Table Recognition</td>
-    <td>Layout Detection</td>
-    <td>✅</td>
+    <td><code>paddle</code></td>
+    <td>Refer to <a href="../module_usage/instructions/model_python_API.en.md">PaddleX Single Model Python Usage Instructions: 4. Inference Backend Configuration</a>.</td>
   </tr>
-
   <tr>
-    <td>Table Recognition</td>
-    <td></td>
+    <td><code>openvino</code></td>
+    <td><code>cpu_num_threads</code>: The number of logical processors used for CPU inference. Default is <code>8</code>.</td>
   </tr>
-
   <tr>
-    <td>Text Detection</td>
-    <td></td>
+    <td><code>onnxruntime</code></td>
+    <td><code>cpu_num_threads</code>: The number of parallel computing threads within operators for CPU inference. Default is <code>8</code>.</td>
   </tr>
-
   <tr>
-    <td>Text Recognition</td>
-    <td>✅</td>
-  </tr>
-
+    <td><code>tensorrt</code></td>
+    <td>
+      <code>precision</code>: The precision used, <code>fp16</code> or <code>fp32</code>. Default is <code>fp32</code>.
+      <br />
+      <code>dynamic_shapes</code>: Dynamic shapes. Dynamic shapes include minimum shape, optimal shape, and maximum shape, which represent TensorRT’s ability to defer specifying some or all tensor dimensions until runtime. The format is:<code>{input tensor name}: [{minimum shape}, [{optimal shape}], [{maximum shape}]]</code>. For more information, please refer to the  <a href="https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#work_dynamic_shapes">TensorRT official documentation.</a>。
   <tr>
-    <td>Object Detection</td>
-    <td>Object Detection</td>
-    <td>FasterRCNN-Swin-Tiny-FPN ❌<br>CenterNet-DLA-34 ❌ <br>CenterNet-ResNet50 ❌</td>
+    <td><code>om</code></td>
+    <td>None</td>
   </tr>
+</table>
 
-  <tr>
-    <td>Instance Segmentation</td>
-    <td>Instance Segmentation</td>
-    <td>Mask-RT-DETR-S ❌</td>
-  </tr>
+### 2.3 How to Modify High-Performance Inference Configuration
 
-  <tr>
-    <td>Image Classification</td>
-    <td>Image Classification</td>
-    <td>✅</td>
-  </tr>
+Due to the diversity of actual deployment environments and requirements, the default configuration may not meet all needs. In such cases, manual adjustments to the high-performance inference configuration may be necessary. Here are two common scenarios:
 
-  <tr>
-    <td>Semantic Segmentation</td>
-    <td>Semantic Segmentation</td>
-    <td>✅</td>
-  </tr>
+- Needing to change the inference backend.
+  - For example, in an OCR pipeline, specifying the `text_detection` module to use the `onnxruntime` backend and the `text_recognition` module to use the `tensorrt` backend.
 
-  <tr>
-    <td>Time Series Forecasting</td>
-    <td>Time Series Forecasting</td>
-    <td>❌</td>
-  </tr>
+- Needing to modify the dynamic shape configuration for TensorRT:
+  - When the default dynamic shape configuration cannot meet requirements (e.g., the model may require input shapes outside the specified range), dynamic shapes need to be specified for each input tensor. After modification, the model's `.cache` directory should be cleaned up.
 
-  <tr>
-    <td>Time Series Anomaly Detection</td>
-    <td>Time Series Anomaly Forecasting</td>
-    <td>❌</td>
-  </tr>
+In these scenarios, users can modify the configuration by altering the `hpi_config` field in the **pipeline/module configuration file**, **CLI** parameters, or **Python API** parameters. **Parameters passed through CLI or Python API will override settings in the pipeline/module configuration file**.
 
-  <tr>
-    <td>Time Series Classification</td>
-    <td>Time Series Classification</td>
-    <td>❌</td>
-  </tr>
+### 2.4 Examples of Modifying High-Performance Inference Configuration
 
-  <tr>
-    <td>Small Object Detection</td>
-    <td>Small Object Detection</td>
-    <td>✅</td>
-  </tr>
+#### (1) Changing the Inference Backend
 
-  <tr>
-    <td>Multi-Label Image Classification</td>
-    <td>Multi-Label Image  Classification</td>
-    <td>✅</td>
-  </tr>
+##### Using the `onnxruntime` backend for all models in a general OCR pipeline:
 
-  <tr>
-    <td>Image Anomaly Detection</td>
-    <td>Unsupervised Anomaly Detection</td>
-    <td>✅</td>
-  </tr>
+<details><summary>👉 1. Modifying the pipeline configuration file (click to expand)</summary>
 
-  <tr>
-    <td rowspan="8">Layout Parsing</td>
-    <td>Table Structure Recognition</td>
-    <td>✅</td>
-  </tr>
+```yaml
+pipeline_name: OCR
 
-  <tr>
-    <td>Layout Region Analysis</td>
-    <td>✅</td>
-  </tr>
+use_hpip: True
+hpi_config:
+    backend: onnxruntime
 
-  <tr>
-    <td>Text Detection</td>
-    <td>✅</td>
-  </tr>
+...
+```
 
-  <tr>
-    <td>Text Recognition</td>
-    <td>✅</td>
-  </tr>
+</details>
+<details><summary>👉 2. CLI parameter passing method (click to expand)</summary>
 
-  <tr>
-    <td>Formula Recognition</td>
-    <td>✅</td>
-  </tr>
+```bash
+paddlex \
+      --pipeline image_classification \
+      --input https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_image_classification_001.jpg \
+      --device gpu:0 \
+      --use_hpip \
+      --hpi_config '{"backend": "onnxruntime"}'
+```
 
-  <tr>
-    <td>Seal Text Detection</td>
-    <td>✅</td>
-  </tr>
+</details>
+<details><summary>👉 3. Python API parameter passing method (click to expand)</summary>
 
-  <tr>
-    <td>Text Image Unwarping</td>
-    <td>✅</td>
-  </tr>
+```python
+from paddlex import create_pipeline
 
-  <tr>
-    <td>Document Image Orientation Classification</td>
-    <td>✅</td>
-  </tr>
+pipeline = create_pipeline(
+      pipeline="OCR",
+      device="gpu",
+      use_hpip=True,
+      hpi_config={"backend": "onnxruntime"}
+)
+```
 
-  <tr>
-    <td rowspan="2">Formula Recognition</td>
-    <td>Layout Detection</td>
-    <td>✅</td>
-  </tr>
+</details>
 
-  <tr>
-    <td>Formula Recognition</td>
-    <td>✅</td>
-  </tr>
+##### Using the `onnxruntime` backend for the image classification module:
 
-  <tr>
-    <td rowspan="3">Seal Recognition</td>
-    <td>Layout Region Analysis</td>
-    <td>✅</td>
-  </tr>
+<details><summary>👉 1. Modifying the module configuration file (click to expand)</summary>
 
-  <tr>
-    <td>Seal Text Detection</td>
-    <td>✅</td>
-  </tr>
+```yaml
+# paddlex/configs/modules/image_classification/ResNet18.yaml
+...
+Predict:
+    ...
+    use_hpip: True
+    hpi_config:
+        backend: onnxruntime
+    ...
+...
+```
 
-  <tr>
-    <td>Text Recognition</td>
-    <td>✅</td>
-  </tr>
+</details>
+<details><summary>👉 2. CLI parameter passing method (click to expand)</summary>
 
-  <tr>
-    <td rowspan="2">Image Recognition</td>
-    <td>Subject Detection</td>
-    <td>✅</td>
-  </tr>
+```bash
+python main.py \
+      -c paddlex/configs/modules/image_classification/ResNet18.yaml \
+      -o Global.mode=predict \
+      -o Predict.model_dir=None \
+      -o Predict.input=https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_image_classification_001.jpg \
+      -o Global.device=gpu:0 \
+      -o Predict.use_hpip=True \
+      -o Predict.hpi_config='{"backend": "onnxruntime"}'
+```
 
-  <tr>
-    <td>Image Feature</td>
-    <td>✅</td>
-  </tr>
+</details>
+<details><summary>👉 3. Python API parameter passing method (click to expand)</summary>
 
-  <tr>
-    <td rowspan="2">Pedestrian Attribute Recognition</td>
-    <td>Pedestrian Detection</td>
-    <td>❌</td>
-  </tr>
+```python
+from paddlex import create_model
 
-  <tr>
-    <td>Pedestrian Attribute Recognition</td>
-    <td>❌</td>
-  </tr>
+model = create_model(
+      model_name="ResNet18",
+      device="gpu",
+      use_hpip=True,
+      hpi_config={"backend": "onnxruntime"}
+)
+```
 
-  <tr>
-    <td rowspan="2">Vehicle Attribute Recognition</td>
-    <td>Vehicle Detection</td>
-    <td>❌</td>
-  </tr>
+</details>
+
+##### Using the `onnxruntime` backend for the `text_detection` module and the `tensorrt` backend for the `text_recognition` module in a general OCR pipeline:
+
+<details><summary>👉 1. Modifying the pipeline configuration file (click to expand)</summary>
+
+```yaml
+pipeline_name: OCR
+
+...
+
+SubModules:
+    TextDetection:
+      module_name: text_detection
+      model_name: PP-OCRv4_mobile_det
+      model_dir: null
+      limit_side_len: 960
+      limit_type: max
+      thresh: 0.3
+      box_thresh: 0.6
+      unclip_ratio: 2.0
+      # Enable high-performance inference for the current submodule
+      use_hpip: True
+      # High-performance inference configuration for the current submodule
+      hpi_config:
+          backend: onnxruntime
+    TextLineOrientation:
+      module_name: textline_orientation
+      model_name: PP-LCNet_x0_25_textline_ori
+      model_dir: null
+      batch_size: 6
+    TextRecognition:
+      module_name: text_recognition
+      model_name: PP-OCRv4_mobile_rec
+      model_dir: null
+      batch_size: 6
+      score_thresh: 0.0
+      # Enable high-performance inference for the current submodule
+      use_hpip: True
+      # High-performance inference configuration for the current submodule
+      hpi_config:
+          backend: tensorrt
+```
 
-  <tr>
-    <td>Vehicle Attribute Recognition</td>
-    <td>❌</td>
-  </tr>
+</details>
+
+#### (2) Modify TensorRT's Dynamic Shape Configuration
+
+##### Modifying dynamic shape configuration for general image classification pipeline:
+
+<details><summary>👉 Click to Expand</summary>
+
+```yaml
+    ...
+    SubModules:
+      ImageClassification:
+        ...
+        hpi_config:
+          backend: tensorrt
+          backend_config:
+            precision: fp32
+            dynamic_shapes:
+              x:
+                - [1, 3, 300, 300]
+                - [4, 3, 300, 300]
+                - [32, 3, 1200, 1200]
+              ...
+    ...
+```
 
-  <tr>
-    <td rowspan="2">Face Recognition</td>
-    <td>Face Detection</td>
-    <td>✅</td>
-  </tr>
+</details>
 
-  <tr>
-    <td>Face Feature</td>
-    <td>✅</td>
-  </tr>
+##### Modifying dynamic shape configuration for image classification module:
+
+<details><summary>👉 Click to Expand</summary>
+
+```yaml
+...
+Predict:
+    ...
+    use_hpip: True
+    hpi_config:
+        backend: tensorrt
+        backend_config:
+          precision: fp32
+          dynamic_shapes:
+            x:
+              - [1, 3, 300, 300]
+              - [4, 3, 300, 300]
+              - [32, 3, 1200, 1200]
+    ...
+...
+```
+
+</details>
+
+### 2.5 Enabling/Disabling High-Performance Inference in Sub-pipelines/Sub-modules
+
+High-performance inference support allows **only specific sub-pipelines/sub-modules within a pipeline to use high-performance inference** by utilizing `use_hpip` at the sub-pipeline/sub-module level. Examples are as follows:
+
+##### Enabling High-Performance Inference for the `text_detection` module in general OCR pipeline, while disabling it for the `text_recognition` module:
+
+<details><summary>👉 Click to Expand</summary>
+
+```yaml
+pipeline_name: OCR
+
+...
+
+SubModules:
+    TextDetection:
+      module_name: text_detection
+      model_name: PP-OCRv4_mobile_det
+      model_dir: null
+      limit_side_len: 960
+      limit_type: max
+      thresh: 0.3
+      box_thresh: 0.6
+      unclip_ratio: 2.0
+      use_hpip: True # Enable high-performance inference for the current sub-module
+    TextLineOrientation:
+      module_name: textline_orientation
+      model_name: PP-LCNet_x0_25_textline_ori
+      model_dir: null
+      batch_size: 6
+    TextRecognition:
+      module_name: text_recognition
+      model_name: PP-OCRv4_mobile_rec
+      model_dir: null
+      batch_size: 6
+      score_thresh: 0.0
+      use_hpip: False # Disable high-performance inference for the current sub-module
+```
+
+</details>
+
+**Notes**:
+
+1. When setting `use_hpip` in a sub-pipeline or sub-module, the deepest-level configuration takes precedence.
+
+2. **It is strongly recommended to enable high-performance inference by modifying the pipeline configuration file**, rather than using CLI or Python API settings. Enabling `use_hpip` through CLI or Python API is equivalent to setting `use_hpip` at the top level of the configuration file.
+
+### 2.6 Model Cache Description
+
+The model cache will be stored in the `.cache` directory under the model directory, including files such as `shape_range_info.pbtxt` and those prefixed with `trt_serialized` generated when using the `tensorrt` or `paddle` backend.
+
+When the `auto_paddle2onnx` option is enabled, an `inference.onnx` file may be automatically generated in the model directory.
+
+### 2.7 Custom Model Inference Library
 
+`ultra-infer` is the underlying model inference library for high-performance inference, located in the `PaddleX/libs/ultra-infer` directory. The compilation script is located at `PaddleX/libs/ultra-infer/scripts/linux/set_up_docker_and_build_py.sh`. The default compilation builds the GPU version and includes OpenVINO, TensorRT, and ONNX Runtime as inference backends for `ultra-infer`.
+
+When compiling customized versions, you can modify the following options as needed:
+
+<table>
+    <thead>
+        <tr>
+            <th>Option</th>
+            <th>Description</th>
+        </tr>
+    </thead>
+    <tbody>
+        <tr>
+            <td>http_proxy</td>
+            <td>Use a specific HTTP proxy when downloading third-party libraries, default is empty</td>
+        </tr>
+        <tr>
+            <td>PYTHON_VERSION</td>
+            <td>Python version, default is <code>3.10.0</code></td>
+        </tr>
+        <tr>
+            <td>WITH_GPU</td>
+            <td>Whether to compile support for Nvidia-GPU, default is <code>ON</code></td>
+        </tr>
+        <tr>
+            <td>ENABLE_ORT_BACKEND</td>
+            <td>Whether to compile and integrate the ONNX Runtime backend, default is <code>ON</code></td>
+        </tr>
+        <tr>
+            <td>ENABLE_TRT_BACKEND</td>
+            <td>Whether to compile and integrate the TensorRT backend (GPU only), default is <code>ON</code></td>
+        </tr>
+        <tr>
+            <td>ENABLE_OPENVINO_BACKEND</td>
+            <td>Whether to compile and integrate the OpenVINO backend (CPU only), default is <code>ON</code></td>
+        </tr>
+    </tbody>
 </table>
+
+Compilation Example:
+
+```shell
+# Compilation
+# export PYTHON_VERSION=...
+# export WITH_GPU=...
+# export ENABLE_ORT_BACKEND=...
+# export ...
+
+cd PaddleX/libs/ultra-infer/scripts/linux
+bash set_up_docker_and_build_py.sh
+
+# Installation
+python -m pip install ../../python/dist/ultra_infer*.whl
+```
+
+## 3. Frequently Asked Questions
+
+**1. Why is the inference speed similar to regular inference after using the high-performance inference feature?**
+
+High-performance inference accelerates inference by intelligently selecting backends, but due to factors such as model complexity or unsupported operators, some models may not be able to use accelerated backends (like OpenVINO, TensorRT, etc.). In such cases, relevant information will be prompted in the logs, and the **fastest available backend** known will be selected, potentially reverting to regular inference.
+
+The high-performance inference plugin accelerates inference by intelligently selecting the backend.
+
+For modules, due to model complexity or unsupported operators, some models may not be able to use accelerated backends (such as OpenVINO, TensorRT, etc.). In such cases, relevant information will be prompted in the logs, and the **fastest available backend** known will be selected, potentially falling back to regular inference.
+
+For pipelines, the performance bottleneck may not be in the model inference stage.
+
+You can use the [PaddleX benchmark](../module_usage/instructions/benchmark.md) tool to conduct actual speed tests for a more accurate performance assessment.
+
+**2. Does the high-performance inference feature support all model pipelines and modules?**
+
+The high-performance inference feature supports all model pipelines and modules, but some models may not experience accelerated inference. Specific reasons can be referred to in Question 1.
+
+**3. Why does the installation of the high-performance inference plugin fail, with the log displaying: "Currently, the CUDA version must be 11.x for GPU devices."?**
+
+The environments supported by the high-performance inference feature are shown in [the table in Section 1.1](#11-installing-the-high-performance-inference-plugin). If the installation fails, it may be due to the high-performance inference feature not supporting the current environment. Additionally, CUDA 12.6 is already under support.
+
+**4. Why does the program get stuck or display WARNING and ERROR messages when using the high-performance inference feature? How should this be handled?**
+
+During engine construction, due to subgraph optimization and operator processing, the program may take longer and generate WARNING and ERROR messages. However, as long as the program does not exit automatically, it is recommended to wait patiently as the program will usually continue to run until completion.

+ 433 - 709
docs/pipeline_deploy/high_performance_inference.md

@@ -4,52 +4,43 @@ comments: true
 
 # PaddleX 高性能推理指南
 
-在实际生产环境中,许多应用对部署策略的性能指标(尤其是响应速度)有着较严苛的标准,以确保系统的高效运行与用户体验的流畅性。为此,PaddleX 提供高性能推理插件,旨在对模型推理及前后处理进行深度性能优化,实现端到端流程的显著提速。本文档将首先介绍高性能推理插件的安装和使用方式,然后列举目前支持使用高性能推理插件的产线与模型
+在实际生产环境中,许多应用对部署策略的性能指标(尤其是响应速度)有着较严苛的标准,以确保系统的高效运行与用户体验的流畅性。为此,PaddleX 提供高性能推理插件,通过自动配置和多后端推理功能,让用户无需关注复杂的配置和底层细节,即可显著提升模型的推理速度
 
 ## 目录
 
 - [1. 基础使用方法](#1.-基础使用方法)
   - [1.1 安装高性能推理插件](#1.1-安装高性能推理插件)
   - [1.2 启用高性能推理插件](#1.2-启用高性能推理插件)
-- [2. 进阶使用方法](#2.-进阶使用方法)
-  - [2.1 修改高性能推理配置](#2.1-修改高性能推理配置)
-  - [2.2 二次开发高性能推理插件](#2.2-二次开发高性能推理插件)
-- [3. 支持使用高性能推理插件的产线与模型](#3.-支持使用高性能推理插件的产线与模型)
+- [2. 进阶使用方法](#2-进阶使用方法)
+  - [2.1 高性能推理工作模式](#21-高性能推理工作模式)
+  - [2.2 高性能推理配置](#22-高性能推理配置)
+  - [2.3 如何修改高性能推理配置](#23-如何修改高性能推理配置)
+  - [2.4 修改高性能推理配置示例](#24-修改高性能推理配置示例)
+  - [2.5 高性能推理在子产线/子模块中的启用/禁用](#25-高性能推理在子产线子模块中的启用禁用)
+  - [2.6 模型缓存说明](#26-模型缓存说明)
+  - [2.7 定制模型推理库](#27-定制模型推理库)
+- [3. 常见问题](#3.-常见问题)
 
 ## 1. 基础使用方法
 
 使用高性能推理插件前,请确保您已经按照[PaddleX本地安装教程](../installation/installation.md) 完成了PaddleX的安装,且按照PaddleX产线命令行使用说明或PaddleX产线Python脚本使用说明跑通了产线的快速推理。
 
-### 1.1 安装高性能推理插件
-
-* 注意:若您使用的是 Windows 系统,请参考[PaddleX本地安装教程——2.1基于Docker获取PaddleX](../installation/installation.md#21-基于docker获取paddlex) 使用 Docker 启动 PaddleX 容器。启动容器后,您可以继续阅读本指南以使用高性能推理。
-
-根据设备类型,执行如下指令,安装高性能推理插件:
-
-如果你的设备是 CPU,请使用以下命令安装 PaddleX 的 CPU 版本:
-
-```bash
-paddlex --install hpi-cpu
-```
-
-如果你的设备是 GPU,请使用以下命令安装 PaddleX 的 GPU 版本。请注意,GPU 版本包含了 CPU 版本的所有功能,因此无需单独安装 CPU 版本:
+高性能推理支持处理 PaddlePaddle 格式模型和 ONNX 格式模型,对于 ONNX 格式模型建议使用[Paddle2ONNX 插件](./paddle2onnx.md)转换得到。如果模型目录中存在多种格式的模型,会根据需要自动选择。
 
-```bash
-paddlex --install hpi-gpu
-```
+### 1.1 安装高性能推理插件
 
 目前高性能推理支持的处理器架构、操作系统、设备类型和 Python 版本如下表所示:
 
 <table>
   <tr>
-    <th>处理器架构</th>
     <th>操作系统</th>
+    <th>处理器架构</th>
     <th>设备类型</th>
     <th>Python 版本</th>
   </tr>
   <tr>
+    <td rowspan="5">Linux</td>
     <td rowspan="4">x86-64</td>
-    <td rowspan="4">Linux</td>
   </tr>
   <tr>
     <td>CPU</td>
@@ -59,11 +50,69 @@ paddlex --install hpi-gpu
     <td>GPU&nbsp;(CUDA&nbsp;11.8&nbsp;+&nbsp;cuDNN&nbsp;8.6)</td>
     <td>3.8–3.12</td>
   </tr>
+  <tr>
+    <td>NPU</td>
+    <td>3.10</td>
+  </tr>
+  <tr>
+    <td>aarch64</td>
+    <td>NPU</td>
+    <td>3.10</td>
+  </tr>
 </table>
 
+#### (1) 基于 Docker 安装高性能推理插件(强烈推荐):
+
+参考 [基于Docker获取PaddleX](../installation/installation.md#21-基于docker获取paddlex) 使用 Docker 启动 PaddleX 容器。启动容器后,根据设备类型,执行如下指令,安装高性能推理插件:
+
+  <table>
+      <thead>
+          <tr>
+              <th>设备类型</th>
+              <th>安装指令</th>
+              <th>说明</th>
+          </tr>
+      </thead>
+      <tbody>
+          <tr>
+              <td>CPU</td>
+              <td><code>paddlex --install hpi-cpu</code></td>
+              <td>安装 CPU 版本的高性能推理功能。</td>
+          </tr>
+          <tr>
+              <td>GPU</td>
+              <td><code>paddlex --install hpi-gpu</code></td>
+              <td>安装 GPU 版本的高性能推理功能。<br />包含了 CPU 版本的所有功能,无需再单独安装 CPU 版本。</td>
+          </tr>
+          <tr>
+              <td>NPU</td>
+              <td><code>paddlex --install hpi-npu</code></td>
+              <td>安装 NPU 版本的高性能推理功能。<br />使用说明请参考<a href="../practical_tutorials/high_performance_npu_tutorial.md">昇腾 NPU 高性能推理教程</a>。</td>
+          </tr>
+      </tbody>
+  </table>
+
+#### (2) 本地安装高性能推理插件:
+
+需要本地 [安装CUDA 11.8](https://developer.nvidia.com/cuda-11-8-0-download-archive) 和 [安装cuDNN 8.6](https://docs.nvidia.com/deeplearning/cudnn/archives/cudnn-860/install-guide/index.html) 后执行上面的安装指令。
+
+**注意:**
+
+1. **GPU 只支持 CUDA 11.8 + cuDNN8.6**,CUDA 12.6 已经在支持中。
+
+2. 同一环境下只能存在一个高性能推理插件版本。
+
+3. NPU 设备的使用说明参考 [昇腾 NPU 高性能推理教程](../practical_tutorials/high_performance_npu_tutorial.md)。
+
+3. Windows 只支持基于 Docker 安装和使用高性能推理插件。
+
 ### 1.2 启用高性能推理插件
 
-对于 PaddleX CLI,指定 `--use_hpip`,即可启用高性能推理插件。以通用图像分类产线为例:
+以下是使用 PaddleX CLI 和 Python API 在通用图像分类产线和图像分类模块中启用高性能推理功能的示例。
+
+对于 PaddleX CLI,指定 `--use_hpip`,即可启用高性能推理。
+
+通用图像分类产线:
 
 ```bash
 paddlex \
@@ -73,7 +122,19 @@ paddlex \
     --use_hpip
 ```
 
-对于 PaddleX Python API,启用高性能推理插件的方法类似。以通用图像分类产线和图像分类模块为例:
+图像分类模块:
+
+```bash
+python main.py \
+    -c paddlex/configs/modules/image_classification/ResNet18.yaml \
+    -o Global.mode=predict \
+    -o Predict.model_dir=None \
+    -o Predict.input=https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_image_classification_001.jpg \
+    -o Global.device=gpu:0 \
+    -o Predict.use_hpip=True
+```
+
+对于 PaddleX Python API,启用高性能推理的方法类似。以通用图像分类产线和图像分类模块为例:
 
 通用图像分类产线:
 
@@ -103,799 +164,462 @@ model = create_model(
 output = model.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_image_classification_001.jpg")
 ```
 
-启用高性能推理插件得到的推理结果与未启用插件时一致。对于部分模型,在首次启用高性能推理插件时,可能需要花费较长时间完成推理引擎的构建。PaddleX 将在推理引擎的第一次构建完成后将相关信息缓存在模型目录,并在后续复用缓存中的内容以提升初始化速度。
-
-## 2. 进阶使用方法
-
-### 2.1 修改高性能推理配置
-
-PaddleX 结合模型信息与运行环境信息为每个模型提供默认的高性能推理配置,其中包括推理后端和推理后端的配置。这些默认配置经过精心准备,以便在数个常见场景中可用,且能够取得较优的性能。因此,通常用户可能并不用关心如何这些配置的具体细节。
-
-然而,由于实际部署环境与需求的多样性,使用默认配置可能无法在特定场景获取理想的性能,甚至可能出现推理失败的情况。对于默认配置无法满足要求的情形,用户可以手动调整配置。以下列举两种常见的情形:
-
-- 更换推理后端:
-
-    对于模型产线,通过在产线 yaml 中增加 `hpi_params` 字段,即可更换推理后端,以通用图像分类产线的 `image_classification.yaml` 为例:
-
-    ```yaml
-      ...
-      SubModules:
-        ImageClassification:
-          ...
-          hpi_params:
-            config:
-              selected_backends:
-                cpu: openvino # 可选:paddle_infer, openvino, onnx_runtime
-                gpu: paddle_infer # 可选:paddle_infer, onnx_runtime, tensorrt
-              backend_config:
-                # Paddle Inference 后端配置
-                paddle_infer:
-                  enable_trt: True # 可选:True, False
-                  trt_precision: FP16 # 当 enable_trt 为 True 时,可选:FP32, FP16
-                # TensorRT 后端配置
-                tensorrt:
-                  precision: FP32 # 可选:FP32, FP16
-      ...
-    ```
-
-    对于单功能模块,通过传入 `hpi_params` 参数,即可更换推理后端,以图像分类模块为例:
-
-    ```python
-    from paddlex import create_model
-
-    model = create_model(
-        "ResNet18",
-        device="gpu",
-        use_hpip=True,
-        hpi_params={
-            "config": {
-                "selected_backends": {"cpu": "openvino", "gpu": "paddle_infer"},
-                "backend_config": {"paddle_infer": {"enable_trt": True, "trt_precision": "FP16"}, "tensorrt": {"precision": "FP32"}}
-            }
-        }
-    )
+启用高性能推理插件得到的推理结果与未启用插件时一致。对于部分模型,**在首次启用高性能推理插件时,可能需要花费较长时间完成推理引擎的构建**。PaddleX 将在推理引擎的第一次构建完成后将相关信息缓存在模型目录,并在后续复用缓存中的内容以提升初始化速度。
 
-    output = pipeline.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_image_classification_001.jpg")
-    ```
+**启用高性能推理默认作用于整条产线/整个模块**,若想细粒度控制作用范围,如只对产线中某条子产线或某个子模块启用高性能推理插件,可以在产线配置文件中不同层级的配置里设置`use_hpip`,请参考 [2.5 高性能推理在子产线/子模块中的启用/禁用](#25-高性能推理在子产线子模块中的启用禁用)。
 
-    目前所有可选的推理后端如下:
-
-    * `paddle_infer`:Paddle Inference 推理引擎。支持 CPU 和 GPU。相比 PaddleX 快速推理,高性能推理插件支持以集成 TensorRT 子图的方式提升模型的 GPU 推理性能。
-    * `openvino`:[OpenVINO](https://github.com/openvinotoolkit/openvino),Intel 提供的深度学习推理工具,优化了多种 Intel 硬件上的模型推理性能。仅支持 CPU。高性能推理插件自动将模型转换为 ONNX 格式后用该引擎推理。
-    * `onnx_runtime`:[ONNX Runtime](https://onnxruntime.ai/),跨平台、高性能的推理引擎。支持 CPU 和 GPU。高性能推理插件自动将模型转换为 ONNX 格式后用该引擎推理。
-    * `tensorrt`:[TensorRT](https://developer.nvidia.com/tensorrt),NVIDIA 提供的高性能深度学习推理库,针对 NVIDIA GPU 进行优化以提升速度。仅支持 GPU。高性能推理插件自动将模型转换为 ONNX 格式后用该引擎推理。
-
-- 修改 Paddle Inference 或 TensorRT 的动态形状配置:
-
-  动态形状是 TensorRT 延迟指定部分或全部张量维度直到运行时的能力。当默认的动态形状配置无法满足需求(例如,模型可能需要范围外的输入形状),用户需要修改相应的配置:
-
-  对于模型产线,在产线 yaml 中的 `hpi_params` 字段中新增`trt_dynamic_shapes` 或 `dynamic_shapes` 字段,以通用图像分类产线的 `image_classification.yaml` 为例:
-
-  ```yaml
-    ...
-    SubModules:
-      ImageClassification:
-        ...
-        hpi_params:
-          config:
-            selected_backends:
-              cpu: openvino
-              gpu: paddle_infer
-            backend_config:
-              # Paddle Inference 后端配置
-              paddle_infer:
-                enable_trt: True
-                trt_precision: FP16
-                trt_dynamic_shapes:
-                  x:
-                    - [1, 3, 300, 300]
-                    - [4, 3, 300, 300]
-                    - [32, 3, 1200, 1200]
-              # TensorRT 后端配置
-              tensorrt:
-                precision: FP32
-                dynamic_shapes:
-                  x:
-                    - [1, 3, 300, 300]
-                    - [4, 3, 300, 300]
-                    - [32, 3, 1200, 1200]
-                ...
-    ...
-  ```
-
-  对于单功能模块,在 `hpi_params` 参数中新增 `trt_dynamic_shapes` 或 `dynamic_shapes` 字段,以图像分类模块为例:
-
-  ```python
-  from paddlex import create_model
-
-  model = create_model(
-        "ResNet18",
-        device="gpu",
-        use_hpip=True,
-        hpi_params={
-            "config": {
-                "selected_backends": {"cpu": "openvino", "gpu": "paddle_infer"},
-                "backend_config": {
-                    # Paddle Inference 后端配置
-                    "paddle_infer": {
-                        "enable_trt": True,
-                        "trt_precision": "FP16",
-                        "trt_dynamic_shapes": {
-                            "x": [
-                                [1, 3, 300, 300],
-                                [4, 3, 300, 300],
-                                [32, 3, 1200, 1200]
-                            ]
-                        }
-                    },
-                    # TensorRT 后端配置
-                    "tensorrt": {
-                        "precision": "FP32",
-                        "dynamic_shapes": {
-                            "x": [
-                                [1, 3, 300, 300],
-                                [4, 3, 300, 300],
-                                [32, 3, 1200, 1200]
-                            ]
-                        }
-                    }
-                }
-            }
-        }
-    )
-
-  output = model.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_image_classification_001.jpg")
-  ```
-
-  在 `trt_dynamic_shapes` 或 `dynamic_shapes` 中,需要为每一个输入张量指定动态形状,格式为:`{输入张量名称}: [{最小形状}, [{最优形状}], [{最大形状}]]`。有关最小形状、最优形状以及最大形状的相关介绍及更多细节,请参考 [TensorRT 官方文档](https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#work_dynamic_shapes)。
-
-  在完成修改后,请删除模型目录中的缓存文件(`shape_range_info.pbtxt` 与 `trt_serialized` 开头的文件)。
-
-### 2.2 二次开发高性能推理插件
-
-我们已经提供了完善的配置,通常情况下不建议进行二次开发。如果有以下需求,确实需要进行二次开发,请务必在充分评估后再进行。如以下场景:
-
-- 自定义数据预处理或后处理逻辑。
-- 实现特定算子的优化。
-- 支持特殊的输入/输出格式。
-- 集成第三方加速库。
-- ......
+## 2. 进阶使用方法
 
-二次开发高性能推理插件流程如下:
+本节介绍高性能推理的进阶使用方法,适合对模型部署有一定了解或希望进行手动配置调优的用户。用户可以参照配置说明和示例,根据自身需求自定义使用高性能推理。接下来将对进阶使用方法进行详细介绍。
 
-#### a. 按需修改 `ultra-infer` 代码
+### 2.1 高性能推理工作模式
 
-`ultra-infer`,是高性能推理功能的底层依赖,包含前后处理加速和多后端推理。位于 `libs` 目录下。
+高性能推理分为两种工作模式:
 
-#### b. 安装 `ultra-infer`
+#### (1) 安全自动配置模式
 
-对 `ultra-infer` 修改完成后,通过如下方式安装 `ultra-infer`
+安全自动配置模式,具有保护机制,默认**自动选用当前环境性能较优的配置**。在这种模式下,用户可以覆盖默认配置,但用户提供的配置将受到检查,PaddleX将根据先验知识拒绝不可用的配置。这是默认的工作模式。
 
-`ultra-infer` 需要编译whl包,编译脚本位于 `PaddleX/libs/ultra-infer/scripts/linux/set_up_docker_and_build_py.sh` ,编译默认编译GPU版本和包含 `Paddle Inference`、`OpenVINO`、`TensorRT`、`ONNX Runtime` 四种推理后端的 `ultra-infer`。
+#### (2) 无限制手动配置模式
 
-```shell
-# 编译
-# export PYTHON_VERSION=...
-# export WITH_GPU=...
-# export ENABLE_ORT_BACKEND=...
-# export ...
-
-cd PaddleX/libs/ultra-infer/scripts/linux
-bash set_up_docker_and_build_py.sh
+无限制手动配置模式,提供完全的配置自由,可以**自由选择推理后端、修改后端配置等**,但无法保证推理一定成功。此模式适合有经验和对推理后端及其配置有明确需求的用户,建议在熟悉高性能推理的情况下使用。
 
-# 安装
-python -m pip install ../../python/dist/ultra_infer*.whl
-```
+### 2.2 高性能推理配置
 
-编译时可根据需求修改如下选项
+常用高性能推理配置包含以下字段:
 
 <table>
-    <thead>
-        <tr>
-            <th>选项</th>
-            <th>说明</th>
-        </tr>
-    </thead>
-    <tbody>
-        <tr>
-            <td>http_proxy</td>
-            <td>在下载三方库时使用具体的http代理,默认空</td>
-        </tr>
-        <tr>
-            <td>PYTHON_VERSION</td>
-            <td>Python版本,默认 <code>3.10.0</code></td>
-        </tr>
-        <tr>
-            <td>WITH_GPU</td>
-            <td>是否编译支持Nvidia-GPU,默认 <code>ON</code></td>
-        </tr>
-        <tr>
-            <td>ENABLE_ORT_BACKEND</td>
-            <td>是否编译集成ONNX Runtime后端,默认 <code>ON</code></td>
-        </tr>
-        <tr>
-            <td>ENABLE_PADDLE_BACKEND</td>
-            <td>是否编译集成Paddle Inference后端,默认 <code>ON</code></td>
-        </tr>
-        <tr>
-            <td>ENABLE_TRT_BACKEND</td>
-            <td>是否编译集成TensorRT后端,默认 <code>ON</code></td>
-        </tr>
-        <tr>
-            <td>ENABLE_OPENVINO_BACKEND</td>
-            <td>是否编译集成OpenVINO后端(仅支持CPU),默认 <code>ON</code></td>
-        </tr>
-        <tr>
-            <td>ENABLE_VISION</td>
-            <td>是否编译集成视觉模型的部署模块,默认 <code>ON</code></td>
-        </tr>
-        <tr>
-            <td>ENABLE_TEXT</td>
-            <td>是否编译集成文本NLP模型的部署模块,默认 <code>ON</code></td>
-        </tr>
-    </tbody>
+<thead>
+<tr>
+<th>参数</th>
+<th>参数说明</th>
+<th>参数类型</th>
+<th>默认值</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td><code>auto_config</code></td>
+<td>是否启用安全自动配置模式。<br /><code>True</code>为启用安全自动配置模式,<code>False</code>为启用无限制手动配置模式。</td>
+<td><code>bool</code></td>
+<td><code>True</code></td>
+</tr>
+<tr>
+  <td><code>backend</code></td>
+  <td>用于指定要使用的推理后端。在无限制手动配置模式下不能为<code>None</code>。</td>
+  <td><code>str | None</code></td>
+  <td><code>None</code></td>
+</tr>
+<tr>
+  <td><code>backend_config</code></td>
+  <td>推理后端的配置,若不为<code>None</code>则可以覆盖推理后端的默认配置项。</td>
+  <td><code>dict | None</code></td>
+  <td><code>None</code></td>
+</tr>
+<tr>
+  <td><code>auto_paddle2onnx</code></td>
+  <td>是否启用<a href="./paddle2onnx.md">Paddle2ONNX插件</a>将Paddle模型自动转换为ONNX模型。</td>
+  <td><code>bool</code></td>
+  <td><code>True</code></td>
+</tr>
+</tbody>
 </table>
 
-## 3. 支持使用高性能推理插件的产线与模型
+`backend` 可选值如下表所示:
 
 <table>
   <tr>
-    <th>模型产线</th>
-    <th>单功能模块</th>
-    <th>支持数量/模型总数</th>
-    <th>不支持模型</th>
+    <th>选项</th>
+    <th>描述</th>
+    <th>支持设备</th>
   </tr>
-
   <tr>
-    <td rowspan="6">通用OCR</td>
-    <tr>
-      <td>文档图像方向分类(可选)</td>
-      <td><b>1</b> / 1 </td>
-      <td>无 </td>
-    </tr>
+    <td><code>paddle</code></td>
+    <td>Paddle Inference 推理引擎,支持 Paddle Inference TensorRT 子图引擎的方式提升模型的 GPU 推理性能。</td>
+    <td>CPU, GPU</td>
   </tr>
-
   <tr>
-    <td>文本图像矫正(可选)</td>
-    <td><b>1</b> / 1 </td>
-    <td>无 </td>
-  </tr>
-
-  <tr>
-    <td>文本检测</td>
-    <td><b>4</b> / 4 </td>
-    <td>无 </td>
+    <td><code>openvino</code></td>
+    <td><a href="https://github.com/openvinotoolkit/openvino">OpenVINO</a>,Intel 提供的深度学习推理工具,优化了多种 Intel 硬件上的模型推理性能。</td>
+    <td>CPU</td>
   </tr>
-
   <tr>
-    <td>文本识别</td>
-    <td><b>18</b> / 18 </td>
-    <td></td>
+    <td><code>onnxruntime</code></td>
+    <td><a href="https://onnxruntime.ai/">ONNX Runtime</a>,跨平台、高性能的推理引擎。</td>
+    <td>CPU, GPU</td>
   </tr>
-
   <tr>
-    <td>文本行方向分类(可选)</td>
-    <td><b>0</b> / 1 </td>
-    <td>
-        <details>
-        <summary>查看详情</summary>
-        PP-LCNet_x0_25_textline_ori</br>
-      </details>
-    </td>
+    <td><code>tensorrt</code></td>
+    <td><a href="https://developer.nvidia.com/tensorrt">TensorRT</a>,NVIDIA 提供的高性能深度学习推理库,针对 NVIDIA GPU 进行优化以提升速度。</td>
+    <td>GPU</td>
   </tr>
-
   <tr>
-    <td rowspan="9">文档场景信息抽取v4</td>
-    <td>文档图像方向分类(可选)</td>
-    <td><b>1</b> / 1 </td>
-    <td>无 </td>
+    <td><code>om</code></td>
+    <td>OM,华为昇腾NPU定制的离线模型格式对应的推理引擎,针对硬件进行了深度优化,减少算子计算时间和调度时间,能够有效提升推理性能。</td>
+    <td>NPU</td>
   </tr>
+</table>
 
-  <tr>
-    <td>文本图像矫正(可选)</td>
-    <td><b>1</b> / 1 </td>
-    <td>无 </td>
-  </tr>
+`backend_config` 根据不同后端有不同的可选值,如下表所示:
 
+<table>
   <tr>
-    <td>版面区域检测</td>
-    <td><b>11</b> / 11 </td>
-    <td>无</td>
+    <th>后端</th>
+    <th>可选值</th>
   </tr>
-
   <tr>
-    <td>表格结构识别(可选)</td>
-    <td><b>2</b> / 2 </td>
-    <td>无</td>
+    <td><code>paddle</code></td>
+    <td>参考<a href="../module_usage/instructions/model_python_API.md">PaddleX单模型Python脚本使用说明: 4. 推理后端设置</a>。</td>
   </tr>
-
   <tr>
-    <td>文本检测</td>
-    <td><b>4</b> / 4 </td>
-    <td>无 </td>
+    <td><code>openvino</code></td>
+    <td><code>cpu_num_threads</code>:CPU推理使用的逻辑处理器数量。默认为<code>8</code>。</td>
   </tr>
-
   <tr>
-    <td>文本识别</td>
-    <td><b>18</b> / 18 </td>
-    <td>无 </td>
+    <td><code>onnxruntime</code></td>
+    <td><code>cpu_num_threads</code>:CPU推理时算子内部的并行计算线程数。默认为<code>8</code>。</td>
   </tr>
-
   <tr>
-    <td>文本行方向分类(可选)</td>
-    <td><b>0</b> / 1 </td>
+    <td><code>tensorrt</code></td>
     <td>
-        <details>
-        <summary>查看详情</summary>
-        PP-LCNet_x0_25_textline_ori</br>
-      </details>
+      <code>precision</code>:使用的精度,<code>fp16</code>或<code>fp32</code>。默认为<code>fp32</code>。
+      <br />
+      <code>dynamic_shapes</code>:动态形状。动态形状包含最小形状、最优形状以及最大形状,是 TensorRT 延迟指定部分或全部张量维度直到运行时的能力。格式为:<code>{输入张量名称}: [{最小形状}, [{最优形状}], [{最大形状}]]</code>。更多介绍请参考 <a href="https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#work_dynamic_shapes">TensorRT 官方文档</a>。
     </td>
   </tr>
-
   <tr>
-    <td>公式识别(可选)</td>
-    <td><b>1</b> / 1 </td>
-    <td>无 </td>
+    <td><code>om</code></td>
+    <td>暂无</td>
   </tr>
+</table>
 
-  <tr>
-    <td>印章文本检测(可选)</td>
-    <td><b>2</b> / 2 </td>
-    <td>无 </td>
-  </tr>
+### 2.3 如何修改高性能推理配置
 
-  <tr>
-    <td rowspan="7">文档场景信息抽取v3</td>
-    <td>表格结构识别</td>
-    <td><b>2</b> / 2 </td>
-    <td>无</td>
-  </tr>
+由于实际部署环境和需求的多样性,默认配置可能无法满足所有要求。这时,可能需要手动调整高性能推理配置。以下是两种常见的情况:
 
-  <tr>
-    <td>版面区域检测</td>
-    <td><b>11</b> / 11 </td>
-    <td>无</td>
-  </tr>
+- 需要更换推理后端。
+  - 例如在OCR产线中,指定`text_detection`模块使用`onnxruntime`后端,`text_recognition`模块使用`tensorrt`后端。
 
-  <tr>
-    <td>文本检测</td>
-    <td><b>2</b> / 2 </td>
-    <td>无 </td>
-  </tr>
+- 需要修改 TensorRT 的动态形状配置:
+  - 当默认的动态形状配置无法满足需求(例如,模型可能需要范围外的输入形状),就需要为每一个输入张量指定动态形状。修改完成后,需要清理模型的`.cache`缓存目录。
 
-  <tr>
-    <td>文本识别</td>
-    <td><b>4</b> / 4 </td>
-    <td>无 </td>
-  </tr>
+在这些情况下,用户可以通过修改**产线/模块配置文件**、**CLI**或**Python API**所传递参数中的 `hpi_config` 字段内容来修改配置。**通过 CLI 或 Python API 传递的参数将覆盖产线/模块配置文件的设置**。
 
-  <tr>
-    <td>印章文本检测</td>
-    <td><b>2</b> / 2 </td>
-    <td>无 </td>
-  </tr>
+### 2.4 修改高性能推理配置示例
 
-  <tr>
-    <td>文本图像矫正</td>
-    <td><b>1</b> / 1 </td>
-    <td>无 </td>
-  </tr>
+#### (1) 更换推理后端。
 
-  <tr>
-    <td>文档图像方向分类</td>
-    <td><b>1</b> / 1 </td>
-    <td>无 </td>
-  </tr>
+  ##### 通用OCR产线的所有模型使用`onnxruntime`后端:
 
-  <tr>
-    <td rowspan="7">通用表格识别v2</td>
-    <td>表格结构识别</td>
-    <td><b>0</b> / 2 </td>
-    <td>
-      <details>
-        <summary>查看详情</summary>
-        SLANeXt_wired</br>
-        SLANeXt_wireless</br>
-      </details>
-    </td>
-  </tr>
+  <details><summary>👉 1. 修改产线配置文件方式(点击展开)</summary>
 
-  <tr>
-    <td>表格分类</td>
-    <td><b>1</b> / 1 </td>
-    <td>无</td>
-  </tr>
+  ```yaml
+  pipeline_name: OCR
 
-  <tr>
-    <td>表格单元格检测</td>
-    <td><b>2</b> / 2 </td>
-    <td>无</td>
-  </tr>
+  use_hpip: True
+  hpi_config:
+    backend: onnxruntime
 
-  <tr>
-    <td>文本检测</td>
-    <td><b>2</b> / 2 </td>
-    <td>无 </td>
-  </tr>
+  ...
+  ```
 
-  <tr>
-    <td>文本识别</td>
-    <td><b>18</b> / 18 </td>
-    <td>无</td>
-  </tr>
+  </details>
+  <details><summary>👉 2. CLI传参方式(点击展开)</summary>
 
-  <tr>
-    <td>版面区域检测</td>
-    <td><b>11</b> / 11 </td>
-    <td>无</td>
-  </tr>
+  ```bash
+  paddlex \
+      --pipeline image_classification \
+      --input https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_image_classification_001.jpg \
+      --device gpu:0 \
+      --use_hpip \
+      --hpi_config '{"backend": "onnxruntime"}'
+  ```
 
-  <tr>
-    <td>文档图像方向分类</td>
-    <td><b>1</b> / 1 </td>
-    <td>无 </td>
-  </tr>
+  </details>
+  <details><summary>👉 3. Python API传参方式(点击展开)</summary>
 
-  <tr>
-    <td rowspan="6">通用表格识别</td>
-    <td>表格结构识别</td>
-    <td><b>2</b> / 2 </td>
-    <td>无 </td>
-    </td>
-  </tr>
+  ```python
+  from paddlex import create_pipeline
+
+  pipeline = create_pipeline(
+      pipeline="OCR",
+      device="gpu",
+      use_hpip=True,
+      hpi_config={"backend": "onnxruntime"}
+  )
+  ```
 
-  <tr>
-    <td>文本检测</td>
-    <td><b>2</b> / 2 </td>
-    <td>无 </td>
-  </tr>
+  </details>
 
-  <tr>
-    <td>文本识别</td>
-    <td><b>18</b> / 18 </td>
-    <td>无</td>
-  </tr>
+  ##### 图像分类模块的模型使用`onnxruntime`后端:
 
-  <tr>
-    <td>版面区域检测(可选)</td>
-    <td><b>11</b> / 11 </td>
-    <td>无</td>
-  </tr>
+  <details><summary>👉 1. 修改产线配置文件方式(点击展开)</summary>
 
-  <tr>
-    <td>文本图像矫正(可选)</td>
-    <td><b>1</b> / 1 </td>
-    <td>无 </td>
-  </tr>
+  ```yaml
+  # paddlex/configs/modules/image_classification/ResNet18.yaml
+  ...
+  Predict:
+    ...
+    use_hpip: True
+    hpi_config:
+        backend: onnxruntime
+    ...
+  ...
+  ```
 
-  <tr>
-    <td>文档图像方向分类(可选)</td>
-    <td><b>1</b> / 1 </td>
-    <td>无 </td>
-  </tr>
+  </details>
+  <details><summary>👉 2. CLI传参方式(点击展开)</summary>
+
+  ```bash
+  python main.py \
+      -c paddlex/configs/modules/image_classification/ResNet18.yaml \
+      -o Global.mode=predict \
+      -o Predict.model_dir=None \
+      -o Predict.input=https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_image_classification_001.jpg \
+      -o Global.device=gpu:0 \
+      -o Predict.use_hpip=True \
+      -o Predict.hpi_config='{"backend": "onnxruntime"}'
+  ```
 
-  <tr>
-    <td>通用目标检测</td>
-    <td>目标检测</td>
-    <td><b>32</b> / 37</td>
-    <td>
-      <details>
-        <summary>查看详情</summary>
-        FasterRCNN-Swin-Tiny-FPN<br>
-        CenterNet-DLA-34<br>
-        CenterNet-ResNet50<br>
-        Co-Deformable-DETR-R50<br>
-        Co-Deformable-DETR-Swin-T<br>
-      </details>
-    </td>
-  </tr>
+  </details>
+  <details><summary>👉 3. Python API传参方式(点击展开)</summary>
 
-  <tr>
-    <td>通用实例分割</td>
-    <td>实例分割</td>
-    <td><b>12</b> / 15</td>
-    <td>
-      <details>
-        <summary>查看详情</summary>
-        Mask-RT-DETR-S</br>
-        PP-YOLOE_seg-S</br>
-        SOLOv2
-      </details>
-    </td>
-  </tr>
+  ```python
+  from paddlex import create_model
 
-  <tr>
-    <td>通用图像分类</td>
-    <td>图像分类</td>
-    <td><b>80</b> / 80 </td>
-    <td>无</td>
-  </tr>
+  model = create_model(
+      model_name="ResNet18",
+      device="gpu",
+      use_hpip=True,
+      hpi_config={"backend": "onnxruntime"}
+  )
+  ```
 
-  <tr>
-    <td>通用语义分割</td>
-    <td>语义分割</td>
-    <td><b>18</b> / 18 </td>
-    <td>无</td>
-  </tr>
+  </details>
 
-  <tr>
-    <td>时序预测</td>
-    <td>时序预测</td>
-    <td><b>7</b> / 7 </td>
-    <td>无</td>
-  </tr>
+  ##### 通用OCR产线的`text_detection`模块使用`onnxruntime`后端,`text_recognition`模块使用`tensorrt`后端:
 
-  <tr>
-    <td>时序异常检测</td>
-    <td>时序异常预测</td>
-    <td><b>4</b> / 5</td>
-    <td>
-      <details>
-        <summary>查看详情</summary>
-        TimesNet_ad</br>
-      </details>
-    </td>
-  </tr>
+  <details><summary>👉 1. 修改产线配置文件方式(点击展开)</summary>
 
-  <tr>
-    <td>时序分类</td>
-    <td>时序分类</td>
-    <td><b>1</b> / 1 </td>
-    <td>无</td>
-  </tr>
+  ```yaml
+  pipeline_name: OCR
+
+  ...
+
+  SubModules:
+    TextDetection:
+      module_name: text_detection
+      model_name: PP-OCRv4_mobile_det
+      model_dir: null
+      limit_side_len: 960
+      limit_type: max
+      thresh: 0.3
+      box_thresh: 0.6
+      unclip_ratio: 2.0
+      # 当前子模块启用高性能推理
+      use_hpip: True
+      # 当前子模块使用如下高性能推理配置
+      hpi_config:
+          backend: onnxruntime
+    TextLineOrientation:
+      module_name: textline_orientation
+      model_name: PP-LCNet_x0_25_textline_ori
+      model_dir: null
+      batch_size: 6
+    TextRecognition:
+      module_name: text_recognition
+      model_name: PP-OCRv4_mobile_rec
+      model_dir: null
+      batch_size: 6
+      score_thresh: 0.0
+      # 当前子模块启用高性能推理
+      use_hpip: True
+      # 当前子模块使用如下高性能推理配置
+      hpi_config:
+          backend: tensorrt
+  ```
 
-  <tr>
-    <td>小目标检测</td>
-    <td>小目标检测</td>
-    <td><b>3</b> / 3 </td>
-    <td>无</td>
-  </tr>
+  </details>
 
-  <tr>
-    <td>图像多标签分类</td>
-    <td>图像多标签分类</td>
-    <td><b>6</b> / 6 </td>
-    <td>无</td>
-  </tr>
+#### (2) 修改 TensorRT 的动态形状配置
 
-  <tr>
-    <td>图像异常检测</td>
-    <td>无监督异常检测</td>
-    <td><b>1</b> / 1 </td>
-    <td>无</td>
-  </tr>
+  ##### 通用图像分类产线修改动态形状配置:
 
-  <tr>
-    <td rowspan="9">通用版面解析v3</td>
-    <td>文档图像方向分类(可选)</td>
-    <td><b>1</b> / 1 </td>
-    <td>无 </td>
-  </tr>
+  <details><summary>👉 点击展开</summary>
 
-  <tr>
-    <td>文本图像矫正(可选)</td>
-    <td><b>1</b> / 1 </td>
-    <td>无 </td>
-  </tr>
+  ```yaml
+    ...
+    SubModules:
+      ImageClassification:
+        ...
+        hpi_config:
+          backend: tensorrt
+          backend_config:
+            precision: fp32
+            dynamic_shapes:
+              x:
+                - [1, 3, 300, 300]
+                - [4, 3, 300, 300]
+                - [32, 3, 1200, 1200]
+              ...
+    ...
+  ```
 
-  <tr>
-    <td>版面区域检测</td>
-    <td><b>3</b> / 3 </td>
-    <td>无</td>
-  </tr>
+  </details>
 
-  <td>表格结构识别(可选)</td>
-    <td><b>2</b> / 2 </td>
-    <td>无</td>
-  </tr>
+  ##### 图像分类模块修改动态形状配置:
 
-  <tr>
-    <td>文本检测</td>
-    <td><b>4</b> / 4 </td>
-    <td>无 </td>
-  </tr>
+  <details><summary>👉 点击展开</summary>
 
-  <tr>
-    <td>文本识别</td>
-    <td><b>18</b> / 18 </td>
-    <td>无 </td>
-  </tr>
+  ```yaml
+  ...
+  Predict:
+    ...
+    use_hpip: True
+    hpi_config:
+        backend: tensorrt
+        backend_config:
+          precision: fp32
+          dynamic_shapes:
+            x:
+              - [1, 3, 300, 300]
+              - [4, 3, 300, 300]
+              - [32, 3, 1200, 1200]
+    ...
+  ...
+  ```
 
-  <tr>
-    <td>文本行方向分类(可选)</td>
-    <td><b>0</b> / 1 </td>
-    <td>
-        <details>
-        <summary>查看详情</summary>
-        PP-LCNet_x0_25_textline_ori</br>
-      </details>
-    </td>
-  </tr>
+  </details>
 
-  <tr>
-    <td>公式识别(可选)</td>
-    <td><b>1</b> / 1 </td>
-    <td>无 </td>
-  </tr>
+### 2.5 高性能推理在子产线/子模块中的启用/禁用
 
-  <tr>
-    <td>印章文本检测(可选)</td>
-    <td><b>2</b> / 2 </td>
-    <td>无 </td>
-  </tr>
+高性能推理支持通过在子产线/子模块级别使用 `use_hpip`,实现**仅产线中的某个子产线/子模块使用高性能推理**。示例如下:
 
-  <tr>
-    <td rowspan="9">通用版面解析</td>
-    <td>文档图像方向分类(可选)</td>
-    <td><b>1</b> / 1 </td>
-    <td>无 </td>
-  </tr>
+##### 通用OCR产线的`text_detection`模块使用高性能推理,`text_recognition`模块不使用高性能推理:
 
-  <tr>
-    <td>文本图像矫正(可选)</td>
-    <td><b>1</b> / 1 </td>
-    <td>无 </td>
-  </tr>
+  <details><summary>👉 点击展开</summary>
 
-  <tr>
-    <td>版面区域检测</td>
-    <td><b>11</b> / 11 </td>
-    <td>无</td>
-  </tr>
+  ```yaml
+  pipeline_name: OCR
+
+  ...
+
+  SubModules:
+    TextDetection:
+      module_name: text_detection
+      model_name: PP-OCRv4_mobile_det
+      model_dir: null
+      limit_side_len: 960
+      limit_type: max
+      thresh: 0.3
+      box_thresh: 0.6
+      unclip_ratio: 2.0
+      use_hpip: True # 当前子模块启用高性能推理
+    TextLineOrientation:
+      module_name: textline_orientation
+      model_name: PP-LCNet_x0_25_textline_ori
+      model_dir: null
+      batch_size: 6
+    TextRecognition:
+      module_name: text_recognition
+      model_name: PP-OCRv4_mobile_rec
+      model_dir: null
+      batch_size: 6
+      score_thresh: 0.0
+      use_hpip: False # 当前子模块不启用高性能推理
+  ```
 
-  <td>表格结构识别(可选)</td>
-    <td><b>2</b> / 2 </td>
-    <td>无</td>
-  </tr>
+  </details>
 
-  <tr>
-    <td>文本检测</td>
-    <td><b>4</b> / 4 </td>
-    <td>无 </td>
-  </tr>
+**注意:**
 
-  <tr>
-    <td>文本识别</td>
-    <td><b>18</b> / 18 </td>
-    <td>无 </td>
-  </tr>
+1. 在子产线或子模块中设置 `use_hpip` 时,将以最深层的配置为准。
 
-  <tr>
-    <td>文本行方向分类(可选)</td>
-    <td><b>0</b> / 1 </td>
-    <td>
-        <details>
-        <summary>查看详情</summary>
-        PP-LCNet_x0_25_textline_ori</br>
-      </details>
-    </td>
-  </tr>
+2. **强烈建议通过修改产线配置文件的方式开启高性能推理**,不建议使用CLI或Python API的方式进行设置。如果通过CLI或Python API启用 `use_hpip`,等同于在配置文件的最上层设置 `use_hpip`。
 
-  <tr>
-    <td>公式识别(可选)</td>
-    <td><b>1</b> / 1 </td>
-    <td>无 </td>
-  </tr>
+### 2.6 模型缓存说明
 
-  <tr>
-    <td>印章文本检测(可选)</td>
-    <td><b>2</b> / 2 </td>
-    <td>无 </td>
-  </tr>
+模型缓存会存放在模型目录下的 `.cache` 目录下,包括使用 `tensorrt` 或 `paddle` 后端时产生的 `shape_range_info.pbtxt`与`trt_serialized`开头的文件。
 
-  <tr>
-    <td rowspan="4">公式识别</td>
-    <td>文档图像方向分类(可选)</td>
-    <td><b>1</b> / 1 </td>
-    <td>无 </td>
-  </tr>
+当启用`auto_paddle2onnx`选项时,可能会在模型目录下自动生成`inference.onnx`文件。
 
-  <tr>
-    <td>文本图像矫正(可选)</td>
-    <td><b>1</b> / 1 </td>
-    <td>无 </td>
-  </tr>
+### 2.7 定制模型推理库
 
-  <tr>
-    <td>版面区域检测(可选)</td>
-    <td><b>6</b> / 6 </td>
-    <td>无</td>
-  </tr>
+`ultra-infer`是高性能推理底层依赖的模型推理库,位于 `PaddleX/libs/ultra-infer` 目录。编译脚本位于 `PaddleX/libs/ultra-infer/scripts/linux/set_up_docker_and_build_py.sh` ,编译默认编译GPU版本和包含 OpenVINO、TensorRT、ONNX Runtime 三种推理后端的 `ultra-infer`。
 
-  <tr>
-    <td>公式识别</td>
-    <td><b>1</b> / 4 </td>
-    <td>
-      <details>
-        <summary>查看详情</summary>
-        UnimerNet</br>
-        PP-FormulaNet-L</br>
-        PP-FormulaNet-S</br>
-      </details>
-    </td>
-  </tr>
+自定义编译时可根据需求修改如下选项:
 
-  <tr>
-    <td rowspan="5">印章文本识别</td>
-    <td>版面区域检测(可选)</td>
-    <td><b>11</b> / 11 </td>
-    <td>无</td>
-  </tr>
+<table>
+    <thead>
+        <tr>
+            <th>选项</th>
+            <th>说明</th>
+        </tr>
+    </thead>
+    <tbody>
+        <tr>
+            <td>http_proxy</td>
+            <td>在下载三方库时使用具体的http代理,默认空</td>
+        </tr>
+        <tr>
+            <td>PYTHON_VERSION</td>
+            <td>Python版本,默认 <code>3.10.0</code></td>
+        </tr>
+        <tr>
+            <td>WITH_GPU</td>
+            <td>是否编译支持Nvidia-GPU,默认 <code>ON</code></td>
+        </tr>
+        <tr>
+            <td>ENABLE_ORT_BACKEND</td>
+            <td>是否编译集成ONNX Runtime后端,默认 <code>ON</code></td>
+        </tr>
+        <tr>
+            <td>ENABLE_TRT_BACKEND</td>
+            <td>是否编译集成TensorRT后端(仅支持GPU),默认 <code>ON</code></td>
+        </tr>
+        <tr>
+            <td>ENABLE_OPENVINO_BACKEND</td>
+            <td>是否编译集成OpenVINO后端(仅支持CPU),默认 <code>ON</code></td>
+        </tr>
+    </tbody>
+</table>
 
-  <tr>
-    <td>文档图像方向分类(可选)</td>
-    <td><b>1</b> / 1 </td>
-    <td>无 </td>
-  </tr>
+编译示例:
 
-  <tr>
-    <td>文本图像矫正(可选)</td>
-    <td><b>1</b> / 1 </td>
-    <td>无 </td>
-  </tr>
+```shell
+# 编译
+# export PYTHON_VERSION=...
+# export WITH_GPU=...
+# export ENABLE_ORT_BACKEND=...
+# export ...
 
-  <tr>
-    <td>印章文本检测</td>
-    <td><b>2</b> / 2 </td>
-    <td>无 </td>
-  </tr>
+cd PaddleX/libs/ultra-infer/scripts/linux
+bash set_up_docker_and_build_py.sh
 
-  <tr>
-    <td>文本识别</td>
-    <td><b>18</b> / 18 </td>
-    <td>无 </td>
-  </tr>
+# 安装
+python -m pip install ../../python/dist/ultra_infer*.whl
+```
 
-  <tr>
-    <td rowspan="2">通用图像识别</td>
-    <td>主体检测</td>
-    <td><b>1</b> / 1 </td>
-    <td>无 </td>
-  </tr>
+## 3. 常见问题
 
-  <tr>
-    <td>图像特征</td>
-    <td><b>3</b> / 3 </td>
-    <td>无 </td>
-  </tr>
+**1. 为什么开启高性能推理插件前后,感觉推理速度没有明显提升?**
 
-  <tr>
-    <td rowspan="2">行人属性识别</td>
-    <td>行人检测</td>
-    <td><b>2</b> / 2 </td>
-    <td>无 </td>
-  </tr>
+高性能推理插件通过智能选择后端来加速推理。
 
-  <tr>
-    <td>行人属性识别</td>
-    <td><b>1</b> / 1 </td>
-    <td>无 </td>
-  </tr>
+对于单功能模块,由于模型复杂性或不支持算子等情况,部分模型可能无法使用加速后端(如OpenVINO、TensorRT等)。此时日志中会提示相关内容,并选择已知**最快的可用后端**,因此可能退回到普通推理。
 
-  <tr>
-    <td rowspan="2">车辆属性识别</td>
-    <td>车辆检测</td>
-    <td><b>2</b> / 2 </td>
-    <td>无 </td>
-  </tr>
+对于模型产线,性能瓶颈可能不在模型推理阶段。
 
-  <tr>
-    <td>车辆属性识别</td>
-    <td><b>1</b> / 1 </td>
-    <td>无 </td>
-  </tr>
+可以使用 [PaddleX benchmark](../module_usage/instructions/benchmark.md) 工具进行实际速度测试,以便更准确地评估性能。
 
-  <tr>
-    <td rowspan="2">人脸识别</td>
-    <td>人脸检测</td>
-    <td><b>4</b> / 4 </td>
-    <td>无 </td>
-  </tr>
+**2: 高性能推理功能是否支持所有模型产线与单功能模块?**
 
-  <tr>
-    <td>人脸特征</td>
-    <td><b>2</b> / 2 </td>
-    <td>无 </td>
-  </tr>
+高性能推理功能支持所有模型产线与单功能模块,但部分模型可能无法加速推理,具体原因可以参考问题1。
 
-</table>
+**3: 为什么安装高性能推理插件会失败,日志显示:Currently, the CUDA version must be 11.x for GPU devices.?**
+
+高性能推理功能目前支持的环境如 [1.1节的表](#11-安装高性能推理插件) 所示。如果安装失败,可能是高性能推理功能不支持当前环境。另外,CUDA 12.6 已经在支持中。
+
+**4. 为什么使用高性能推理功能后,程序在运行过程中会卡住或者显示一些 WARNING 和 ERROR 信息?这种情况下应该如何处理?**
+
+在引擎构建过程中,由于子图优化和算子处理,可能会导致程序耗时较长,并生成一些 WARNING 和 ERROR 信息。然而,只要程序没有自动退出,建议耐心等待,程序通常会继续运行至完成。

+ 61 - 0
docs/pipeline_deploy/paddle2onnx.en.md

@@ -0,0 +1,61 @@
+# Installation and Usage of the Paddle2ONNX Plugin
+
+The Paddle2ONNX plugin for PaddleX provides the ability to convert PaddlePaddle format models to ONNX format models, leveraging the underlying [Paddle2ONNX](https://github.com/PaddlePaddle/Paddle2ONNX).
+
+## 1. Installation
+
+```bash
+paddlex --install paddle2onnx
+```
+
+## 2. Usage
+
+### 2.1 Parameter Introduction
+
+<table>
+    <thead>
+        <tr>
+            <th>Parameter</th>
+            <th>Type</th>
+            <th>Description</th>
+        </tr>
+    </thead>
+    <tbody>
+        <tr>
+            <td>paddle_model_dir</td>
+            <td>str</td>
+            <td>Directory containing the Paddle model.</td>
+        </tr>
+        <tr>
+            <td>onnx_model_dir</td>
+            <td>str</td>
+            <td>Output directory for the ONNX model, which can be the same as the Paddle model directory. Defaults to <code>onnx</code>.</td>
+        </tr>
+        <tr>
+            <td>opset_version</td>
+            <td>int</td>
+            <td>The ONNX opset version to use. Defaults to <code>19</code> for <code>.json</code> format Paddle models and <code>7</code> for <code>.pdmodel</code> format Paddle models.</td>
+        </tr>
+    </tbody>
+</table>
+
+### 2.2 Usage Method
+
+Usage:
+
+```bash
+paddlex \
+    --paddle2onnx \  # Use the paddle2onnx function
+    --paddle_model_dir /your/paddle_model/dir \  # Specify the directory where the Paddle model is located
+    --onnx_model_dir /your/onnx_model/output/dir \  # Specify the output directory for the converted ONNX model
+    --opset_version 7  # Specify the ONNX opset version to use
+```
+
+Taking the ResNet18 model from the image_classification module as an example:
+
+```bash
+paddlex \
+    --paddle2onnx \
+    --paddle_model_dir ./ResNet18 \
+    --onnx_model_dir ./ResNet18 \
+```

+ 62 - 0
docs/pipeline_deploy/paddle2onnx.md

@@ -0,0 +1,62 @@
+
+# Paddle2ONNX 插件的安装与使用
+
+PaddleX 的 Paddle2ONNX 插件提供了将 PaddlePaddle 格式模型转化到 ONNX 格式模型的能力,底层使用[Paddle2ONNX](https://github.com/PaddlePaddle/Paddle2ONNX)。
+
+## 1. 安装
+
+```bash
+paddlex --install paddle2onnx
+```
+
+## 2. 使用
+
+### 2.1 参数介绍
+
+<table>
+    <thead>
+        <tr>
+            <th>参数</th>
+            <th>类型</th>
+            <th>描述</th>
+        </tr>
+    </thead>
+    <tbody>
+        <tr>
+            <td>paddle_model_dir</td>
+            <td>str</td>
+            <td>包含Paddle模型的目录。</td>
+        </tr>
+        <tr>
+            <td>onnx_model_dir</td>
+            <td>str</td>
+            <td>ONNX模型的输出目录,可以与Paddle模型目录相同。默认为<code>onnx</code>。</td>
+        </tr>
+        <tr>
+            <td>opset_version</td>
+            <td>int</td>
+            <td>使用的ONNX opset版本。<code>.json</code>格式的Paddle模型默认为<code>19</code>,<code>.pdmodel</code>格式的Paddle模型默认为<code>7</code>。</td>
+        </tr>
+    </tbody>
+</table>
+
+### 2.2 使用方式
+
+使用方式:
+
+```bash
+paddlex \
+    --paddle2onnx \  # 使用paddle2onnx功能
+    --paddle_model_dir /your/paddle_model/dir \  # 指定Paddle模型所在的目录
+    --onnx_model_dir /your/onnx_model/output/dir \  # 指定转换后ONNX模型的输出目录
+    --opset_version 7  # 指定要使用的ONNX opset版本
+```
+
+以 image_classification 模块中的 ResNet18 模型为例:
+
+```bash
+paddlex \
+    --paddle2onnx \
+    --paddle_model_dir ./ResNet18 \
+    --onnx_model_dir ./ResNet18 \
+```

+ 2 - 2
docs/pipeline_usage/pipeline_develop_guide.md

@@ -164,7 +164,7 @@ Pipeline:
 
 若您需要将产线直接应用在您的Python项目中,可以参考[PaddleX产线Python脚本使用说明](./instructions/pipeline_python_API.md)及[快速体验](#2快速体验)中的Python示例代码。
 
-此外,PaddleX 也提供了其他三种部署方式,详细说明如下:
+PaddleX 也提供了其他三种部署方式,详细说明如下:
 
 
 🚀 <b>高性能推理</b>:在实际生产环境中,许多应用对部署策略的性能指标(尤其是响应速度)有着较严苛的标准,以确保系统的高效运行与用户体验的流畅性。为此,PaddleX 提供高性能推理插件,旨在对模型推理及前后处理进行深度性能优化,实现端到端流程的显著提速,详细的高性能部署流程请参考[PaddleX高性能部署指南](../pipeline_deploy/high_performance_inference.md)。
@@ -174,7 +174,7 @@ Pipeline:
 📱 <b>端侧部署</b>:端侧部署是一种将计算和数据处理功能放在用户设备本身上的方式,设备可以直接处理数据,而不需要依赖远程的服务器。PaddleX 支持将模型部署在 Android 等端侧设备上,详细的端侧部署流程请参考[PaddleX端侧部署指南](../pipeline_deploy/edge_deploy.md)。
 您可以根据需要选择合适的方式部署模型产线,进而进行后续的 AI 应用集成。
 
-
+PaddleX 提供了将 Paddle 模型转换为 ONNX 模型的能力,详细说明请参考[Paddle2ONNX 插件的安装与使用](../pipeline_deploy/paddle2onnx.md)
 
 > ❗ PaddleX为每个产线都提供了详细的使用说明,您可以根据需要进行选择,所有产线对应的使用说明如下:
 

+ 7 - 2
docs/pipeline_usage/tutorials/information_extraction_pipelines/document_scene_information_extraction_v3.en.md

@@ -1274,7 +1274,12 @@ Below are the API references for basic serving and multi-language service invoca
 <tr>
 <td><code>file</code></td>
 <td><code>string</code></td>
-<td>URL of an image file or PDF file accessible to the server, or Base64 encoded result of the content of the above file types. For PDF files exceeding 10 pages, only the content of the first 10 pages will be used.</td>
+<td>URL of an image file or PDF file accessible to the server, or Base64 encoded result of the content of the above file types. By default, for PDF files exceeding 10 pages, only the content of the first 10 pages will be processed.<br />
+To remove the page limit, please add the following configuration to the pipeline configuration file:
+<pre><code>Serving:
+  extra:
+    max_num_input_imgs: null
+</code></pre></td>
 <td>Yes</td>
 </tr>
 <tr>
@@ -1426,7 +1431,7 @@ Below are the API references for basic serving and multi-language service invoca
 <tr>
 <td><code>layoutParsingResults</code></td>
 <td><code>array</code></td>
-<td>Analysis results obtained using computer vision models. The array length is 1 (for image input) or the smaller of the number of document pages and 10 (for PDF input). For PDF input, each element in the array represents the processing result of each page in the PDF file in sequence.</td>
+<td>Analysis results obtained using computer vision models. The array length is 1 (for image input) or the actual number of document pages processed (for PDF input). For PDF input, each element in the array represents the result of each page actually processed in the PDF file.</td>
 </tr>
 <tr>
 <td><code>visualInfo</code></td>

+ 7 - 2
docs/pipeline_usage/tutorials/information_extraction_pipelines/document_scene_information_extraction_v3.md

@@ -1279,7 +1279,12 @@ for res in visual_predict_res:
 <tr>
 <td><code>file</code></td>
 <td><code>string</code></td>
-<td>服务器可访问的图像文件或PDF文件的URL,或上述类型文件内容的Base64编码结果。对于超过10页的PDF文件,只有前10页的内容会被使用。</td>
+<td>服务器可访问的图像文件或PDF文件的URL,或上述类型文件内容的Base64编码结果。默认对于超过10页的PDF文件,只有前10页的内容会被处理。<br /> 要解除页数限制,请在产线配置文件中添加以下配置:
+<pre><code>Serving:
+  extra:
+    max_num_input_imgs: null
+</code></pre>
+</td>
 <td>是</td>
 </tr>
 <tr>
@@ -1431,7 +1436,7 @@ for res in visual_predict_res:
 <tr>
 <td><code>layoutParsingResults</code></td>
 <td><code>array</code></td>
-<td>使用计算机视觉模型得到的分析结果。数组长度为1(对于图像输入)或文档页数与10中的较小者(对于PDF输入)。对于PDF输入,数组中的每个元素依次表示PDF文件中每一页的处理结果。</td>
+<td>使用计算机视觉模型得到的分析结果。数组长度为1(对于图像输入)或实际处理的文档页数(对于PDF输入)。对于PDF输入,数组中的每个元素依次表示PDF文件中实际处理的每一页的结果。</td>
 </tr>
 <tr>
 <td><code>visualInfo</code></td>

+ 7 - 2
docs/pipeline_usage/tutorials/information_extraction_pipelines/document_scene_information_extraction_v4.en.md

@@ -1411,7 +1411,12 @@ Below are the API references for basic serving and multi-language service invoca
 <tr>
 <td><code>file</code></td>
 <td><code>string</code></td>
-<td>URL of an image file or PDF file accessible to the server, or Base64 encoded result of the content of the above file types. For PDF files exceeding 10 pages, only the content of the first 10 pages will be used.</td>
+<td>URL of an image file or PDF file accessible to the server, or Base64 encoded result of the content of the above file types. By default, for PDF files exceeding 10 pages, only the content of the first 10 pages will be processed.<br />
+To remove the page limit, please add the following configuration to the pipeline configuration file:
+<pre><code>Serving:
+  extra:
+    max_num_input_imgs: null
+</code></pre></td>
 <td>Yes</td>
 </tr>
 <tr>
@@ -1563,7 +1568,7 @@ Below are the API references for basic serving and multi-language service invoca
 <tr>
 <td><code>layoutParsingResults</code></td>
 <td><code>array</code></td>
-<td>Analysis results obtained using computer vision models. The array length is 1 (for image input) or the smaller of the number of document pages and 10 (for PDF input). For PDF input, each element in the array represents the processing result of each page in the PDF file in sequence.</td>
+<td>Analysis results obtained using computer vision models. The array length is 1 (for image input) or the actual number of document pages processed (for PDF input). For PDF input, each element in the array represents the result of each page actually processed in the PDF file.</td>
 </tr>
 <tr>
 <td><code>visualInfo</code></td>

+ 7 - 2
docs/pipeline_usage/tutorials/information_extraction_pipelines/document_scene_information_extraction_v4.md

@@ -1614,7 +1614,12 @@ for res in visual_predict_res:
 <tr>
 <td><code>file</code></td>
 <td><code>string</code></td>
-<td>服务器可访问的图像文件或PDF文件的URL,或上述类型文件内容的Base64编码结果。对于超过10页的PDF文件,只有前10页的内容会被使用。</td>
+<td>服务器可访问的图像文件或PDF文件的URL,或上述类型文件内容的Base64编码结果。默认对于超过10页的PDF文件,只有前10页的内容会被处理。<br /> 要解除页数限制,请在产线配置文件中添加以下配置:
+<pre><code>Serving:
+  extra:
+    max_num_input_imgs: null
+</code></pre>
+</td>
 <td>是</td>
 </tr>
 <tr>
@@ -1766,7 +1771,7 @@ for res in visual_predict_res:
 <tr>
 <td><code>layoutParsingResults</code></td>
 <td><code>array</code></td>
-<td>使用计算机视觉模型得到的分析结果。数组长度为1(对于图像输入)或文档页数与10中的较小者(对于PDF输入)。对于PDF输入,数组中的每个元素依次表示PDF文件中每一页的处理结果。</td>
+<td>使用计算机视觉模型得到的分析结果。数组长度为1(对于图像输入)或实际处理的文档页数(对于PDF输入)。对于PDF输入,数组中的每个元素依次表示PDF文件中实际处理的每一页的结果。</td>
 </tr>
 <tr>
 <td><code>visualInfo</code></td>

+ 7 - 2
docs/pipeline_usage/tutorials/ocr_pipelines/OCR.en.md

@@ -966,7 +966,12 @@ Below are the API reference and multi-language service invocation examples for t
 <tr>
 <td><code>file</code></td>
 <td><code>string</code></td>
-<td>The URL of an image or PDF file accessible by the server, or the Base64-encoded content of the file. For PDF files exceeding 10 pages, only the first 10 pages will be used.</td>
+<td>The URL of an image or PDF file accessible by the server, or the Base64-encoded content of the file. By default, for PDF files exceeding 10 pages, only the first 10 pages will be processed.<br />
+To remove the page limit, please add the following configuration to the pipeline configuration file:
+<pre><code>Serving:
+  extra:
+    max_num_input_imgs: null
+</code></pre></td>
 <td>Yes</td>
 </tr>
 <tr>
@@ -1046,7 +1051,7 @@ Below are the API reference and multi-language service invocation examples for t
 <tr>
 <td><code>ocrResults</code></td>
 <td><code>object</code></td>
-<td>OCR results. The array length is 1 (for image input) or the smaller of the document page count and 10 (for PDF input). For PDF input, each element in the array represents the processing result for each page of the PDF file.</td>
+<td>OCR results. The array length is 1 (for image input) or the actual number of document pages processed (for PDF input). For PDF input, each element in the array represents the result of each page actually processed in the PDF file.</td>
 </tr>
 <tr>
 <td><code>dataInfo</code></td>

+ 7 - 2
docs/pipeline_usage/tutorials/ocr_pipelines/OCR.md

@@ -960,7 +960,12 @@ for res in output:
 <tr>
 <td><code>file</code></td>
 <td><code>string</code></td>
-<td>服务器可访问的图像文件或PDF文件的URL,或上述类型文件内容的Base64编码结果。对于超过10页的PDF文件,只有前10页的内容会被使用。</td>
+<td>服务器可访问的图像文件或PDF文件的URL,或上述类型文件内容的Base64编码结果。默认对于超过10页的PDF文件,只有前10页的内容会被处理。<br /> 要解除页数限制,请在产线配置文件中添加以下配置:
+<pre><code>Serving:
+  extra:
+    max_num_input_imgs: null
+</code></pre>
+</td>
 <td>是</td>
 </tr>
 <tr>
@@ -1041,7 +1046,7 @@ for res in output:
 <tr>
 <td><code>ocrResults</code></td>
 <td><code>object</code></td>
-<td>OCR结果。数组长度为1(对于图像输入)或文档页数与10中的较小者(对于PDF输入)。对于PDF输入,数组中的每个元素依次表示PDF文件中每一页的处理结果。</td>
+<td>OCR结果。数组长度为1(对于图像输入)或实际处理的文档页数(对于PDF输入)。对于PDF输入,数组中的每个元素依次表示PDF文件中实际处理的每一页的结果。</td>
 </tr>
 <tr>
 <td><code>dataInfo</code></td>

+ 7 - 2
docs/pipeline_usage/tutorials/ocr_pipelines/PP-StructureV3.en.md

@@ -1458,7 +1458,12 @@ Below is the API reference for basic serving deployment and examples of service
 <tr>
 <td><code>file</code></td>
 <td><code>string</code></td>
-<td>The URL of an image or PDF file accessible by the server, or the Base64-encoded content of the above file types. For PDF files with more than 10 pages, only the content of the first 10 pages will be used.</td>
+<td>The URL of an image or PDF file accessible by the server, or the Base64-encoded content of the above file types. By default, for PDF files exceeding 10 pages, only the content of the first 10 pages will be processed.<br />
+To remove the page limit, please add the following configuration to the pipeline configuration file:
+<pre><code>Serving:
+  extra:
+    max_num_input_imgs: null
+</code></pre></td>
 <td>Yes</td>
 </tr>
 <tr>
@@ -1640,7 +1645,7 @@ Below is the API reference for basic serving deployment and examples of service
 <tr>
 <td><code>layoutParsingResults</code></td>
 <td><code>array</code></td>
-<td>The layout parsing results. The length of the array is 1 (for image input) or the minimum of the document page count and 10 (for PDF input). For PDF input, each element in the array represents the processing result of each page in the PDF file.</td>
+<td>The layout parsing results. The array length is 1 (for image input) or the actual number of document pages processed (for PDF input). For PDF input, each element in the array represents the result of each page actually processed in the PDF file.</td>
 </tr>
 <tr>
 <td><code>dataInfo</code></td>

+ 7 - 2
docs/pipeline_usage/tutorials/ocr_pipelines/PP-StructureV3.md

@@ -1403,7 +1403,12 @@ for res in output:
 <tr>
 <td><code>file</code></td>
 <td><code>string</code></td>
-<td>服务器可访问的图像文件或PDF文件的URL,或上述类型文件内容的Base64编码结果。对于超过10页的PDF文件,只有前10页的内容会被使用。</td>
+<td>服务器可访问的图像文件或PDF文件的URL,或上述类型文件内容的Base64编码结果。默认对于超过10页的PDF文件,只有前10页的内容会被处理。<br /> 要解除页数限制,请在产线配置文件中添加以下配置:
+<pre><code>Serving:
+  extra:
+    max_num_input_imgs: null
+</code></pre>
+</td>
 <td>是</td>
 </tr>
 <tr>
@@ -1586,7 +1591,7 @@ for res in output:
 <tr>
 <td><code>layoutParsingResults</code></td>
 <td><code>array</code></td>
-<td>版面解析结果。数组长度为1(对于图像输入)或文档页数与10中的较小者(对于PDF输入)。对于PDF输入,数组中的每个元素依次表示PDF文件中每一页的处理结果。</td>
+<td>版面解析结果。数组长度为1(对于图像输入)或实际处理的文档页数(对于PDF输入)。对于PDF输入,数组中的每个元素依次表示PDF文件中实际处理的每一页的结果。</td>
 </tr>
 <tr>
 <td><code>dataInfo</code></td>

+ 7 - 2
docs/pipeline_usage/tutorials/ocr_pipelines/doc_preprocessor.en.md

@@ -494,7 +494,12 @@ Additionally, PaddleX offers three other deployment methods, detailed as follows
 <tr>
 <td><code>file</code></td>
 <td><code>string</code></td>
-<td>The URL of an image or PDF file accessible by the server, or the Base64-encoded content of the file. For PDF files exceeding 10 pages, only the first 10 pages will be used.</td>
+<td>The URL of an image or PDF file accessible by the server, or the Base64-encoded content of the file. By default, for PDF files exceeding 10 pages, only the first 10 pages will be processed.<br />
+To remove the page limit, please add the following configuration to the pipeline configuration file:
+<pre><code>Serving:
+  extra:
+    max_num_input_imgs: null
+</code></pre></td>
 <td>Yes</td>
 </tr>
 <tr>
@@ -532,7 +537,7 @@ Additionally, PaddleX offers three other deployment methods, detailed as follows
 <tr>
 <td><code>docPreprocessingResults</code></td>
 <td><code>object</code></td>
-<td>Document image preprocessing results. The array length is 1 (for image input) or the smaller of the number of document pages and 10 (for PDF input). For PDF input, each element in the array represents the processing result of each page in the PDF file.</td>
+<td>Document image preprocessing results. The array length is 1 (for image input) or the actual number of document pages processed (for PDF input). For PDF input, each element in the array represents the result of each page actually processed in the PDF file.</td>
 </tr>
 <tr>
 <td><code>dataInfo</code></td>

+ 7 - 2
docs/pipeline_usage/tutorials/ocr_pipelines/doc_preprocessor.md

@@ -496,7 +496,12 @@ for res in output:
 <tr>
 <td><code>file</code></td>
 <td><code>string</code></td>
-<td>服务器可访问的图像文件或PDF文件的URL,或上述类型文件内容的Base64编码结果。对于超过10页的PDF文件,只有前10页的内容会被使用。</td>
+<td>服务器可访问的图像文件或PDF文件的URL,或上述类型文件内容的Base64编码结果。默认对于超过10页的PDF文件,只有前10页的内容会被处理。<br /> 要解除页数限制,请在产线配置文件中添加以下配置:
+<pre><code>Serving:
+  extra:
+    max_num_input_imgs: null
+</code></pre>
+</td>
 <td>是</td>
 </tr>
 <tr>
@@ -534,7 +539,7 @@ for res in output:
 <tr>
 <td><code>docPreprocessingResults</code></td>
 <td><code>object</code></td>
-<td>文档图像预处理结果。数组长度为1(对于图像输入)或文档页数与10中的较小者(对于PDF输入)。对于PDF输入,数组中的每个元素依次表示PDF文件中每一页的处理结果。</td>
+<td>文档图像预处理结果。数组长度为1(对于图像输入)或实际处理的文档页数(对于PDF输入)。对于PDF输入,数组中的每个元素依次表示PDF文件中实际处理的每一页的结果。</td>
 </tr>
 <tr>
 <td><code>dataInfo</code></td>

+ 7 - 2
docs/pipeline_usage/tutorials/ocr_pipelines/formula_recognition.en.md

@@ -782,7 +782,12 @@ Below are the API references for basic service-based deployment and multi-langua
 <tr>
 <td><code>file</code></td>
 <td><code>string</code></td>
-<td>The URL of an image or PDF file accessible by the server, or the Base64-encoded content of the file. For PDF files exceeding 10 pages, only the first 10 pages will be used.</td>
+<td>The URL of an image or PDF file accessible by the server, or the Base64-encoded content of the file. By default, for PDF files exceeding 10 pages, only the first 10 pages will be processed.<br />
+To remove the page limit, please add the following configuration to the pipeline configuration file:
+<pre><code>Serving:
+  extra:
+    max_num_input_imgs: null
+</code></pre></td>
 <td>Yes</td>
 </tr>
 <tr>
@@ -850,7 +855,7 @@ Below are the API references for basic service-based deployment and multi-langua
 <tr>
 <td><code>formulaRecResults</code></td>
 <td><code>object</code></td>
-<td>The formula recognition results. The array length is 1 (for image input) or the smaller of the number of document pages and 10 (for PDF input). For PDF input, each element in the array represents the processing result of each page in the PDF file.</td>
+<td>The formula recognition results. The array length is 1 (for image input) or the actual number of document pages processed (for PDF input). For PDF input, each element in the array represents the result of each page actually processed in the PDF file.</td>
 </tr>
 <tr>
 <td><code>dataInfo</code></td>

+ 7 - 2
docs/pipeline_usage/tutorials/ocr_pipelines/formula_recognition.md

@@ -778,7 +778,12 @@ for res in output:
 <tr>
 <td><code>file</code></td>
 <td><code>string</code></td>
-<td>服务器可访问的图像文件或PDF文件的URL,或上述类型文件内容的Base64编码结果。对于超过10页的PDF文件,只有前10页的内容会被使用。</td>
+<td>服务器可访问的图像文件或PDF文件的URL,或上述类型文件内容的Base64编码结果。默认对于超过10页的PDF文件,只有前10页的内容会被处理。<br /> 要解除页数限制,请在产线配置文件中添加以下配置:
+<pre><code>Serving:
+  extra:
+    max_num_input_imgs: null
+</code></pre>
+</td>
 <td>是</td>
 </tr>
 <tr>
@@ -846,7 +851,7 @@ for res in output:
 <tr>
 <td><code>formulaRecResults</code></td>
 <td><code>object</code></td>
-<td>公式识别结果。数组长度为1(对于图像输入)或文档页数与10中的较小者(对于PDF输入)。对于PDF输入,数组中的每个元素依次表示PDF文件中每一页的处理结果。</td>
+<td>公式识别结果。数组长度为1(对于图像输入)或实际处理的文档页数(对于PDF输入)。对于PDF输入,数组中的每个元素依次表示PDF文件中实际处理的每一页的结果。</td>
 </tr>
 <tr>
 <td><code>dataInfo</code></td>

+ 7 - 2
docs/pipeline_usage/tutorials/ocr_pipelines/layout_parsing.en.md

@@ -1254,7 +1254,12 @@ Below are the API reference and multi-language service invocation examples for t
 <tr>
 <td><code>file</code></td>
 <td><code>string</code></td>
-<td>The URL of an image or PDF file accessible by the server, or the Base64-encoded content of the above file types. For PDF files with more than 10 pages, only the content of the first 10 pages will be used.</td>
+<td>The URL of an image or PDF file accessible by the server, or the Base64-encoded content of the above file types. By default, for PDF files exceeding 10 pages, only the content of the first 10 pages will be processed.<br />
+To remove the page limit, please add the following configuration to the pipeline configuration file:
+<pre><code>Serving:
+  extra:
+    max_num_input_imgs: null
+</code></pre></td>
 <td>Yes</td>
 </tr>
 <tr>
@@ -1418,7 +1423,7 @@ Below are the API reference and multi-language service invocation examples for t
 <tr>
 <td><code>layoutParsingResults</code></td>
 <td><code>array</code></td>
-<td>The layout parsing results. The length of the array is 1 (for image input) or the smaller of the document page count and 10 (for PDF input). For PDF input, each element in the array represents the processing result of each page in the PDF file.</td>
+<td>The layout parsing results. The array length is 1 (for image input) or the actual number of document pages processed (for PDF input). For PDF input, each element in the array represents the result of each page actually processed in the PDF file.</td>
 </tr>
 <tr>
 <td><code>dataInfo</code></td>

+ 7 - 2
docs/pipeline_usage/tutorials/ocr_pipelines/layout_parsing.md

@@ -1306,7 +1306,12 @@ for res in output:
 <tr>
 <td><code>file</code></td>
 <td><code>string</code></td>
-<td>服务器可访问的图像文件或PDF文件的URL,或上述类型文件内容的Base64编码结果。对于超过10页的PDF文件,只有前10页的内容会被使用。</td>
+<td>服务器可访问的图像文件或PDF文件的URL,或上述类型文件内容的Base64编码结果。默认对于超过10页的PDF文件,只有前10页的内容会被处理。<br /> 要解除页数限制,请在产线配置文件中添加以下配置:
+<pre><code>Serving:
+  extra:
+    max_num_input_imgs: null
+</code></pre>
+</td>
 <td>是</td>
 </tr>
 <tr>
@@ -1471,7 +1476,7 @@ for res in output:
 <tr>
 <td><code>layoutParsingResults</code></td>
 <td><code>array</code></td>
-<td>版面解析结果。数组长度为1(对于图像输入)或文档页数与10中的较小者(对于PDF输入)。对于PDF输入,数组中的每个元素依次表示PDF文件中每一页的处理结果。</td>
+<td>版面解析结果。数组长度为1(对于图像输入)或实际处理的文档页数(对于PDF输入)。对于PDF输入,数组中的每个元素依次表示PDF文件中实际处理的每一页的结果。</td>
 </tr>
 <tr>
 <td><code>dataInfo</code></td>

+ 7 - 2
docs/pipeline_usage/tutorials/ocr_pipelines/seal_recognition.en.md

@@ -1122,7 +1122,12 @@ Below are the API references for basic serving deployment and multi-language ser
 <tr>
 <td><code>file</code></td>
 <td><code>string</code></td>
-<td>The URL of an image or PDF file accessible by the server, or the Base64-encoded content of the file. For PDF files with more than 10 pages, only the content of the first 10 pages will be used.</td>
+<td>The URL of an image or PDF file accessible by the server, or the Base64-encoded content of the file. By default, for PDF files exceeding 10 pages, only the content of the first 10 pages will be processed.<br />
+To remove the page limit, please add the following configuration to the pipeline configuration file:
+<pre><code>Serving:
+  extra:
+    max_num_input_imgs: null
+</code></pre></td>
 <td>Yes</td>
 </tr>
 <tr>
@@ -1226,7 +1231,7 @@ Below are the API references for basic serving deployment and multi-language ser
 <tr>
 <td><code>sealRecResults</code></td>
 <td><code>object</code></td>
-<td>The seal text recognition result. The array length is 1 (for image input) or the smaller of the number of document pages and 10 (for PDF input). For PDF input, each element in the array represents the processing result of each page in the PDF file.</td>
+<td>The seal text recognition result. The array length is 1 (for image input) or the actual number of document pages processed (for PDF input). For PDF input, each element in the array represents the result of each page actually processed in the PDF file.</td>
 </tr>
 <tr>
 <td><code>dataInfo</code></td>

+ 7 - 2
docs/pipeline_usage/tutorials/ocr_pipelines/seal_recognition.md

@@ -1140,7 +1140,12 @@ for res in output:
 <tr>
 <td><code>file</code></td>
 <td><code>string</code></td>
-<td>服务器可访问的图像文件或PDF文件的URL,或上述类型文件内容的Base64编码结果。对于超过10页的PDF文件,只有前10页的内容会被使用。</td>
+<td>服务器可访问的图像文件或PDF文件的URL,或上述类型文件内容的Base64编码结果。默认对于超过10页的PDF文件,只有前10页的内容会被处理。<br /> 要解除页数限制,请在产线配置文件中添加以下配置:
+<pre><code>Serving:
+  extra:
+    max_num_input_imgs: null
+</code></pre>
+</td>
 <td>是</td>
 </tr>
 <tr>
@@ -1244,7 +1249,7 @@ for res in output:
 <tr>
 <td><code>sealRecResults</code></td>
 <td><code>object</code></td>
-<td>印章文本识别结果。数组长度为1(对于图像输入)或文档页数与10中的较小者(对于PDF输入)。对于PDF输入,数组中的每个元素依次表示PDF文件中每一页的处理结果。</td>
+<td>印章文本识别结果。数组长度为1(对于图像输入)或实际处理的文档页数(对于PDF输入)。对于PDF输入,数组中的每个元素依次表示PDF文件中实际处理的每一页的结果。</td>
 </tr>
 <tr>
 <td><code>dataInfo</code></td>

+ 7 - 2
docs/pipeline_usage/tutorials/ocr_pipelines/table_recognition.en.md

@@ -1186,7 +1186,12 @@ Below are the API references for basic serving deployment and multi-language ser
 <tr>
 <td><code>file</code></td>
 <td><code>string</code></td>
-<td>The URL of an image or PDF file accessible by the server, or the Base64-encoded content of the above file types. For PDF files exceeding 10 pages, only the content of the first 10 pages will be used.</td>
+<td>The URL of an image or PDF file accessible by the server, or the Base64-encoded content of the above file types. By default, for PDF files exceeding 10 pages, only the content of the first 10 pages will be processed.<br />
+To remove the page limit, please add the following configuration to the pipeline configuration file:
+<pre><code>Serving:
+  extra:
+    max_num_input_imgs: null
+</code></pre></td>
 <td>Yes</td>
 </tr>
 <tr>
@@ -1303,7 +1308,7 @@ Below are the API references for basic serving deployment and multi-language ser
 <tr>
 <td><code>tableRecResults</code></td>
 <td><code>object</code></td>
-<td>The table recognition results. The array length is 1 (for image input) or the smaller of the number of document pages and 10 (for PDF input). For PDF input, each element in the array represents the processing result of each page in the PDF file.</td>
+<td>The table recognition results. The array length is 1 (for image input) or the actual number of document pages processed (for PDF input). For PDF input, each element in the array represents the result of each page actually processed in the PDF file.</td>
 </tr>
 <tr>
 <td><code>dataInfo</code></td>

+ 7 - 2
docs/pipeline_usage/tutorials/ocr_pipelines/table_recognition.md

@@ -1132,7 +1132,12 @@ for res in output:
 <tr>
 <td><code>file</code></td>
 <td><code>string</code></td>
-<td>服务器可访问的图像文件或PDF文件的URL,或上述类型文件内容的Base64编码结果。对于超过10页的PDF文件,只有前10页的内容会被使用。</td>
+<td>服务器可访问的图像文件或PDF文件的URL,或上述类型文件内容的Base64编码结果。默认对于超过10页的PDF文件,只有前10页的内容会被处理。<br /> 要解除页数限制,请在产线配置文件中添加以下配置:
+<pre><code>Serving:
+  extra:
+    max_num_input_imgs: null
+</code></pre>
+</td>
 <td>是</td>
 </tr>
 <tr>
@@ -1248,7 +1253,7 @@ for res in output:
 <tr>
 <td><code>tableRecResults</code></td>
 <td><code>object</code></td>
-<td>表格识别结果。数组长度为1(对于图像输入)或文档页数与10中的较小者(对于PDF输入)。对于PDF输入,数组中的每个元素依次表示PDF文件中每一页的处理结果。</td>
+<td>表格识别结果。数组长度为1(对于图像输入)或实际处理的文档页数(对于PDF输入)。对于PDF输入,数组中的每个元素依次表示PDF文件中实际处理的每一页的结果。</td>
 </tr>
 <tr>
 <td><code>dataInfo</code></td>

+ 7 - 2
docs/pipeline_usage/tutorials/ocr_pipelines/table_recognition_v2.en.md

@@ -1258,7 +1258,12 @@ Below are the API references for basic serving deployment and multi-language ser
 <tr>
 <td><code>file</code></td>
 <td><code>string</code></td>
-<td>The URL of a server-accessible image or PDF file, or the Base64-encoded content of such files. For PDF files exceeding 10 pages, only the first 10 pages will be used.</td>
+<td>The URL of a server-accessible image or PDF file, or the Base64-encoded content of such files. By default, for PDF files exceeding 10 pages, only the first 10 pages will be processed.<br />
+To remove the page limit, please add the following configuration to the pipeline configuration file:
+<pre><code>Serving:
+  extra:
+    max_num_input_imgs: null
+</code></pre></td>
 <td>Yes</td>
 </tr>
 <tr>
@@ -1386,7 +1391,7 @@ Below are the API references for basic serving deployment and multi-language ser
 <tr>
 <td><code>tableRecResults</code></td>
 <td><code>object</code></td>
-<td>The table recognition results. The array length is 1 (for image input) or the smaller of the number of document pages and 10 (for PDF input). For PDF input, each element in the array represents the processing result of each page in the PDF file.</td>
+<td>The table recognition results. The array length is 1 (for image input) or the actual number of document pages processed (for PDF input). For PDF input, each element in the array represents the result of each page actually processed in the PDF file.</td>
 </tr>
 <tr>
 <td><code>dataInfo</code></td>

+ 7 - 2
docs/pipeline_usage/tutorials/ocr_pipelines/table_recognition_v2.md

@@ -1264,7 +1264,12 @@ for res in output:
 <tr>
 <td><code>file</code></td>
 <td><code>string</code></td>
-<td>服务器可访问的图像文件或PDF文件的URL,或上述类型文件内容的Base64编码结果。对于超过10页的PDF文件,只有前10页的内容会被使用。</td>
+<td>服务器可访问的图像文件或PDF文件的URL,或上述类型文件内容的Base64编码结果。默认对于超过10页的PDF文件,只有前10页的内容会被处理。<br /> 要解除页数限制,请在产线配置文件中添加以下配置:
+<pre><code>Serving:
+  extra:
+    max_num_input_imgs: null
+</code></pre>
+</td>
 <td>是</td>
 </tr>
 <tr>
@@ -1392,7 +1397,7 @@ for res in output:
 <tr>
 <td><code>tableRecResults</code></td>
 <td><code>object</code></td>
-<td>表格识别结果。数组长度为1(对于图像输入)或文档页数与10中的较小者(对于PDF输入)。对于PDF输入,数组中的每个元素依次表示PDF文件中每一页的处理结果。</td>
+<td>表格识别结果。数组长度为1(对于图像输入)或实际处理的文档页数(对于PDF输入)。对于PDF输入,数组中的每个元素依次表示PDF文件中实际处理的每一页的结果。</td>
 </tr>
 <tr>
 <td><code>dataInfo</code></td>

+ 0 - 0
docs/practical_tutorials/high_performance_npu_tutorial.en.md


+ 1 - 0
mkdocs.yml

@@ -415,6 +415,7 @@ nav:
        - 高性能推理: pipeline_deploy/high_performance_inference.md
        - 服务化部署: pipeline_deploy/serving.md
        - 端侧部署: pipeline_deploy/edge_deploy.md
+       - 获取 ONNX 模型: pipeline_deploy/paddle2onnx.md
   - 多硬件使用:
        - 多硬件使用指南: other_devices_support/multi_devices_use_guide.md
        - 飞桨多硬件安装: