简体中文 | English
In real-world production environments, many applications have stringent standards for deployment strategy performance metrics, particularly response speed, to ensure efficient system operation and smooth user experience. To this end, PaddleX provides high-performance inference plugins designed to deeply optimize model inference and pre/post-processing, achieving significant speedups in the end-to-end process. This document will first introduce the installation and usage of the high-performance inference plugins, followed by a list of pipelines and models currently supporting the use of these plugins.
Before using the high-performance inference plugins, ensure you have completed the installation of PaddleX according to the PaddleX Local Installation Tutorial, and have successfully run the basic inference of the pipeline using either the PaddleX pipeline command line instructions or the Python script instructions.
Find the corresponding installation command based on your processor architecture, operating system, device type, and Python version in the table below and execute it in your deployment environment:
| Processor Architecture | Operating System | Device Type | Python Version | Installation Command |
|---|---|---|---|---|
| x86-64 | Linux | CPU | ||
| 3.8 | curl -s https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/deploy/paddlex_hpi/install_script/latest/install_paddlex_hpi.py | python3.8 - --arch x86_64 --os linux --device cpu --py 38 | |||
| 3.9 | curl -s https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/deploy/paddlex_hpi/install_script/latest/install_paddlex_hpi.py | python3.9 - --arch x86_64 --os linux --device cpu --py 39 | |||
| 3.10 | curl -s https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/deploy/paddlex_hpi/install_script/latest/install_paddlex_hpi.py | python3.10 - --arch x86_64 --os linux --device cpu --py 310 | |||
| GPU (CUDA 11.8 + cuDNN 8.6) | 3.8 | curl -s https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/deploy/paddlex_hpi/install_script/latest/install_paddlex_hpi.py | python3.8 - --arch x86_64 --os linux --device gpu_cuda118_cudnn86 --py 38 | ||
| 3.9 | curl -s https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/deploy/paddlex_hpi/install_script/latest/install_paddlex_hpi.py | python3.9 - --arch x86_64 --os linux --device gpu_cuda118_cudnn86 --py 39 | |||
| 3.10 | curl -s https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/deploy/paddlex_hpi/install_script/latest/install_paddlex_hpi.py | python3.10 - --arch x86_64 --os linux --device gpu_cuda118_cudnn86 --py 310 |
On the Baidu AIStudio Community - AI Learning and Training Platform page, under the "Open-source Pipeline Deployment Serial Number Inquiry and Acquisition" section, select "Acquire Now" as shown in the following image:
Select the pipeline you wish to deploy and click "Acquire". Afterwards, you can find the acquired serial number in the "Open-source Pipeline Deployment SDK Serial Number Management" section at the bottom of the page:
After using the serial number to complete activation, you can utilize high-performance inference plugins. PaddleX provides both online and offline activation methods (both only support Linux systems):
${HOME}/.baidu/paddlex/licenses directory on the machine (create the directory if it does not exist) and specify the serial number when using the inference API or CLI.Please note: Each serial number can only be bound to a unique device fingerprint and can only be bound once. This means that if users deploy models on different machines, they must prepare separate serial numbers for each machine.
Before enabling high-performance plugins, please ensure that the LD_LIBRARY_PATH of the current environment does not specify the TensorRT directory, as the plugins already integrate TensorRT to avoid conflicts caused by different TensorRT versions that may prevent the plugins from functioning properly.
For PaddleX CLI, specify --use_hpip and set the serial number to enable the high-performance inference plugin. If you wish to activate the license online, specify --update_license when using the serial number for the first time. Taking the general image classification pipeline as an example:
paddlex \
--pipeline image_classification \
--input https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_image_classification_001.jpg \
--device gpu:0 \
+ --use_hpip \
+ --serial_number {serial_number}
# If you wish to activate the license online
paddlex \
--pipeline image_classification \
--input https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_image_classification_001.jpg \
--device gpu:0 \
+ --use_hpip \
+ --serial_number {serial_number} \
+ --update_license
For PaddleX Python API, enabling the high-performance inference plugin is similar. Still taking the general image classification pipeline as an example:
from paddlex import create_pipeline
pipeline = create_pipeline(
pipeline="image_classification",
+ use_hpip=True,
+ serial_number="{serial_number}",
)
output = pipeline.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_image_classification_001.jpg")
The inference results obtained with the high-performance inference plugin enabled are consistent with those without the plugin enabled. For some models, enabling the high-performance inference plugin for the first time may take a longer time to complete the construction of the inference engine. PaddleX will cache the relevant information in the model directory after the first construction of the inference engine and reuse the cached content in subsequent runs to improve initialization speed.
PaddleX provides default high-performance inference configurations for each model and stores them in the model's configuration file. Due to the diversity of actual deployment environments, using the default configurations may not achieve ideal performance in specific environments or may even result in inference failures. For situations where the default configurations cannot meet requirements, you can try changing the model's inference backend as follows:
Locate the inference.yml file in the model directory and find the Hpi field.
Modify the value of selected_backends. Specifically, selected_backends may be set as follows:
selected_backends:
cpu: paddle_infer
gpu: onnx_runtime
Each entry is formatted as {device_type}: {inference_backend_name}. The default selects the backend with the shortest inference time in the official test environment. supported_backends lists the inference backends supported by the model in the official test environment for reference.
The currently available inference backends are:
paddle_infer: The standard Paddle Inference engine. Supports CPU and GPU.paddle_tensorrt: Paddle-TensorRT, a high-performance deep learning inference library produced by Paddle, which integrates TensorRT in the form of subgraphs for further optimization and acceleration. Supports GPU only.openvino: OpenVINO, a deep learning inference tool provided by Intel, optimized for model inference performance on various Intel hardware. Supports CPU only.onnx_runtime: ONNX Runtime, a cross-platform, high-performance inference engine. Supports CPU and GPU.tensorrt: TensorRT, a high-performance deep learning inference library provided by NVIDIA, optimized for NVIDIA GPUs to improve speed. Supports GPU only.Here are some key details of the current official test environment:
| Pipeline | Pipeline Module | Specific Models |
|---|---|---|
| General Image Classification | Image Classification | ResNet18 ResNet34 moreResNet50ResNet101 ResNet152 ResNet18_vd ResNet34_vd ResNet50_vd ResNet101_vd ResNet152_vd ResNet200_vd PP-LCNet_x0_25 PP-LCNet_x0_35 PP-LCNet_x0_5 PP-LCNet_x0_75 PP-LCNet_x1_0 PP-LCNet_x1_5 PP-LCNet_x2_0 PP-LCNet_x2_5 PP-LCNetV2_small PP-LCNetV2_base PP-LCNetV2_large MobileNetV3_large_x0_35 MobileNetV3_large_x0_5 MobileNetV3_large_x0_75 MobileNetV3_large_x1_0 MobileNetV3_large_x1_25 MobileNetV3_small_x0_35 MobileNetV3_small_x0_5 MobileNetV3_small_x0_75 MobileNetV3_small_x1_0 MobileNetV3_small_x1_25 ConvNeXt_tiny ConvNeXt_small ConvNeXt_base_224 ConvNeXt_base_384 ConvNeXt_large_224 ConvNeXt_large_384 MobileNetV1_x0_25 MobileNetV1_x0_5 MobileNetV1_x0_75 MobileNetV1_x1_0 MobileNetV2_x0_25 MobileNetV2_x0_5 MobileNetV2_x1_0 MobileNetV2_x1_5 MobileNetV2_x2_0 SwinTransformer_tiny_patch4_window7_224 SwinTransformer_small_patch4_window7_224 SwinTransformer_base_patch4_window7_224 SwinTransformer_base_patch4_window12_384 SwinTransformer_large_patch4_window7_224 SwinTransformer_large_patch4_window12_384 PP-HGNet_small PP-HGNet_tiny PP-HGNet_base PP-HGNetV2-B0 PP-HGNetV2-B1 PP-HGNetV2-B2 PP-HGNetV2-B3 PP-HGNetV2-B4 PP-HGNetV2-B5 PP-HGNetV2-B6 CLIP_vit_base_patch16_224 CLIP_vit_large_patch14_224 |
| General Object Detection | Object Detection | PP-YOLOE_plus-S PP-YOLOE_plus-M morePP-YOLOE_plus-LPP-YOLOE_plus-X YOLOX-N YOLOX-T YOLOX-S YOLOX-M YOLOX-L YOLOX-X YOLOv3-DarkNet53 YOLOv3-ResNet50_vd_DCN YOLOv3-MobileNetV3 RT-DETR-R18 RT-DETR-R50 RT-DETR-L RT-DETR-H RT-DETR-X PicoDet-S PicoDet-L |
| General Semantic Segmentation | Semantic Segmentation | Deeplabv3-R50 Deeplabv3-R101 moreDeeplabv3_Plus-R50Deeplabv3_Plus-R101 PP-LiteSeg-T OCRNet_HRNet-W48 OCRNet_HRNet-W18 SeaFormer_tiny SeaFormer_small SeaFormer_base SeaFormer_large SegFormer-B0 SegFormer-B1 SegFormer-B2 SegFormer-B3 SegFormer-B4 SegFormer-B5 |
| General Instance Segmentation | Instance Segmentation | Mask-RT-DETR-L Mask-RT-DETR-H |
| General OCR | Text Detection | PP-OCRv4_server_det PP-OCRv4_mobile_det |
| Text Recognition | PP-OCRv4_server_rec PP-OCRv4_mobile_rec LaTeX_OCR_rec ch_RepSVTR_rec ch_SVTRv2_rec |
|
| General Table Recognition | Layout Detection | PicoDet_layout_1x |
| Table Recognition | SLANet | |
| SLANet_plus | ||
| Text Detection | PP-OCRv4_server_det PP-OCRv4_mobile_det |
|
| Text Recognition | PP-OCRv4_server_rec PP-OCRv4_mobile_rec |
|
| Document Scene Information Extraction v3 Pipeline | Table Recognition | SLANet |
| SLANet_plus | ||
| Layout Detection | PicoDet_layout_1x | |
| Text Detection | PP-OCRv4_server_det | |
| PP-OCRv4_mobile_det | ||
| Text Recognition | PP-OCRv4_server_rec | |
| PP-OCRv4_mobile_rec | ||
| ch_RepSVTR_rec | ||
| ch_SVTRv2_rec | ||
| Seal Text Detection | PP-OCRv4_server_seal_det | |
| PP-OCRv4_mobile_seal_det | ||
| Text Image Rectification | UVDoc | |
| Document Image Orientation Classification | PP-LCNet_x1_0_doc_ori |