models_list.en.md 181 KB


comments: true

PaddleX Model List (CPU/GPU)

PaddleX includes multiple pipelines, each containing several modules, and each module includes several models. You can choose which models to use based on the benchmark data below. If you prioritize model accuracy, choose models with higher accuracy. If you prioritize model inference speed, choose models with faster inference speed. If you prioritize model storage size, choose models with smaller storage size.

Image Classification Module

Model Name Top1 Acc (%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size (MB) yaml File Model Download Link
CLIP_vit_base_patch16_224 85.36 12.03 / 2.49 60.86 / 42.69 331 CLIP_vit_base_patch16_224.yaml Inference Model/Training Model
CLIP_vit_large_patch14_224 88.1 49.15 / 9.75 223.16 / 206.49 1040 CLIP_vit_large_patch14_224.yaml Inference Model/Training Model
ConvNeXt_base_224 83.84 11.37 / 5.65 143.98 / 52.31 313.9 ConvNeXt_base_224.yaml Inference Model/Training Model
ConvNeXt_base_384 84.90 29.48 / 11.17 293.76 / 134.27 313.9 ConvNeXt_base_384.yaml Inference Model/Training Model
ConvNeXt_large_224 84.26 22.99 / 12.73 220.79 / 113.24 700.7 ConvNeXt_large_224.yaml Inference Model/Training Model
ConvNeXt_large_384 85.27 58.90 / 24.63 509.48 / 260.27 700.7 ConvNeXt_large_384.yaml Inference Model/Training Model
ConvNeXt_small 83.13 7.72 / 4.35 95.92 / 33.34 178.0 ConvNeXt_small.yaml Inference Model/Training Model
ConvNeXt_tiny 82.03 6.00 / 2.47 63.59 / 18.23 101.4 ConvNeXt_tiny.yaml Inference Model/Training Model
FasterNet-L 83.5 11.96 / 2.68 51.93 / 35.33 357.1 FasterNet-L.yaml Inference Model/Training Model
FasterNet-M 83.0 11.17 / 2.16 38.49 / 21.17 204.6 FasterNet-M.yaml Inference Model/Training Model
FasterNet-S 81.3 7.70 / 1.24 19.51 / 11.22 119.3 FasterNet-S.yaml Inference Model/Training Model
FasterNet-T0 71.9 4.73 / 0.82 6.40 / 1.96 15.1 FasterNet-T0.yaml Inference Model/Training Model
FasterNet-T1 75.9 4.80 / 0.80 8.14 / 3.13 29.2 FasterNet-T1.yaml Inference Model/Training Model
FasterNet-T2 79.1 6.10 / 0.88 12.71 / 5.35 57.4 FasterNet-T2.yaml Inference Model/Training Model
MobileNetV1_x0_5 63.5 1.98 / 0.51 2.50 / 1.04 4.8 MobileNetV1_x0_5.yaml Inference Model/Training Model
MobileNetV1_x0_25 51.4 1.99 / 0.45 1.82 / 0.73 1.8 MobileNetV1_x0_25.yaml Inference Model/Training Model
MobileNetV1_x0_75 68.8 2.33 / 0.41 3.33 / 1.34 9.3 MobileNetV1_x0_75.yaml Inference Model/Training Model
MobileNetV1_x1_0 71.0 2.31 / 0.45 3.91 / 1.89 15.2 MobileNetV1_x1_0.yaml Inference Model/Training Model
MobileNetV2_x0_5 65.0 3.58 / 0.62 3.86 / 1.23 7.1 MobileNetV2_x0_5.yaml Inference Model/Training Model
MobileNetV2_x0_25 53.2 3.05 / 0.66 3.30 / 0.98 5.5 MobileNetV2_x0_25.yaml Inference Model/Training Model
MobileNetV2_x1_0 72.2 3.85 / 0.63 5.50 / 1.87 12.6 MobileNetV2_x1_0.yaml Inference Model/Training Model
MobileNetV2_x1_5 74.1 3.93 / 0.73 8.84 / 3.12 25.0 MobileNetV2_x1_5.yaml Inference Model/Training Model
MobileNetV2_x2_0 75.2 3.89 / 0.79 10.36 / 4.50 41.2 MobileNetV2_x2_0.yaml Inference Model/Training Model
MobileNetV3_large_x0_5 69.2 4.60 / 0.77 5.32 / 1.58 9.6 MobileNetV3_large_x0_5.yaml Inference Model/Training Model
MobileNetV3_large_x0_35 64.3 4.44 / 0.75 5.20 / 1.50 7.5 MobileNetV3_large_x0_35.yaml Inference Model/Training Model
MobileNetV3_large_x0_75 73.1 5.30 / 0.85 6.02 / 1.93 14.0 MobileNetV3_large_x0_75.yaml Inference Model/Training Model
MobileNetV3_large_x1_0 75.3 5.38 / 0.81 7.16 / 2.19 19.5 MobileNetV3_large_x1_0.yaml Inference Model/Training Model
MobileNetV3_large_x1_25 76.4 5.54 / 0.84 7.06 / 2.84 26.5 MobileNetV3_large_x1_25.yaml Inference Model/Training Model
MobileNetV3_small_x0_5 59.2 3.87 / 0.77 4.90 / 1.32 6.8 MobileNetV3_small_x0_5.yaml Inference Model/Training Model
MobileNetV3_small_x0_35 53.0 3.68 / 0.77 3.94 / 1.27 6.0 MobileNetV3_small_x0_35.yaml Inference Model/Training Model
MobileNetV3_small_x0_75 66.0 3.92 / 0.77 4.68 / 1.39 8.5 MobileNetV3_small_x0_75.yaml Inference Model/Training Model
MobileNetV3_small_x1_0 68.2 4.23 / 0.78 5.24 / 1.48 10.5 MobileNetV3_small_x1_0.yaml Inference Model/Training Model
MobileNetV3_small_x1_25 70.7 4.59 / 0.79 5.36 / 1.63 13.0 MobileNetV3_small_x1_25.yaml Inference Model/Training Model
MobileNetV4_conv_large 83.4 9.04 / 2.28 34.34 / 22.01 125.2 MobileNetV4_conv_large.yaml Inference Model/Training Model
MobileNetV4_conv_medium 79.9 5.70 / 1.05 13.78 / 5.64 37.6 MobileNetV4_conv_medium.yaml Inference Model/Training Model
MobileNetV4_conv_small 74.6 3.81 / 0.55 5.24 / 1.50 14.7 MobileNetV4_conv_small.yaml Inference Model/Training Model
MobileNetV4_hybrid_large 83.8 13.43 / 4.28 61.16 / 31.06 145.1 MobileNetV4_hybrid_large.yaml Inference Model/Training Model
MobileNetV4_hybrid_medium 80.5 11.82 / 1.30 22.01 / 6.06 42.9 MobileNetV4_hybrid_medium.yaml Inference Model/Training Model
PP-HGNet_base 85.0 13.43 / 3.81 71.24 / 51.48 249.4 PP-HGNet_base.yaml Inference Model/Training Model
PP-HGNet_small 81.51 5.87 / 1.68 25.58 / 18.50 86.5 PP-HGNet_small.yaml Inference Model/Training Model
PP-HGNet_tiny 79.83 5.84 / 1.38 17.03 / 10.58 52.4 PP-HGNet_tiny.yaml Inference Model/Training Model
PP-HGNetV2-B0 77.77 4.41 / 0.87 10.58 / 1.87 21.4 PP-HGNetV2-B0.yaml Inference Model/Training Model
PP-HGNetV2-B1 79.18 4.52 / 0.73 11.98 / 2.28 22.6 PP-HGNetV2-B1.yaml Inference Model/Training Model
PP-HGNetV2-B2 81.74 6.67 / 0.96 14.22 / 4.04 39.9 PP-HGNetV2-B2.yaml Inference Model/Training Model
PP-HGNetV2-B3 82.98 7.47 / 1.94 17.73 / 5.63 57.9 PP-HGNetV2-B3.yaml Inference Model/Training Model
PP-HGNetV2-B4 83.57 7.05 / 1.16 16.23 / 7.55 70.4 PP-HGNetV2-B4.yaml Inference Model/Training Model
PP-HGNetV2-B5 84.75 10.38 / 1.95 31.53 / 18.02 140.8 PP-HGNetV2-B5.yaml Inference Model/Training Model
PP-HGNetV2-B6 86.30 13.86 / 3.28 67.25 / 56.70 268.4 PP-HGNetV2-B6.yaml Inference Model/Training Model
PP-LCNet_x0_5 63.14 2.41 / 0.60 2.54 / 0.90 6.7 PP-LCNet_x0_5.yaml Inference Model/Training Model
PP-LCNet_x0_25 51.86 2.16 / 0.60 2.73 / 0.77 5.5 PP-LCNet_x0_25.yaml Inference Model/Training Model
PP-LCNet_x0_35 58.09 2.18 / 0.60 2.32 / 0.89 5.9 PP-LCNet_x0_35.yaml Inference Model/Training Model
PP-LCNet_x0_75 68.18 2.61 / 0.58 3.00 / 1.09 8.4 PP-LCNet_x0_75.yaml Inference Model/Training Model
PP-LCNet_x1_0 71.32 2.59 / 0.68 3.18 / 1.19 10.5 PP-LCNet_x1_0.yaml Inference Model/Training Model
PP-LCNet_x1_5 73.71 2.60 / 0.68 3.98 / 1.66 16.0 PP-LCNet_x1_5.yaml Inference Model/Training Model
PP-LCNet_x2_0 75.18 2.53 / 0.68 5.21 / 2.24 23.2 PP-LCNet_x2_0.yaml Inference Model/Training Model
PP-LCNet_x2_5 76.60 2.76 / 0.67 6.78 / 3.20 32.1 PP-LCNet_x2_5.yaml Inference Model/Training Model
PP-LCNetV2_base 77.05 4.04 / 0.62 6.80 / 2.67 23.7 PP-LCNetV2_base.yaml Inference Model/Training Model
PP-LCNetV2_large 78.51 4.91 / 0.85 10.30 / 5.38 37.3 PP-LCNetV2_large.yaml Inference Model/Training Model
PP-LCNetV2_small 73.97 3.07 / 0.60 4.28 / 1.58 14.6 PP-LCNetV2_small.yaml Inference Model/Training Model
ResNet18_vd 72.3 2.87 / 0.77 7.91 / 4.64 41.5 ResNet18_vd.yaml Inference Model/Training Model
ResNet18 71.0 2.63 / 0.74 6.30 / 4.16 41.5 ResNet18.yaml Inference Model/Training Model
ResNet34_vd 76.0 4.47 / 1.09 14.30 / 8.33 77.3 ResNet34_vd.yaml Inference Model/Training Model
ResNet34 74.6 4.20 / 1.07 12.53 / 7.83 77.3 ResNet34.yaml Inference Model/Training Model
ResNet50_vd 79.1 6.66 / 1.23 16.34 / 10.00 90.8 ResNet50_vd.yaml Inference Model/Training Model
ResNet50 76.5 6.25 / 1.17 15.93 / 9.72 90.8 ResNet50.yaml Inference Model/Training Model
ResNet101_vd 80.2 11.93 / 2.07 32.47 / 23.62 158.4 ResNet101_vd.yaml Inference Model/Training Model
ResNet101 77.6 13.73 / 2.06 29.69 / 17.72 158.7 ResNet101.yaml Inference Model/Training Model
ResNet152_vd 80.6 20.70 / 2.82 43.90 / 27.91 214.3 ResNet152_vd.yaml Inference Model/Training Model
ResNet152 78.3 17.86 / 2.79 46.19 / 26.00 214.2 ResNet152.yaml Inference Model/Training Model
ResNet200_vd 80.9 22.55 / 3.54 58.54 / 35.70 266.0 ResNet200_vd.yaml Inference Model/Training Model
StarNet-S1 73.6 6.24 / 0.96 8.78 / 2.44 11.2 StarNet-S1.yaml Inference Model/Training Model
StarNet-S2 74.8 4.78 / 0.85 7.24 / 2.48 14.3 StarNet-S2.yaml Inference Model/Training Model
StarNet-S3 77.0 6.77 / 1.07 9.69 / 3.35 22.2 StarNet-S3.yaml Inference Model/Training Model
StarNet-S4 79.0 9.01 / 1.48 14.79 / 4.58 28.9 StarNet-S4.yaml Inference Model/Training Model
SwinTransformer_base_patch4_window7_224 83.37 13.04 / 10.77 133.79 / 118.45 340 SwinTransformer_base_patch4_window7_224.yaml Inference Model/Training Model
SwinTransformer_base_patch4_window12_384 84.17 33.99 / 28.42 400.19 / 317.36 311.4 SwinTransformer_base_patch4_window12_384.yaml Inference Model/Training Model
SwinTransformer_large_patch4_window7_224 86.19 23.69 / 6.18 198.60 / 177.18 694.8 SwinTransformer_large_patch4_window7_224.yaml Inference Model/Training Model
SwinTransformer_large_patch4_window12_384 87.06 68.07 / 14.84 609.07 / 525.72 696.1 SwinTransformer_large_patch4_window12_384.yaml Inference Model/Training Model
SwinTransformer_small_patch4_window7_224 83.21 12.17 / 3.51 111.03 / 92.51 175.6 SwinTransformer_small_patch4_window7_224.yaml Inference Model/Training Model
SwinTransformer_tiny_patch4_window7_224 81.10 7.11 / 2.01 62.72 / 47.35 100.1 SwinTransformer_tiny_patch4_window7_224.yaml Inference Model/Training Model
Note: The above accuracy metrics are based on the [ImageNet-1k](https://www.image-net.org/index.php) validation set Top1 Acc. ## [Image Multi-label Classification Module](../module_usage/tutorials/cv_modules/image_multilabel_classification.en.md)
Model Name mAP (%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size (MB) yaml File Model Download Link
CLIP_vit_base_patch16_448_ML 89.15 48.87 / 8.10 275.33 / 188.48 325.6 CLIP_vit_base_patch16_448_ML.yaml Inference Model/Training Model
PP-HGNetV2-B0_ML 80.98 7.15 / 1.77 21.35 / 8.19 39.6 PP-HGNetV2-B0_ML.yaml Inference Model/Training Model
PP-HGNetV2-B4_ML 87.96 8.11 / 2.82 44.76 / 29.38 88.5 PP-HGNetV2-B4_ML.yaml Inference Model/Training Model
PP-HGNetV2-B6_ML 91.06 34.54 / 8.22 189.17 / 189.17 286.5 PP-HGNetV2-B6_ML.yaml Inference Model/Training Model
PP-LCNet_x1_0_ML 77.96 5.28 / 1.62 13.16 / 5.61 29.4 PP-LCNet_x1_0_ML.yaml Inference Model/Training Model
ResNet50_ML 83.42 10.54 / 2.97 55.39 / 35.52 108.9 ResNet50_ML.yaml Inference Model/Training Model
Note: The above accuracy metrics are for the multi-label classification task mAP on [COCO2017](https://cocodataset.org/#home). ## [Pedestrian Attribute Module](../module_usage/tutorials/cv_modules/pedestrian_attribute_recognition.en.md)
Model Name mA (%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size (MB) yaml File Model Download Link
PP-LCNet_x1_0_pedestrian_attribute 92.2 2.52 / 0.66 2.60 / 1.07 6.7 PP-LCNet_x1_0_pedestrian_attribute.yaml Inference Model/Training Model
Note: The above accuracy metrics are for the internal PaddleX dataset mA. ## [Vehicle Attribute Module](../module_usage/tutorials/cv_modules/vehicle_attribute_recognition.en.md)
Model Name mA (%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size (MB) yaml File Model Download Link
PP-LCNet_x1_0_vehicle_attribute 91.7 2.53 / 0.67 2.73 / 1.10 6.7 PP-LCNet_x1_0_vehicle_attribute.yaml Inference Model/Training Model
Note: The above accuracy metrics are based on the VeRi dataset mA. ## [Image Feature Module](../module_usage/tutorials/cv_modules/image_feature.en.md)
Model Name recall@1 (%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size (MB) yaml File Model Download Link
PP-ShiTuV2_rec 84.2 3.91 / 1.06 6.82 / 2.89 16.3 PP-ShiTuV2_rec.yaml Inference Model/Training Model
PP-ShiTuV2_rec_CLIP_vit_base 88.69 12.57 / 11.62 67.09 / 67.09 306.6 PP-ShiTuV2_rec_CLIP_vit_base.yaml Inference Model/Training Model
PP-ShiTuV2_rec_CLIP_vit_large 91.03 49.85 / 49.85 229.14 / 229.14 1050 PP-ShiTuV2_rec_CLIP_vit_large.yaml Inference Model/Training Model
Note: The above accuracy metrics are based on the AliProducts recall@1. ## [Face Feature Module](../module_usage/tutorials/cv_modules/face_feature.en.md)
Model Name Output Feature Dimension Acc (%)
AgeDB-30/CFP-FP/LFW
GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size (MB) yaml File Model Download Link
MobileFaceNet 128 96.28/96.71/99.58 3.31 / 0.73 5.93 / 1.30 4.1 MobileFaceNet.yaml Inference Model/Training Model
ResNet50_face 512 98.12/98.56/99.77 6.12 / 3.11 15.85 / 9.44 87.2 ResNet50_face.yaml Inference Model/Training Model
Note: The above accuracy metrics are measured on the AgeDB-30, CFP-FP, and LFW datasets. ## [Main Body Detection Module](../module_usage/tutorials/cv_modules/mainbody_detection.en.md)
Model Name mAP (%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size (MB) yaml File Model Download Link
PP-ShiTuV2_det 41.5 11.81 / 4.53 43.03 / 25.31 27.54 PP-ShiTuV2_det.yaml Inference Model/Training Model
Note: The above accuracy metrics are based on the [PaddleClas Main Body Detection Dataset](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.5/docs/zh_CN/training/PP-ShiTu/mainbody_detection.md) mAP(0.5:0.95). ## [Object Detection Module](../module_usage/tutorials/cv_modules/object_detection.en.md)
Model Name mAP (%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size (MB) yaml File Model Download Link
Cascade-FasterRCNN-ResNet50-FPN 41.1 120.28 / 120.28 - / 6514.61 245.4 Cascade-FasterRCNN-ResNet50-FPN.yaml Inference Model/Training Model
Cascade-FasterRCNN-ResNet50-vd-SSLDv2-FPN 45.0 124.10 / 124.10 - / 6709.52 246.2 Cascade-FasterRCNN-ResNet50-vd-SSLDv2-FPN.yaml Inference Model/Training Model
CenterNet-DLA-34 37.6 67.19 / 67.19 6622.61 / 6622.61 75.4 CenterNet-DLA-34.yaml Inference Model/Training Model
CenterNet-ResNet50 38.9 216.06 / 216.06 2545.79 / 2545.79 319.7 CenterNet-ResNet50.yaml Inference Model/Training Model
DETR-R50 42.3 58.80 / 26.90 370.96 / 208.77 159.3 DETR-R50.yaml Inference Model/Training Model
FasterRCNN-ResNet34-FPN 37.8 76.90 / 76.90 - / 4136.79 137.5 FasterRCNN-ResNet34-FPN.yaml Inference Model/Training Model
FasterRCNN-ResNet50-FPN 38.4 95.48 / 95.48 - / 3693.90 148.1 FasterRCNN-ResNet50-FPN.yaml Inference Model/Training Model
FasterRCNN-ResNet50-vd-FPN 39.5 98.03 / 98.03 - / 4278.36 148.1 FasterRCNN-ResNet50-vd-FPN.yaml Inference Model/Training Model
FasterRCNN-ResNet50-vd-SSLDv2-FPN 41.4 99.23 / 99.23 - / 4415.68 148.1 FasterRCNN-ResNet50-vd-SSLDv2-FPN.yaml Inference Model/Training Model
FasterRCNN-ResNet50 36.7 129.10 / 129.10 - / 3868.44 120.2 FasterRCNN-ResNet50.yaml Inference Model/Training Model
FasterRCNN-ResNet101-FPN 41.4 131.48 / 131.48 - / 4380.00 216.3 FasterRCNN-ResNet101-FPN.yaml Inference Model/Training Model
FasterRCNN-ResNet101 39.0 216.71 / 216.71 - / 5376.45 188.1 FasterRCNN-ResNet101.yaml Inference Model/Training Model
FasterRCNN-ResNeXt101-vd-FPN 43.4 234.38 / 234.38 - / 6154.61 360.6 FasterRCNN-ResNeXt101-vd-FPN.yaml Inference Model/Training Model
FasterRCNN-Swin-Tiny-FPN 42.6 65.92 / 65.92 - / 2468.98 159.8 FasterRCNN-Swin-Tiny-FPN.yaml Inference Model/Training Model
FCOS-ResNet50 39.6 101.02 / 34.42 752.15 / 752.15 124.2 FCOS-ResNet50.yaml Inference Model/Training Model
PicoDet-L 42.6 14.31 / 11.06 45.95 / 25.06 20.9 PicoDet-L.yaml Inference Model/Training Model
PicoDet-M 37.5 10.48 / 5.00 22.88 / 9.03 16.8 PicoDet-M.yaml Inference Model/Training Model
PicoDet-S 29.1 9.15 / 3.26 16.06 / 4.04 4.4 PicoDet-S.yaml Inference Model/Training Model
PicoDet-XS 26.2 9.54 / 3.52 17.96 / 5.38 5.7 PicoDet-XS.yaml Inference Model/Training Model
PP-YOLOE_plus-L 52.9 32.06 / 28.00 185.32 / 116.21 185.3 PP-YOLOE_plus-L.yaml Inference Model/Training Model
PP-YOLOE_plus-M 49.8 18.37 / 15.04 108.77 / 63.48 83.2 PP-YOLOE_plus-M.yaml Inference Model/Training Model
PP-YOLOE_plus-S 43.7 11.43 / 7.52 60.16 / 26.94 28.3 PP-YOLOE_plus-S.yaml Inference Model/Training Model
PP-YOLOE_plus-X 54.7 56.28 / 50.60 292.08 / 212.24 349.4 PP-YOLOE_plus-X.yaml Inference Model/Training Model
RT-DETR-H 56.3 114.57 / 101.56 938.20 / 938.20 435.8 RT-DETR-H.yaml Inference Model/Training Model
RT-DETR-L 53.0 34.76 / 27.60 495.39 / 247.68 113.7 RT-DETR-L.yaml Inference Model/Training Model
RT-DETR-R18 46.5 19.11 / 14.82 263.13 / 143.05 70.7 RT-DETR-R18.yaml Inference Model/Training Model
RT-DETR-R50 53.1 41.11 / 10.12 536.20 / 482.86 149.1 RT-DETR-R50.yaml Inference Model/Training Model
RT-DETR-X 54.8 61.91 / 51.41 639.79 / 639.79 232.9 RT-DETR-X.yaml Inference Model/Training Model
YOLOv3-DarkNet53 39.1 39.62 / 35.54 166.57 / 136.34 219.7 YOLOv3-DarkNet53.yaml Inference Model/Training Model
YOLOv3-MobileNetV3 31.4 16.54 / 6.21 64.37 / 45.55 83.8 YOLOv3-MobileNetV3.yaml Inference Model/Training Model
YOLOv3-ResNet50_vd_DCN 40.6 31.64 / 26.72 226.75 / 226.75 163.0 YOLOv3-ResNet50_vd_DCN.yaml Inference Model/Training Model
YOLOX-L 50.1 49.68 / 45.03 232.52 / 156.24 192.5 YOLOX-L.yaml Inference Model/Training Model
YOLOX-M 46.9 43.46 / 29.52 147.64 / 80.06 90.0 YOLOX-M.yaml Inference Model/Training Model
YOLOX-N 26.1 42.94 / 17.79 64.15 / 7.19 3.4 YOLOX-N.yaml Inference Model/Training Model
YOLOX-S 40.4 46.53 / 29.34 98.37 / 35.02 32.0 YOLOX-S.yaml Inference Model/Training Model
YOLOX-T 32.9 31.81 / 18.91 55.34 / 11.63 18.1 YOLOX-T.yaml Inference Model/Training Model
YOLOX-X 51.8 84.06 / 77.28 390.38 / 272.88 351.5 YOLOX-X.yaml Inference Model/Training Model
Co-Deformable-DETR-R50 49.7 259.62 / 259.62 32413.76 / 32413.76 184 Co-Deformable-DETR-R50.yaml.yaml Inference Model/Training Model
Co-Deformable-DETR-Swin-T 48.0 (@640x640 input shape) 120.17 / 120.17 - / 15620.29 187 Co-Deformable-Swin-T.yaml Inference Model/Training Model
Co-DINO-R50 52.0 1123.23 / 1123.23 - / - 186 Co-DINO-R50.yaml Inference Model/Training Model
Co-DINO-Swin-L 55.9 (@640x640 input shape) - / - - / - 840 Co-DINO-Swin-L.yaml Inference Model/Training Model
Note: The above accuracy metrics are based on the COCO2017 validation set mAP(0.5:0.95). ## [Small Object Detection Module](../module_usage/tutorials/cv_modules/small_object_detection.en.md)
Model Name mAP (%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size (MB) yaml File Model Download Links
PP-YOLOE_plus_SOD-S 25.1 116.07 / 20.10 176.44 / 40.21 77.3 PP-YOLOE_plus_SOD-S.yaml Inference Model/Training Model
PP-YOLOE_plus_SOD-L 31.9 100.02 / 48.33 271.29 / 151.20 325.0 PP-YOLOE_plus_SOD-L.yaml Inference Model/Training Model
PP-YOLOE_plus_SOD-largesize-L 42.7 515.69 / 460.17 2816.08 / 1736.00 340.5 PP-YOLOE_plus_SOD-largesize-L.yaml Inference Model/Training Model

Note: The above accuracy metrics are based on the validation set mAP(0.5:0.95) of VisDrone-DET.

Open-Vocabulary Object Detection

Model mAP(0.5:0.95) mAP(0.5) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size (MB) yaml File Model Download Link
GroundingDINO-T 49.4 64.4 - / - - / - 658.3 GroundingDINO-T.yaml Inference Model
YOLO-Worldv2-L 44.4 59.8 - / - 292.14 / 292.14 421.4 YOLO-Worldv2-L.yaml Inference Model
Note: The above accuracy metrics are based on the COCO val2017 validation set mAP(0.5:0.95). ## [Open Vocabulary Segmentation](../module_usage/tutorials/cv_modules/open_vocabulary_segmentation.en.md)
Model GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size (MB) yaml File Model Download Link
SAM-H_box - / - - / - 2433.7 SAM-H_box.yaml Inference Model
SAM-H_point - / - - / - 2433.7 SAM-H_point.yaml Inference Model

Rotated Object Detection

Model mAP(%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size (MB) yaml File Model Download Link
PP-YOLOE-R-L 78.14 67.50 / 61.15 414.79 / 414.79 211.0 PP-YOLOE-R-L.yaml Inference Model/Training Model

Note: The above accuracy metrics are based on the DOTA validation set mAP(0.5:0.95).

## [Pedestrian Detection Module](../module_usage/tutorials/cv_modules/human_detection.en.md)
Model Name mAP (%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size (MB) yaml File Model Download Link
PP-YOLOE-L_human 48.0 30.59 / 26.64 180.05 / 112.70 196.1 PP-YOLOE-L_human.yaml Inference Model/Training Model
PP-YOLOE-S_human 42.5 10.26 / 6.66 54.01 / 23.48 28.8 PP-YOLOE-S_human.yaml Inference Model/Training Model
Note: The above accuracy metrics are based on the validation set mAP(0.5:0.95) of [CrowdHuman](https://bj.bcebos.com/v1/paddledet/data/crowdhuman.zip). ## [Vehicle Detection Module](../module_usage/tutorials/cv_modules/vehicle_detection.en.md)
Model Name mAP (%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size (MB) yaml File Model Download Link
PP-YOLOE-L_vehicle 63.9 30.30 / 26.27 169.28 / 111.88 196.1 PP-YOLOE-L_vehicle.yaml Inference Model/Training Model
PP-YOLOE-S_vehicle 61.3 10.54 / 6.69 52.73 / 23.58 28.8 PP-YOLOE-S_vehicle.yaml Inference Model/Training Model
Note: The above precision metrics are based on the validation set mAP(0.5:0.95) of [PPVehicle](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/modules/ppvehicle) ## [Face Detection Module](../module_usage/tutorials/cv_modules/face_detection.en.md)
Model Name AP (%)
Easy/Medium/Hard
GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size (MB) yaml File Model Download Link
BlazeFace 77.7/73.4/49.5 50.90 / 45.74 71.92 / 71.92 0.447 BlazeFace.yaml Inference Model/Training Model
BlazeFace-FPN-SSH 83.2/80.5/60.5 58.99 / 51.75 87.39 / 87.39 0.606 BlazeFace-FPN-SSH.yaml Inference Model/Training Model
PicoDet_LCNet_x2_5_face 93.7/90.7/68.1 33.91 / 26.53 153.56 / 79.21 28.9 PicoDet_LCNet_x2_5_face.yaml Inference Model/Training Model
PP-YOLOE_plus-S_face 93.9/91.8/79.8 21.28 / 11.09 137.26 / 72.09 26.5 PP-YOLOE_plus-S_face.yaml Inference Model/Training Model

Note: The above precision metrics are evaluated on the WIDER-FACE validation set with an input size of 640x640.

Anomaly Detection Module

Model Name mIoU GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size (MB) yaml File Model Download Link
STFPM 0.9901 4.94 / 1.63 34.88 / 34.88 22.5 STFPM.yaml Inference Model/Training Model
Note: The above precision metrics are the average anomaly scores on the validation set of [MVTec AD](https://www.mvtec.com/company/research/datasets/mvtec-ad). ## [ Human Keypoint Detection Module](../module_usage/tutorials//cv_modules/human_keypoint_detection.en.md)
Model Scheme Input Size AP(0.5:0.95) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size (MB) yaml File Model Download Link
PP-TinyPose_128x96 Top-Down 58.4 24.22 / 4.34 - / 6.19 4.9 PP-TinyPose_128x96.yaml Inference Model/Training Model
PP-TinyPose_256x192 Top-Down 68.3 21.73 / 3.59 - / 10.18 4.9 PP-TinyPose_256x192.yaml Inference Model/Training Model

Note: The above accuracy metrics are based on the COCO dataset AP(0.5:0.95), with detection boxes obtained from ground truth annotations.

3D Multi-modal Fusion Detection Module

Model mAP(%) NDS yaml File Model Download Link
BEVFusion 53.9 60.9 BEVFusion.yaml Inference Model/Training Model

Note: The above accuracy metrics are based on the nuscenes validation set with mAP(0.5:0.95) and NDS 60.9, and the precision type is FP32.

## [Semantic Segmentation Module](../module_usage/tutorials/cv_modules/semantic_segmentation.en.md)
Model Name mloU(%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size (MB) yaml File Model Download Link
Deeplabv3_Plus-R50 80.36 481.33 / 446.18 2952.95 / 1907.07 94.9 Deeplabv3_Plus-R50.yaml Inference Model/Training Model
Deeplabv3_Plus-R101 81.10 766.70 / 194.42 4441.56 / 2984.19 162.5 Deeplabv3_Plus-R101.yaml Inference Model/Training Model
Deeplabv3-R50 79.90 681.65 / 602.10 3786.41 / 3093.10 138.3 Deeplabv3-R50.yaml Inference Model/Training Model
Deeplabv3-R101 80.85 974.62 / 896.99 5222.60 / 4230.79 205.9 Deeplabv3-R101.yaml Inference Model/Training Model
OCRNet_HRNet-W18 80.67 271.02 / 221.38 1791.52 / 1061.62 43.1 OCRNet_HRNet-W18.yaml Inference Model/Training Model
OCRNet_HRNet-W48 82.15 582.92 / 536.28 3513.72 / 2543.10 270 OCRNet_HRNet-W48.yaml Inference Model/Training Model
PP-LiteSeg-T 73.10 28.12 / 23.84 398.31 / 398.31 28.5 PP-LiteSeg-T.yaml Inference Model/Training Model
PP-LiteSeg-B 75.25 35.69 / 35.69 485.10 / 485.10 47.0 PP-LiteSeg-B.yaml Inference Model/Training Model
SegFormer-B0 (slice) 76.73 11.1946 268.929 13.2 SegFormer-B0.yaml Inference Model/Training Model
SegFormer-B1 (slice) 78.35 17.9998 403.393 48.5 SegFormer-B1.yaml Inference Model/Training Model
SegFormer-B2 (slice) 81.60 48.0371 1248.52 96.9 SegFormer-B2.yaml Inference Model/Training Model
SegFormer-B3 (slice) 82.47 64.341 1666.35 167.3 SegFormer-B3.yaml Inference Model/Training Model
SegFormer-B4 (slice) 82.38 82.4336 1995.42 226.7 SegFormer-B4.yaml Inference Model/Training Model
SegFormer-B5 (slice) 82.58 97.3717 2420.19 229.7 SegFormer-B5.yaml Inference Model/Training Model
Note: The above accuracy metrics are based on the [Cityscapes](https://www.cityscapes-dataset.com/) dataset mIoU.
Model Name mIoU (%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size (MB) yaml File Model Download Link
SeaFormer_base(slice) 40.92 24.4073 397.574 30.8 SeaFormer_base.yaml Inference Model/Training Model
SeaFormer_large (slice) 43.66 27.8123 550.464 49.8 SeaFormer_large.yaml Inference Model/Training Model
SeaFormer_small (slice) 38.73 19.2295 358.343 14.3 SeaFormer_small.yaml Inference Model/Training Model
SeaFormer_tiny (slice) 34.58 13.9496 330.132 6.1 SeaFormer_tiny.yaml Inference Model/Training Model
MaskFormer_small 49.70 65.21 / 65.21 - / 629.85 242.5 MaskFormer_small.yaml Inference Model/Training Model
MaskFormer_tiny 46.69 47.95 / 47.95 - / 492.67 160.5 MaskFormer_tiny.yaml Inference Model/Training Model
Note: The above accuracy metrics are based on the [ADE20k](https://groups.csail.mit.edu/vision/datasets/ADE20K/) dataset. "Slice" indicates that the input images have been cropped. ## [Instance Segmentation Module](../module_usage/tutorials/cv_modules/instance_segmentation.en.md)
Model Name Mask AP GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size (MB) yaml File Model Download Link
Mask-RT-DETR-H 50.6 180.83 / 180.83 1711.24 / 1711.24 449.9 Mask-RT-DETR-H.yaml Inference Model/Training Model
Mask-RT-DETR-L 45.7 113.20 / 113.20 1179.56 / 1179.56 113.6 Mask-RT-DETR-L.yaml Inference Model/Training Model
Mask-RT-DETR-M 42.7 87.08 / 87.08 - / 2090.73 66.6 Mask-RT-DETR-M.yaml Inference Model/Training Model
Mask-RT-DETR-S 41.0 120.86 / 120.86 - / 2163.07 51.8 Mask-RT-DETR-S.yaml Inference Model/Training Model
Mask-RT-DETR-X 47.5 141.43 / 141.43 1379.14 / 1379.14 237.5 Mask-RT-DETR-X.yaml Inference Model/Training Model
Cascade-MaskRCNN-ResNet50-FPN 36.3 136.79 / 136.79 - / 5935.41 254.8 Cascade-MaskRCNN-ResNet50-FPN.yaml Inference Model/Training Model
Cascade-MaskRCNN-ResNet50-vd-SSLDv2-FPN 39.1 137.40 / 137.40 - / 6816.68 254.7 Cascade-MaskRCNN-ResNet50-vd-SSLDv2-FPN.yaml Inference Model/Training Model
MaskRCNN-ResNet50-FPN 35.6 112.79 / 112.79 - / 4912.37 157.5 MaskRCNN-ResNet50-FPN.yaml Inference Model/Training Model
MaskRCNN-ResNet50-vd-FPN 36.4 112.88 / 112.88 - / 5204.97 157.5 MaskRCNN-ResNet50-vd-FPN.yaml Inference Model/Training Model
MaskRCNN-ResNet50 32.8 181.60 / 181.60 - / 5523.45 127.8 MaskRCNN-ResNet50.yaml Inference Model/Training Model
MaskRCNN-ResNet101-FPN 36.6 138.84 / 138.84 - / 5107.74 225.4 MaskRCNN-ResNet101-FPN.yaml Inference Model/Training Model
MaskRCNN-ResNet101-vd-FPN 38.1 141.73 / 141.73 - / 5592.76 225.1 MaskRCNN-ResNet101-vd-FPN.yaml Inference Model/Training Model
MaskRCNN-ResNeXt101-vd-FPN 39.5 220.83 / 220.83 - / 5932.59 370.0 MaskRCNN-ResNeXt101-vd-FPN.yaml Inference Model/Training Model
PP-YOLOE_seg-S 32.5 243.41 / 222.30 2507.70 / 1282.35 31.5 PP-YOLOE_seg-S.yaml Inference Model/Training Model
SOLOv2 35.5 131.99 / 131.99 - / 2369.98 179.1 SOLOv2.yaml Inference Model/Training Model
Note: The above accuracy metrics are based on the Mask AP(0.5:0.95) on the [COCO2017](https://cocodataset.org/#home) validation set. ## [Text Detection Module](../module_usage/tutorials/ocr_modules/text_detection.en.md)
Model Detection Hmean (%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size (MB) yaml File Model Download Link
PP-OCRv5_server_det 83.8 89.55 / 70.19 383.15 / 383.15 84.3 PP-OCRv5_server_det.yaml Inference Model/Training Model
PP-OCRv5_mobile_det 79.0 10.67 / 6.36 57.77 / 28.15 4.7 PP-OCRv5_mobile_det.yaml Inference Model/Training Model
PP-OCRv4_server_det 82.56 127.82 / 98.87 585.95 / 489.77 109 PP-OCRv4_server_det.yaml Inference Model/Training Model
PP-OCRv4_mobile_det 63.8 9.87 / 4.17 56.60 / 20.79 4.7 PP-OCRv4_mobile_det.yaml Inference Model/Training Model
PP-OCRv3_mobile_det Accuracy comparable to PP-OCRv4_mobile_det 9.90 / 3.60 41.93 / 20.76 2.1 PP-OCRv3_mobile_det.yaml Inference Model/Training Model
PP-OCRv3_server_det 80.11 119.50 / 75.00 379.35 / 318.35 102.1 PP-OCRv3_server_det.yaml Inference Model/Training Model
Note: The evaluation dataset for the above accuracy metrics is the self-built Chinese and English dataset of PaddleOCR, covering multiple scenarios such as street view, web images, documents, and handwriting, with 593 images for text recognition. ## [Seal Text Detection Module](../module_usage/tutorials/ocr_modules/seal_text_detection.en.md)
Model Name Detection Hmean (%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size (MB) yaml File Model Download Link
PP-OCRv4_mobile_seal_det 96.36 9.70 / 3.56 50.38 / 19.64 4.7 PP-OCRv4_mobile_seal_det.yaml Inference Model/Training Model
PP-OCRv4_server_seal_det 98.40 124.64 / 91.57 545.68 / 439.86 109 PP-OCRv4_server_seal_det.yaml Inference Model/Training Model
Note: The evaluation set for the above precision metrics is the seal dataset built by PaddleX, which includes 500 seal images. ## [Text Recognition Module](../module_usage/tutorials/ocr_modules/text_recognition.en.md) * Chinese Text Recognition Models
Model Recognition Avg Accuracy(%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size (MB) yaml File Model Download Link
PP-OCRv5_server_rec 86.38 8.46 / 2.36 31.21 / 31.21 81 PP-OCRv5_server_rec.yaml Inference Model/Training Model
PP-OCRv5_mobile_rec 81.29 5.43 / 1.46 21.20 / 5.32 16 PP-OCRv5_mobile_rec.yaml Inference Model/Training Model
PP-OCRv4_server_rec_doc 86.58 8.69 / 2.78 37.93 / 37.93 182 PP-OCRv4_server_rec_doc.yaml Inference Model/Training Model
PP-OCRv4_mobile_rec 78.74 5.26 / 1.12 17.48 / 3.61 10.5 PP-OCRv4_mobile_rec.yaml Inference Model/Training Model
PP-OCRv4_server_rec 85.19 8.75 / 2.49 36.93 / 36.93 173 PP-OCRv4_server_rec.yaml Inference Model/Training Model
PP-OCRv3_mobile_rec 72.96 3.89 / 1.16 8.72 / 3.56 10.3 PP-OCRv3_mobile_rec.yaml Inference Model/Training Model

Note: The evaluation set for the above accuracy metrics is a Chinese dataset built by PaddleOCR, covering multiple scenarios such as street view, web images, documents, and handwriting, with 8367 images for text recognition.

Model Recognition Avg Accuracy(%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size (MB) yaml File Model Download Link
ch_SVTRv2_rec 68.81 10.38 / 8.31 66.52 / 30.83 80.5 ch_SVTRv2_rec.yaml Inference Model/Training Model

Note: The evaluation dataset for the above accuracy metrics is the PaddleOCR Algorithm Model Challenge - Task 1: OCR End-to-End Recognition Task Leaderboard A.

Model Recognition Avg Accuracy(%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size (MB) yaml File Model Download Link
ch_RepSVTR_rec 65.07 6.29 / 1.57 20.64 / 5.40 48.8 ch_RepSVTR_rec.yaml Inference Model/Training Model

Note: The evaluation dataset for the above accuracy metrics is the PaddleOCR Algorithm Model Challenge - Task 1: OCR End-to-End Recognition Task Leaderboard B.

English Recognition Model
Model Recognition Avg Accuracy(%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size (MB) yaml File Model Download Link
en_PP-OCRv4_mobile_rec 70.39 4.81 / 1.23 17.20 / 4.18 7.5 en_PP-OCRv4_mobile_rec.yaml Inference Model/Training Model
en_PP-OCRv3_mobile_rec 70.69 3.56 / 0.78 8.44 / 5.78 17.3 en_PP-OCRv3_mobile_rec.yaml Inference Model/Training Model

Note: The evaluation set for the above accuracy metrics is an English dataset built by PaddleX.

Multilingual Recognition Model

Model Recognition Avg Accuracy(%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size (MB) yaml File Model Download Link
korean_PP-OCRv5_mobile_rec 90.45 5.43 / 1.46 21.20 / 5.32 14 korean_PP-OCRv5_mobile_rec.yaml Inference Model/Pre-trained Model
latin_PP-OCRv5_mobile_rec 84.7 5.43 / 1.46 21.20 / 5.32 14 latin_PP-OCRv5_mobile_rec.yaml Inference Model/Pre-trained Model
eslav_PP-OCRv5_mobile_rec 85.8 5.43 / 1.46 21.20 / 5.32 14 eslav_PP-OCRv5_mobile_rec.yaml Inference Model/Pre-trained Model
korean_PP-OCRv3_mobile_rec 60.21 3.73 / 0.98 8.76 / 2.91 9.6 korean_PP-OCRv3_mobile_rec.yaml Inference Model/Training Model
japan_PP-OCRv3_mobile_rec 45.69 3.86 / 1.01 8.62 / 2.92 8.8 japan_PP-OCRv3_mobile_rec.yaml Inference Model/Training Model
chinese_cht_PP-OCRv3_mobile_rec 82.06 3.90 / 1.16 9.24 / 3.18 9.7 chinese_cht_PP-OCRv3_mobile_rec.yaml Inference Model/Training Model
te_PP-OCRv3_mobile_rec 95.88 3.59 / 0.81 8.28 / 6.21 7.8 te_PP-OCRv3_mobile_rec.yaml Inference Model/Training Model
ka_PP-OCRv3_mobile_rec 96.96 3.49 / 0.89 8.63 / 2.77 17.4 ka_PP-OCRv3_mobile_rec.yaml Inference Model/Training Model
ta_PP-OCRv3_mobile_rec 76.83 3.49 / 0.86 8.35 / 3.41 8.7 ta_PP-OCRv3_mobile_rec.yaml Inference Model/Training Model
latin_PP-OCRv3_mobile_rec 76.93 3.53 / 0.78 8.50 / 6.83 8.7 latin_PP-OCRv3_mobile_rec.yaml Inference Model/Training Model
arabic_PP-OCRv3_mobile_rec 73.55 3.60 / 0.83 8.44 / 4.69 17.3 arabic_PP-OCRv3_mobile_rec.yaml Inference Model/Training Model
cyrillic_PP-OCRv3_mobile_rec 94.28 3.56 / 0.79 8.22 / 2.76 8.7 cyrillic_PP-OCRv3_mobile_rec.yaml Inference Model/Training Model
devanagari_PP-OCRv3_mobile_rec 96.44 3.60 / 0.78 6.95 / 2.87 8.7 devanagari_PP-OCRv3_mobile_rec.yaml Inference Model/Training Model

Note: The evaluation set for the above accuracy metrics is a multi-language dataset built by PaddleX.

Formula Recognition Module

Model En-BLEU(%) Zh-BLEU(%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size (MB) yaml File Model Download Link
UniMERNet 85.91 43.50 1311.84 / 1311.84 - / 8288.07 1530 UniMERNet.yaml Inference Model/Training Model
PP-FormulaNet-S 87.00 45.71 182.25 / 182.25 - / 254.39 224 PP-FormulaNet-S.yaml Inference Model/Training Model
PP-FormulaNet-L 90.36 45.78 1482.03 / 1482.03 - / 3131.54 695
PP-FormulaNet-L.yaml Inference Model/Training Model
PP-FormulaNet_plus-S 88.71 53.32 179.20 / 179.20 - / 260.99 248 PP-FormulaNet_plus-S.yaml Inference Model/Training Model
PP-FormulaNet_plus-M 91.45 89.76 1040.27 / 1040.27 - / 1615.80 592 PP-FormulaNet_plus-M.yaml Inference Model/Training Model
PP-FormulaNet_plus-L 92.22 90.64 1476.07 / 1476.07 - / 3125.58 698 PP-FormulaNet_plus-L.yaml Inference Model/Training Model
LaTeX_OCR_rec 74.55 39.96 1088.89 / 1088.89 - / - 99 LaTeX_OCR_rec.yaml Inference Model/Training Model
Note: The above accuracy metrics are measured from the internal formula recognition test set of PaddleX. The BLEU score of LaTeX_OCR_rec on the LaTeX-OCR formula recognition test set is 0.8821. All model GPU inference times are based on Tesla V100 GPUs, with precision type FP32. ## [Table Structure Recognition Module](../module_usage/tutorials/ocr_modules/table_structure_recognition.en.md)
Model Accuracy (%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size (MB) yaml File Model Download Link
SLANet 59.52 23.96 / 21.75 - / 43.12 6.9 SLANet.yaml Inference Model/Training Model
SLANet_plus 63.69 23.43 / 22.16 - / 41.80 6.9 SLANet_plus.yaml Inference Model/Training Model
SLANeXt_wired 69.65 85.92 / 85.92 - / 501.66 351 SLANeXt_wired.yaml Inference Model/Training Model
SLANeXt_wireless SLANeXt_wireless.yaml Inference Model/Training Model
Note: The above accuracy metrics are measured from the high-difficulty Chinese table recognition dataset built internally by PaddleX. ## [Table Cell Detection Module](../module_usage/tutorials/ocr_modules/table_cells_detection.en.md)
Model mAP(%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size (MB) yaml File Model Download Link
RT-DETR-L_wired_table_cell_det 82.7 33.47 / 27.02 402.55 / 256.56 124M RT-DETR-L_wired_table_cell_det.yaml Inference Model/Training Model
RT-DETR-L_wireless_table_cell_det RT-DETR-L_wireless_table_cell_det.yaml Inference Model/Training Model

Note: The above accuracy metrics are measured from the internal table cell detection dataset of PaddleX.

## [Table Classification Module](../module_usage/tutorials/ocr_modules/table_classification.en.md)
Model Top1 Acc(%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size (MB) yaml File Model Download Link
PP-LCNet_x1_0_table_cls 94.2 2.62 / 0.60 3.17 / 1.14 6.6 PP-LCNet_x1_0_table_cls.yaml Inference Model/Training Model

Note: The above accuracy metrics are measured from the internal table classification dataset built by PaddleX.

## [Text Image Unwarping Module](../module_usage/tutorials/ocr_modules/text_image_unwarping.en.md)
Model Name CER GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size (MB) yaml File Model Download Link
UVDoc 0.179 19.05 / 19.05 - / 869.82 30.3 UVDoc.yaml Inference Model/Training Model
Note: The above accuracy metrics are measured from the image unwarping dataset built by PaddleX. ## [Layout Detection Module](../module_usage/tutorials/ocr_modules/layout_detection.en.md) * Layout detection model, including 20 common categories: document title, section title, text, page number, abstract, table of contents, references, footnote, header, footer, algorithm, formula, formula number, image, table, figure and table captions (figure caption, table caption, and chart caption), stamp, chart, sidebar text, and reference content.
Model Name mAP(0.5)(%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size (MB) yaml File Model Download Link
PP-DocLayout_plus-L 83.2 53.03 / 17.23 634.62 / 378.32 126.01 PP-DocLayout_plus-L.yaml Inference Model/Training Model

Note: The evaluation set for the accuracy metrics mentioned above is a custom-built layout detection dataset, which includes 1,300 document-type images such as Chinese and English papers, magazines, newspapers, research reports, PPTs, exam papers, and textbooks.

  • Layout detection model, including 1 category: block.
    Model Name mAP(0.5)(%) GPU Inference Time (ms)
    [Normal Mode / High-Performance Mode]
    CPU Inference Time (ms)
    [Normal Mode / High-Performance Mode]
    Model Storage Size (MB) yaml File Model Download Link
    PP-DocBlockLayout 95.9 34.60 / 28.54 506.43 / 256.83 123.92 PP-DocBlockLayout.yaml Inference Model/Training Model

Note: The evaluation set for the accuracy metrics mentioned above is a custom-built layout block detection dataset, which includes 1,000 document-type images such as Chinese and English papers, magazines, newspapers, research reports, PPTs, exam papers, and textbooks.

  • The layout detection model includes 23 common categories: document title, paragraph title, text, page number, abstract, table of contents, references, footnotes, header, footer, algorithm, formula, formula number, image, figure caption, table, table caption, seal, figure title, figure, header image, footer image, and sidebar text.
    Model Name mAP(0.5)(%) GPU Inference Time (ms)
    [Normal Mode / High-Performance Mode]
    CPU Inference Time (ms)
    [Normal Mode / High-Performance Mode]
    Model Storage Size (MB) yaml File Model Download Link
    PP-DocLayout-L 90.4 33.59 / 33.59 503.01 / 251.08 123.76 PP-DocLayout-L.yaml Inference Model/Training Model
    PP-DocLayout-M 75.2 13.03 / 4.72 43.39 / 24.44 22.578 PP-DocLayout-M.yaml Inference Model/Training Model
    PP-DocLayout-S 70.9 11.54 / 3.86 18.53 / 6.29 4.834 PP-DocLayout-S.yaml Inference Model/Training Model

Note: The evaluation set for the accuracy metrics mentioned above is a custom-built layout region detection dataset, which includes 500 common document-type images such as Chinese and English papers, magazines, and research reports.