PaddleX includes multiple pipelines, each containing several modules, and each module includes several models. You can choose which models to use based on the benchmark data below. If you prioritize model accuracy, choose models with higher accuracy. If you prioritize model inference speed, choose models with faster inference speed. If you prioritize model storage size, choose models with smaller storage size.
| Model Name | Top1 Acc (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (MB) | yaml File | Model Download Link |
|---|---|---|---|---|---|---|
| CLIP_vit_base_patch16_224 | 85.36 | 12.03 / 2.49 | 60.86 / 42.69 | 331 | CLIP_vit_base_patch16_224.yaml | Inference Model/Training Model |
| CLIP_vit_large_patch14_224 | 88.1 | 49.15 / 9.75 | 223.16 / 206.49 | 1040 | CLIP_vit_large_patch14_224.yaml | Inference Model/Training Model |
| ConvNeXt_base_224 | 83.84 | 11.37 / 5.65 | 143.98 / 52.31 | 313.9 | ConvNeXt_base_224.yaml | Inference Model/Training Model |
| ConvNeXt_base_384 | 84.90 | 29.48 / 11.17 | 293.76 / 134.27 | 313.9 | ConvNeXt_base_384.yaml | Inference Model/Training Model |
| ConvNeXt_large_224 | 84.26 | 22.99 / 12.73 | 220.79 / 113.24 | 700.7 | ConvNeXt_large_224.yaml | Inference Model/Training Model |
| ConvNeXt_large_384 | 85.27 | 58.90 / 24.63 | 509.48 / 260.27 | 700.7 | ConvNeXt_large_384.yaml | Inference Model/Training Model |
| ConvNeXt_small | 83.13 | 7.72 / 4.35 | 95.92 / 33.34 | 178.0 | ConvNeXt_small.yaml | Inference Model/Training Model |
| ConvNeXt_tiny | 82.03 | 6.00 / 2.47 | 63.59 / 18.23 | 101.4 | ConvNeXt_tiny.yaml | Inference Model/Training Model |
| FasterNet-L | 83.5 | 11.96 / 2.68 | 51.93 / 35.33 | 357.1 | FasterNet-L.yaml | Inference Model/Training Model |
| FasterNet-M | 83.0 | 11.17 / 2.16 | 38.49 / 21.17 | 204.6 | FasterNet-M.yaml | Inference Model/Training Model |
| FasterNet-S | 81.3 | 7.70 / 1.24 | 19.51 / 11.22 | 119.3 | FasterNet-S.yaml | Inference Model/Training Model |
| FasterNet-T0 | 71.9 | 4.73 / 0.82 | 6.40 / 1.96 | 15.1 | FasterNet-T0.yaml | Inference Model/Training Model |
| FasterNet-T1 | 75.9 | 4.80 / 0.80 | 8.14 / 3.13 | 29.2 | FasterNet-T1.yaml | Inference Model/Training Model |
| FasterNet-T2 | 79.1 | 6.10 / 0.88 | 12.71 / 5.35 | 57.4 | FasterNet-T2.yaml | Inference Model/Training Model |
| MobileNetV1_x0_5 | 63.5 | 1.98 / 0.51 | 2.50 / 1.04 | 4.8 | MobileNetV1_x0_5.yaml | Inference Model/Training Model |
| MobileNetV1_x0_25 | 51.4 | 1.99 / 0.45 | 1.82 / 0.73 | 1.8 | MobileNetV1_x0_25.yaml | Inference Model/Training Model |
| MobileNetV1_x0_75 | 68.8 | 2.33 / 0.41 | 3.33 / 1.34 | 9.3 | MobileNetV1_x0_75.yaml | Inference Model/Training Model |
| MobileNetV1_x1_0 | 71.0 | 2.31 / 0.45 | 3.91 / 1.89 | 15.2 | MobileNetV1_x1_0.yaml | Inference Model/Training Model |
| MobileNetV2_x0_5 | 65.0 | 3.58 / 0.62 | 3.86 / 1.23 | 7.1 | MobileNetV2_x0_5.yaml | Inference Model/Training Model |
| MobileNetV2_x0_25 | 53.2 | 3.05 / 0.66 | 3.30 / 0.98 | 5.5 | MobileNetV2_x0_25.yaml | Inference Model/Training Model |
| MobileNetV2_x1_0 | 72.2 | 3.85 / 0.63 | 5.50 / 1.87 | 12.6 | MobileNetV2_x1_0.yaml | Inference Model/Training Model |
| MobileNetV2_x1_5 | 74.1 | 3.93 / 0.73 | 8.84 / 3.12 | 25.0 | MobileNetV2_x1_5.yaml | Inference Model/Training Model |
| MobileNetV2_x2_0 | 75.2 | 3.89 / 0.79 | 10.36 / 4.50 | 41.2 | MobileNetV2_x2_0.yaml | Inference Model/Training Model |
| MobileNetV3_large_x0_5 | 69.2 | 4.60 / 0.77 | 5.32 / 1.58 | 9.6 | MobileNetV3_large_x0_5.yaml | Inference Model/Training Model |
| MobileNetV3_large_x0_35 | 64.3 | 4.44 / 0.75 | 5.20 / 1.50 | 7.5 | MobileNetV3_large_x0_35.yaml | Inference Model/Training Model |
| MobileNetV3_large_x0_75 | 73.1 | 5.30 / 0.85 | 6.02 / 1.93 | 14.0 | MobileNetV3_large_x0_75.yaml | Inference Model/Training Model |
| MobileNetV3_large_x1_0 | 75.3 | 5.38 / 0.81 | 7.16 / 2.19 | 19.5 | MobileNetV3_large_x1_0.yaml | Inference Model/Training Model |
| MobileNetV3_large_x1_25 | 76.4 | 5.54 / 0.84 | 7.06 / 2.84 | 26.5 | MobileNetV3_large_x1_25.yaml | Inference Model/Training Model |
| MobileNetV3_small_x0_5 | 59.2 | 3.87 / 0.77 | 4.90 / 1.32 | 6.8 | MobileNetV3_small_x0_5.yaml | Inference Model/Training Model |
| MobileNetV3_small_x0_35 | 53.0 | 3.68 / 0.77 | 3.94 / 1.27 | 6.0 | MobileNetV3_small_x0_35.yaml | Inference Model/Training Model |
| MobileNetV3_small_x0_75 | 66.0 | 3.92 / 0.77 | 4.68 / 1.39 | 8.5 | MobileNetV3_small_x0_75.yaml | Inference Model/Training Model |
| MobileNetV3_small_x1_0 | 68.2 | 4.23 / 0.78 | 5.24 / 1.48 | 10.5 | MobileNetV3_small_x1_0.yaml | Inference Model/Training Model |
| MobileNetV3_small_x1_25 | 70.7 | 4.59 / 0.79 | 5.36 / 1.63 | 13.0 | MobileNetV3_small_x1_25.yaml | Inference Model/Training Model |
| MobileNetV4_conv_large | 83.4 | 9.04 / 2.28 | 34.34 / 22.01 | 125.2 | MobileNetV4_conv_large.yaml | Inference Model/Training Model |
| MobileNetV4_conv_medium | 79.9 | 5.70 / 1.05 | 13.78 / 5.64 | 37.6 | MobileNetV4_conv_medium.yaml | Inference Model/Training Model |
| MobileNetV4_conv_small | 74.6 | 3.81 / 0.55 | 5.24 / 1.50 | 14.7 | MobileNetV4_conv_small.yaml | Inference Model/Training Model |
| MobileNetV4_hybrid_large | 83.8 | 13.43 / 4.28 | 61.16 / 31.06 | 145.1 | MobileNetV4_hybrid_large.yaml | Inference Model/Training Model |
| MobileNetV4_hybrid_medium | 80.5 | 11.82 / 1.30 | 22.01 / 6.06 | 42.9 | MobileNetV4_hybrid_medium.yaml | Inference Model/Training Model |
| PP-HGNet_base | 85.0 | 13.43 / 3.81 | 71.24 / 51.48 | 249.4 | PP-HGNet_base.yaml | Inference Model/Training Model |
| PP-HGNet_small | 81.51 | 5.87 / 1.68 | 25.58 / 18.50 | 86.5 | PP-HGNet_small.yaml | Inference Model/Training Model |
| PP-HGNet_tiny | 79.83 | 5.84 / 1.38 | 17.03 / 10.58 | 52.4 | PP-HGNet_tiny.yaml | Inference Model/Training Model |
| PP-HGNetV2-B0 | 77.77 | 4.41 / 0.87 | 10.58 / 1.87 | 21.4 | PP-HGNetV2-B0.yaml | Inference Model/Training Model |
| PP-HGNetV2-B1 | 79.18 | 4.52 / 0.73 | 11.98 / 2.28 | 22.6 | PP-HGNetV2-B1.yaml | Inference Model/Training Model |
| PP-HGNetV2-B2 | 81.74 | 6.67 / 0.96 | 14.22 / 4.04 | 39.9 | PP-HGNetV2-B2.yaml | Inference Model/Training Model |
| PP-HGNetV2-B3 | 82.98 | 7.47 / 1.94 | 17.73 / 5.63 | 57.9 | PP-HGNetV2-B3.yaml | Inference Model/Training Model |
| PP-HGNetV2-B4 | 83.57 | 7.05 / 1.16 | 16.23 / 7.55 | 70.4 | PP-HGNetV2-B4.yaml | Inference Model/Training Model |
| PP-HGNetV2-B5 | 84.75 | 10.38 / 1.95 | 31.53 / 18.02 | 140.8 | PP-HGNetV2-B5.yaml | Inference Model/Training Model |
| PP-HGNetV2-B6 | 86.30 | 13.86 / 3.28 | 67.25 / 56.70 | 268.4 | PP-HGNetV2-B6.yaml | Inference Model/Training Model |
| PP-LCNet_x0_5 | 63.14 | 2.41 / 0.60 | 2.54 / 0.90 | 6.7 | PP-LCNet_x0_5.yaml | Inference Model/Training Model |
| PP-LCNet_x0_25 | 51.86 | 2.16 / 0.60 | 2.73 / 0.77 | 5.5 | PP-LCNet_x0_25.yaml | Inference Model/Training Model |
| PP-LCNet_x0_35 | 58.09 | 2.18 / 0.60 | 2.32 / 0.89 | 5.9 | PP-LCNet_x0_35.yaml | Inference Model/Training Model |
| PP-LCNet_x0_75 | 68.18 | 2.61 / 0.58 | 3.00 / 1.09 | 8.4 | PP-LCNet_x0_75.yaml | Inference Model/Training Model |
| PP-LCNet_x1_0 | 71.32 | 2.59 / 0.68 | 3.18 / 1.19 | 10.5 | PP-LCNet_x1_0.yaml | Inference Model/Training Model |
| PP-LCNet_x1_5 | 73.71 | 2.60 / 0.68 | 3.98 / 1.66 | 16.0 | PP-LCNet_x1_5.yaml | Inference Model/Training Model |
| PP-LCNet_x2_0 | 75.18 | 2.53 / 0.68 | 5.21 / 2.24 | 23.2 | PP-LCNet_x2_0.yaml | Inference Model/Training Model |
| PP-LCNet_x2_5 | 76.60 | 2.76 / 0.67 | 6.78 / 3.20 | 32.1 | PP-LCNet_x2_5.yaml | Inference Model/Training Model |
| PP-LCNetV2_base | 77.05 | 4.04 / 0.62 | 6.80 / 2.67 | 23.7 | PP-LCNetV2_base.yaml | Inference Model/Training Model |
| PP-LCNetV2_large | 78.51 | 4.91 / 0.85 | 10.30 / 5.38 | 37.3 | PP-LCNetV2_large.yaml | Inference Model/Training Model |
| PP-LCNetV2_small | 73.97 | 3.07 / 0.60 | 4.28 / 1.58 | 14.6 | PP-LCNetV2_small.yaml | Inference Model/Training Model |
| ResNet18_vd | 72.3 | 2.87 / 0.77 | 7.91 / 4.64 | 41.5 | ResNet18_vd.yaml | Inference Model/Training Model |
| ResNet18 | 71.0 | 2.63 / 0.74 | 6.30 / 4.16 | 41.5 | ResNet18.yaml | Inference Model/Training Model |
| ResNet34_vd | 76.0 | 4.47 / 1.09 | 14.30 / 8.33 | 77.3 | ResNet34_vd.yaml | Inference Model/Training Model |
| ResNet34 | 74.6 | 4.20 / 1.07 | 12.53 / 7.83 | 77.3 | ResNet34.yaml | Inference Model/Training Model |
| ResNet50_vd | 79.1 | 6.66 / 1.23 | 16.34 / 10.00 | 90.8 | ResNet50_vd.yaml | Inference Model/Training Model |
| ResNet50 | 76.5 | 6.25 / 1.17 | 15.93 / 9.72 | 90.8 | ResNet50.yaml | Inference Model/Training Model |
| ResNet101_vd | 80.2 | 11.93 / 2.07 | 32.47 / 23.62 | 158.4 | ResNet101_vd.yaml | Inference Model/Training Model |
| ResNet101 | 77.6 | 13.73 / 2.06 | 29.69 / 17.72 | 158.7 | ResNet101.yaml | Inference Model/Training Model |
| ResNet152_vd | 80.6 | 20.70 / 2.82 | 43.90 / 27.91 | 214.3 | ResNet152_vd.yaml | Inference Model/Training Model |
| ResNet152 | 78.3 | 17.86 / 2.79 | 46.19 / 26.00 | 214.2 | ResNet152.yaml | Inference Model/Training Model |
| ResNet200_vd | 80.9 | 22.55 / 3.54 | 58.54 / 35.70 | 266.0 | ResNet200_vd.yaml | Inference Model/Training Model |
| StarNet-S1 | 73.6 | 6.24 / 0.96 | 8.78 / 2.44 | 11.2 | StarNet-S1.yaml | Inference Model/Training Model |
| StarNet-S2 | 74.8 | 4.78 / 0.85 | 7.24 / 2.48 | 14.3 | StarNet-S2.yaml | Inference Model/Training Model |
| StarNet-S3 | 77.0 | 6.77 / 1.07 | 9.69 / 3.35 | 22.2 | StarNet-S3.yaml | Inference Model/Training Model |
| StarNet-S4 | 79.0 | 9.01 / 1.48 | 14.79 / 4.58 | 28.9 | StarNet-S4.yaml | Inference Model/Training Model |
| SwinTransformer_base_patch4_window7_224 | 83.37 | 13.04 / 10.77 | 133.79 / 118.45 | 340 | SwinTransformer_base_patch4_window7_224.yaml | Inference Model/Training Model |
| SwinTransformer_base_patch4_window12_384 | 84.17 | 33.99 / 28.42 | 400.19 / 317.36 | 311.4 | SwinTransformer_base_patch4_window12_384.yaml | Inference Model/Training Model |
| SwinTransformer_large_patch4_window7_224 | 86.19 | 23.69 / 6.18 | 198.60 / 177.18 | 694.8 | SwinTransformer_large_patch4_window7_224.yaml | Inference Model/Training Model |
| SwinTransformer_large_patch4_window12_384 | 87.06 | 68.07 / 14.84 | 609.07 / 525.72 | 696.1 | SwinTransformer_large_patch4_window12_384.yaml | Inference Model/Training Model |
| SwinTransformer_small_patch4_window7_224 | 83.21 | 12.17 / 3.51 | 111.03 / 92.51 | 175.6 | SwinTransformer_small_patch4_window7_224.yaml | Inference Model/Training Model |
| SwinTransformer_tiny_patch4_window7_224 | 81.10 | 7.11 / 2.01 | 62.72 / 47.35 | 100.1 | SwinTransformer_tiny_patch4_window7_224.yaml | Inference Model/Training Model |
| Model Name | mAP (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (MB) | yaml File | Model Download Link |
|---|---|---|---|---|---|---|
| CLIP_vit_base_patch16_448_ML | 89.15 | 48.87 / 8.10 | 275.33 / 188.48 | 325.6 | CLIP_vit_base_patch16_448_ML.yaml | Inference Model/Training Model |
| PP-HGNetV2-B0_ML | 80.98 | 7.15 / 1.77 | 21.35 / 8.19 | 39.6 | PP-HGNetV2-B0_ML.yaml | Inference Model/Training Model |
| PP-HGNetV2-B4_ML | 87.96 | 8.11 / 2.82 | 44.76 / 29.38 | 88.5 | PP-HGNetV2-B4_ML.yaml | Inference Model/Training Model |
| PP-HGNetV2-B6_ML | 91.06 | 34.54 / 8.22 | 189.17 / 189.17 | 286.5 | PP-HGNetV2-B6_ML.yaml | Inference Model/Training Model |
| PP-LCNet_x1_0_ML | 77.96 | 5.28 / 1.62 | 13.16 / 5.61 | 29.4 | PP-LCNet_x1_0_ML.yaml | Inference Model/Training Model |
| ResNet50_ML | 83.42 | 10.54 / 2.97 | 55.39 / 35.52 | 108.9 | ResNet50_ML.yaml | Inference Model/Training Model |
| Model Name | mA (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (MB) | yaml File | Model Download Link |
|---|---|---|---|---|---|---|
| PP-LCNet_x1_0_pedestrian_attribute | 92.2 | 2.52 / 0.66 | 2.60 / 1.07 | 6.7 | PP-LCNet_x1_0_pedestrian_attribute.yaml | Inference Model/Training Model |
| Model Name | mA (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (MB) | yaml File | Model Download Link |
|---|---|---|---|---|---|---|
| PP-LCNet_x1_0_vehicle_attribute | 91.7 | 2.53 / 0.67 | 2.73 / 1.10 | 6.7 | PP-LCNet_x1_0_vehicle_attribute.yaml | Inference Model/Training Model |
| Model Name | recall@1 (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (MB) | yaml File | Model Download Link |
|---|---|---|---|---|---|---|
| PP-ShiTuV2_rec | 84.2 | 3.91 / 1.06 | 6.82 / 2.89 | 16.3 | PP-ShiTuV2_rec.yaml | Inference Model/Training Model |
| PP-ShiTuV2_rec_CLIP_vit_base | 88.69 | 12.57 / 11.62 | 67.09 / 67.09 | 306.6 | PP-ShiTuV2_rec_CLIP_vit_base.yaml | Inference Model/Training Model |
| PP-ShiTuV2_rec_CLIP_vit_large | 91.03 | 49.85 / 49.85 | 229.14 / 229.14 | 1050 | PP-ShiTuV2_rec_CLIP_vit_large.yaml | Inference Model/Training Model |
| Model Name | Output Feature Dimension | Acc (%) AgeDB-30/CFP-FP/LFW |
GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (MB) | yaml File | Model Download Link |
|---|---|---|---|---|---|---|---|
| MobileFaceNet | 128 | 96.28/96.71/99.58 | 3.31 / 0.73 | 5.93 / 1.30 | 4.1 | MobileFaceNet.yaml | Inference Model/Training Model |
| ResNet50_face | 512 | 98.12/98.56/99.77 | 6.12 / 3.11 | 15.85 / 9.44 | 87.2 | ResNet50_face.yaml | Inference Model/Training Model |
| Model Name | mAP (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (MB) | yaml File | Model Download Link |
|---|---|---|---|---|---|---|
| PP-ShiTuV2_det | 41.5 | 11.81 / 4.53 | 43.03 / 25.31 | 27.54 | PP-ShiTuV2_det.yaml | Inference Model/Training Model |
| Model Name | mAP (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (MB) | yaml File | Model Download Link |
|---|---|---|---|---|---|---|
| Cascade-FasterRCNN-ResNet50-FPN | 41.1 | 120.28 / 120.28 | - / 6514.61 | 245.4 | Cascade-FasterRCNN-ResNet50-FPN.yaml | Inference Model/Training Model |
| Cascade-FasterRCNN-ResNet50-vd-SSLDv2-FPN | 45.0 | 124.10 / 124.10 | - / 6709.52 | 246.2 | Cascade-FasterRCNN-ResNet50-vd-SSLDv2-FPN.yaml | Inference Model/Training Model |
| CenterNet-DLA-34 | 37.6 | 67.19 / 67.19 | 6622.61 / 6622.61 | 75.4 | CenterNet-DLA-34.yaml | Inference Model/Training Model |
| CenterNet-ResNet50 | 38.9 | 216.06 / 216.06 | 2545.79 / 2545.79 | 319.7 | CenterNet-ResNet50.yaml | Inference Model/Training Model |
| DETR-R50 | 42.3 | 58.80 / 26.90 | 370.96 / 208.77 | 159.3 | DETR-R50.yaml | Inference Model/Training Model |
| FasterRCNN-ResNet34-FPN | 37.8 | 76.90 / 76.90 | - / 4136.79 | 137.5 | FasterRCNN-ResNet34-FPN.yaml | Inference Model/Training Model |
| FasterRCNN-ResNet50-FPN | 38.4 | 95.48 / 95.48 | - / 3693.90 | 148.1 | FasterRCNN-ResNet50-FPN.yaml | Inference Model/Training Model |
| FasterRCNN-ResNet50-vd-FPN | 39.5 | 98.03 / 98.03 | - / 4278.36 | 148.1 | FasterRCNN-ResNet50-vd-FPN.yaml | Inference Model/Training Model |
| FasterRCNN-ResNet50-vd-SSLDv2-FPN | 41.4 | 99.23 / 99.23 | - / 4415.68 | 148.1 | FasterRCNN-ResNet50-vd-SSLDv2-FPN.yaml | Inference Model/Training Model |
| FasterRCNN-ResNet50 | 36.7 | 129.10 / 129.10 | - / 3868.44 | 120.2 | FasterRCNN-ResNet50.yaml | Inference Model/Training Model |
| FasterRCNN-ResNet101-FPN | 41.4 | 131.48 / 131.48 | - / 4380.00 | 216.3 | FasterRCNN-ResNet101-FPN.yaml | Inference Model/Training Model |
| FasterRCNN-ResNet101 | 39.0 | 216.71 / 216.71 | - / 5376.45 | 188.1 | FasterRCNN-ResNet101.yaml | Inference Model/Training Model |
| FasterRCNN-ResNeXt101-vd-FPN | 43.4 | 234.38 / 234.38 | - / 6154.61 | 360.6 | FasterRCNN-ResNeXt101-vd-FPN.yaml | Inference Model/Training Model |
| FasterRCNN-Swin-Tiny-FPN | 42.6 | 65.92 / 65.92 | - / 2468.98 | 159.8 | FasterRCNN-Swin-Tiny-FPN.yaml | Inference Model/Training Model |
| FCOS-ResNet50 | 39.6 | 101.02 / 34.42 | 752.15 / 752.15 | 124.2 | FCOS-ResNet50.yaml | Inference Model/Training Model |
| PicoDet-L | 42.6 | 14.31 / 11.06 | 45.95 / 25.06 | 20.9 | PicoDet-L.yaml | Inference Model/Training Model |
| PicoDet-M | 37.5 | 10.48 / 5.00 | 22.88 / 9.03 | 16.8 | PicoDet-M.yaml | Inference Model/Training Model |
| PicoDet-S | 29.1 | 9.15 / 3.26 | 16.06 / 4.04 | 4.4 | PicoDet-S.yaml | Inference Model/Training Model |
| PicoDet-XS | 26.2 | 9.54 / 3.52 | 17.96 / 5.38 | 5.7 | PicoDet-XS.yaml | Inference Model/Training Model |
| PP-YOLOE_plus-L | 52.9 | 32.06 / 28.00 | 185.32 / 116.21 | 185.3 | PP-YOLOE_plus-L.yaml | Inference Model/Training Model |
| PP-YOLOE_plus-M | 49.8 | 18.37 / 15.04 | 108.77 / 63.48 | 83.2 | PP-YOLOE_plus-M.yaml | Inference Model/Training Model |
| PP-YOLOE_plus-S | 43.7 | 11.43 / 7.52 | 60.16 / 26.94 | 28.3 | PP-YOLOE_plus-S.yaml | Inference Model/Training Model |
| PP-YOLOE_plus-X | 54.7 | 56.28 / 50.60 | 292.08 / 212.24 | 349.4 | PP-YOLOE_plus-X.yaml | Inference Model/Training Model |
| RT-DETR-H | 56.3 | 114.57 / 101.56 | 938.20 / 938.20 | 435.8 | RT-DETR-H.yaml | Inference Model/Training Model |
| RT-DETR-L | 53.0 | 34.76 / 27.60 | 495.39 / 247.68 | 113.7 | RT-DETR-L.yaml | Inference Model/Training Model |
| RT-DETR-R18 | 46.5 | 19.11 / 14.82 | 263.13 / 143.05 | 70.7 | RT-DETR-R18.yaml | Inference Model/Training Model |
| RT-DETR-R50 | 53.1 | 41.11 / 10.12 | 536.20 / 482.86 | 149.1 | RT-DETR-R50.yaml | Inference Model/Training Model |
| RT-DETR-X | 54.8 | 61.91 / 51.41 | 639.79 / 639.79 | 232.9 | RT-DETR-X.yaml | Inference Model/Training Model |
| YOLOv3-DarkNet53 | 39.1 | 39.62 / 35.54 | 166.57 / 136.34 | 219.7 | YOLOv3-DarkNet53.yaml | Inference Model/Training Model |
| YOLOv3-MobileNetV3 | 31.4 | 16.54 / 6.21 | 64.37 / 45.55 | 83.8 | YOLOv3-MobileNetV3.yaml | Inference Model/Training Model |
| YOLOv3-ResNet50_vd_DCN | 40.6 | 31.64 / 26.72 | 226.75 / 226.75 | 163.0 | YOLOv3-ResNet50_vd_DCN.yaml | Inference Model/Training Model |
| YOLOX-L | 50.1 | 49.68 / 45.03 | 232.52 / 156.24 | 192.5 | YOLOX-L.yaml | Inference Model/Training Model |
| YOLOX-M | 46.9 | 43.46 / 29.52 | 147.64 / 80.06 | 90.0 | YOLOX-M.yaml | Inference Model/Training Model |
| YOLOX-N | 26.1 | 42.94 / 17.79 | 64.15 / 7.19 | 3.4 | YOLOX-N.yaml | Inference Model/Training Model |
| YOLOX-S | 40.4 | 46.53 / 29.34 | 98.37 / 35.02 | 32.0 | YOLOX-S.yaml | Inference Model/Training Model |
| YOLOX-T | 32.9 | 31.81 / 18.91 | 55.34 / 11.63 | 18.1 | YOLOX-T.yaml | Inference Model/Training Model |
| YOLOX-X | 51.8 | 84.06 / 77.28 | 390.38 / 272.88 | 351.5 | YOLOX-X.yaml | Inference Model/Training Model |
| Co-Deformable-DETR-R50 | 49.7 | 259.62 / 259.62 | 32413.76 / 32413.76 | 184 | Co-Deformable-DETR-R50.yaml.yaml | Inference Model/Training Model |
| Co-Deformable-DETR-Swin-T | 48.0 (@640x640 input shape) | 120.17 / 120.17 | - / 15620.29 | 187 | Co-Deformable-Swin-T.yaml | Inference Model/Training Model |
| Co-DINO-R50 | 52.0 | 1123.23 / 1123.23 | - / - | 186 | Co-DINO-R50.yaml | Inference Model/Training Model |
| Co-DINO-Swin-L | 55.9 (@640x640 input shape) | - / - | - / - | 840 | Co-DINO-Swin-L.yaml | Inference Model/Training Model |
| Model Name | mAP (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (MB) | yaml File | Model Download Links |
|---|---|---|---|---|---|---|
| PP-YOLOE_plus_SOD-S | 25.1 | 116.07 / 20.10 | 176.44 / 40.21 | 77.3 | PP-YOLOE_plus_SOD-S.yaml | Inference Model/Training Model |
| PP-YOLOE_plus_SOD-L | 31.9 | 100.02 / 48.33 | 271.29 / 151.20 | 325.0 | PP-YOLOE_plus_SOD-L.yaml | Inference Model/Training Model |
| PP-YOLOE_plus_SOD-largesize-L | 42.7 | 515.69 / 460.17 | 2816.08 / 1736.00 | 340.5 | PP-YOLOE_plus_SOD-largesize-L.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are based on the validation set mAP(0.5:0.95) of VisDrone-DET.
| Model | mAP(0.5:0.95) | mAP(0.5) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (MB) | yaml File | Model Download Link |
|---|---|---|---|---|---|---|---|
| GroundingDINO-T | 49.4 | 64.4 | - / - | - / - | 658.3 | GroundingDINO-T.yaml | Inference Model |
| YOLO-Worldv2-L | 44.4 | 59.8 | - / - | 292.14 / 292.14 | 421.4 | YOLO-Worldv2-L.yaml | Inference Model |
| Model | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (MB) | yaml File | Model Download Link |
|---|---|---|---|---|---|
| SAM-H_box | - / - | - / - | 2433.7 | SAM-H_box.yaml | Inference Model |
| SAM-H_point | - / - | - / - | 2433.7 | SAM-H_point.yaml | Inference Model |
| Model | mAP(%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (MB) | yaml File | Model Download Link |
|---|---|---|---|---|---|---|
| PP-YOLOE-R-L | 78.14 | 67.50 / 61.15 | 414.79 / 414.79 | 211.0 | PP-YOLOE-R-L.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are based on the DOTA validation set mAP(0.5:0.95).
## [Pedestrian Detection Module](../module_usage/tutorials/cv_modules/human_detection.en.md)| Model Name | mAP (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (MB) | yaml File | Model Download Link |
|---|---|---|---|---|---|---|
| PP-YOLOE-L_human | 48.0 | 30.59 / 26.64 | 180.05 / 112.70 | 196.1 | PP-YOLOE-L_human.yaml | Inference Model/Training Model |
| PP-YOLOE-S_human | 42.5 | 10.26 / 6.66 | 54.01 / 23.48 | 28.8 | PP-YOLOE-S_human.yaml | Inference Model/Training Model |
| Model Name | mAP (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (MB) | yaml File | Model Download Link |
|---|---|---|---|---|---|---|
| PP-YOLOE-L_vehicle | 63.9 | 30.30 / 26.27 | 169.28 / 111.88 | 196.1 | PP-YOLOE-L_vehicle.yaml | Inference Model/Training Model |
| PP-YOLOE-S_vehicle | 61.3 | 10.54 / 6.69 | 52.73 / 23.58 | 28.8 | PP-YOLOE-S_vehicle.yaml | Inference Model/Training Model |
| Model Name | AP (%) Easy/Medium/Hard |
GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (MB) | yaml File | Model Download Link |
|---|---|---|---|---|---|---|
| BlazeFace | 77.7/73.4/49.5 | 50.90 / 45.74 | 71.92 / 71.92 | 0.447 | BlazeFace.yaml | Inference Model/Training Model |
| BlazeFace-FPN-SSH | 83.2/80.5/60.5 | 58.99 / 51.75 | 87.39 / 87.39 | 0.606 | BlazeFace-FPN-SSH.yaml | Inference Model/Training Model |
| PicoDet_LCNet_x2_5_face | 93.7/90.7/68.1 | 33.91 / 26.53 | 153.56 / 79.21 | 28.9 | PicoDet_LCNet_x2_5_face.yaml | Inference Model/Training Model |
| PP-YOLOE_plus-S_face | 93.9/91.8/79.8 | 21.28 / 11.09 | 137.26 / 72.09 | 26.5 | PP-YOLOE_plus-S_face.yaml | Inference Model/Training Model |
Note: The above precision metrics are evaluated on the WIDER-FACE validation set with an input size of 640x640.
| Model Name | mIoU | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (MB) | yaml File | Model Download Link |
|---|---|---|---|---|---|---|
| STFPM | 0.9901 | 4.94 / 1.63 | 34.88 / 34.88 | 22.5 | STFPM.yaml | Inference Model/Training Model |
| Model | Scheme | Input Size | AP(0.5:0.95) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (MB) | yaml File | Model Download Link |
|---|---|---|---|---|---|---|---|---|
| PP-TinyPose_128x96 | Top-Down | 58.4 | 24.22 / 4.34 | - / 6.19 | 4.9 | PP-TinyPose_128x96.yaml | Inference Model/Training Model | |
| PP-TinyPose_256x192 | Top-Down | 68.3 | 21.73 / 3.59 | - / 10.18 | 4.9 | PP-TinyPose_256x192.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are based on the COCO dataset AP(0.5:0.95), with detection boxes obtained from ground truth annotations.
| Model | mAP(%) | NDS | yaml File | Model Download Link |
|---|---|---|---|---|
| BEVFusion | 53.9 | 60.9 | BEVFusion.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are based on the nuscenes validation set with mAP(0.5:0.95) and NDS 60.9, and the precision type is FP32.
## [Semantic Segmentation Module](../module_usage/tutorials/cv_modules/semantic_segmentation.en.md)| Model Name | mloU(%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (MB) | yaml File | Model Download Link |
|---|---|---|---|---|---|---|
| Deeplabv3_Plus-R50 | 80.36 | 481.33 / 446.18 | 2952.95 / 1907.07 | 94.9 | Deeplabv3_Plus-R50.yaml | Inference Model/Training Model |
| Deeplabv3_Plus-R101 | 81.10 | 766.70 / 194.42 | 4441.56 / 2984.19 | 162.5 | Deeplabv3_Plus-R101.yaml | Inference Model/Training Model |
| Deeplabv3-R50 | 79.90 | 681.65 / 602.10 | 3786.41 / 3093.10 | 138.3 | Deeplabv3-R50.yaml | Inference Model/Training Model |
| Deeplabv3-R101 | 80.85 | 974.62 / 896.99 | 5222.60 / 4230.79 | 205.9 | Deeplabv3-R101.yaml | Inference Model/Training Model |
| OCRNet_HRNet-W18 | 80.67 | 271.02 / 221.38 | 1791.52 / 1061.62 | 43.1 | OCRNet_HRNet-W18.yaml | Inference Model/Training Model |
| OCRNet_HRNet-W48 | 82.15 | 582.92 / 536.28 | 3513.72 / 2543.10 | 270 | OCRNet_HRNet-W48.yaml | Inference Model/Training Model |
| PP-LiteSeg-T | 73.10 | 28.12 / 23.84 | 398.31 / 398.31 | 28.5 | PP-LiteSeg-T.yaml | Inference Model/Training Model |
| PP-LiteSeg-B | 75.25 | 35.69 / 35.69 | 485.10 / 485.10 | 47.0 | PP-LiteSeg-B.yaml | Inference Model/Training Model |
| SegFormer-B0 (slice) | 76.73 | 11.1946 | 268.929 | 13.2 | SegFormer-B0.yaml | Inference Model/Training Model |
| SegFormer-B1 (slice) | 78.35 | 17.9998 | 403.393 | 48.5 | SegFormer-B1.yaml | Inference Model/Training Model |
| SegFormer-B2 (slice) | 81.60 | 48.0371 | 1248.52 | 96.9 | SegFormer-B2.yaml | Inference Model/Training Model |
| SegFormer-B3 (slice) | 82.47 | 64.341 | 1666.35 | 167.3 | SegFormer-B3.yaml | Inference Model/Training Model |
| SegFormer-B4 (slice) | 82.38 | 82.4336 | 1995.42 | 226.7 | SegFormer-B4.yaml | Inference Model/Training Model |
| SegFormer-B5 (slice) | 82.58 | 97.3717 | 2420.19 | 229.7 | SegFormer-B5.yaml | Inference Model/Training Model |
| Model Name | mIoU (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (MB) | yaml File | Model Download Link |
|---|---|---|---|---|---|---|
| SeaFormer_base(slice) | 40.92 | 24.4073 | 397.574 | 30.8 | SeaFormer_base.yaml | Inference Model/Training Model |
| SeaFormer_large (slice) | 43.66 | 27.8123 | 550.464 | 49.8 | SeaFormer_large.yaml | Inference Model/Training Model |
| SeaFormer_small (slice) | 38.73 | 19.2295 | 358.343 | 14.3 | SeaFormer_small.yaml | Inference Model/Training Model |
| SeaFormer_tiny (slice) | 34.58 | 13.9496 | 330.132 | 6.1 | SeaFormer_tiny.yaml | Inference Model/Training Model |
| MaskFormer_small | 49.70 | 65.21 / 65.21 | - / 629.85 | 242.5 | MaskFormer_small.yaml | Inference Model/Training Model |
| MaskFormer_tiny | 46.69 | 47.95 / 47.95 | - / 492.67 | 160.5 | MaskFormer_tiny.yaml | Inference Model/Training Model |
| Model Name | Mask AP | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (MB) | yaml File | Model Download Link |
|---|---|---|---|---|---|---|
| Mask-RT-DETR-H | 50.6 | 180.83 / 180.83 | 1711.24 / 1711.24 | 449.9 | Mask-RT-DETR-H.yaml | Inference Model/Training Model |
| Mask-RT-DETR-L | 45.7 | 113.20 / 113.20 | 1179.56 / 1179.56 | 113.6 | Mask-RT-DETR-L.yaml | Inference Model/Training Model |
| Mask-RT-DETR-M | 42.7 | 87.08 / 87.08 | - / 2090.73 | 66.6 | Mask-RT-DETR-M.yaml | Inference Model/Training Model |
| Mask-RT-DETR-S | 41.0 | 120.86 / 120.86 | - / 2163.07 | 51.8 | Mask-RT-DETR-S.yaml | Inference Model/Training Model |
| Mask-RT-DETR-X | 47.5 | 141.43 / 141.43 | 1379.14 / 1379.14 | 237.5 | Mask-RT-DETR-X.yaml | Inference Model/Training Model |
| Cascade-MaskRCNN-ResNet50-FPN | 36.3 | 136.79 / 136.79 | - / 5935.41 | 254.8 | Cascade-MaskRCNN-ResNet50-FPN.yaml | Inference Model/Training Model |
| Cascade-MaskRCNN-ResNet50-vd-SSLDv2-FPN | 39.1 | 137.40 / 137.40 | - / 6816.68 | 254.7 | Cascade-MaskRCNN-ResNet50-vd-SSLDv2-FPN.yaml | Inference Model/Training Model |
| MaskRCNN-ResNet50-FPN | 35.6 | 112.79 / 112.79 | - / 4912.37 | 157.5 | MaskRCNN-ResNet50-FPN.yaml | Inference Model/Training Model |
| MaskRCNN-ResNet50-vd-FPN | 36.4 | 112.88 / 112.88 | - / 5204.97 | 157.5 | MaskRCNN-ResNet50-vd-FPN.yaml | Inference Model/Training Model |
| MaskRCNN-ResNet50 | 32.8 | 181.60 / 181.60 | - / 5523.45 | 127.8 | MaskRCNN-ResNet50.yaml | Inference Model/Training Model |
| MaskRCNN-ResNet101-FPN | 36.6 | 138.84 / 138.84 | - / 5107.74 | 225.4 | MaskRCNN-ResNet101-FPN.yaml | Inference Model/Training Model |
| MaskRCNN-ResNet101-vd-FPN | 38.1 | 141.73 / 141.73 | - / 5592.76 | 225.1 | MaskRCNN-ResNet101-vd-FPN.yaml | Inference Model/Training Model |
| MaskRCNN-ResNeXt101-vd-FPN | 39.5 | 220.83 / 220.83 | - / 5932.59 | 370.0 | MaskRCNN-ResNeXt101-vd-FPN.yaml | Inference Model/Training Model |
| PP-YOLOE_seg-S | 32.5 | 243.41 / 222.30 | 2507.70 / 1282.35 | 31.5 | PP-YOLOE_seg-S.yaml | Inference Model/Training Model |
| SOLOv2 | 35.5 | 131.99 / 131.99 | - / 2369.98 | 179.1 | SOLOv2.yaml | Inference Model/Training Model |
| Model | Detection Hmean (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (MB) | yaml File | Model Download Link |
|---|---|---|---|---|---|---|
| PP-OCRv5_server_det | 83.8 | 89.55 / 70.19 | 383.15 / 383.15 | 84.3 | PP-OCRv5_server_det.yaml | Inference Model/Training Model |
| PP-OCRv5_mobile_det | 79.0 | 10.67 / 6.36 | 57.77 / 28.15 | 4.7 | PP-OCRv5_mobile_det.yaml | Inference Model/Training Model |
| PP-OCRv4_server_det | 82.56 | 127.82 / 98.87 | 585.95 / 489.77 | 109 | PP-OCRv4_server_det.yaml | Inference Model/Training Model |
| PP-OCRv4_mobile_det | 63.8 | 9.87 / 4.17 | 56.60 / 20.79 | 4.7 | PP-OCRv4_mobile_det.yaml | Inference Model/Training Model |
| PP-OCRv3_mobile_det | Accuracy comparable to PP-OCRv4_mobile_det | 9.90 / 3.60 | 41.93 / 20.76 | 2.1 | PP-OCRv3_mobile_det.yaml | Inference Model/Training Model |
| PP-OCRv3_server_det | 80.11 | 119.50 / 75.00 | 379.35 / 318.35 | 102.1 | PP-OCRv3_server_det.yaml | Inference Model/Training Model |
| Model Name | Detection Hmean (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (MB) | yaml File | Model Download Link |
|---|---|---|---|---|---|---|
| PP-OCRv4_mobile_seal_det | 96.36 | 9.70 / 3.56 | 50.38 / 19.64 | 4.7 | PP-OCRv4_mobile_seal_det.yaml | Inference Model/Training Model |
| PP-OCRv4_server_seal_det | 98.40 | 124.64 / 91.57 | 545.68 / 439.86 | 109 | PP-OCRv4_server_seal_det.yaml | Inference Model/Training Model |
| Model | Recognition Avg Accuracy(%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (MB) | yaml File | Model Download Link |
|---|---|---|---|---|---|---|
| PP-OCRv5_server_rec | 86.38 | 8.46 / 2.36 | 31.21 / 31.21 | 81 | PP-OCRv5_server_rec.yaml | Inference Model/Training Model |
| PP-OCRv5_mobile_rec | 81.29 | 5.43 / 1.46 | 21.20 / 5.32 | 16 | PP-OCRv5_mobile_rec.yaml | Inference Model/Training Model |
| PP-OCRv4_server_rec_doc | 86.58 | 8.69 / 2.78 | 37.93 / 37.93 | 182 | PP-OCRv4_server_rec_doc.yaml | Inference Model/Training Model |
| PP-OCRv4_mobile_rec | 78.74 | 5.26 / 1.12 | 17.48 / 3.61 | 10.5 | PP-OCRv4_mobile_rec.yaml | Inference Model/Training Model |
| PP-OCRv4_server_rec | 85.19 | 8.75 / 2.49 | 36.93 / 36.93 | 173 | PP-OCRv4_server_rec.yaml | Inference Model/Training Model |
| PP-OCRv3_mobile_rec | 72.96 | 3.89 / 1.16 | 8.72 / 3.56 | 10.3 | PP-OCRv3_mobile_rec.yaml | Inference Model/Training Model |
Note: The evaluation set for the above accuracy metrics is a Chinese dataset built by PaddleOCR, covering multiple scenarios such as street view, web images, documents, and handwriting, with 8367 images for text recognition.
| Model | Recognition Avg Accuracy(%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (MB) | yaml File | Model Download Link |
|---|---|---|---|---|---|---|
| ch_SVTRv2_rec | 68.81 | 10.38 / 8.31 | 66.52 / 30.83 | 80.5 | ch_SVTRv2_rec.yaml | Inference Model/Training Model |
Note: The evaluation dataset for the above accuracy metrics is the PaddleOCR Algorithm Model Challenge - Task 1: OCR End-to-End Recognition Task Leaderboard A.
| Model | Recognition Avg Accuracy(%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (MB) | yaml File | Model Download Link |
|---|---|---|---|---|---|---|
| ch_RepSVTR_rec | 65.07 | 6.29 / 1.57 | 20.64 / 5.40 | 48.8 | ch_RepSVTR_rec.yaml | Inference Model/Training Model |
Note: The evaluation dataset for the above accuracy metrics is the PaddleOCR Algorithm Model Challenge - Task 1: OCR End-to-End Recognition Task Leaderboard B.
English Recognition Model| Model | Recognition Avg Accuracy(%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (MB) | yaml File | Model Download Link |
|---|---|---|---|---|---|---|
| en_PP-OCRv4_mobile_rec | 70.39 | 4.81 / 1.23 | 17.20 / 4.18 | 7.5 | en_PP-OCRv4_mobile_rec.yaml | Inference Model/Training Model |
| en_PP-OCRv3_mobile_rec | 70.69 | 3.56 / 0.78 | 8.44 / 5.78 | 17.3 | en_PP-OCRv3_mobile_rec.yaml | Inference Model/Training Model |
Note: The evaluation set for the above accuracy metrics is an English dataset built by PaddleX.
Multilingual Recognition Model
| Model | Recognition Avg Accuracy(%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (MB) | yaml File | Model Download Link |
|---|---|---|---|---|---|---|
| korean_PP-OCRv5_mobile_rec | 90.45 | 5.43 / 1.46 | 21.20 / 5.32 | 14 | korean_PP-OCRv5_mobile_rec.yaml | Inference Model/Pre-trained Model |
| latin_PP-OCRv5_mobile_rec | 84.7 | 5.43 / 1.46 | 21.20 / 5.32 | 14 | latin_PP-OCRv5_mobile_rec.yaml | Inference Model/Pre-trained Model |
| eslav_PP-OCRv5_mobile_rec | 85.8 | 5.43 / 1.46 | 21.20 / 5.32 | 14 | eslav_PP-OCRv5_mobile_rec.yaml | Inference Model/Pre-trained Model |
| korean_PP-OCRv3_mobile_rec | 60.21 | 3.73 / 0.98 | 8.76 / 2.91 | 9.6 | korean_PP-OCRv3_mobile_rec.yaml | Inference Model/Training Model |
| japan_PP-OCRv3_mobile_rec | 45.69 | 3.86 / 1.01 | 8.62 / 2.92 | 8.8 | japan_PP-OCRv3_mobile_rec.yaml | Inference Model/Training Model |
| chinese_cht_PP-OCRv3_mobile_rec | 82.06 | 3.90 / 1.16 | 9.24 / 3.18 | 9.7 | chinese_cht_PP-OCRv3_mobile_rec.yaml | Inference Model/Training Model |
| te_PP-OCRv3_mobile_rec | 95.88 | 3.59 / 0.81 | 8.28 / 6.21 | 7.8 | te_PP-OCRv3_mobile_rec.yaml | Inference Model/Training Model |
| ka_PP-OCRv3_mobile_rec | 96.96 | 3.49 / 0.89 | 8.63 / 2.77 | 17.4 | ka_PP-OCRv3_mobile_rec.yaml | Inference Model/Training Model |
| ta_PP-OCRv3_mobile_rec | 76.83 | 3.49 / 0.86 | 8.35 / 3.41 | 8.7 | ta_PP-OCRv3_mobile_rec.yaml | Inference Model/Training Model |
| latin_PP-OCRv3_mobile_rec | 76.93 | 3.53 / 0.78 | 8.50 / 6.83 | 8.7 | latin_PP-OCRv3_mobile_rec.yaml | Inference Model/Training Model |
| arabic_PP-OCRv3_mobile_rec | 73.55 | 3.60 / 0.83 | 8.44 / 4.69 | 17.3 | arabic_PP-OCRv3_mobile_rec.yaml | Inference Model/Training Model |
| cyrillic_PP-OCRv3_mobile_rec | 94.28 | 3.56 / 0.79 | 8.22 / 2.76 | 8.7 | cyrillic_PP-OCRv3_mobile_rec.yaml | Inference Model/Training Model |
| devanagari_PP-OCRv3_mobile_rec | 96.44 | 3.60 / 0.78 | 6.95 / 2.87 | 8.7 | devanagari_PP-OCRv3_mobile_rec.yaml | Inference Model/Training Model |
Note: The evaluation set for the above accuracy metrics is a multi-language dataset built by PaddleX.
| Model | En-BLEU(%) | Zh-BLEU(%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (MB) | yaml File | Model Download Link |
|---|---|---|---|---|---|---|---|
| UniMERNet | 85.91 | 43.50 | 1311.84 / 1311.84 | - / 8288.07 | 1530 | UniMERNet.yaml | Inference Model/Training Model |
| PP-FormulaNet-S | 87.00 | 45.71 | 182.25 / 182.25 | - / 254.39 | 224 | PP-FormulaNet-S.yaml | Inference Model/Training Model |
| PP-FormulaNet-L | 90.36 | 45.78 | 1482.03 / 1482.03 | - / 3131.54 | 695 | PP-FormulaNet-L.yaml | Inference Model/Training Model |
| PP-FormulaNet_plus-S | 88.71 | 53.32 | 179.20 / 179.20 | - / 260.99 | 248 | PP-FormulaNet_plus-S.yaml | Inference Model/Training Model |
| PP-FormulaNet_plus-M | 91.45 | 89.76 | 1040.27 / 1040.27 | - / 1615.80 | 592 | PP-FormulaNet_plus-M.yaml | Inference Model/Training Model |
| PP-FormulaNet_plus-L | 92.22 | 90.64 | 1476.07 / 1476.07 | - / 3125.58 | 698 | PP-FormulaNet_plus-L.yaml | Inference Model/Training Model |
| LaTeX_OCR_rec | 74.55 | 39.96 | 1088.89 / 1088.89 | - / - | 99 | LaTeX_OCR_rec.yaml | Inference Model/Training Model |
| Model | Accuracy (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (MB) | yaml File | Model Download Link |
|---|---|---|---|---|---|---|
| SLANet | 59.52 | 23.96 / 21.75 | - / 43.12 | 6.9 | SLANet.yaml | Inference Model/Training Model |
| SLANet_plus | 63.69 | 23.43 / 22.16 | - / 41.80 | 6.9 | SLANet_plus.yaml | Inference Model/Training Model |
| SLANeXt_wired | 69.65 | 85.92 / 85.92 | - / 501.66 | 351 | SLANeXt_wired.yaml | Inference Model/Training Model |
| SLANeXt_wireless | SLANeXt_wireless.yaml | Inference Model/Training Model |
| Model | mAP(%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (MB) | yaml File | Model Download Link |
|---|---|---|---|---|---|---|
| RT-DETR-L_wired_table_cell_det | 82.7 | 33.47 / 27.02 | 402.55 / 256.56 | 124M | RT-DETR-L_wired_table_cell_det.yaml | Inference Model/Training Model |
| RT-DETR-L_wireless_table_cell_det | RT-DETR-L_wireless_table_cell_det.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are measured from the internal table cell detection dataset of PaddleX.
## [Table Classification Module](../module_usage/tutorials/ocr_modules/table_classification.en.md)| Model | Top1 Acc(%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (MB) | yaml File | Model Download Link |
|---|---|---|---|---|---|---|
| PP-LCNet_x1_0_table_cls | 94.2 | 2.62 / 0.60 | 3.17 / 1.14 | 6.6 | PP-LCNet_x1_0_table_cls.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are measured from the internal table classification dataset built by PaddleX.
## [Text Image Unwarping Module](../module_usage/tutorials/ocr_modules/text_image_unwarping.en.md)| Model Name | CER | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (MB) | yaml File | Model Download Link |
|---|---|---|---|---|---|---|
| UVDoc | 0.179 | 19.05 / 19.05 | - / 869.82 | 30.3 | UVDoc.yaml | Inference Model/Training Model |
| Model Name | mAP(0.5)(%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (MB) | yaml File | Model Download Link |
|---|---|---|---|---|---|---|
| PP-DocLayout_plus-L | 83.2 | 53.03 / 17.23 | 634.62 / 378.32 | 126.01 | PP-DocLayout_plus-L.yaml | Inference Model/Training Model |
Note: The evaluation set for the accuracy metrics mentioned above is a custom-built layout detection dataset, which includes 1,300 document-type images such as Chinese and English papers, magazines, newspapers, research reports, PPTs, exam papers, and textbooks.
| Model Name | mAP(0.5)(%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (MB) | yaml File | Model Download Link |
|---|---|---|---|---|---|---|
| PP-DocBlockLayout | 95.9 | 34.60 / 28.54 | 506.43 / 256.83 | 123.92 | PP-DocBlockLayout.yaml | Inference Model/Training Model |
Note: The evaluation set for the accuracy metrics mentioned above is a custom-built layout block detection dataset, which includes 1,000 document-type images such as Chinese and English papers, magazines, newspapers, research reports, PPTs, exam papers, and textbooks.
| Model Name | mAP(0.5)(%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (MB) | yaml File | Model Download Link |
|---|---|---|---|---|---|---|
| PP-DocLayout-L | 90.4 | 33.59 / 33.59 | 503.01 / 251.08 | 123.76 | PP-DocLayout-L.yaml | Inference Model/Training Model |
| PP-DocLayout-M | 75.2 | 13.03 / 4.72 | 43.39 / 24.44 | 22.578 | PP-DocLayout-M.yaml | Inference Model/Training Model |
| PP-DocLayout-S | 70.9 | 11.54 / 3.86 | 18.53 / 6.29 | 4.834 | PP-DocLayout-S.yaml | Inference Model/Training Model |
Note: The evaluation set for the accuracy metrics mentioned above is a custom-built layout region detection dataset, which includes 500 common document-type images such as Chinese and English papers, magazines, and research reports.
Table Layout Detection Model
| Model | mAP(0.5) (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (MB) | yaml File | Model Download Link |
|---|---|---|---|---|---|---|
| PicoDet_layout_1x_table | 97.5 | 9.57 / 6.63 | 27.66 / 16.75 | 7.4 | PicoDet_layout_1x_table.yaml | Inference Model/Training Model |
| Model | mAP(0.5) (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (MB) | yaml File | Model Download Link |
|---|---|---|---|---|---|---|
| PicoDet-S_layout_3cls | 88.2 | 8.43 / 3.44 | 17.60 / 6.51 | 4.8 | PicoDet-S_layout_3cls.yaml | Inference Model/Training Model |
| PicoDet-L_layout_3cls | 89.0 | 12.80 / 9.57 | 45.04 / 23.86 | 22.6 | PicoDet-L_layout_3cls.yaml | Inference Model/Training Model |
| RT-DETR-H_layout_3cls | 95.8 | 114.80 / 25.65 | 924.38 / 924.38 | 470.1 | RT-DETR-H_layout_3cls.yaml | Inference Model/Training Model |
5-class English document layout detection model, including text, title, table, image, and list
| Model | mAP(0.5) (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (MB) | yaml File | Model Download Link |
|---|---|---|---|---|---|---|
| PicoDet_layout_1x | 97.8 | 9.62 / 6.75 | 26.96 / 12.77 | 7.4 | PicoDet_layout_1x.yaml | Inference Model/Training Model |
17-class layout detection model, including 17 common layout categories: paragraph title, image, text, number, abstract, content, figure title, formula, table, table title, reference, document title, footnote, header, algorithm, footer, and seal
| Model | mAP(0.5) (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (MB) | yaml File | Model Download Link |
|---|---|---|---|---|---|---|
| PicoDet-S_layout_17cls | 87.4 | 8.80 / 3.62 | 17.51 / 6.35 | 4.8 | PicoDet-S_layout_17cls.yaml | Inference Model/Training Model |
| PicoDet-L_layout_17cls | 89.0 | 12.60 / 10.27 | 43.70 / 24.42 | 22.6 | PicoDet-L_layout_17cls.yaml | Inference Model/Training Model |
| RT-DETR-H_layout_17cls | 98.3 | 115.29 / 101.18 | 964.75 / 964.75 | 470.2 | RT-DETR-H_layout_17cls.yaml | Inference Model/Training Model |
Note: The evaluation set for the above accuracy metrics is the layout area detection dataset built by PaddleOCR, which includes 892 images of common document types such as Chinese and English papers, magazines, and research reports.
| Model | Top-1 Acc (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (MB) | yaml File | Model Download Link |
|---|---|---|---|---|---|---|
| PP-LCNet_x1_0_doc_ori | 99.06 | 2.62 / 0.59 | 3.24 / 1.19 | 7 | PP-LCNet_x1_0_doc_ori.yaml | Inference Model/Training Model |
| Model | Top-1 Acc (%) | GPU Inference Time (ms) [Standard Mode / High-Performance Mode] |
CPU Inference Time (ms) [Standard Mode / High-Performance Mode] |
Model Storage Size (MB) | YAML File | Model Download Link |
|---|---|---|---|---|---|---|
| PP-LCNet_x0_25_textline_ori | 99.06 | 2.16 / 0.41 | 2.37 / 0.73 | 7 | PP-LCNet_x0_25_textline_ori.yaml | 推理模型/训练模型 |
| PP-LCNet_x1_0_textline_ori | 99.42 | - / - | 2.98 / 2.98 | 7 | PP-LCNet_x1_0_textline_ori.yaml | 推理模型/训练模型 |
Note: The evaluation dataset for the above accuracy metrics is a self-built dataset covering multiple scenarios such as certificates and documents, with 1,000 images.
| Model Name | mse | mae | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (MB) | yaml File | Model Download Link |
|---|---|---|---|---|---|---|---|
| DLinear | 0.382 | 0.394 | 0.34 / 0.12 | 0.64 / 0.06 | 0.072 | DLinear.yaml | Inference Model/Training Model |
| NLinear | 0.386 | 0.392 | 0.27 / 0.10 | 0.49 / 0.08 | 0.04 | NLinear.yaml | Inference Model/Training Model |
| Nonstationary | 0.600 | 0.515 | 3.92 / 2.59 | 18.09 / 13.36 | 55.5 | Nonstationary.yaml | Inference Model/Training Model |
| PatchTST | 0.379 | 0.391 | 1.81 / 0.45 | 5.79 / 0.77 | 2.0 | PatchTST.yaml | Inference Model/Training Model |
| RLinear | 0.385 | 0.392 | 0.39 / 0.18 | 0.82 / 0.08 | 0.04 | RLinear.yaml | Inference Model/Training Model |
| TiDE | 0.407 | 0.414 | - / - | 4.54 / 1.09 | 31.7 | TiDE.yaml | Inference Model/Training Model |
| TimesNet | 0.416 | 0.429 | 15.19 / 13.77 | 23.14 / 12.42 | 4.9 | TimesNet.yaml | Inference Model/Training Model |
| Model Name | Precision | Recall | F1 Score | Model Storage Size (MB) | YAML File | Model Download Link |
|---|---|---|---|---|---|---|
| AutoEncoder_ad | 99.36 | 0.24 / 0.13 | 0.41 / 0.05 | 0.052 | AutoEncoder_ad.yaml | Inference Model/Training Model |
| DLinear_ad | 98.98 | 0.39 / 0.16 | 0.69 / 0.08 | 0.112 | DLinear_ad.yaml | Inference Model/Training Model |
| Nonstationary_ad | 98.55 | 1.94 / 1.16 | 5.31 / 1.66 | 1.8 | Nonstationary_ad.yaml | Inference Model/Training Model |
| PatchTST_ad | 98.78 | 2.10 / 0.55 | 6.98 / 0.63 | 0.32 | PatchTST_ad.yaml | Inference Model/Training Model |
| Model Name | acc(%) | Model Storage Size (MB) | yaml File | Model Download Link |
|---|---|---|---|---|
| TimesNet_cls | 87.5 | 0.792 | TimesNet_cls.yaml | Inference Model/Training Model |
| Model | Training Data | Model Storage Size (MB) | Word Error Rate | YAML File | Model Download Link |
|---|---|---|---|---|---|
| whisper_large | 680kh | 5800 | 2.7 (Librispeech) | whisper_large.yaml | Inference Model |
| whisper_medium | 680kh | 2900 | - | whisper_medium.yaml | Inference Model |
| whisper_small | 680kh | 923 | - | whisper_small.yaml | Inference Model |
| whisper_base | 680kh | 277 | - | whisper_base.yaml | Inference Model |
| whisper_tiny | 680kh | 145 | - | whisper_tiny.yaml | Inference Model |
| Model | Top1 Acc(%) | Model Storage Size (MB) | yaml File | Model Download Link |
|---|---|---|---|---|
| PP-TSM-R50_8frames_uniform | 74.36 | 93.4 | PP-TSM-R50_8frames_uniform.yaml | Inference Model/Training Model |
| PP-TSMv2-LCNetV2_8frames_uniform | 71.71 | 22.5 | PP-TSMv2-LCNetV2_8frames_uniform.yaml | Inference Model/Training Model |
| PP-TSMv2-LCNetV2_16frames_uniform | 73.11 | 22.5 | PP-TSMv2-LCNetV2_16frames_uniform.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are based on the K400 validation set Top1 Acc.
## [Video Detection Module](../module_usage/tutorials/video_modules/video_detection.en.md)| Model | Frame-mAP(@ IoU 0.5) | Model Storage Size (MB) | yaml File | Model Download Link |
|---|---|---|---|---|
| YOWO | 80.94 | 462.891 | YOWO.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are based on the test dataset UCF101-24, using the Frame-mAP (@ IoU 0.5) metric.
## [Document Vision-Language Model Module](../module_usage/tutorials/vlm_modules/doc_vlm.en.md)| Model | Model Parameter Size(B) | Model Storage Size(GB) | yaml File | Model Download Lin |
|---|---|---|---|---|
| PP-DocBee-2B | 2 | 4.2 | PP-DocBee-2B.yaml | Inference Model |
| PP-DocBee-7B | 7 | 15.8 | PP-DocBee-7B.yaml | Inference Model |
| PP-DocBee2-3B | 3 | 7.6 | PP-DocBee2-3B.yaml | Inference Model |
| Model | Model Parameter Size(B) | Model Storage Size(GB) | yaml File | Model Download Lin |
|---|---|---|---|---|
| PP-Chart2Table | 0.58 | 1.4 | PP-Chart2Table.yaml | Inference Model |
Test Environment Description:
Performance Test Environment
Inference Mode Description
| Mode | GPU Configuration | CPU Configuration | Acceleration Technology Combination |
|---|---|---|---|
| Normal Mode | FP32 Precision / No TRT Acceleration | FP32 Precision / 8 Threads | PaddleInference |
| High-Performance Mode | Optimal combination of pre-selected precision types and acceleration strategies | FP32 Precision / 8 Threads | Pre-selected optimal backend (Paddle/OpenVINO/TRT, etc.) |