--- comments: true --- # PaddleX Model List (CPU/GPU) PaddleX includes multiple production lines, each containing several modules, and each module includes several models. You can choose which models to use based on the benchmark data below. If you prioritize model accuracy, choose models with higher accuracy. If you prioritize model inference speed, choose models with faster inference speed. If you prioritize model storage size, choose models with smaller storage size. ## [Image Classification Module](../module_usage/tutorials/cv_modules/image_classification.en.md)
Model Name Top1 Acc (%) GPU Inference Time (ms) CPU Inference Time (ms) Model Storage Size yaml File Model Download Link
CLIP_vit_base_patch16_224 85.36 13.1957 285.493 306.5 M CLIP_vit_base_patch16_224.yaml Inference Model/Training Model
CLIP_vit_large_patch14_224 88.1 51.1284 1131.28 1.04 G CLIP_vit_large_patch14_224.yaml Inference Model/Training Model
ConvNeXt_base_224 83.84 12.8473 1513.87 313.9 M ConvNeXt_base_224.yaml Inference Model/Training Model
ConvNeXt_base_384 84.90 31.7607 3967.05 313.9 M ConvNeXt_base_384.yaml Inference Model/Training Model
ConvNeXt_large_224 84.26 26.8103 2463.56 700.7 M ConvNeXt_large_224.yaml Inference Model/Training Model
ConvNeXt_large_384 85.27 66.4058 6598.92 700.7 M ConvNeXt_large_384.yaml Inference Model/Training Model
ConvNeXt_small 83.13 9.74075 1127.6 178.0 M ConvNeXt_small.yaml Inference Model/Training Model
ConvNeXt_tiny 82.03 5.48923 672.559 101.4 M ConvNeXt_tiny.yaml Inference Model/Training Model
FasterNet-L 83.5 23.4415 - 357.1 M FasterNet-L.yaml Inference Model/Training Model
FasterNet-M 83.0 21.8936 - 204.6 M FasterNet-M.yaml Inference Model/Training Model
FasterNet-S 81.3 13.0409 - 119.3 M FasterNet-S.yaml Inference Model/Training Model
FasterNet-T0 71.9 12.2432 - 15.1 M FasterNet-T0.yaml Inference Model/Training Model
FasterNet-T1 75.9 11.3562 - 29.2 M FasterNet-T1.yaml Inference Model/Training Model
FasterNet-T2 79.1 10.703 - 57.4 M FasterNet-T2.yaml Inference Model/Training Model
MobileNetV1_x0_5 63.5 1.86754 7.48297 4.8 M MobileNetV1_x0_5.yaml Inference Model/Training Model
MobileNetV1_x0_25 51.4 1.83478 4.83674 1.8 M MobileNetV1_x0_25.yaml Inference Model/Training Model
MobileNetV1_x0_75 68.8 2.57903 10.6343 9.3 M MobileNetV1_x0_75.yaml Inference Model/Training Model
MobileNetV1_x1_0 71.0 2.78781 13.98 15.2 M MobileNetV1_x1_0.yaml Inference Model/Training Model
MobileNetV2_x0_5 65.0 4.94234 11.1629 7.1 M MobileNetV2_x0_5.yaml Inference Model/Training Model
MobileNetV2_x0_25 53.2 4.50856 9.40991 5.5 M MobileNetV2_x0_25.yaml Inference Model/Training Model
MobileNetV2_x1_0 72.2 6.12159 16.0442 12.6 M MobileNetV2_x1_0.yaml Inference Model/Training Model
MobileNetV2_x1_5 74.1 6.28385 22.5129 25.0 M MobileNetV2_x1_5.yaml Inference Model/Training Model
MobileNetV2_x2_0 75.2 6.12888 30.8612 41.2 M MobileNetV2_x2_0.yaml Inference Model/Training Model
MobileNetV3_large_x0_5 69.2 6.31302 14.5588 9.6 M MobileNetV3_large_x0_5.yaml Inference Model/Training Model
MobileNetV3_large_x0_35 64.3 5.76207 13.9041 7.5 M MobileNetV3_large_x0_35.yaml Inference Model/Training Model
MobileNetV3_large_x0_75 73.1 8.41737 16.9506 14.0 M MobileNetV3_large_x0_75.yaml Inference Model/Training Model
MobileNetV3_large_x1_0 75.3 8.64112 19.1614 19.5 M MobileNetV3_large_x1_0.yaml Inference Model/Training Model
MobileNetV3_large_x1_25 76.4 8.73358 22.1296 26.5 M MobileNetV3_large_x1_25.yaml Inference Model/Training Model
MobileNetV3_small_x0_5 59.2 5.16721 11.2688 6.8 M MobileNetV3_small_x0_5.yaml Inference Model/Training Model
MobileNetV3_small_x0_35 53.0 5.22053 11.0055 6.0 M MobileNetV3_small_x0_35.yaml Inference Model/Training Model
MobileNetV3_small_x0_75 66.0 5.39831 12.8313 8.5 M MobileNetV3_small_x0_75.yaml Inference Model/Training Model
MobileNetV3_small_x1_0 68.2 6.00993 12.9598 10.5 M MobileNetV3_small_x1_0.yaml Inference Model/Training Model
MobileNetV3_small_x1_25 70.7 6.9589 14.3995 13.0 M MobileNetV3_small_x1_25.yaml Inference Model/Training Model
MobileNetV4_conv_large 83.4 12.5485 51.6453 125.2 M MobileNetV4_conv_large.yaml Inference Model/Training Model
MobileNetV4_conv_medium 79.9 9.65509 26.6157 37.6 M MobileNetV4_conv_medium.yaml Inference Model/Training Model
MobileNetV4_conv_small 74.6 5.24172 11.0893 14.7 M MobileNetV4_conv_small.yaml Inference Model/Training Model
MobileNetV4_hybrid_large 83.8 20.0726 213.769 145.1 M MobileNetV4_hybrid_large.yaml Inference Model/Training Model
MobileNetV4_hybrid_medium 80.5 19.7543 62.2624 42.9 M MobileNetV4_hybrid_medium.yaml Inference Model/Training Model
PP-HGNet_base 85.0 14.2969 327.114 249.4 M PP-HGNet_base.yaml Inference Model/Training Model
PP-HGNet_small 81.51 5.50661 119.041 86.5 M PP-HGNet_small.yaml Inference Model/Training Model
PP-HGNet_tiny 79.83 5.22006 69.396 52.4 M PP-HGNet_tiny.yaml Inference Model/Training Model
PP-HGNetV2-B0 77.77 6.53694 23.352 21.4 M PP-HGNetV2-B0.yaml Inference Model/Training Model
PP-HGNetV2-B1 79.18 6.56034 27.3099 22.6 M PP-HGNetV2-B1.yaml Inference Model/Training Model
PP-HGNetV2-B2 81.74 9.60494 43.1219 39.9 M PP-HGNetV2-B2.yaml Inference Model/Training Model
PP-HGNetV2-B3 82.98 11.0042 55.1367 57.9 M PP-HGNetV2-B3.yaml Inference Model/Training Model
PP-HGNetV2-B4 83.57 9.66407 54.2462 70.4 M PP-HGNetV2-B4.yaml Inference Model/Training Model
PP-HGNetV2-B5 84.75 15.7091 115.926 140.8 M PP-HGNetV2-B5.yaml Inference Model/Training Model
PP-HGNetV2-B6 86.30 21.226 255.279 268.4 M PP-HGNetV2-B6.yaml Inference Model/Training Model
PP-LCNet_x0_5 63.14 3.67722 6.66857 6.7 M PP-LCNet_x0_5.yaml Inference Model/Training Model
PP-LCNet_x0_25 51.86 2.65341 5.81357 5.5 M PP-LCNet_x0_25.yaml Inference Model/Training Model
PP-LCNet_x0_35 58.09 2.7212 6.28944 5.9 M PP-LCNet_x0_35.yaml Inference Model/Training Model
PP-LCNet_x0_75 68.18 3.91032 8.06953 8.4 M PP-LCNet_x0_75.yaml Inference Model/Training Model
PP-LCNet_x1_0 71.32 3.84845 9.23735 10.5 M PP-LCNet_x1_0.yaml Inference Model/Training Model
PP-LCNet_x1_5 73.71 3.97666 12.3457 16.0 M PP-LCNet_x1_5.yaml Inference Model/Training Model
PP-LCNet_x2_0 75.18 4.07556 16.2752 23.2 M PP-LCNet_x2_0.yaml Inference Model/Training Model
PP-LCNet_x2_5 76.60 4.06028 21.5063 32.1 M PP-LCNet_x2_5.yaml Inference Model/Training Model
PP-LCNetV2_base 77.05 5.23428 19.6005 23.7 M PP-LCNetV2_base.yaml Inference Model/Training Model
PP-LCNetV2_large 78.51 6.78335 30.4378 37.3 M PP-LCNetV2_large.yaml Inference Model/Training Model
PP-LCNetV2_small 73.97 3.89762 13.0273 14.6 M PP-LCNetV2_small.yaml Inference Model/Training Model
ResNet18_vd 72.3 3.53048 31.3014 41.5 M ResNet18_vd.yaml Inference Model/Training Model
ResNet18 71.0 2.4868 27.4601 41.5 M ResNet18.yaml Inference Model/Training Model
ResNet34_vd 76.0 5.60675 56.0653 77.3 M ResNet34_vd.yaml Inference Model/Training Model
ResNet34 74.6 4.16902 51.925 77.3 M ResNet34.yaml Inference Model/Training Model
ResNet50_vd 79.1 10.1885 68.446 90.8 M ResNet50_vd.yaml Inference Model/Training Model
ResNet50 76.5 9.62383 64.8135 90.8 M ResNet50.yaml Inference Model/Training Model
ResNet101_vd 80.2 20.0563 124.85 158.4 M ResNet101_vd.yaml Inference Model/Training Model
ResNet101 77.6 19.2297 121.006 158.7 M ResNet101.yaml Inference Model/Training Model
ResNet152_vd 80.6 29.6439 181.678 214.3 M ResNet152_vd.yaml Inference Model/Training Model
ResNet152 78.3 30.0461 177.707 214.2 M ResNet152.yaml Inference Model/Training Model
ResNet200_vd 80.9 39.1628 235.185 266.0 M ResNet200_vd.yaml Inference Model/Training Model
StarNet-S1 73.6 9.895 23.0465 11.2 M StarNet-S1.yaml Inference Model/Training Model
StarNet-S2 74.8 7.91279 21.9571 14.3 M StarNet-S2.yaml Inference Model/Training Model
StarNet-S3 77.0 10.7531 30.7656 22.2 M StarNet-S3.yaml Inference Model/Training Model
StarNet-S4 79.0 15.2868 43.2497 28.9 M StarNet-S4.yaml Inference Model/Training Model
SwinTransformer_base_patch4_window7_224 83.37 16.9848 383.83 310.5 M SwinTransformer_base_patch4_window7_224.yaml Inference Model/Training Model
SwinTransformer_base_patch4_window12_384 84.17 37.2855 1178.63 311.4 M SwinTransformer_base_patch4_window12_384.yaml Inference Model/Training Model
SwinTransformer_large_patch4_window7_224 86.19 27.5498 689.729 694.8 M SwinTransformer_large_patch4_window7_224.yaml Inference Model/Training Model
SwinTransformer_large_patch4_window12_384 87.06 74.1768 2105.22 696.1 M SwinTransformer_large_patch4_window12_384.yaml Inference Model/Training Model
SwinTransformer_small_patch4_window7_224 83.21 16.3982 285.56 175.6 M SwinTransformer_small_patch4_window7_224.yaml Inference Model/Training Model
SwinTransformer_tiny_patch4_window7_224 81.10 8.54846 156.306 100.1 M SwinTransformer_tiny_patch4_window7_224.yaml Inference Model/Training Model
Note: The above accuracy metrics are based on the [ImageNet-1k](https://www.image-net.org/index.php) validation set Top1 Acc. ## [Image Multi-label Classification Module](../module_usage/tutorials/cv_modules/image_multilabel_classification.en.md)
Model Name mAP (%) GPU Inference Time (ms) CPU Inference Time (ms) Model Storage Size yaml File Model Download Link
CLIP_vit_base_patch16_448_ML 89.15 - - 325.6 M CLIP_vit_base_patch16_448_ML.yaml Inference Model/Training Model
PP-HGNetV2-B0_ML 80.98 - - 39.6 M PP-HGNetV2-B0_ML.yaml Inference Model/Training Model
PP-HGNetV2-B4_ML 87.96 - - 88.5 M PP-HGNetV2-B4_ML.yaml Inference Model/Training Model
PP-HGNetV2-B6_ML 91.06 - - 286.5 M PP-HGNetV2-B6_ML.yaml Inference Model/Training Model
PP-LCNet_x1_0_ML 77.96 - - 29.4 M PP-LCNet_x1_0_ML.yaml Inference Model/Training Model
ResNet50_ML 83.42 - - 108.9 M ResNet50_ML.yaml Inference Model/Training Model
Note: The above accuracy metrics are for the multi-label classification task mAP on [COCO2017](https://cocodataset.org/#home). ## [Pedestrian Attribute Module](../module_usage/tutorials/cv_modules/pedestrian_attribute_recognition.en.md)
Model Name mA (%) GPU Inference Time (ms) CPU Inference Time (ms) Model Size yaml File Model Download Link
PP-LCNet_x1_0_pedestrian_attribute 92.2 3.84845 9.23735 6.7 M PP-LCNet_x1_0_pedestrian_attribute.yaml Inference Model/Training Model
Note: The above accuracy metrics are for the internal PaddleX dataset mA. ## [Vehicle Attribute Module](../module_usage/tutorials/cv_modules/vehicle_attribute_recognition.en.md)
Model Name mA (%) GPU Inference Time (ms) CPU Inference Time (ms) Model Storage Size yaml File Model Download Link
PP-LCNet_x1_0_vehicle_attribute 91.7 3.84845 9.23735 6.7 M PP-LCNet_x1_0_vehicle_attribute.yaml Inference Model/Training Model
Note: The above accuracy metrics are based on the VeRi dataset mA. ## [Image Feature Module](../module_usage/tutorials/cv_modules/image_feature.en.md)
Model Name recall@1 (%) GPU Inference Time (ms) CPU Inference Time (ms) Model Storage Size yaml File Model Download Link
PP-ShiTuV2_rec 84.2 5.23428 19.6005 16.3 M PP-ShiTuV2_rec.yaml Inference Model/Training Model
PP-ShiTuV2_rec_CLIP_vit_base 88.69 13.1957 285.493 306.6 M PP-ShiTuV2_rec_CLIP_vit_base.yaml Inference Model/Training Model
PP-ShiTuV2_rec_CLIP_vit_large 91.03 51.1284 1131.28 1.05 G PP-ShiTuV2_rec_CLIP_vit_large.yaml Inference Model/Training Model
Note: The above accuracy metrics are based on the AliProducts recall@1. ## [Document Orientation Classification Module](../module_usage/tutorials/ocr_modules/doc_img_orientation_classification.en.md)
Model Name Top-1 Acc (%) GPU Inference Time (ms) CPU Inference Time (ms) Model Storage Size yaml File Model Download Link
PP-LCNet_x1_0_doc_ori 99.06 3.84845 9.23735 7 PP-LCNet_x1_0_doc_ori.yaml Inference Model/Training Model
Note: The above accuracy metrics are based on the Top-1 Acc of the internal dataset of PaddleX. ## [Face Feature Module](../module_usage/tutorials/cv_modules/face_feature.en.md)
Model Name Output Feature Dimension Acc (%)
AgeDB-30/CFP-FP/LFW
GPU Inference Time (ms) CPU Inference Time Model Storage Size (M) yaml File Model Download Link
MobileFaceNet 128 96.28/96.71/99.58 5.7 101.6 4.1 MobileFaceNet.yaml Inference Model/Training Model
ResNet50_face 512 98.12/98.56/99.77 8.7 200.7 87.2 ResNet50_face.yaml Inference Model/Training Model
Note: The above accuracy metrics are measured on the AgeDB-30, CFP-FP, and LFW datasets. ## [Main Body Detection Module](../module_usage/tutorials/cv_modules/mainbody_detection.en.md)
Model Name mAP (%) GPU Inference Time (ms) CPU Inference Time (ms) Model Storage Size yaml File Model Download Link
PP-ShiTuV2_det 41.5 33.7 537.0 27.54 PP-ShiTuV2_det.yaml Inference Model/Training Model
Note: The above accuracy metrics are based on the [PaddleClas Main Body Detection Dataset](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.5/docs/zh_CN/training/PP-ShiTu/mainbody_detection.md) mAP(0.5:0.95). ## [Object Detection Module](../module_usage/tutorials/cv_modules/object_detection.en.md)
Model Name mAP (%) GPU Inference Time (ms) CPU Inference Time (ms) Model Storage Size yaml File Model Download Link
Cascade-FasterRCNN-ResNet50-FPN 41.1 - - 245.4 M Cascade-FasterRCNN-ResNet50-FPN.yaml Inference Model/Training Model
Cascade-FasterRCNN-ResNet50-vd-SSLDv2-FPN 45.0 - - 246.2 M Cascade-FasterRCNN-ResNet50-vd-SSLDv2-FPN.yaml Inference Model/Training Model
CenterNet-DLA-34 37.6 - - 75.4 M CenterNet-DLA-34.yaml Inference Model/Training Model
CenterNet-ResNet50 38.9 - - 319.7 M CenterNet-ResNet50.yaml Inference Model/Training Model
DETR-R50 42.3 59.2132 5334.52 159.3 M DETR-R50.yaml Inference Model/Training Model
FasterRCNN-ResNet34-FPN 37.8 - - 137.5 M FasterRCNN-ResNet34-FPN.yaml Inference Model/Training Model
FasterRCNN-ResNet50-FPN 38.4 - - 148.1 M FasterRCNN-ResNet50-FPN.yaml Inference Model/Training Model
FasterRCNN-ResNet50-vd-FPN 39.5 - - 148.1 M FasterRCNN-ResNet50-vd-FPN.yaml Inference Model/Training Model
FasterRCNN-ResNet50-vd-SSLDv2-FPN 41.4 - - 148.1 M FasterRCNN-ResNet50-vd-SSLDv2-FPN.yaml Inference Model/Training Model
FasterRCNN-ResNet50 36.7 - - 120.2 M FasterRCNN-ResNet50.yaml Inference Model/Training Model
FasterRCNN-ResNet101-FPN 41.4 - - 216.3 M FasterRCNN-ResNet101-FPN.yaml Inference Model/Training Model
FasterRCNN-ResNet101 39.0 - - 188.1 M FasterRCNN-ResNet101.yaml Inference Model/Training Model
FasterRCNN-ResNeXt101-vd-FPN 43.4 - - 360.6 M FasterRCNN-ResNeXt101-vd-FPN.yaml Inference Model/Training Model
FasterRCNN-Swin-Tiny-FPN 42.6 - - 159.8 M FasterRCNN-Swin-Tiny-FPN.yaml Inference Model/Training Model
FCOS-ResNet50 39.6 103.367 3424.91 124.2 M FCOS-ResNet50.yaml Inference Model/Training Model
PicoDet-L 42.6 16.6715 169.904 20.9 M PicoDet-L.yaml Inference Model/Training Model
PicoDet-M 37.5 16.2311 71.7257 16.8 M PicoDet-M.yaml Inference Model/Training Model
PicoDet-S 29.1 14.097 37.6563 4.4 M PicoDet-S.yaml Inference Model/Training Model
PicoDet-XS 26.2 13.8102 48.3139 5.7 M PicoDet-XS.yaml Inference Model/Training Model
PP-YOLOE_plus-L 52.9 33.5644 814.825 185.3 M PP-YOLOE_plus-L.yaml Inference Model/Training Model
PP-YOLOE_plus-M 49.8 19.843 449.261 83.2 M PP-YOLOE_plus-M.yaml Inference Model/Training Model
PP-YOLOE_plus-S 43.7 16.8884 223.059 28.3 M PP-YOLOE_plus-S.yaml Inference Model/Training Model
PP-YOLOE_plus-X 54.7 57.8995 1439.93 349.4 M PP-YOLOE_plus-X.yaml Inference Model/Training Model
RT-DETR-H 56.3 114.814 3933.39 435.8 M RT-DETR-H.yaml Inference Model/Training Model
RT-DETR-L 53.0 34.5252 1454.27 113.7 M RT-DETR-L.yaml Inference Model/Training Model
RT-DETR-R18 46.5 19.89 784.824 70.7 M RT-DETR-R18.yaml Inference Model/Training Model
RT-DETR-R50 53.1 41.9327 1625.95 149.1 M RT-DETR-R50.yaml Inference Model/Training Model
RT-DETR-X 54.8 61.8042 2246.64 232.9 M RT-DETR-X.yaml Inference Model/Training Model
YOLOv3-DarkNet53 39.1 40.1055 883.041 219.7 M YOLOv3-DarkNet53.yaml Inference Model/Training Model
YOLOv3-MobileNetV3 31.4 18.6692 267.214 83.8 M YOLOv3-MobileNetV3.yaml Inference Model/Training Model
YOLOv3-ResNet50_vd_DCN 40.6 31.6276 856.047 163.0 M YOLOv3-ResNet50_vd_DCN.yaml Inference Model/Training Model
YOLOX-L 50.1 185.691 1250.58 192.5 M YOLOX-L.yaml Inference Model/Training Model
YOLOX-M 46.9 123.324 688.071 90.0 M YOLOX-M.yaml Inference Model/Training Model
YOLOX-N 26.1 79.1665 155.59 3.4M YOLOX-N.yaml Inference Model/Training Model
YOLOX-S 40.4 184.828 474.446 32.0 M YOLOX-S.yaml Inference Model/Training Model
YOLOX-T 32.9 102.748 212.52 18.1 M YOLOX-T.yaml Inference Model/Training Model
YOLOX-X 51.8 227.361 2067.84 351.5 M YOLOX-X.yaml Inference Model/Training Model
Note: The above accuracy metrics are based on the COCO2017 validation set mAP(0.5:0.95). ## [Small Object Detection Module](../module_usage/tutorials/cv_modules/small_object_detection.en.md)
Model Name mAP (%) GPU Inference Time (ms) CPU Inference Time (ms) Model Size yaml File Model Download Links
PP-YOLOE_plus_SOD-S 25.1 65.4608 324.37 77.3 M PP-YOLOE_plus_SOD-S.yaml Inference Model/Training Model
PP-YOLOE_plus_SOD-L 31.9 57.1448 1006.98 325.0 M PP-YOLOE_plus_SOD-L.yaml Inference Model/Training Model PP-YOLOE_plus_SOD-largesize-L 42.7 458.521 11172.7 340.5 M PP-YOLOE_plus_SOD-largesize-L.yaml Inference Model/Training Model Note: The above accuracy metrics are based on the validation set mAP(0.5:0.95) of [VisDrone-DET](https://github.com/VisDrone/VisDrone-Dataset). ## [Open-Vocabulary Object Detection](../module_usage/tutorials/cv_modules/open_vocabulary_detection.en.md)
Model mAP(0.5:0.95) mAP(0.5) GPU Inference Time (ms) CPU Inference Time (ms) Model Size (M) Model Download Link
GroundingDINO-T 49.4 64.4 253.72 1807.4 658.3 Inference Model
Note: The above accuracy metrics are based on the COCO val2017 validation set mAP(0.5:0.95). All models' GPU inference times are based on an NVIDIA Tesla T4 machine with FP32 precision. CPU inference speeds are based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision. ## [Open Vocabulary Segmentation](../module_usage/tutorials/cv_modules/open_vocabulary_segmentation.en.md)
Model GPU Inference Time (ms) CPU Inference Time (ms) Model Storage Size (M) Model Download Link
SAM-H_box 144.9 33920.7 2433.7 Inference Model
SAM-H_point 144.9 33920.7 2433.7 Inference Model
Note: All model GPU inference times are based on NVIDIA Tesla T4, with precision type FP32. CPU inference speed is based on Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz, with 8 threads, and precision type FP32. ## [Rotated Object Detection](../module_usage/tutorials/cv_modules/rotated_object_detection.en.md)
Model mAP(%) GPU Inference Time (ms) CPU Inference Time (ms) Model Storage Size (M) yaml File Model Download Link
PP-YOLOE-R-L 78.14 20.7039 157.942 211.0 M PP-YOLOE-R.yaml Inference Model/Training Model

Note: The above accuracy metrics are based on the DOTA validation set mAP(0.5:0.95). All model GPU inference times are based on NVIDIA TRX2080 Ti, with precision type F16. CPU inference speed is based on Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz, with 8 threads, and precision type FP32.

## [Pedestrian Detection Module](../module_usage/tutorials/cv_modules/human_detection.en.md)
Model Name mAP (%) GPU Inference Time (ms) CPU Inference Time (ms) Model Storage Size yaml File Model Download Link
PP-YOLOE-L_human 48.0 32.7754 777.691 196.1 M PP-YOLOE-L_human.yaml Inference Model/Training Model
PP-YOLOE-S_human 42.5 15.0118 179.317 28.8 M PP-YOLOE-S_human.yaml Inference Model/Training Model
Note: The above accuracy metrics are based on the validation set mAP(0.5:0.95) of [CrowdHuman](https://bj.bcebos.com/v1/paddledet/data/crowdhuman.zip). ## [Vehicle Detection Module](../module_usage/tutorials/cv_modules/vehicle_detection.en.md)
Model Name mAP (%) GPU Inference Time (ms) CPU Inference Time (ms) Model Storage Size yaml File Model Download Link
PP-YOLOE-L_vehicle 63.9 32.5619 775.633 196.1 M PP-YOLOE-L_vehicle.yaml Inference Model/Training Model
PP-YOLOE-S_vehicle 61.3 15.3787 178.441 28.8 M PP-YOLOE-S_vehicle.yaml Inference Model/Training Model
Note: The above precision metrics are based on the validation set mAP(0.5:0.95) of [PPVehicle](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/modules/ppvehicle) ## [Face Detection Module](../module_usage/tutorials/cv_modules/face_detection.en.md)
Model Name AP (%)
Easy/Medium/Hard
GPU Inference Time (ms) CPU Inference Time (ms) Model Storage Size yaml File Model Download Link
BlazeFace 77.7/73.4/49.5 49.9 68.2 0.447 M BlazeFace.yaml Inference Model/Training Model
BlazeFace-FPN-SSH 83.2/80.5/60.5 52.4 73.2 0.606 M BlazeFace-FPN-SSH.yaml Inference Model/Training Model
PicoDet_LCNet_x2_5_face 93.7/90.7/68.1 33.7 185.1 28.9 M PicoDet_LCNet_x2_5_face.yaml Inference Model/Training Model
PP-YOLOE_plus-S_face 93.9/91.8/79.8 25.8 159.9 26.5 M PP-YOLOE_plus-S_face Inference Model/Training Model
**Note: The above precision metrics are evaluated on the WIDER-FACE validation set with an input size of 640x640.** ## [Anomaly Detection Module](../module_usage/tutorials/cv_modules/anomaly_detection.en.md)
Model Name mIoU GPU Inference Time (ms) CPU Inference Time (ms) Model Storage Size yaml File Model Download Link
STFPM 0.9901 - - 22.5 M STFPM.yaml Inference Model/Training Model
Note: The above precision metrics are the average anomaly scores on the validation set of [MVTec AD](https://www.mvtec.com/company/research/datasets/mvtec-ad). ## [Human Keypoint Detection Module](../module_usage/tutorials//cv_modules/human_keypoint_detection.en.md) ## [Human Keypoint Detection Module](../module_usage/tutorials//cv_modules/human_keypoint_detection.md)
Model Scheme Input Size AP(0.5:0.95) GPU Inference Time (ms) CPU Inference Time (ms) Model Storage Size (M) yaml File Model Download Link
PP-TinyPose_128x96 Top-Down 128*96 58.4 4.9 PP-TinyPose_128x96.yaml Inference Model/Training Model
PP-TinyPose_256x192 Top-Down 256*192 68.3 4.9 PP-TinyPose_256x192.yaml Inference Model/Training Model
**Note: The above accuracy metrics are based on the COCO dataset AP(0.5:0.95), with detection boxes obtained from ground truth annotations. All GPU inference times are based on an NVIDIA Tesla T4 machine, with precision type FP32. CPU inference speeds are based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz, with 8 threads and precision type FP32.** ## [3D Multi-modal Fusion Detection Module](../module_usage/tutorials//cv_modules/3d_bev_detection.en.md)
Model mAP(%) NDS yaml File Model Download Link
BEVFusion 53.9 60.9 BEVFusion.yaml Inference Model/Training Model

Note: The above accuracy metrics are based on the nuscenes validation set with mAP(0.5:0.95) and NDS 60.9, and the precision type is FP32.

## [Semantic Segmentation Module](../module_usage/tutorials/cv_modules/semantic_segmentation.en.md)
Model Name mloU(%) GPU Inference Time (ms) CPU Inference Time (ms) Model Storage Size yaml File Model Download Link
Deeplabv3_Plus-R50 80.36 61.0531 1513.58 94.9 M Deeplabv3_Plus-R50.yaml Inference Model/Training Model
Deeplabv3_Plus-R101 81.10 100.026 2460.71 162.5 M Deeplabv3_Plus-R101.yaml Inference Model/Training Model
Deeplabv3-R50 79.90 82.2631 1735.83 138.3 M Deeplabv3-R50.yaml Inference Model/Training Model
Deeplabv3-R101 80.85 121.492 2685.51 205.9 M Deeplabv3-R101.yaml Inference Model/Training Model
OCRNet_HRNet-W18 80.67 48.2335 906.385 43.1 M OCRNet_HRNet-W18.yaml Inference Model/Training Model
OCRNet_HRNet-W48 82.15 78.9976 2226.95 249.8 M OCRNet_HRNet-W48.yaml Inference Model/Training Model
PP-LiteSeg-T 73.10 7.6827 138.683 28.5 M PP-LiteSeg-T.yaml Inference Model/Training Model
PP-LiteSeg-B 75.25 10.9935 194.727 47.0 M PP-LiteSeg-B.yaml Inference Model/Training Model
SegFormer-B0 (slice) 76.73 11.1946 268.929 13.2 M SegFormer-B0.yaml Inference Model/Training Model
SegFormer-B1 (slice) 78.35 17.9998 403.393 48.5 M SegFormer-B1.yaml Inference Model/Training Model
SegFormer-B2 (slice) 81.60 48.0371 1248.52 96.9 M SegFormer-B2.yaml Inference Model/Training Model
SegFormer-B3 (slice) 82.47 64.341 1666.35 167.3 M SegFormer-B3.yaml Inference Model/Training Model
SegFormer-B4 (slice) 82.38 82.4336 1995.42 226.7 M SegFormer-B4.yaml Inference Model/Training Model
SegFormer-B5 (slice) 82.58 97.3717 2420.19 229.7 M SegFormer-B5.yaml Inference Model/Training Model
Note: The above accuracy metrics are based on the [Cityscapes](https://www.cityscapes-dataset.com/) dataset mIoU.
Model Name mIoU (%) GPU Inference Time (ms) CPU Inference Time (ms) Model Storage Size yaml File Model Download Link
SeaFormer_base(slice) 40.92 24.4073 397.574 30.8 M SeaFormer_base.yaml Inference Model/Training Model
SeaFormer_large (slice) 43.66 27.8123 550.464 49.8 M SeaFormer_large.yaml Inference Model/Training Model
SeaFormer_small (slice) 38.73 19.2295 358.343 14.3 M SeaFormer_small.yaml Inference Model/Training Model
SeaFormer_tiny (slice) 34.58 13.9496 330.132 6.1M SeaFormer_tiny.yaml Inference Model/Training Model
Note: The above accuracy metrics are based on the [ADE20k](https://groups.csail.mit.edu/vision/datasets/ADE20K/) dataset. "Slice" indicates that the input images have been cropped. ## [Instance Segmentation Module](../module_usage/tutorials/cv_modules/instance_segmentation.en.md)
Model Name Mask AP GPU Inference Time (ms) CPU Inference Time (ms) Model Storage Size yaml File Model Download Link
Mask-RT-DETR-H 50.6 132.693 4896.17 449.9 M Mask-RT-DETR-H.yaml Inference Model/Training Model
Mask-RT-DETR-L 45.7 46.5059 2575.92 113.6 M Mask-RT-DETR-L.yaml Inference Model/Training Model
Mask-RT-DETR-M 42.7 36.8329 - 66.6 M Mask-RT-DETR-M.yaml Inference Model/Training Model
Mask-RT-DETR-S 41.0 33.5007 - 51.8 M Mask-RT-DETR-S.yaml Inference Model/Training Model
Mask-RT-DETR-X 47.5 75.755 3358.04 237.5 M Mask-RT-DETR-X.yaml Inference Model/Training Model
Cascade-MaskRCNN-ResNet50-FPN 36.3 - - 254.8 M Cascade-MaskRCNN-ResNet50-FPN.yaml Inference Model/Training Model
Cascade-MaskRCNN-ResNet50-vd-SSLDv2-FPN 39.1 - - 254.7 M Cascade-MaskRCNN-ResNet50-vd-SSLDv2-FPN.yaml Inference Model/Training Model
MaskRCNN-ResNet50-FPN 35.6 - - 157.5 M MaskRCNN-ResNet50-FPN.yaml Inference Model/Training Model
MaskRCNN-ResNet50-vd-FPN 36.4 - - 157.5 M MaskRCNN-ResNet50-vd-FPN.yaml Inference Model/Training Model
MaskRCNN-ResNet50 32.8 - - 127.8 M MaskRCNN-ResNet50.yaml Inference Model/Training Model
MaskRCNN-ResNet101-FPN 36.6 - - 225.4 M MaskRCNN-ResNet101-FPN.yaml Inference Model/Training Model
MaskRCNN-ResNet101-vd-FPN 38.1 - - 225.1 M MaskRCNN-ResNet101-vd-FPN.yaml Inference Model/Training Model
MaskRCNN-ResNeXt101-vd-FPN 39.5 - - 370.0 M MaskRCNN-ResNeXt101-vd-FPN.yaml Inference Model/Training Model
PP-YOLOE_seg-S 32.5 - - 31.5 M PP-YOLOE_seg-S.yaml Inference Model/Training Model
SOLOv2 35.5 - - 179.1 M SOLOv2.yaml Inference Model/Training Model
Note: The above accuracy metrics are based on the Mask AP(0.5:0.95) on the [COCO2017](https://cocodataset.org/#home) validation set. ## [Text Detection Module](../module_usage/tutorials/ocr_modules/text_detection.en.md)
Model Detection Hmean (%) GPU Inference Time (ms) CPU Inference Time (ms) Model Storage Size (M) yaml File Model Download Link
PP-OCRv4_server_det 82.56 83.3501 2434.01 109 PP-OCRv4_server_det.yaml Inference Model/Training Model
PP-OCRv4_mobile_det 77.35 10.6923 120.177 4.7 PP-OCRv4_mobile_det.yaml Inference Model/Training Model
PP-OCRv3_mobile_det 78.68 2.1 PP-OCRv3_mobile_det.yaml Inference Model/Training Model
PP-OCRv3_server_det 80.11 102.1 PP-OCRv3_server_det.yaml Inference Model/Training Model
Note: The evaluation dataset for the above accuracy metrics is the self-built Chinese and English dataset of PaddleOCR, covering multiple scenarios such as street view, web images, documents, and handwriting, with 593 images for text recognition. The GPU inference time for all models is based on an NVIDIA Tesla T4 machine with FP32 precision type, while the CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision type. ## [Seal Text Detection Module](../module_usage/tutorials/ocr_modules/seal_text_detection.en.md)
Model Name Detection Hmean (%) GPU Inference Time (ms) CPU Inference Time (ms) Model Storage Size yaml File Model Download Link
PP-OCRv4_mobile_seal_det 96.47 10.5878 131.813 4.7M PP-OCRv4_mobile_seal_det.yaml Inference Model/Training Model
PP-OCRv4_server_seal_det 98.21 84.341 2425.06 108.3 M PP-OCRv4_server_seal_det.yaml Inference Model/Training Model
Note: The evaluation set for the above precision metrics is the seal dataset built by PaddleX, which includes 500 seal images. ## [Text Recognition Module](../module_usage/tutorials/ocr_modules/text_recognition.en.md) * Chinese Text Recognition Models
Model Recognition Avg Accuracy(%) GPU Inference Time (ms) CPU Inference Time (ms) Model Storage Size (M) yaml File Model Download Link
PP-OCRv4_server_rec_doc 81.53 74.7 M PP-OCRv4_server_rec_doc.yaml Inference Model/Training Model
PP-OCRv4_mobile_rec 78.74 7.95018 46.7868 10.6 M PP-OCRv4_mobile_rec.yaml Inference Model/Training Model
PP-OCRv4_server_rec 80.61 7.19439 140.179 71.2 M PP-OCRv4_server_rec.yaml Inference Model/Training Model
PP-OCRv3_mobile_rec 72.96 9.2 M PP-OCRv3_mobile_rec.yaml Inference Model/Training Model

Note: The evaluation set for the above accuracy metrics is a Chinese dataset built by PaddleOCR, covering multiple scenarios such as street view, web images, documents, and handwriting, with 8367 images for text recognition. All models' GPU inference times are based on an NVIDIA Tesla T4 machine with FP32 precision, while CPU inference speeds are based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.

Model Recognition Avg Accuracy(%) GPU Inference Time (ms) CPU Inference Time Model Storage Size (M) yaml File Model Download Link
ch_SVTRv2_rec 68.81 8.36801 165.706 73.9 M ch_SVTRv2_rec.yaml Inference Model/Training Model

Note: The evaluation dataset for the above accuracy metrics is the PaddleOCR Algorithm Model Challenge - Task 1: OCR End-to-End Recognition Task Leaderboard A. All model GPU inference times are based on an NVIDIA Tesla T4 machine with FP32 precision. CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.

Model Recognition Avg Accuracy(%) GPU Inference Time (ms) CPU Inference Time Model Storage Size (M) yaml File Model Download Link
ch_RepSVTR_rec 65.07 10.5047 51.5647 22.1 M ch_RepSVTR_rec.yaml Inference Model/Training Model

Note: The evaluation dataset for the above accuracy metrics is the PaddleOCR Algorithm Model Challenge - Task 1: OCR End-to-End Recognition Task Leaderboard B. All model GPU inference times are based on an NVIDIA Tesla T4 machine with FP32 precision. CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.

English Recognition Model
Model Recognition Avg Accuracy(%) GPU Inference Time (ms) CPU Inference Time Model Storage Size (M) yaml File Model Download Link
en_PP-OCRv4_mobile_rec 70.39 6.8 M en_PP-OCRv4_mobile_rec.yaml Inference Model/Training Model
en_PP-OCRv3_mobile_rec 70.69 7.8 M en_PP-OCRv3_mobile_rec.yaml Inference Model/Training Model
Multilingual Recognition Model
Model Recognition Avg Accuracy(%) GPU Inference Time (ms) CPU Inference Time Model Storage Size (M) yaml File Model Download Link
korean_PP-OCRv3_mobile_rec 60.21 8.6 M korean_PP-OCRv3_mobile_rec.yaml Inference Model/Training Model
japan_PP-OCRv3_mobile_rec 45.69 8.8 M japan_PP-OCRv3_mobile_rec.yaml Inference Model/Training Model
chinese_cht_PP-OCRv3_mobile_rec 82.06 9.7 M chinese_cht_PP-OCRv3_mobile_rec.yaml Inference Model/Training Model
te_PP-OCRv3_mobile_rec 95.88 7.8 M te_PP-OCRv3_mobile_rec.yaml Inference Model/Training Model
ka_PP-OCRv3_mobile_rec 96.96 8.0 M ka_PP-OCRv3_mobile_rec.yaml Inference Model/Training Model
ta_PP-OCRv3_mobile_rec 76.83 8.0 M ta_PP-OCRv3_mobile_rec.yaml Inference Model/Training Model
latin_PP-OCRv3_mobile_rec 76.93 7.8 M latin_PP-OCRv3_mobile_rec.yaml Inference Model/Training Model
arabic_PP-OCRv3_mobile_rec 73.55 7.8 M arabic_PP-OCRv3_mobile_rec.yaml Inference Model/Training Model
cyrillic_PP-OCRv3_mobile_rec 94.28 7.9 M cyrillic_PP-OCRv3_mobile_rec.yaml Inference Model/Training Model
devanagari_PP-OCRv3_mobile_rec 96.44 7.9 M devanagari_PP-OCRv3_mobile_rec.yaml Inference Model/Training Model

Note: The evaluation set for the above accuracy metrics is a multi-language dataset built by PaddleX. All model GPU inference times are based on NVIDIA Tesla T4 machines, with precision type FP32. CPU inference speed is based on Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz, with 8 threads, and precision type FP32.

## [Formula Recognition Module](../module_usage/tutorials/ocr_modules/formula_recognition.en.md)
Model Avg-BLEU GPU Inference Time (ms) Model Storage Size (M) yaml File Model Download Link
UniMERNet 0.8613 2266.96 1.4 G UniMERNet.yaml Inference Model/Training Model
PP-FormulaNet-S 0.8712 202.25 167.9 M PP-FormulaNet-S.yaml Inference Model/Training Model
PP-FormulaNet-L 0.9213 1976.52 535.2 M PP-FormulaNet-L.yaml Inference Model/Training Model
LaTeX_OCR_rec 0.7163 - 89.7 M LaTeX_OCR_rec.yaml Inference Model/Training Model
Note: The above accuracy metrics are measured from the internal formula recognition test set of PaddleX. The BLEU score of LaTeX_OCR_rec on the LaTeX-OCR formula recognition test set is 0.8821. All model GPU inference times are based on Tesla V100 GPUs, with precision type FP32. ## [Table Structure Recognition Module](../module_usage/tutorials/ocr_modules/table_structure_recognition.en.md)
Model Accuracy (%) GPU Inference Time (ms) CPU Inference Time (ms) Model Storage Size (M) yaml File Model Download Link
SLANet 59.52 522.536 1845.37 6.9 M SLANet.yaml Inference Model/Training Model
SLANet_plus 63.69 522.536 1845.37 6.9 M SLANet_plus.yaml Inference Model/Training Model
SLANeXt_wired 69.65 -- -- -- SLANeXt_wired.yaml Inference Model/Training Model
SLANeXt_wireless SLANeXt_wireless.yaml Inference Model/Training Model
Note: The above accuracy metrics are measured from the high-difficulty Chinese table recognition dataset built internally by PaddleX. All model GPU inference times are based on an NVIDIA Tesla T4 machine with FP32 precision type. CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision type. ## [Table Cell Detection Module](../module_usage/tutorials/ocr_modules/table_cells_detection.en.md)
Model mAP(%) GPU Inference Time (ms) CPU Inference Time (ms) Model Storage Size (M) yaml File Model Download Link
RT-DETR-L_wired_table_cell_det -- -- -- -- RT-DETR-L_wired_table_cell_det.yaml Inference Model/Training Model
RT-DETR-L_wireless_table_cell_det RT-DETR-L_wireless_table_cell_det.yaml Inference Model/Training Model

Note: The above accuracy metrics are measured from the internal table cell detection dataset of PaddleX. All model GPU inference times are based on an NVIDIA Tesla T4 machine, with precision type FP32. CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz, with 8 threads, and precision type FP32.

## [Table Classification Module](../module_usage/tutorials/ocr_modules/table_classification.en.md)
Model Top1 Acc(%) GPU Inference Time (ms) CPU Inference Time (ms) Model Storage Size (M) yaml File Model Download Link
PP-LCNet_x1_0_table_cls -- -- -- -- PP-LCNet_x1_0_table_cls.yaml Inference Model/Training Model

Note: The above accuracy metrics are measured from the internal table classification dataset built by PaddleX. All model GPU inference times are based on an NVIDIA Tesla T4 machine with FP32 precision. CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.

## [Text Image Unwarping Module](../module_usage/tutorials/ocr_modules/text_image_unwarping.en.md)
Model Name MS-SSIM (%) GPU Inference Time (ms) CPU Inference Time (ms) Model Storage Size yaml File Model Download Link
UVDoc 54.40 - - 30.3 M UVDoc.yaml Inference Model/Training Model
Note: The above accuracy metrics are measured from the image unwarping dataset built by PaddleX. ## [Layout Detection Module](../module_usage/tutorials/ocr_modules/layout_detection.en.md) * Table Layout Detection Model
Model mAP(0.5) (%) GPU Inference Time (ms) CPU Inference Time (ms) Model Storage Size (M) yaml File Model Download Link
PicoDet_layout_1x_table 97.5 12.623 90.8934 7.4 M PicoDet_layout_1x_table.yaml Inference Model/Training Model
Note: The evaluation set for the above accuracy metrics is the layout table area detection dataset built by PaddleOCR, which contains 7835 images of document types with tables in both Chinese and English. The GPU inference time is based on an NVIDIA Tesla T4 machine with FP32 precision, and the CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision. 3 types of layout detection models, including tables, images, and seals
Model mAP(0.5) (%) GPU Inference Time (ms) CPU Inference Time (ms) Model Storage Size (M) yaml File Model Download Link
PicoDet-S_layout_3cls 88.2 13.5 45.8 4.8 PicoDet-S_layout_3cls.yaml Inference Model/Training Model
PicoDet-L_layout_3cls 89.0 15.7 159.8 22.6 PicoDet-L_layout_3cls.yaml Inference Model/Training Model
RT-DETR-H_layout_3cls 95.8 114.6 3832.6 470.1 RT-DETR-H_layout_3cls.yaml Inference Model/Training Model
Note: The evaluation dataset for the above accuracy metrics is the layout area detection dataset built by PaddleOCR, which includes 1,154 common types of document images such as Chinese and English papers, magazines, and research reports. The GPU inference time is based on an NVIDIA Tesla T4 machine with FP32 precision, and the CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision. * 5-class English document layout detection model, including text, title, table, image, and list
Model mAP(0.5) (%) GPU Inference Time (ms) CPU Inference Time (ms) Model Size (M) yaml File Model Download Link
PicoDet_layout_1x 97.8 13.0 91.3 7.4 PicoDet_layout_1x.yaml Inference Model/Training Model
Note: The evaluation dataset for the above accuracy metrics is the [PubLayNet](https://developer.ibm.com/exchanges/data/all/publaynet/) evaluation dataset, which contains 11,245 images of English documents. The GPU inference time is based on an NVIDIA Tesla T4 machine with FP32 precision. The CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision. * 17-class layout detection model, including 17 common layout categories: paragraph title, image, text, number, abstract, content, figure title, formula, table, table title, reference, document title, footnote, header, algorithm, footer, and seal
Model mAP(0.5) (%) GPU Inference Time (ms) CPU Inference Time (ms) Model Size (M) yaml File Model Download Link
PicoDet-S_layout_17cls 87.4 13.6 46.2 4.8 PicoDet-S_layout_17cls.yaml Inference Model/Training Model
PicoDet-L_layout_17cls 89.0 17.2 160.2 22.6 PicoDet-L_layout_17cls.yaml Inference Model/Training Model RT-DETR-H_layout_17cls 98.3 115.1 3827.2 470.2 RT-DETR-H_layout_17cls.yaml Inference Model/Training Model Note: The evaluation set for the above accuracy metrics is the layout area detection dataset built by PaddleOCR, which includes 892 images of common document types such as Chinese and English papers, magazines, and research reports. The GPU inference time is based on an NVIDIA Tesla T4 machine with FP32 precision. The CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision. ## [Document Image Orientation Classification Module](../module_usage/tutorials/ocr_modules/doc_img_orientation_classification.en.md)
Model Top-1 Acc (%) GPU Inference Time (ms) CPU Inference Time (ms) Model Size (M) yaml File Model Download Link
PP-LCNet_x1_0_doc_ori 99.06 3.84845 9.23735 7 PP-LCNet_x1_0_doc_ori.yaml Inference Model/Training Model
Note: The evaluation set for the above accuracy metrics is a self-built dataset covering multiple scenarios such as documents and certificates, with 1000 images. The GPU inference time is based on an NVIDIA Tesla T4 machine with FP32 precision. The CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision. ## [Time Series Forecasting Module](../module_usage/tutorials/time_series_modules/time_series_forecasting.en.md)
Model Name mse mae Model Storage Size yaml File Model Download Link
DLinear 0.382 0.394 72 K DLinear.yaml Inference Model/Training Model
NLinear 0.386 0.392 40 K NLinear.yaml Inference Model/Training Model
Nonstationary 0.600 0.515 55.5 M Nonstationary.yaml Inference Model/Training Model
PatchTST 0.379 0.391 2.0 M PatchTST.yaml Inference Model/Training Model
RLinear 0.385 0.392 40 K RLinear.yaml Inference Model/Training Model
TiDE 0.407 0.414 31.7 M TiDE.yaml Inference Model/Training Model
TimesNet 0.416 0.429 4.9 M TimesNet.yaml Inference Model/Training Model
Note: The above accuracy metrics are measured from the [ETTH1](https://paddle-model-ecology.bj.bcebos.com/paddlex/data/Etth1.tar) dataset (evaluation results on the test.csv test set). ## [Time Series Anomaly Detection Module](../module_usage/tutorials/time_series_modules/time_series_anomaly_detection.en.md)
Model Name Precision Recall F1 Score Model Storage Size YAML File Model Download Link
AutoEncoder_ad 99.36 84.36 91.25 52 K AutoEncoder_ad.yaml Inference Model/Training Model
DLinear_ad 98.98 93.96 96.41 112 K DLinear_ad.yaml Inference Model/Training Model
Nonstationary_ad 98.55 88.95 93.51 1.8 M Nonstationary_ad.yaml Inference Model/Training Model
PatchTST_ad 98.78 90.70 94.57 320 K PatchTST_ad.yaml Inference Model/Training Model
Note: The above precision metrics are measured from the [PSM](https://paddle-model-ecology.bj.bcebos.com/paddlex/data/ts_anomaly_examples.tar) dataset. ## [Time Series Classification Module](../module_usage/tutorials/time_series_modules/time_series_classification.en.md)
Model Name acc(%) Model Storage Size yaml File Model Download Link
TimesNet_cls 87.5 792 K TimesNet_cls.yaml Inference Model/Training Model
Note: The above accuracy metrics are measured from the [UWaveGestureLibrary](https://paddlets.bj.bcebos.com/classification/UWaveGestureLibrary_TEST.csv) dataset. >Note: The GPU inference time for all models above is based on an NVIDIA Tesla T4 machine with FP32 precision. The CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision. ## [Multilingual Speech Recognition Module](../module_usage/tutorials/speech_modules/multilingual_speech_recognition.en.md)
Model Training Data Model Size Word Error Rate YAML File Model Download Link
whisper_large 680kh 5.8G 2.7 (Librispeech) whisper_large.yaml Inference Model
whisper_medium 680kh 2.9G - whisper_medium.yaml Inference Model
whisper_small 680kh 923M - whisper_small.yaml Inference Model
whisper_base 680kh 277M - whisper_base.yaml Inference Model
whisper_tiny 680kh 145M - whisper_small.yaml Inference Model
## [Video Classification Module](../module_usage/tutorials/video_modules/video_classification.en.md)
Model Top1 Acc(%) Model Storage Size (M) yaml File Model Download Link
PP-TSM-R50_8frames_uniform 74.36 93.4 M PP-TSM-R50_8frames_uniform.yaml Inference Model/Training Model
PP-TSMv2-LCNetV2_8frames_uniform 71.71 22.5 M PP-TSMv2-LCNetV2_8frames_uniform.yaml Inference Model/Training Model
PP-TSMv2-LCNetV2_16frames_uniform 73.11 22.5 M PP-TSMv2-LCNetV2_16frames_uniform.yaml Inference Model/Training Model

Note: The above accuracy metrics are based on the K400 validation set Top1 Acc.

## [Video Detection Module](../module_usage/tutorials/video_modules/video_detection.en.md)
Model Frame-mAP(@ IoU 0.5) Model Storage Size (M) yaml File Model Download Link
YOWO 80.94 462.891M YOWO.yaml Inference Model/Training Model

Note: The above accuracy metrics are based on the test dataset UCF101-24, using the Frame-mAP (@ IoU 0.5) metric. All model GPU inference times are based on an NVIDIA Tesla T4 machine with FP32 precision. CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.