comments: true

PaddleX Model List (CPU/GPU)

PaddleX includes multiple pipelines, each containing several modules, and each module includes several models. You can choose which models to use based on the benchmark data below. If you prioritize model accuracy, choose models with higher accuracy. If you prioritize model inference speed, choose models with faster inference speed. If you prioritize model storage size, choose models with smaller storage size.

Image Classification Module

Model Name	Top1 Acc (%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size	yaml File	Model Download Link
CLIP_vit_base_patch16_224	85.36	12.84 / 2.82	60.52 / 60.52	306.5 M	CLIP_vit_base_patch16_224.yaml	Inference Model/Training Model
CLIP_vit_large_patch14_224	88.1	51.72 / 11.13	238.07 / 238.07	1.04 G	CLIP_vit_large_patch14_224.yaml	Inference Model/Training Model
ConvNeXt_base_224	83.84	13.18 / 12.14	128.39 / 81.78	313.9 M	ConvNeXt_base_224.yaml	Inference Model/Training Model
ConvNeXt_base_384	84.90	32.15 / 30.52	279.36 / 220.35	313.9 M	ConvNeXt_base_384.yaml	Inference Model/Training Model
ConvNeXt_large_224	84.26	26.51 / 7.21	213.32 / 157.22	700.7 M	ConvNeXt_large_224.yaml	Inference Model/Training Model
ConvNeXt_large_384	85.27	67.07 / 65.26	494.04 / 438.97	700.7 M	ConvNeXt_large_384.yaml	Inference Model/Training Model
ConvNeXt_small	83.13	9.05 / 8.21	97.94 / 55.29	178.0 M	ConvNeXt_small.yaml	Inference Model/Training Model
ConvNeXt_tiny	82.03	5.12 / 2.06	63.96 / 29.77	101.4 M	ConvNeXt_tiny.yaml	Inference Model/Training Model
FasterNet-L	83.5	15.67 / 3.10	52.24 / 52.24	357.1 M	FasterNet-L.yaml	Inference Model/Training Model
FasterNet-M	83.0	9.72 / 2.30	35.29 / 35.29	204.6 M	FasterNet-M.yaml	Inference Model/Training Model
FasterNet-S	81.3	5.46 / 1.27	20.46 / 18.03	119.3 M	FasterNet-S.yaml	Inference Model/Training Model
FasterNet-T0	71.9	4.18 / 0.60	6.34 / 3.44	15.1 M	FasterNet-T0.yaml	Inference Model/Training Model
FasterNet-T1	75.9	4.24 / 0.64	9.57 / 5.20	29.2 M	FasterNet-T1.yaml	Inference Model/Training Model
FasterNet-T2	79.1	3.87 / 0.78	11.14 / 9.98	57.4 M	FasterNet-T2.yaml	Inference Model/Training Model
MobileNetV1_x0_5	63.5	1.39 / 0.28	2.74 / 1.02	4.8 M	MobileNetV1_x0_5.yaml	Inference Model/Training Model
MobileNetV1_x0_25	51.4	1.32 / 0.30	2.04 / 0.58	1.8 M	MobileNetV1_x0_25.yaml	Inference Model/Training Model
MobileNetV1_x0_75	68.8	1.75 / 0.33	3.41 / 1.57	9.3 M	MobileNetV1_x0_75.yaml	Inference Model/Training Model
MobileNetV1_x1_0	71.0	1.89 / 0.34	4.01 / 2.17	15.2 M	MobileNetV1_x1_0.yaml	Inference Model/Training Model
MobileNetV2_x0_5	65.0	3.17 / 0.48	4.52 / 1.35	7.1 M	MobileNetV2_x0_5.yaml	Inference Model/Training Model
MobileNetV2_x0_25	53.2	2.80 / 0.46	3.92 / 0.98	5.5 M	MobileNetV2_x0_25.yaml	Inference Model/Training Model
MobileNetV2_x1_0	72.2	3.57 / 0.49	5.63 / 2.51	12.6 M	MobileNetV2_x1_0.yaml	Inference Model/Training Model
MobileNetV2_x1_5	74.1	3.58 / 0.62	8.02 / 4.49	25.0 M	MobileNetV2_x1_5.yaml	Inference Model/Training Model
MobileNetV2_x2_0	75.2	3.56 / 0.74	10.24 / 6.83	41.2 M	MobileNetV2_x2_0.yaml	Inference Model/Training Model
MobileNetV3_large_x0_5	69.2	3.79 / 0.62	6.76 / 1.61	9.6 M	MobileNetV3_large_x0_5.yaml	Inference Model/Training Model
MobileNetV3_large_x0_35	64.3	3.70 / 0.60	5.54 / 1.41	7.5 M	MobileNetV3_large_x0_35.yaml	Inference Model/Training Model
MobileNetV3_large_x0_75	73.1	4.82 / 0.66	7.45 / 2.00	14.0 M	MobileNetV3_large_x0_75.yaml	Inference Model/Training Model
MobileNetV3_large_x1_0	75.3	4.86 / 0.68	6.88 / 2.61	19.5 M	MobileNetV3_large_x1_0.yaml	Inference Model/Training Model
MobileNetV3_large_x1_25	76.4	5.08 / 0.71	7.37 / 3.58	26.5 M	MobileNetV3_large_x1_25.yaml	Inference Model/Training Model
MobileNetV3_small_x0_5	59.2	3.41 / 0.57	5.60 / 1.14	6.8 M	MobileNetV3_small_x0_5.yaml	Inference Model/Training Model
MobileNetV3_small_x0_35	53.0	3.49 / 0.60	4.63 / 1.07	6.0 M	MobileNetV3_small_x0_35.yaml	Inference Model/Training Model
MobileNetV3_small_x0_75	66.0	3.49 / 0.60	5.19 / 1.28	8.5 M	MobileNetV3_small_x0_75.yaml	Inference Model/Training Model
MobileNetV3_small_x1_0	68.2	3.76 / 0.53	5.11 / 1.43	10.5 M	MobileNetV3_small_x1_0.yaml	Inference Model/Training Model
MobileNetV3_small_x1_25	70.7	4.23 / 0.58	6.48 / 1.68	13.0 M	MobileNetV3_small_x1_25.yaml	Inference Model/Training Model
MobileNetV4_conv_large	83.4	8.33 / 2.24	33.56 / 23.70	125.2 M	MobileNetV4_conv_large.yaml	Inference Model/Training Model
MobileNetV4_conv_medium	79.9	6.81 / 0.92	12.47 / 6.27	37.6 M	MobileNetV4_conv_medium.yaml	Inference Model/Training Model
MobileNetV4_conv_small	74.6	3.25 / 0.46	4.42 / 1.54	14.7 M	MobileNetV4_conv_small.yaml	Inference Model/Training Model
MobileNetV4_hybrid_large	83.8	12.27 / 4.18	58.64 / 58.64	145.1 M	MobileNetV4_hybrid_large.yaml	Inference Model/Training Model
MobileNetV4_hybrid_medium	80.5	12.08 / 1.34	24.69 / 8.10	42.9 M	MobileNetV4_hybrid_medium.yaml	Inference Model/Training Model
PP-HGNet_base	85.0	14.10 / 4.19	68.92 / 68.92	249.4 M	PP-HGNet_base.yaml	Inference Model/Training Model
PP-HGNet_small	81.51	5.12 / 1.73	25.01 / 25.01	86.5 M	PP-HGNet_small.yaml	Inference Model/Training Model
PP-HGNet_tiny	79.83	3.28 / 1.29	16.40 / 15.97	52.4 M	PP-HGNet_tiny.yaml	Inference Model/Training Model
PP-HGNetV2-B0	77.77	3.83 / 0.57	9.95 / 2.37	21.4 M	PP-HGNetV2-B0.yaml	Inference Model/Training Model
PP-HGNetV2-B1	79.18	3.87 / 0.62	8.77 / 3.79	22.6 M	PP-HGNetV2-B1.yaml	Inference Model/Training Model
PP-HGNetV2-B2	81.74	5.73 / 0.86	15.11 / 7.05	39.9 M	PP-HGNetV2-B2.yaml	Inference Model/Training Model
PP-HGNetV2-B3	82.98	6.26 / 1.01	18.47 / 10.34	57.9 M	PP-HGNetV2-B3.yaml	Inference Model/Training Model
PP-HGNetV2-B4	83.57	5.47 / 1.10	14.42 / 9.89	70.4 M	PP-HGNetV2-B4.yaml	Inference Model/Training Model
PP-HGNetV2-B5	84.75	10.24 / 1.96	29.71 / 29.71	140.8 M	PP-HGNetV2-B5.yaml	Inference Model/Training Model
PP-HGNetV2-B6	86.30	12.25 / 3.76	62.29 / 62.29	268.4 M	PP-HGNetV2-B6.yaml	Inference Model/Training Model
PP-LCNet_x0_5	63.14	2.28 / 0.42	2.86 / 0.83	6.7 M	PP-LCNet_x0_5.yaml	Inference Model/Training Model
PP-LCNet_x0_25	51.86	1.89 / 0.45	2.49 / 0.68	5.5 M	PP-LCNet_x0_25.yaml	Inference Model/Training Model
PP-LCNet_x0_35	58.09	1.94 / 0.41	2.73 / 0.77	5.9 M	PP-LCNet_x0_35.yaml	Inference Model/Training Model
PP-LCNet_x0_75	68.18	2.30 / 0.41	2.95 / 1.07	8.4 M	PP-LCNet_x0_75.yaml	Inference Model/Training Model
PP-LCNet_x1_0	71.32	2.35 / 0.47	4.03 / 1.35	10.5 M	PP-LCNet_x1_0.yaml	Inference Model/Training Model
PP-LCNet_x1_5	73.71	2.33 / 0.53	4.17 / 2.29	16.0 M	PP-LCNet_x1_5.yaml	Inference Model/Training Model
PP-LCNet_x2_0	75.18	2.40 / 0.51	5.37 / 3.46	23.2 M	PP-LCNet_x2_0.yaml	Inference Model/Training Model
PP-LCNet_x2_5	76.60	2.36 / 0.61	6.29 / 5.05	32.1 M	PP-LCNet_x2_5.yaml	Inference Model/Training Model
PP-LCNetV2_base	77.05	3.33 / 0.55	6.86 / 3.77	23.7 M	PP-LCNetV2_base.yaml	Inference Model/Training Model
PP-LCNetV2_large	78.51	4.37 / 0.71	9.43 / 8.07	37.3 M	PP-LCNetV2_large.yaml	Inference Model/Training Model
PP-LCNetV2_small	73.97	2.53 / 0.41	5.14 / 1.98	14.6 M	PP-LCNetV2_small.yaml	Inference Model/Training Model
ResNet18_vd	72.3	2.47 / 0.61	6.97 / 5.15	41.5 M	ResNet18_vd.yaml	Inference Model/Training Model
ResNet18	71.0	2.35 / 0.67	6.35 / 4.61	41.5 M	ResNet18.yaml	Inference Model/Training Model
ResNet34_vd	76.0	4.01 / 1.03	11.99 / 9.86	77.3 M	ResNet34_vd.yaml	Inference Model/Training Model
ResNet34	74.6	3.99 / 1.02	12.42 / 9.81	77.3 M	ResNet34.yaml	Inference Model/Training Model
ResNet50_vd	79.1	6.04 / 1.16	16.08 / 12.07	90.8 M	ResNet50_vd.yaml	Inference Model/Training Model
ResNet50	76.5	6.44 / 1.16	15.04 / 11.63	90.8 M	ResNet50.yaml	Inference Model/Training Model
ResNet101_vd	80.2	11.16 / 2.07	32.14 / 32.14	158.4 M	ResNet101_vd.yaml	Inference Model/Training Model
ResNet101	77.6	10.91 / 2.06	31.14 / 22.93	158.7 M	ResNet101.yaml	Inference Model/Training Model
ResNet152_vd	80.6	15.96 / 2.99	49.33 / 49.33	214.3 M	ResNet152_vd.yaml	Inference Model/Training Model
ResNet152	78.3	15.61 / 2.90	47.33 / 36.60	214.2 M	ResNet152.yaml	Inference Model/Training Model
ResNet200_vd	80.9	24.20 / 3.69	62.62 / 62.62	266.0 M	ResNet200_vd.yaml	Inference Model/Training Model
StarNet-S1	73.6	6.33 / 1.98	7.56 / 3.26	11.2 M	StarNet-S1.yaml	Inference Model/Training Model
StarNet-S2	74.8	4.49 / 1.55	7.38 / 3.38	14.3 M	StarNet-S2.yaml	Inference Model/Training Model
StarNet-S3	77.0	6.70 / 1.62	11.05 / 4.76	22.2 M	StarNet-S3.yaml	Inference Model/Training Model
StarNet-S4	79.0	8.50 / 2.86	15.40 / 6.76	28.9 M	StarNet-S4.yaml	Inference Model/Training Model
SwinTransformer_base_patch4_window7_224	83.37	14.29 / 5.13	130.89 / 130.89	310.5 M	SwinTransformer_base_patch4_window7_224.yaml	Inference Model/Training Model
SwinTransformer_base_patch4_window12_384	84.17	37.74 / 10.10	362.56 / 362.56	311.4 M	SwinTransformer_base_patch4_window12_384.yaml	Inference Model/Training Model
SwinTransformer_large_patch4_window7_224	86.19	26.48 / 7.94	228.23 / 228.23	694.8 M	SwinTransformer_large_patch4_window7_224.yaml	Inference Model/Training Model
SwinTransformer_large_patch4_window12_384	87.06	74.72 / 18.16	652.04 / 652.04	696.1 M	SwinTransformer_large_patch4_window12_384.yaml	Inference Model/Training Model
SwinTransformer_small_patch4_window7_224	83.21	10.37 / 3.90	94.20 / 94.20	175.6 M	SwinTransformer_small_patch4_window7_224.yaml	Inference Model/Training Model
SwinTransformer_tiny_patch4_window7_224	81.10	6.66 / 2.15	60.45 / 60.45	100.1 M	SwinTransformer_tiny_patch4_window7_224.yaml	Inference Model/Training Model

Note: The above accuracy metrics are based on the [ImageNet-1k](https://www.image-net.org/index.php) validation set Top1 Acc. ## [Image Multi-label Classification Module](../module_usage/tutorials/cv_modules/image_multilabel_classification.en.md)

Model Name	mAP (%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size	yaml File	Model Download Link
CLIP_vit_base_patch16_448_ML	89.15	54.75 / 14.30	280.23 / 280.23	325.6 M	CLIP_vit_base_patch16_448_ML.yaml	Inference Model/Training Model
PP-HGNetV2-B0_ML	80.98	6.47 / 1.38	21.56 / 13.69	39.6 M	PP-HGNetV2-B0_ML.yaml	Inference Model/Training Model
PP-HGNetV2-B4_ML	87.96	9.63 / 2.79	43.98 / 36.63	88.5 M	PP-HGNetV2-B4_ML.yaml	Inference Model/Training Model
PP-HGNetV2-B6_ML	91.06	37.07 / 9.43	188.58 / 188.58	286.5 M	PP-HGNetV2-B6_ML.yaml	Inference Model/Training Model
PP-LCNet_x1_0_ML	77.96	4.04 / 1.15	11.76 / 8.32	29.4 M	PP-LCNet_x1_0_ML.yaml	Inference Model/Training Model
ResNet50_ML	83.42	12.12 / 3.27	51.79 / 44.36	108.9 M	ResNet50_ML.yaml	Inference Model/Training Model

Note: The above accuracy metrics are for the multi-label classification task mAP on [COCO2017](https://cocodataset.org/#home). ## [Pedestrian Attribute Module](../module_usage/tutorials/cv_modules/pedestrian_attribute_recognition.en.md)

Model Name	mA (%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Size	yaml File	Model Download Link
PP-LCNet_x1_0_pedestrian_attribute	92.2	2.35 / 0.49	3.17 / 1.25	6.7 M	PP-LCNet_x1_0_pedestrian_attribute.yaml	Inference Model/Training Model

Note: The above accuracy metrics are for the internal PaddleX dataset mA. ## [Vehicle Attribute Module](../module_usage/tutorials/cv_modules/vehicle_attribute_recognition.en.md)

Model Name	mA (%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size	yaml File	Model Download Link
PP-LCNet_x1_0_vehicle_attribute	91.7	2.32 / 2.32	3.22 / 1.26	6.7 M	PP-LCNet_x1_0_vehicle_attribute.yaml	Inference Model/Training Model

Note: The above accuracy metrics are based on the VeRi dataset mA. ## [Image Feature Module](../module_usage/tutorials/cv_modules/image_feature.en.md)

Model Name	recall@1 (%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size	yaml File	Model Download Link
PP-ShiTuV2_rec	84.2	3.48 / 0.55	8.04 / 4.04	16.3 M	PP-ShiTuV2_rec.yaml	Inference Model/Training Model
PP-ShiTuV2_rec_CLIP_vit_base	88.69	12.94 / 2.88	58.36 / 58.36	306.6 M	PP-ShiTuV2_rec_CLIP_vit_base.yaml	Inference Model/Training Model
PP-ShiTuV2_rec_CLIP_vit_large	91.03	51.65 / 11.18	255.78 / 255.78	1.05 G	PP-ShiTuV2_rec_CLIP_vit_large.yaml	Inference Model/Training Model

Note: The above accuracy metrics are based on the AliProducts recall@1. ## [Document Orientation Classification Module](../module_usage/tutorials/ocr_modules/doc_img_orientation_classification.en.md)

Model Name	Top-1 Acc (%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size	yaml File	Model Download Link
PP-LCNet_x1_0_doc_ori	99.06	2.31 / 0.43	3.37 / 1.27	7	PP-LCNet_x1_0_doc_ori.yaml	Inference Model/Training Model

Note: The above accuracy metrics are based on the Top-1 Acc of the internal dataset of PaddleX. ## [Face Feature Module](../module_usage/tutorials/cv_modules/face_feature.en.md)

Model Name	Output Feature Dimension	Acc (%) AgeDB-30/CFP-FP/LFW	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (M)	yaml File	Model Download Link
MobileFaceNet	128	96.28/96.71/99.58	3.16 / 0.48	6.49 / 6.49	4.1	MobileFaceNet.yaml	Inference Model/Training Model
ResNet50_face	512	98.12/98.56/99.77	5.68 / 1.09	14.96 / 11.90	87.2	ResNet50_face.yaml	Inference Model/Training Model

Note: The above accuracy metrics are measured on the AgeDB-30, CFP-FP, and LFW datasets. ## [Main Body Detection Module](../module_usage/tutorials/cv_modules/mainbody_detection.en.md)

Model Name	mAP (%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size	yaml File	Model Download Link
PP-ShiTuV2_det	41.5	12.79 / 4.51	44.14 / 44.14	27.54	PP-ShiTuV2_det.yaml	Inference Model/Training Model

Note: The above accuracy metrics are based on the [PaddleClas Main Body Detection Dataset](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.5/docs/zh_CN/training/PP-ShiTu/mainbody_detection.md) mAP(0.5:0.95). ## [Object Detection Module](../module_usage/tutorials/cv_modules/object_detection.en.md)

Model Name	mAP (%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size	yaml File	Model Download Link
Cascade-FasterRCNN-ResNet50-FPN	41.1	135.92 / 135.92	-	245.4 M	Cascade-FasterRCNN-ResNet50-FPN.yaml	Inference Model/Training Model
Cascade-FasterRCNN-ResNet50-vd-SSLDv2-FPN	45.0	138.23 / 138.23	-	246.2 M	Cascade-FasterRCNN-ResNet50-vd-SSLDv2-FPN.yaml	Inference Model/Training Model
CenterNet-DLA-34	37.6	-	-	75.4 M	CenterNet-DLA-34.yaml	Inference Model/Training Model
CenterNet-ResNet50	38.9	-	-	319.7 M	CenterNet-ResNet50.yaml	Inference Model/Training Model
DETR-R50	42.3	62.91 / 17.33	392.63 / 392.63	159.3 M	DETR-R50.yaml	Inference Model/Training Model
FasterRCNN-ResNet34-FPN	37.8	83.33 / 31.64	-	137.5 M	FasterRCNN-ResNet34-FPN.yaml	Inference Model/Training Model
FasterRCNN-ResNet50-FPN	38.4	107.08 / 35.40	-	148.1 M	FasterRCNN-ResNet50-FPN.yaml	Inference Model/Training Model
FasterRCNN-ResNet50-vd-FPN	39.5	109.36 / 36.00	-	148.1 M	FasterRCNN-ResNet50-vd-FPN.yaml	Inference Model/Training Model
FasterRCNN-ResNet50-vd-SSLDv2-FPN	41.4	109.06 / 36.19	-	148.1 M	FasterRCNN-ResNet50-vd-SSLDv2-FPN.yaml	Inference Model/Training Model
FasterRCNN-ResNet50	36.7	496.33 / 109.12	-	120.2 M	FasterRCNN-ResNet50.yaml	Inference Model/Training Model
FasterRCNN-ResNet101-FPN	41.4	148.21 / 42.21	-	216.3 M	FasterRCNN-ResNet101-FPN.yaml	Inference Model/Training Model
FasterRCNN-ResNet101	39.0	538.58 / 120.88	-	188.1 M	FasterRCNN-ResNet101.yaml	Inference Model/Training Model
FasterRCNN-ResNeXt101-vd-FPN	43.4	258.01 / 58.25	-	360.6 M	FasterRCNN-ResNeXt101-vd-FPN.yaml	Inference Model/Training Model
FasterRCNN-Swin-Tiny-FPN	42.6	-	-	159.8 M	FasterRCNN-Swin-Tiny-FPN.yaml	Inference Model/Training Model
FCOS-ResNet50	39.6	106.13 / 28.32	721.79 / 721.79	124.2 M	FCOS-ResNet50.yaml	Inference Model/Training Model
PicoDet-L	42.6	14.68 / 5.81	47.32 / 47.32	20.9 M	PicoDet-L.yaml	Inference Model/Training Model
PicoDet-M	37.5	9.62 / 3.23	23.75 / 14.88	16.8 M	PicoDet-M.yaml	Inference Model/Training Model
PicoDet-S	29.1	7.98 / 2.33	14.82 / 5.60	4.4 M	PicoDet-S.yaml	Inference Model/Training Model
PicoDet-XS	26.2	9.66 / 2.75	19.15 / 7.24	5.7 M	PicoDet-XS.yaml	Inference Model/Training Model
PP-YOLOE_plus-L	52.9	33.55 / 10.46	189.05 / 189.05	185.3 M	PP-YOLOE_plus-L.yaml	Inference Model/Training Model
PP-YOLOE_plus-M	49.8	19.52 / 7.46	113.36 / 113.36	83.2 M	PP-YOLOE_plus-M.yaml	Inference Model/Training Model
PP-YOLOE_plus-S	43.7	12.16 / 4.58	73.86 / 52.90	28.3 M	PP-YOLOE_plus-S.yaml	Inference Model/Training Model
PP-YOLOE_plus-X	54.7	58.87 / 15.84	292.93 / 292.93	349.4 M	PP-YOLOE_plus-X.yaml	Inference Model/Training Model
RT-DETR-H	56.3	115.92 / 28.16	971.32 / 971.32	435.8 M	RT-DETR-H.yaml	Inference Model/Training Model
RT-DETR-L	53.0	35.00 / 10.45	495.51 / 495.51	113.7 M	RT-DETR-L.yaml	Inference Model/Training Model
RT-DETR-R18	46.5	20.21 / 6.23	266.01 / 266.01	70.7 M	RT-DETR-R18.yaml	Inference Model/Training Model
RT-DETR-R50	53.1	42.14 / 11.31	523.97 / 523.97	149.1 M	RT-DETR-R50.yaml	Inference Model/Training Model
RT-DETR-X	54.8	61.24 / 15.83	647.08 / 647.08	232.9 M	RT-DETR-X.yaml	Inference Model/Training Model
YOLOv3-DarkNet53	39.1	41.58 / 10.10	158.78 / 158.78	219.7 M	YOLOv3-DarkNet53.yaml	Inference Model/Training Model
YOLOv3-MobileNetV3	31.4	16.53 / 5.70	60.44 / 60.44	83.8 M	YOLOv3-MobileNetV3.yaml	Inference Model/Training Model
YOLOv3-ResNet50_vd_DCN	40.6	32.91 / 10.07	225.72 / 224.32	163.0 M	YOLOv3-ResNet50_vd_DCN.yaml	Inference Model/Training Model
YOLOX-L	50.1	121.19 / 13.55	295.38 / 274.15	192.5 M	YOLOX-L.yaml	Inference Model/Training Model
YOLOX-M	46.9	87.19 / 10.09	183.95 / 172.67	90.0 M	YOLOX-M.yaml	Inference Model/Training Model
YOLOX-N	26.1	53.31 / 45.02	69.69 / 59.18	3.4M	YOLOX-N.yaml	Inference Model/Training Model
YOLOX-S	40.4	129.52 / 13.19	181.39 / 179.01	32.0 M	YOLOX-S.yaml	Inference Model/Training Model
YOLOX-T	32.9	66.81 / 61.31	92.30 / 83.90	18.1 M	YOLOX-T.yaml	Inference Model/Training Model
YOLOX-X	51.8	156.40 / 20.17	480.14 / 454.35	351.5 M	YOLOX-X.yaml	Inference Model/Training Model

Note: The above accuracy metrics are based on the COCO2017 validation set mAP(0.5:0.95). ## [Small Object Detection Module](../module_usage/tutorials/cv_modules/small_object_detection.en.md)

Model Name	mAP (%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Size	yaml File	Model Download Links
PP-YOLOE_plus_SOD-S	25.1	135.68 / 122.94	188.09 / 107.74	77.3 M	PP-YOLOE_plus_SOD-S.yaml	Inference Model/Training Model

PP-YOLOE_plus_SOD-L 31.9 57.1448 1006.98 325.0 M PP-YOLOE_plus_SOD-L.yaml Inference Model/Training Model PP-YOLOE_plus_SOD-largesize-L 42.7 458.521 11172.7 340.5 M PP-YOLOE_plus_SOD-largesize-L.yaml Inference Model/Training Model Note: The above accuracy metrics are based on the validation set mAP(0.5:0.95) of [VisDrone-DET](https://github.com/VisDrone/VisDrone-Dataset). ## [Open-Vocabulary Object Detection](../module_usage/tutorials/cv_modules/open_vocabulary_detection.en.md)

Model	mAP(0.5:0.95)	mAP(0.5)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Size (M)	Model Download Link
GroundingDINO-T	49.4	64.4	253.72	1807.4	658.3	Inference Model
YOLO-Worldv2-L	44.4	59.8	24.32	374.89	421.4	Inference Model

Note: The above accuracy metrics are based on the COCO val2017 validation set mAP(0.5:0.95). ## [Open Vocabulary Segmentation](../module_usage/tutorials/cv_modules/open_vocabulary_segmentation.en.md)

Model	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (M)	Model Download Link
SAM-H_box	144.9	33920.7	2433.7	Inference Model
SAM-H_point	144.9	33920.7	2433.7	Inference Model

Rotated Object Detection

Model	mAP(%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (M)	yaml File	Model Download Link
PP-YOLOE-R-L	78.14	20.7039	157.942	211.0 M	PP-YOLOE-R.yaml	Inference Model/Training Model

Note: The above accuracy metrics are based on the DOTA validation set mAP(0.5:0.95).

## [Pedestrian Detection Module](../module_usage/tutorials/cv_modules/human_detection.en.md)

Model Name	mAP (%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size	yaml File	Model Download Link
PP-YOLOE-L_human	48.0	33.27 / 9.19	173.72 / 173.72	196.1 M	PP-YOLOE-L_human.yaml	Inference Model/Training Model
PP-YOLOE-S_human	42.5	9.94 / 3.42	54.48 / 46.52	28.8 M	PP-YOLOE-S_human.yaml	Inference Model/Training Model

Note: The above accuracy metrics are based on the validation set mAP(0.5:0.95) of [CrowdHuman](https://bj.bcebos.com/v1/paddledet/data/crowdhuman.zip). ## [Vehicle Detection Module](../module_usage/tutorials/cv_modules/vehicle_detection.en.md)

Model Name	mAP (%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size	yaml File	Model Download Link
PP-YOLOE-L_vehicle	63.9	32.84 / 9.03	176.60 / 176.60	196.1 M	PP-YOLOE-L_vehicle.yaml	Inference Model/Training Model
PP-YOLOE-S_vehicle	61.3	9.79 / 3.48	54.14 / 46.69	28.8 M	PP-YOLOE-S_vehicle.yaml	Inference Model/Training Model

Note: The above precision metrics are based on the validation set mAP(0.5:0.95) of [PPVehicle](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/modules/ppvehicle) ## [Face Detection Module](../module_usage/tutorials/cv_modules/face_detection.en.md)

Model Name	AP (%) Easy/Medium/Hard	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size	yaml File	Model Download Link
BlazeFace	77.7/73.4/49.5	60.34 / 54.76	84.18 / 84.18	0.447 M	BlazeFace.yaml	Inference Model/Training Model
BlazeFace-FPN-SSH	83.2/80.5/60.5	69.29 / 63.42	86.96 / 86.96	0.606 M	BlazeFace-FPN-SSH.yaml	Inference Model/Training Model
PicoDet_LCNet_x2_5_face	93.7/90.7/68.1	35.37 / 12.88	126.24 / 126.24	28.9 M	PicoDet_LCNet_x2_5_face.yaml	Inference Model/Training Model
PP-YOLOE_plus-S_face	93.9/91.8/79.8	22.54 / 8.33	138.67 / 138.67	26.5 M	PP-YOLOE_plus-S_face	Inference Model/Training Model

Note: The above precision metrics are evaluated on the WIDER-FACE validation set with an input size of 640x640.

Anomaly Detection Module

Model Name	mIoU	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size	yaml File	Model Download Link
STFPM	0.9901	2.97 / 1.57	38.86 / 13.24	22.5 M	STFPM.yaml	Inference Model/Training Model

Note: The above precision metrics are the average anomaly scores on the validation set of [MVTec AD](https://www.mvtec.com/company/research/datasets/mvtec-ad).

Model	Scheme	Input Size	AP(0.5:0.95)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (M)	yaml File	Model Download Link
PP-TinyPose_128x96	Top-Down	128*96	58.4			4.9	PP-TinyPose_128x96.yaml	Inference Model/Training Model
PP-TinyPose_256x192	Top-Down	256*192	68.3			4.9	PP-TinyPose_256x192.yaml	Inference Model/Training Model

Note: The above accuracy metrics are based on the COCO dataset AP(0.5:0.95), with detection boxes obtained from ground truth annotations.

3D Multi-modal Fusion Detection Module

Model	mAP(%)	NDS	yaml File	Model Download Link
BEVFusion	53.9	60.9	BEVFusion.yaml	Inference Model/Training Model

Note: The above accuracy metrics are based on the nuscenes validation set with mAP(0.5:0.95) and NDS 60.9, and the precision type is FP32.

## [Semantic Segmentation Module](../module_usage/tutorials/cv_modules/semantic_segmentation.en.md)

Model Name	mloU（%）	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size	yaml File	Model Download Link
Deeplabv3_Plus-R50	80.36	503.51 / 122.30	3543.91 / 3543.91	94.9 M	Deeplabv3_Plus-R50.yaml	Inference Model/Training Model
Deeplabv3_Plus-R101	81.10	803.79 / 175.45	5136.21 / 5136.21	162.5 M	Deeplabv3_Plus-R101.yaml	Inference Model/Training Model
Deeplabv3-R50	79.90	647.56 / 121.67	3803.09 / 3803.09	138.3 M	Deeplabv3-R50.yaml	Inference Model/Training Model
Deeplabv3-R101	80.85	950.43 / 178.50	5517.14 / 5517.14	205.9 M	Deeplabv3-R101.yaml	Inference Model/Training Model
OCRNet_HRNet-W18	80.67	286.12 / 80.76	1794.03 / 1794.03	43.1 M	OCRNet_HRNet-W18.yaml	Inference Model/Training Model
OCRNet_HRNet-W48	82.15	627.36 / 170.76	3531.61 / 3531.61	249.8 M	OCRNet_HRNet-W48.yaml	Inference Model/Training Model
PP-LiteSeg-T	73.10	30.16 / 14.03	420.07 / 235.01	28.5 M	PP-LiteSeg-T.yaml	Inference Model/Training Model
PP-LiteSeg-B	75.25	40.92 / 20.18	494.32 / 310.34	47.0 M	PP-LiteSeg-B.yaml	Inference Model/Training Model
SegFormer-B0 (slice)	76.73	11.1946	268.929	13.2 M	SegFormer-B0.yaml	Inference Model/Training Model
SegFormer-B1 (slice)	78.35	17.9998	403.393	48.5 M	SegFormer-B1.yaml	Inference Model/Training Model
SegFormer-B2 (slice)	81.60	48.0371	1248.52	96.9 M	SegFormer-B2.yaml	Inference Model/Training Model
SegFormer-B3 (slice)	82.47	64.341	1666.35	167.3 M	SegFormer-B3.yaml	Inference Model/Training Model
SegFormer-B4 (slice)	82.38	82.4336	1995.42	226.7 M	SegFormer-B4.yaml	Inference Model/Training Model
SegFormer-B5 (slice)	82.58	97.3717	2420.19	229.7 M	SegFormer-B5.yaml	Inference Model/Training Model

Note: The above accuracy metrics are based on the [Cityscapes](https://www.cityscapes-dataset.com/) dataset mIoU.

Model Name	mIoU (%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size	yaml File	Model Download Link
SeaFormer_base(slice)	40.92	24.4073	397.574	30.8 M	SeaFormer_base.yaml	Inference Model/Training Model
SeaFormer_large (slice)	43.66	27.8123	550.464	49.8 M	SeaFormer_large.yaml	Inference Model/Training Model
SeaFormer_small (slice)	38.73	19.2295	358.343	14.3 M	SeaFormer_small.yaml	Inference Model/Training Model
SeaFormer_tiny (slice)	34.58	13.9496	330.132	6.1M	SeaFormer_tiny.yaml	Inference Model/Training Model

Note: The above accuracy metrics are based on the [ADE20k](https://groups.csail.mit.edu/vision/datasets/ADE20K/) dataset. "Slice" indicates that the input images have been cropped. ## [Instance Segmentation Module](../module_usage/tutorials/cv_modules/instance_segmentation.en.md)

Model Name	Mask AP	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size	yaml File	Model Download Link
Mask-RT-DETR-H	50.6	172.36 / 172.36	1615.75 / 1615.75	449.9 M	Mask-RT-DETR-H.yaml	Inference Model/Training Model
Mask-RT-DETR-L	45.7	88.18 / 88.18	1090.84 / 1090.84	113.6 M	Mask-RT-DETR-L.yaml	Inference Model/Training Model
Mask-RT-DETR-M	42.7	78.69 / 78.69	-	66.6 M	Mask-RT-DETR-M.yaml	Inference Model/Training Model
Mask-RT-DETR-S	41.0	33.5007	-	51.8 M	Mask-RT-DETR-S.yaml	Inference Model/Training Model
Mask-RT-DETR-X	47.5	114.16 / 114.16	1240.92 / 1240.92	237.5 M	Mask-RT-DETR-X.yaml	Inference Model/Training Model
Cascade-MaskRCNN-ResNet50-FPN	36.3	141.69 / 141.69	-	254.8 M	Cascade-MaskRCNN-ResNet50-FPN.yaml	Inference Model/Training Model
Cascade-MaskRCNN-ResNet50-vd-SSLDv2-FPN	39.1	147.62 / 147.62	-	254.7 M	Cascade-MaskRCNN-ResNet50-vd-SSLDv2-FPN.yaml	Inference Model/Training Model
MaskRCNN-ResNet50-FPN	35.6	118.30 / 118.30	-	157.5 M	MaskRCNN-ResNet50-FPN.yaml	Inference Model/Training Model
MaskRCNN-ResNet50-vd-FPN	36.4	118.34 / 118.34	-	157.5 M	MaskRCNN-ResNet50-vd-FPN.yaml	Inference Model/Training Model
MaskRCNN-ResNet50	32.8	228.83 / 228.83	-	127.8 M	MaskRCNN-ResNet50.yaml	Inference Model/Training Model
MaskRCNN-ResNet101-FPN	36.6	148.14 / 148.14	-	225.4 M	MaskRCNN-ResNet101-FPN.yaml	Inference Model/Training Model
MaskRCNN-ResNet101-vd-FPN	38.1	151.12 / 151.12	-	225.1 M	MaskRCNN-ResNet101-vd-FPN.yaml	Inference Model/Training Model
MaskRCNN-ResNeXt101-vd-FPN	39.5	237.55 / 237.55	-	370.0 M	MaskRCNN-ResNeXt101-vd-FPN.yaml	Inference Model/Training Model
PP-YOLOE_seg-S	32.5	-	-	31.5 M	PP-YOLOE_seg-S.yaml	Inference Model/Training Model
SOLOv2	35.5	-	-	179.1 M	SOLOv2.yaml	Inference Model/Training Model

Note: The above accuracy metrics are based on the Mask AP(0.5:0.95) on the [COCO2017](https://cocodataset.org/#home) validation set. ## [Text Detection Module](../module_usage/tutorials/ocr_modules/text_detection.en.md)

Model	Detection Hmean (%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (M)	yaml File	Model Download Link
PP-OCRv4_server_det	82.56	83.34 / 80.91	442.58 / 442.58	109	PP-OCRv4_server_det.yaml	Inference Model/Training Model
PP-OCRv4_mobile_det	77.35	8.79 / 3.13	51.00 / 28.58	4.7	PP-OCRv4_mobile_det.yaml	Inference Model/Training Model
PP-OCRv3_mobile_det	78.68	8.44 / 2.91	27.87 / 27.87	2.1	PP-OCRv3_mobile_det.yaml	Inference Model/Training Model
PP-OCRv3_server_det	80.11	65.41 / 13.67	305.07 / 305.07	102.1	PP-OCRv3_server_det.yaml	Inference Model/Training Model

Note: The evaluation dataset for the above accuracy metrics is the self-built Chinese and English dataset of PaddleOCR, covering multiple scenarios such as street view, web images, documents, and handwriting, with 593 images for text recognition. ## [Seal Text Detection Module](../module_usage/tutorials/ocr_modules/seal_text_detection.en.md)

Model Name	Detection Hmean (%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size	yaml File	Model Download Link
PP-OCRv4_mobile_seal_det	96.47	7.82 / 3.09	48.28 / 23.97	4.7M	PP-OCRv4_mobile_seal_det.yaml	Inference Model/Training Model
PP-OCRv4_server_seal_det	98.21	74.75 / 67.72	382.55 / 382.55	108.3 M	PP-OCRv4_server_seal_det.yaml	Inference Model/Training Model

Note: The evaluation set for the above precision metrics is the seal dataset built by PaddleX, which includes 500 seal images. ## [Text Recognition Module](../module_usage/tutorials/ocr_modules/text_recognition.en.md) * Chinese Text Recognition Models

Model	Recognition Avg Accuracy(%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (M)	yaml File	Model Download Link
PP-OCRv4_server_rec_doc	81.53	6.65 / 2.38	32.92 / 32.92	74.7 M	PP-OCRv4_server_rec_doc.yaml	Inference Model/Training Model
PP-OCRv4_mobile_rec	78.74	4.82 / 1.20	16.74 / 4.64	10.6 M	PP-OCRv4_mobile_rec.yaml	Inference Model/Training Model
PP-OCRv4_server_rec	80.61	6.58 / 2.43	33.17 / 33.17	71.2 M	PP-OCRv4_server_rec.yaml	Inference Model/Training Model
PP-OCRv3_mobile_rec	72.96	5.87 / 1.19	9.07 / 4.28	9.2 M	PP-OCRv3_mobile_rec.yaml	Inference Model/Training Model

Note: The evaluation set for the above accuracy metrics is a Chinese dataset built by PaddleOCR, covering multiple scenarios such as street view, web images, documents, and handwriting, with 8367 images for text recognition.

Model	Recognition Avg Accuracy(%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (M)	yaml File	Model Download Link
ch_SVTRv2_rec	68.81	8.08 / 2.74	50.17 / 42.50	73.9 M	ch_SVTRv2_rec.yaml	Inference Model/Training Model

Note: The evaluation dataset for the above accuracy metrics is the PaddleOCR Algorithm Model Challenge - Task 1: OCR End-to-End Recognition Task Leaderboard A.

Model	Recognition Avg Accuracy(%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (M)	yaml File	Model Download Link
ch_RepSVTR_rec	65.07	5.93 / 1.62	20.73 / 7.32	22.1 M	ch_RepSVTR_rec.yaml	Inference Model/Training Model

Note: The evaluation dataset for the above accuracy metrics is the PaddleOCR Algorithm Model Challenge - Task 1: OCR End-to-End Recognition Task Leaderboard B.

English Recognition Model

Model	Recognition Avg Accuracy(%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (M)	yaml File	Model Download Link
en_PP-OCRv4_mobile_rec	70.39	4.81 / 0.75	16.10 / 5.31	6.8 M	en_PP-OCRv4_mobile_rec.yaml	Inference Model/Training Model
en_PP-OCRv3_mobile_rec	70.69	5.44 / 0.75	8.65 / 5.57	7.8 M	en_PP-OCRv3_mobile_rec.yaml	Inference Model/Training Model

Note: The evaluation set for the above accuracy metrics is an English dataset built by PaddleX.

Multilingual Recognition Model

Model	Recognition Avg Accuracy(%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (M)	yaml File	Model Download Link
korean_PP-OCRv3_mobile_rec	60.21	5.40 / 0.97	9.11 / 4.05	8.6 M	korean_PP-OCRv3_mobile_rec.yaml	Inference Model/Training Model
japan_PP-OCRv3_mobile_rec	45.69	5.70 / 1.02	8.48 / 4.07	8.8 M	japan_PP-OCRv3_mobile_rec.yaml	Inference Model/Training Model
chinese_cht_PP-OCRv3_mobile_rec	82.06	5.90 / 1.28	9.28 / 4.34	9.7 M	chinese_cht_PP-OCRv3_mobile_rec.yaml	Inference Model/Training Model
te_PP-OCRv3_mobile_rec	95.88	5.42 / 0.82	8.10 / 6.91	7.8 M	te_PP-OCRv3_mobile_rec.yaml	Inference Model/Training Model
ka_PP-OCRv3_mobile_rec	96.96	5.25 / 0.79	9.09 / 3.86	8.0 M	ka_PP-OCRv3_mobile_rec.yaml	Inference Model/Training Model
ta_PP-OCRv3_mobile_rec	76.83	5.23 / 0.75	10.13 / 4.30	8.0 M	ta_PP-OCRv3_mobile_rec.yaml	Inference Model/Training Model
latin_PP-OCRv3_mobile_rec	76.93	5.20 / 0.79	8.83 / 7.15	7.8 M	latin_PP-OCRv3_mobile_rec.yaml	Inference Model/Training Model
arabic_PP-OCRv3_mobile_rec	73.55	5.35 / 0.79	8.80 / 4.56	7.8 M	arabic_PP-OCRv3_mobile_rec.yaml	Inference Model/Training Model
cyrillic_PP-OCRv3_mobile_rec	94.28	5.23 / 0.76	8.89 / 3.88	7.9 M	cyrillic_PP-OCRv3_mobile_rec.yaml	Inference Model/Training Model
devanagari_PP-OCRv3_mobile_rec	96.44	5.22 / 0.79	8.56 / 4.06	7.9 M	devanagari_PP-OCRv3_mobile_rec.yaml	Inference Model/Training Model

Note: The evaluation set for the above accuracy metrics is a multi-language dataset built by PaddleX.

Formula Recognition Module

Model	Avg-BLEU(%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (M)	yaml File	Model Download Link
UniMERNet	86.13	2266.96/-	-/-	1.4 G	UniMERNet.yaml	Inference Model/Training Model
PP-FormulaNet-S	87.12	202.25/-	-/-	167.9 M	PP-FormulaNet-S.yaml	Inference Model/Training Model
PP-FormulaNet-L	92.13	1976.52/-	-/-	535.2 M	PP-FormulaNet-L.yaml	Inference Model/Training Model
LaTeX_OCR_rec	71.63	-/-	-/-	89.7 M	LaTeX_OCR_rec.yaml	Inference Model/Training Model

Note: The above accuracy metrics are measured from the internal formula recognition test set of PaddleX. The BLEU score of LaTeX_OCR_rec on the LaTeX-OCR formula recognition test set is 0.8821. All model GPU inference times are based on Tesla V100 GPUs, with precision type FP32. ## [Table Structure Recognition Module](../module_usage/tutorials/ocr_modules/table_structure_recognition.en.md)

Model	Accuracy (%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (M)	yaml File	Model Download Link
SLANet	59.52	103.08 / 103.08	197.99 / 197.99	6.9 M	SLANet.yaml	Inference Model/Training Model
SLANet_plus	63.69	140.29 / 140.29	195.39 / 195.39	6.9 M	SLANet_plus.yaml	Inference Model/Training Model
SLANeXt_wired	69.65	--	--	--	SLANeXt_wired.yaml	Inference Model/Training Model
SLANeXt_wireless	69.65	--	--	--	SLANeXt_wireless.yaml	Inference Model/Training Model

Note: The above accuracy metrics are measured from the high-difficulty Chinese table recognition dataset built internally by PaddleX. ## [Table Cell Detection Module](../module_usage/tutorials/ocr_modules/table_cells_detection.en.md)

Model	Model Download Link	mAP(%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (M)	Introduction
RT-DETR-L_wired_table_cell_det	Inference Model/Training Model	82.7	35.00 / 10.45	495.51 / 495.51	124M	RT-DETR is the first real-time end-to-end object detection model. The Baidu PaddlePaddle Vision Team, based on RT-DETR-L as the base model, has completed pretraining on a self-built table cell detection dataset, achieving good performance for both wired and wireless table cell detection.
RT-DETR-L_wireless_table_cell_det	Inference Model/Training Model	82.7	35.00 / 10.45	495.51 / 495.51	124M

Note: The above accuracy metrics are measured from the internal table cell detection dataset of PaddleX.

## [Table Classification Module](../module_usage/tutorials/ocr_modules/table_classification.en.md)

Model	Top1 Acc(%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (M)	yaml File	Model Download Link
PP-LCNet_x1_0_table_cls	--	--	--	--	PP-LCNet_x1_0_table_cls.yaml	Inference Model/Training Model

Note: The above accuracy metrics are measured from the internal table classification dataset built by PaddleX.

## [Text Image Unwarping Module](../module_usage/tutorials/ocr_modules/text_image_unwarping.en.md)

Model Name	MS-SSIM (%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size	yaml File	Model Download Link
UVDoc	54.40	16.27 / 7.76	176.97 / 80.60	30.3 M	UVDoc.yaml	Inference Model/Training Model

Note: The above accuracy metrics are measured from the image unwarping dataset built by PaddleX. ## [Layout Detection Module](../module_usage/tutorials/ocr_modules/layout_detection.en.md) * Table Layout Detection Model

Model	mAP(0.5) (%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (M)	yaml File	Model Download Link
PicoDet_layout_1x_table	97.5	8.02 / 3.09	23.70 / 20.41	7.4 M	PicoDet_layout_1x_table.yaml	Inference Model/Training Model

Note: The evaluation set for the above accuracy metrics is the layout table area detection dataset built by PaddleOCR, which contains 7835 images of document types with tables in both Chinese and English. 3 types of layout detection models, including tables, images, and seals

Model	mAP(0.5) (%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (M)	yaml File	Model Download Link
PicoDet-S_layout_3cls	88.2	8.99 / 2.22	16.11 / 8.73	4.8	PicoDet-S_layout_3cls.yaml	Inference Model/Training Model
PicoDet-L_layout_3cls	89.0	13.05 / 4.50	41.30 / 41.30	22.6	PicoDet-L_layout_3cls.yaml	Inference Model/Training Model
RT-DETR-H_layout_3cls	95.8	114.93 / 27.71	947.56 / 947.56	470.1	RT-DETR-H_layout_3cls.yaml	Inference Model/Training Model

Note: The evaluation dataset for the above accuracy metrics is the layout area detection dataset built by PaddleOCR, which includes 1,154 common types of document images such as Chinese and English papers, magazines, and research reports. * 5-class English document layout detection model, including text, title, table, image, and list

Model	mAP(0.5) (%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Size (M)	yaml File	Model Download Link
PicoDet_layout_1x	97.8	9.03 / 3.10	25.82 / 20.70	7.4	PicoDet_layout_1x.yaml	Inference Model/Training Model

Note: The evaluation dataset for the above accuracy metrics is the [PubLayNet](https://developer.ibm.com/exchanges/data/all/publaynet/) evaluation dataset, which contains 11,245 images of English documents. * 17-class layout detection model, including 17 common layout categories: paragraph title, image, text, number, abstract, content, figure title, formula, table, table title, reference, document title, footnote, header, algorithm, footer, and seal

Model	mAP(0.5) (%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Size (M)	yaml File	Model Download Link
PicoDet-S_layout_17cls	87.4	9.11 / 2.12	15.42 / 9.12	4.8	PicoDet-S_layout_17cls.yaml	Inference Model/Training Model

PicoDet-L_layout_17cls 89.0 17.2 160.2 22.6 PicoDet-L_layout_17cls.yaml Inference Model/Training Model RT-DETR-H_layout_17cls 98.3 115.1 3827.2 470.2 RT-DETR-H_layout_17cls.yaml Inference Model/Training Model Note: The evaluation set for the above accuracy metrics is the layout area detection dataset built by PaddleOCR, which includes 892 images of common document types such as Chinese and English papers, magazines, and research reports. ## [Document Image Orientation Classification Module](../module_usage/tutorials/ocr_modules/doc_img_orientation_classification.en.md)

Model	Top-1 Acc (%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Size (M)	yaml File	Model Download Link
PP-LCNet_x1_0_doc_ori	99.06	2.31 / 0.43	3.37 / 1.27	7	PP-LCNet_x1_0_doc_ori.yaml	Inference Model/Training Model

Note: The evaluation set for the above accuracy metrics is a self-built dataset covering multiple scenarios such as documents and certificates, with 1000 images. ## [Text Line Orientation Classification Module](../module_usage/tutorials/ocr_modules/doc_img_orientation_classification.en.md)

Model	Top-1 Acc (%)	GPU Inference Time (ms) [Standard Mode / High-Performance Mode]	CPU Inference Time (ms) [Standard Mode / High-Performance Mode]	Model Storage Size (M)	YAML File	Model Download Link
PP-LCNet_x1_0_doc_ori	99.06	2.31 / 0.43	3.37 / 1.27	7	PP-LCNet_x0_25_textline_ori.yaml	Inference Model/Training Model

Note: The evaluation dataset for the above accuracy metrics is a self-built dataset covering multiple scenarios such as certificates and documents, with 1,000 images.

Time Series Forecasting Module

Model Name	mse	mae	Model Storage Size	yaml File	Model Download Link
DLinear	0.382	0.394	72 K	DLinear.yaml	Inference Model/Training Model
NLinear	0.386	0.392	40 K	NLinear.yaml	Inference Model/Training Model
Nonstationary	0.600	0.515	55.5 M	Nonstationary.yaml	Inference Model/Training Model
PatchTST	0.379	0.391	2.0 M	PatchTST.yaml	Inference Model/Training Model
RLinear	0.385	0.392	40 K	RLinear.yaml	Inference Model/Training Model
TiDE	0.407	0.414	31.7 M	TiDE.yaml	Inference Model/Training Model
TimesNet	0.416	0.429	4.9 M	TimesNet.yaml	Inference Model/Training Model

Note: The above accuracy metrics are measured from the [ETTH1](https://paddle-model-ecology.bj.bcebos.com/paddlex/data/Etth1.tar) dataset (evaluation results on the test.csv test set). ## [Time Series Anomaly Detection Module](../module_usage/tutorials/time_series_modules/time_series_anomaly_detection.en.md)

Model Name	Precision	Recall	F1 Score	Model Storage Size	YAML File	Model Download Link
AutoEncoder_ad	99.36	84.36	91.25	52 K	AutoEncoder_ad.yaml	Inference Model/Training Model
DLinear_ad	98.98	93.96	96.41	112 K	DLinear_ad.yaml	Inference Model/Training Model
Nonstationary_ad	98.55	88.95	93.51	1.8 M	Nonstationary_ad.yaml	Inference Model/Training Model
PatchTST_ad	98.78	90.70	94.57	320 K	PatchTST_ad.yaml	Inference Model/Training Model

Note: The above precision metrics are measured from the [PSM](https://paddle-model-ecology.bj.bcebos.com/paddlex/data/ts_anomaly_examples.tar) dataset. ## [Time Series Classification Module](../module_usage/tutorials/time_series_modules/time_series_classification.en.md)

Model Name	acc(%)	Model Storage Size	yaml File	Model Download Link
TimesNet_cls	87.5	792 K	TimesNet_cls.yaml	Inference Model/Training Model

Note: The above accuracy metrics are measured from the [UWaveGestureLibrary](https://paddlets.bj.bcebos.com/classification/UWaveGestureLibrary_TEST.csv) dataset. ## [Multilingual Speech Recognition Module](../module_usage/tutorials/speech_modules/multilingual_speech_recognition.en.md)

Model	Training Data	Model Size	Word Error Rate	YAML File	Model Download Link
whisper_large	680kh	5.8G	2.7 (Librispeech)	whisper_large.yaml	Inference Model
whisper_medium	680kh	2.9G	-	whisper_medium.yaml	Inference Model
whisper_small	680kh	923M	-	whisper_small.yaml	Inference Model
whisper_base	680kh	277M	-	whisper_base.yaml	Inference Model
whisper_tiny	680kh	145M	-	whisper_small.yaml	Inference Model

Video Classification Module

Model	Top1 Acc(%)	Model Storage Size (M)	yaml File	Model Download Link
PP-TSM-R50_8frames_uniform	74.36	93.4 M	PP-TSM-R50_8frames_uniform.yaml	Inference Model/Training Model
PP-TSMv2-LCNetV2_8frames_uniform	71.71	22.5 M	PP-TSMv2-LCNetV2_8frames_uniform.yaml	Inference Model/Training Model
PP-TSMv2-LCNetV2_16frames_uniform	73.11	22.5 M	PP-TSMv2-LCNetV2_16frames_uniform.yaml	Inference Model/Training Model

Note: The above accuracy metrics are based on the K400 validation set Top1 Acc.

## [Video Detection Module](../module_usage/tutorials/video_modules/video_detection.en.md)

Model	Frame-mAP(@ IoU 0.5)	Model Storage Size (M)	yaml File	Model Download Link
YOWO	80.94	462.891M	YOWO.yaml	Inference Model/Training Model

Note: The above accuracy metrics are based on the test dataset UCF101-24, using the Frame-mAP (@ IoU 0.5) metric.

## [Document Vision-Language Model Module](../module_usage/tutorials/vlm_modules/doc_vlm.en.md)

Model	Model Storage Size（GB）	Model Download Lin
PP-DocBee-2B	4.2	Inference Model
PP-DocBee-7B	15.8	Inference Model

Test Environment Description:

Performance Test Environment
Inference Mode Description

Mode	GPU Configuration	CPU Configuration	Acceleration Technology Combination
Normal Mode	FP32 Precision / No TRT Acceleration	FP32 Precision / 8 Threads	PaddleInference
High-Performance Mode	Optimal combination of pre-selected precision types and acceleration strategies	FP32 Precision / 8 Threads	Pre-selected optimal backend (Paddle/OpenVINO/TRT, etc.)

models_list.en.md 165 KB 文件歷史 原始文件