--- comments: true --- # PaddleX Model List (CPU/GPU) PaddleX includes multiple production lines, each containing several modules, and each module includes several models. You can choose which models to use based on the benchmark data below. If you prioritize model accuracy, choose models with higher accuracy. If you prioritize model inference speed, choose models with faster inference speed. If you prioritize model storage size, choose models with smaller storage size. ## [Image Classification Module](../module_usage/tutorials/cv_modules/image_classification.en.md)

Model Name	Top1 Acc (%)	GPU Inference Time (ms)	CPU Inference Time (ms)	Model Storage Size	yaml File	Model Download Link
CLIP_vit_base_patch16_224	85.36	13.1957	285.493	306.5 M	CLIP_vit_base_patch16_224.yaml	Inference Model/Training Model
CLIP_vit_large_patch14_224	88.1	51.1284	1131.28	1.04 G	CLIP_vit_large_patch14_224.yaml	Inference Model/Training Model
ConvNeXt_base_224	83.84	12.8473	1513.87	313.9 M	ConvNeXt_base_224.yaml	Inference Model/Training Model
ConvNeXt_base_384	84.90	31.7607	3967.05	313.9 M	ConvNeXt_base_384.yaml	Inference Model/Training Model
ConvNeXt_large_224	84.26	26.8103	2463.56	700.7 M	ConvNeXt_large_224.yaml	Inference Model/Training Model
ConvNeXt_large_384	85.27	66.4058	6598.92	700.7 M	ConvNeXt_large_384.yaml	Inference Model/Training Model
ConvNeXt_small	83.13	9.74075	1127.6	178.0 M	ConvNeXt_small.yaml	Inference Model/Training Model
ConvNeXt_tiny	82.03	5.48923	672.559	101.4 M	ConvNeXt_tiny.yaml	Inference Model/Training Model
FasterNet-L	83.5	23.4415	-	357.1 M	FasterNet-L.yaml	Inference Model/Training Model
FasterNet-M	83.0	21.8936	-	204.6 M	FasterNet-M.yaml	Inference Model/Training Model
FasterNet-S	81.3	13.0409	-	119.3 M	FasterNet-S.yaml	Inference Model/Training Model
FasterNet-T0	71.9	12.2432	-	15.1 M	FasterNet-T0.yaml	Inference Model/Training Model
FasterNet-T1	75.9	11.3562	-	29.2 M	FasterNet-T1.yaml	Inference Model/Training Model
FasterNet-T2	79.1	10.703	-	57.4 M	FasterNet-T2.yaml	Inference Model/Training Model
MobileNetV1_x0_5	63.5	1.86754	7.48297	4.8 M	MobileNetV1_x0_5.yaml	Inference Model/Training Model
MobileNetV1_x0_25	51.4	1.83478	4.83674	1.8 M	MobileNetV1_x0_25.yaml	Inference Model/Training Model
MobileNetV1_x0_75	68.8	2.57903	10.6343	9.3 M	MobileNetV1_x0_75.yaml	Inference Model/Training Model
MobileNetV1_x1_0	71.0	2.78781	13.98	15.2 M	MobileNetV1_x1_0.yaml	Inference Model/Training Model
MobileNetV2_x0_5	65.0	4.94234	11.1629	7.1 M	MobileNetV2_x0_5.yaml	Inference Model/Training Model
MobileNetV2_x0_25	53.2	4.50856	9.40991	5.5 M	MobileNetV2_x0_25.yaml	Inference Model/Training Model
MobileNetV2_x1_0	72.2	6.12159	16.0442	12.6 M	MobileNetV2_x1_0.yaml	Inference Model/Training Model
MobileNetV2_x1_5	74.1	6.28385	22.5129	25.0 M	MobileNetV2_x1_5.yaml	Inference Model/Training Model
MobileNetV2_x2_0	75.2	6.12888	30.8612	41.2 M	MobileNetV2_x2_0.yaml	Inference Model/Training Model
MobileNetV3_large_x0_5	69.2	6.31302	14.5588	9.6 M	MobileNetV3_large_x0_5.yaml	Inference Model/Training Model
MobileNetV3_large_x0_35	64.3	5.76207	13.9041	7.5 M	MobileNetV3_large_x0_35.yaml	Inference Model/Training Model
MobileNetV3_large_x0_75	73.1	8.41737	16.9506	14.0 M	MobileNetV3_large_x0_75.yaml	Inference Model/Training Model
MobileNetV3_large_x1_0	75.3	8.64112	19.1614	19.5 M	MobileNetV3_large_x1_0.yaml	Inference Model/Training Model
MobileNetV3_large_x1_25	76.4	8.73358	22.1296	26.5 M	MobileNetV3_large_x1_25.yaml	Inference Model/Training Model
MobileNetV3_small_x0_5	59.2	5.16721	11.2688	6.8 M	MobileNetV3_small_x0_5.yaml	Inference Model/Training Model
MobileNetV3_small_x0_35	53.0	5.22053	11.0055	6.0 M	MobileNetV3_small_x0_35.yaml	Inference Model/Training Model
MobileNetV3_small_x0_75	66.0	5.39831	12.8313	8.5 M	MobileNetV3_small_x0_75.yaml	Inference Model/Training Model
MobileNetV3_small_x1_0	68.2	6.00993	12.9598	10.5 M	MobileNetV3_small_x1_0.yaml	Inference Model/Training Model
MobileNetV3_small_x1_25	70.7	6.9589	14.3995	13.0 M	MobileNetV3_small_x1_25.yaml	Inference Model/Training Model
MobileNetV4_conv_large	83.4	12.5485	51.6453	125.2 M	MobileNetV4_conv_large.yaml	Inference Model/Training Model
MobileNetV4_conv_medium	79.9	9.65509	26.6157	37.6 M	MobileNetV4_conv_medium.yaml	Inference Model/Training Model
MobileNetV4_conv_small	74.6	5.24172	11.0893	14.7 M	MobileNetV4_conv_small.yaml	Inference Model/Training Model
MobileNetV4_hybrid_large	83.8	20.0726	213.769	145.1 M	MobileNetV4_hybrid_large.yaml	Inference Model/Training Model
MobileNetV4_hybrid_medium	80.5	19.7543	62.2624	42.9 M	MobileNetV4_hybrid_medium.yaml	Inference Model/Training Model
PP-HGNet_base	85.0	14.2969	327.114	249.4 M	PP-HGNet_base.yaml	Inference Model/Training Model
PP-HGNet_small	81.51	5.50661	119.041	86.5 M	PP-HGNet_small.yaml	Inference Model/Training Model
PP-HGNet_tiny	79.83	5.22006	69.396	52.4 M	PP-HGNet_tiny.yaml	Inference Model/Training Model
PP-HGNetV2-B0	77.77	6.53694	23.352	21.4 M	PP-HGNetV2-B0.yaml	Inference Model/Training Model
PP-HGNetV2-B1	79.18	6.56034	27.3099	22.6 M	PP-HGNetV2-B1.yaml	Inference Model/Training Model
PP-HGNetV2-B2	81.74	9.60494	43.1219	39.9 M	PP-HGNetV2-B2.yaml	Inference Model/Training Model
PP-HGNetV2-B3	82.98	11.0042	55.1367	57.9 M	PP-HGNetV2-B3.yaml	Inference Model/Training Model
PP-HGNetV2-B4	83.57	9.66407	54.2462	70.4 M	PP-HGNetV2-B4.yaml	Inference Model/Training Model
PP-HGNetV2-B5	84.75	15.7091	115.926	140.8 M	PP-HGNetV2-B5.yaml	Inference Model/Training Model
PP-HGNetV2-B6	86.30	21.226	255.279	268.4 M	PP-HGNetV2-B6.yaml	Inference Model/Training Model
PP-LCNet_x0_5	63.14	3.67722	6.66857	6.7 M	PP-LCNet_x0_5.yaml	Inference Model/Training Model
PP-LCNet_x0_25	51.86	2.65341	5.81357	5.5 M	PP-LCNet_x0_25.yaml	Inference Model/Training Model
PP-LCNet_x0_35	58.09	2.7212	6.28944	5.9 M	PP-LCNet_x0_35.yaml	Inference Model/Training Model
PP-LCNet_x0_75	68.18	3.91032	8.06953	8.4 M	PP-LCNet_x0_75.yaml	Inference Model/Training Model
PP-LCNet_x1_0	71.32	3.84845	9.23735	10.5 M	PP-LCNet_x1_0.yaml	Inference Model/Training Model
PP-LCNet_x1_5	73.71	3.97666	12.3457	16.0 M	PP-LCNet_x1_5.yaml	Inference Model/Training Model
PP-LCNet_x2_0	75.18	4.07556	16.2752	23.2 M	PP-LCNet_x2_0.yaml	Inference Model/Training Model
PP-LCNet_x2_5	76.60	4.06028	21.5063	32.1 M	PP-LCNet_x2_5.yaml	Inference Model/Training Model
PP-LCNetV2_base	77.05	5.23428	19.6005	23.7 M	PP-LCNetV2_base.yaml	Inference Model/Training Model
PP-LCNetV2_large	78.51	6.78335	30.4378	37.3 M	PP-LCNetV2_large.yaml	Inference Model/Training Model
PP-LCNetV2_small	73.97	3.89762	13.0273	14.6 M	PP-LCNetV2_small.yaml	Inference Model/Training Model
ResNet18_vd	72.3	3.53048	31.3014	41.5 M	ResNet18_vd.yaml	Inference Model/Training Model
ResNet18	71.0	2.4868	27.4601	41.5 M	ResNet18.yaml	Inference Model/Training Model
ResNet34_vd	76.0	5.60675	56.0653	77.3 M	ResNet34_vd.yaml	Inference Model/Training Model
ResNet34	74.6	4.16902	51.925	77.3 M	ResNet34.yaml	Inference Model/Training Model
ResNet50_vd	79.1	10.1885	68.446	90.8 M	ResNet50_vd.yaml	Inference Model/Training Model
ResNet50	76.5	9.62383	64.8135	90.8 M	ResNet50.yaml	Inference Model/Training Model
ResNet101_vd	80.2	20.0563	124.85	158.4 M	ResNet101_vd.yaml	Inference Model/Training Model
ResNet101	77.6	19.2297	121.006	158.7 M	ResNet101.yaml	Inference Model/Training Model
ResNet152_vd	80.6	29.6439	181.678	214.3 M	ResNet152_vd.yaml	Inference Model/Training Model
ResNet152	78.3	30.0461	177.707	214.2 M	ResNet152.yaml	Inference Model/Training Model
ResNet200_vd	80.9	39.1628	235.185	266.0 M	ResNet200_vd.yaml	Inference Model/Training Model
StarNet-S1	73.6	9.895	23.0465	11.2 M	StarNet-S1.yaml	Inference Model/Training Model
StarNet-S2	74.8	7.91279	21.9571	14.3 M	StarNet-S2.yaml	Inference Model/Training Model
StarNet-S3	77.0	10.7531	30.7656	22.2 M	StarNet-S3.yaml	Inference Model/Training Model
StarNet-S4	79.0	15.2868	43.2497	28.9 M	StarNet-S4.yaml	Inference Model/Training Model
SwinTransformer_base_patch4_window7_224	83.37	16.9848	383.83	310.5 M	SwinTransformer_base_patch4_window7_224.yaml	Inference Model/Training Model
SwinTransformer_base_patch4_window12_384	84.17	37.2855	1178.63	311.4 M	SwinTransformer_base_patch4_window12_384.yaml	Inference Model/Training Model
SwinTransformer_large_patch4_window7_224	86.19	27.5498	689.729	694.8 M	SwinTransformer_large_patch4_window7_224.yaml	Inference Model/Training Model
SwinTransformer_large_patch4_window12_384	87.06	74.1768	2105.22	696.1 M	SwinTransformer_large_patch4_window12_384.yaml	Inference Model/Training Model
SwinTransformer_small_patch4_window7_224	83.21	16.3982	285.56	175.6 M	SwinTransformer_small_patch4_window7_224.yaml	Inference Model/Training Model
SwinTransformer_tiny_patch4_window7_224	81.10	8.54846	156.306	100.1 M	SwinTransformer_tiny_patch4_window7_224.yaml	Inference Model/Training Model

Note: The above accuracy metrics are based on the [ImageNet-1k](https://www.image-net.org/index.php) validation set Top1 Acc. ## [Image Multi-label Classification Module](../module_usage/tutorials/cv_modules/image_multilabel_classification.en.md)

Model Name	mAP (%)	GPU Inference Time (ms)	CPU Inference Time (ms)	Model Storage Size	yaml File	Model Download Link
CLIP_vit_base_patch16_448_ML	89.15	-	-	325.6 M	CLIP_vit_base_patch16_448_ML.yaml	Inference Model/Training Model
PP-HGNetV2-B0_ML	80.98	-	-	39.6 M	PP-HGNetV2-B0_ML.yaml	Inference Model/Training Model
PP-HGNetV2-B4_ML	87.96	-	-	88.5 M	PP-HGNetV2-B4_ML.yaml	Inference Model/Training Model
PP-HGNetV2-B6_ML	91.06	-	-	286.5 M	PP-HGNetV2-B6_ML.yaml	Inference Model/Training Model
PP-LCNet_x1_0_ML	77.96	-	-	29.4 M	PP-LCNet_x1_0_ML.yaml	Inference Model/Training Model
ResNet50_ML	83.42	-	-	108.9 M	ResNet50_ML.yaml	Inference Model/Training Model

Note: The above accuracy metrics are for the multi-label classification task mAP on [COCO2017](https://cocodataset.org/#home). ## [Pedestrian Attribute Module](../module_usage/tutorials/cv_modules/pedestrian_attribute_recognition.en.md)

Model Name	mA (%)	GPU Inference Time (ms)	CPU Inference Time (ms)	Model Size	yaml File	Model Download Link
PP-LCNet_x1_0_pedestrian_attribute	92.2	3.84845	9.23735	6.7 M	PP-LCNet_x1_0_pedestrian_attribute.yaml	Inference Model/Training Model

Note: The above accuracy metrics are for the internal PaddleX dataset mA. ## [Vehicle Attribute Module](../module_usage/tutorials/cv_modules/vehicle_attribute_recognition.en.md)

Model Name	mA (%)	GPU Inference Time (ms)	CPU Inference Time (ms)	Model Storage Size	yaml File	Model Download Link
PP-LCNet_x1_0_vehicle_attribute	91.7	3.84845	9.23735	6.7 M	PP-LCNet_x1_0_vehicle_attribute.yaml	Inference Model/Training Model

Note: The above accuracy metrics are based on the VeRi dataset mA. ## [Image Feature Module](../module_usage/tutorials/cv_modules/image_feature.en.md)

Model Name	recall@1 (%)	GPU Inference Time (ms)	CPU Inference Time (ms)	Model Storage Size	yaml File	Model Download Link
PP-ShiTuV2_rec	84.2	5.23428	19.6005	16.3 M	PP-ShiTuV2_rec.yaml	Inference Model/Training Model
PP-ShiTuV2_rec_CLIP_vit_base	88.69	13.1957	285.493	306.6 M	PP-ShiTuV2_rec_CLIP_vit_base.yaml	Inference Model/Training Model
PP-ShiTuV2_rec_CLIP_vit_large	91.03	51.1284	1131.28	1.05 G	PP-ShiTuV2_rec_CLIP_vit_large.yaml	Inference Model/Training Model

Note: The above accuracy metrics are based on the AliProducts recall@1. ## [Document Orientation Classification Module](../module_usage/tutorials/ocr_modules/doc_img_orientation_classification.en.md)

Model Name	Top-1 Acc (%)	GPU Inference Time (ms)	CPU Inference Time (ms)	Model Storage Size	yaml File	Model Download Link
PP-LCNet_x1_0_doc_ori	99.06	3.84845	9.23735	7	PP-LCNet_x1_0_doc_ori.yaml	Inference Model/Training Model

Note: The above accuracy metrics are based on the Top-1 Acc of the internal dataset of PaddleX. ## [Face Feature Module](../module_usage/tutorials/cv_modules/face_feature.en.md)

Model Name	Output Feature Dimension	Acc (%) AgeDB-30/CFP-FP/LFW	GPU Inference Time (ms)	CPU Inference Time	Model Storage Size (M)	yaml File	Model Download Link
MobileFaceNet	128	96.28/96.71/99.58	5.7	101.6	4.1	MobileFaceNet.yaml	Inference Model/Training Model
ResNet50_face	512	98.12/98.56/99.77	8.7	200.7	87.2	ResNet50_face.yaml	Inference Model/Training Model

Note: The above accuracy metrics are measured on the AgeDB-30, CFP-FP, and LFW datasets. ## [Main Body Detection Module](../module_usage/tutorials/cv_modules/mainbody_detection.en.md)

Model Name	mAP (%)	GPU Inference Time (ms)	CPU Inference Time (ms)	Model Storage Size	yaml File	Model Download Link
PP-ShiTuV2_det	41.5	33.7	537.0	27.54	PP-ShiTuV2_det.yaml	Inference Model/Training Model

Note: The above accuracy metrics are based on the [PaddleClas Main Body Detection Dataset](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.5/docs/zh_CN/training/PP-ShiTu/mainbody_detection.md) mAP(0.5:0.95). ## [Object Detection Module](../module_usage/tutorials/cv_modules/object_detection.en.md)

Model Name	mAP (%)	GPU Inference Time (ms)	CPU Inference Time (ms)	Model Storage Size	yaml File	Model Download Link
Cascade-FasterRCNN-ResNet50-FPN	41.1	-	-	245.4 M	Cascade-FasterRCNN-ResNet50-FPN.yaml	Inference Model/Training Model
Cascade-FasterRCNN-ResNet50-vd-SSLDv2-FPN	45.0	-	-	246.2 M	Cascade-FasterRCNN-ResNet50-vd-SSLDv2-FPN.yaml	Inference Model/Training Model
CenterNet-DLA-34	37.6	-	-	75.4 M	CenterNet-DLA-34.yaml	Inference Model/Training Model
CenterNet-ResNet50	38.9	-	-	319.7 M	CenterNet-ResNet50.yaml	Inference Model/Training Model
DETR-R50	42.3	59.2132	5334.52	159.3 M	DETR-R50.yaml	Inference Model/Training Model
FasterRCNN-ResNet34-FPN	37.8	-	-	137.5 M	FasterRCNN-ResNet34-FPN.yaml	Inference Model/Training Model
FasterRCNN-ResNet50-FPN	38.4	-	-	148.1 M	FasterRCNN-ResNet50-FPN.yaml	Inference Model/Training Model
FasterRCNN-ResNet50-vd-FPN	39.5	-	-	148.1 M	FasterRCNN-ResNet50-vd-FPN.yaml	Inference Model/Training Model
FasterRCNN-ResNet50-vd-SSLDv2-FPN	41.4	-	-	148.1 M	FasterRCNN-ResNet50-vd-SSLDv2-FPN.yaml	Inference Model/Training Model
FasterRCNN-ResNet50	36.7	-	-	120.2 M	FasterRCNN-ResNet50.yaml	Inference Model/Training Model
FasterRCNN-ResNet101-FPN	41.4	-	-	216.3 M	FasterRCNN-ResNet101-FPN.yaml	Inference Model/Training Model
FasterRCNN-ResNet101	39.0	-	-	188.1 M	FasterRCNN-ResNet101.yaml	Inference Model/Training Model
FasterRCNN-ResNeXt101-vd-FPN	43.4	-	-	360.6 M	FasterRCNN-ResNeXt101-vd-FPN.yaml	Inference Model/Training Model
FasterRCNN-Swin-Tiny-FPN	42.6	-	-	159.8 M	FasterRCNN-Swin-Tiny-FPN.yaml	Inference Model/Training Model
FCOS-ResNet50	39.6	103.367	3424.91	124.2 M	FCOS-ResNet50.yaml	Inference Model/Training Model
PicoDet-L	42.6	16.6715	169.904	20.9 M	PicoDet-L.yaml	Inference Model/Training Model
PicoDet-M	37.5	16.2311	71.7257	16.8 M	PicoDet-M.yaml	Inference Model/Training Model
PicoDet-S	29.1	14.097	37.6563	4.4 M	PicoDet-S.yaml	Inference Model/Training Model
PicoDet-XS	26.2	13.8102	48.3139	5.7 M	PicoDet-XS.yaml	Inference Model/Training Model
PP-YOLOE_plus-L	52.9	33.5644	814.825	185.3 M	PP-YOLOE_plus-L.yaml	Inference Model/Training Model
PP-YOLOE_plus-M	49.8	19.843	449.261	83.2 M	PP-YOLOE_plus-M.yaml	Inference Model/Training Model
PP-YOLOE_plus-S	43.7	16.8884	223.059	28.3 M	PP-YOLOE_plus-S.yaml	Inference Model/Training Model
PP-YOLOE_plus-X	54.7	57.8995	1439.93	349.4 M	PP-YOLOE_plus-X.yaml	Inference Model/Training Model
RT-DETR-H	56.3	114.814	3933.39	435.8 M	RT-DETR-H.yaml	Inference Model/Training Model
RT-DETR-L	53.0	34.5252	1454.27	113.7 M	RT-DETR-L.yaml	Inference Model/Training Model
RT-DETR-R18	46.5	19.89	784.824	70.7 M	RT-DETR-R18.yaml	Inference Model/Training Model
RT-DETR-R50	53.1	41.9327	1625.95	149.1 M	RT-DETR-R50.yaml	Inference Model/Training Model
RT-DETR-X	54.8	61.8042	2246.64	232.9 M	RT-DETR-X.yaml	Inference Model/Training Model
YOLOv3-DarkNet53	39.1	40.1055	883.041	219.7 M	YOLOv3-DarkNet53.yaml	Inference Model/Training Model
YOLOv3-MobileNetV3	31.4	18.6692	267.214	83.8 M	YOLOv3-MobileNetV3.yaml	Inference Model/Training Model
YOLOv3-ResNet50_vd_DCN	40.6	31.6276	856.047	163.0 M	YOLOv3-ResNet50_vd_DCN.yaml	Inference Model/Training Model
YOLOX-L	50.1	185.691	1250.58	192.5 M	YOLOX-L.yaml	Inference Model/Training Model
YOLOX-M	46.9	123.324	688.071	90.0 M	YOLOX-M.yaml	Inference Model/Training Model
YOLOX-N	26.1	79.1665	155.59	3.4M	YOLOX-N.yaml	Inference Model/Training Model
YOLOX-S	40.4	184.828	474.446	32.0 M	YOLOX-S.yaml	Inference Model/Training Model
YOLOX-T	32.9	102.748	212.52	18.1 M	YOLOX-T.yaml	Inference Model/Training Model
YOLOX-X	51.8	227.361	2067.84	351.5 M	YOLOX-X.yaml	Inference Model/Training Model

Note: The above accuracy metrics are based on the COCO2017 validation set mAP(0.5:0.95). ## [Small Object Detection Module](../module_usage/tutorials/cv_modules/small_object_detection.en.md)

Model Name	mAP (%)	GPU Inference Time (ms)	CPU Inference Time (ms)	Model Size	yaml File	Model Download Links
PP-YOLOE_plus_SOD-S	25.1	65.4608	324.37	77.3 M	PP-YOLOE_plus_SOD-S.yaml	Inference Model/Training Model

PP-YOLOE_plus_SOD-L 31.9 57.1448 1006.98 325.0 M PP-YOLOE_plus_SOD-L.yaml Inference Model/Training Model PP-YOLOE_plus_SOD-largesize-L 42.7 458.521 11172.7 340.5 M PP-YOLOE_plus_SOD-largesize-L.yaml Inference Model/Training Model Note: The above accuracy metrics are based on the validation set mAP(0.5:0.95) of [VisDrone-DET](https://github.com/VisDrone/VisDrone-Dataset). ## [Open-Vocabulary Object Detection](../module_usage/tutorials/cv_modules/open_vocabulary_detection.en.md)

Model	mAP(0.5:0.95)	mAP(0.5)	GPU Inference Time (ms)	CPU Inference Time (ms)	Model Size (M)	Model Download Link
GroundingDINO-T	49.4	64.4	253.72	1807.4	658.3	Inference Model

Note: The above accuracy metrics are based on the COCO val2017 validation set mAP(0.5:0.95). All models' GPU inference times are based on an NVIDIA Tesla T4 machine with FP32 precision. CPU inference speeds are based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision. ## [Open Vocabulary Segmentation](../module_usage/tutorials/cv_modules/open_vocabulary_segmentation.en.md)

Model	GPU Inference Time (ms)	CPU Inference Time (ms)	Model Storage Size (M)	Model Download Link
SAM-H_box	144.9	33920.7	2433.7	Inference Model
SAM-H_point	144.9	33920.7	2433.7	Inference Model

Note: All model GPU inference times are based on NVIDIA Tesla T4, with precision type FP32. CPU inference speed is based on Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz, with 8 threads, and precision type FP32. ## [Rotated Object Detection](../module_usage/tutorials/cv_modules/rotated_object_detection.en.md)

Model	mAP(%)	GPU Inference Time (ms)	CPU Inference Time (ms)	Model Storage Size (M)	yaml File	Model Download Link
PP-YOLOE-R-L	78.14	20.7039	157.942	211.0 M	PP-YOLOE-R.yaml	Inference Model/Training Model

Note: The above accuracy metrics are based on the DOTA validation set mAP(0.5:0.95). All model GPU inference times are based on NVIDIA TRX2080 Ti, with precision type F16. CPU inference speed is based on Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz, with 8 threads, and precision type FP32.

## [Pedestrian Detection Module](../module_usage/tutorials/cv_modules/human_detection.en.md)

Model Name	mAP (%)	GPU Inference Time (ms)	CPU Inference Time (ms)	Model Storage Size	yaml File	Model Download Link
PP-YOLOE-L_human	48.0	32.7754	777.691	196.1 M	PP-YOLOE-L_human.yaml	Inference Model/Training Model
PP-YOLOE-S_human	42.5	15.0118	179.317	28.8 M	PP-YOLOE-S_human.yaml	Inference Model/Training Model

Note: The above accuracy metrics are based on the validation set mAP(0.5:0.95) of [CrowdHuman](https://bj.bcebos.com/v1/paddledet/data/crowdhuman.zip). ## [Vehicle Detection Module](../module_usage/tutorials/cv_modules/vehicle_detection.en.md)

Model Name	mAP (%)	GPU Inference Time (ms)	CPU Inference Time (ms)	Model Storage Size	yaml File	Model Download Link
PP-YOLOE-L_vehicle	63.9	32.5619	775.633	196.1 M	PP-YOLOE-L_vehicle.yaml	Inference Model/Training Model
PP-YOLOE-S_vehicle	61.3	15.3787	178.441	28.8 M	PP-YOLOE-S_vehicle.yaml	Inference Model/Training Model

Note: The above precision metrics are based on the validation set mAP(0.5:0.95) of [PPVehicle](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/modules/ppvehicle) ## [Face Detection Module](../module_usage/tutorials/cv_modules/face_detection.en.md)

Model Name	AP (%) Easy/Medium/Hard	GPU Inference Time (ms)	CPU Inference Time (ms)	Model Storage Size	yaml File	Model Download Link
BlazeFace	77.7/73.4/49.5	49.9	68.2	0.447 M	BlazeFace.yaml	Inference Model/Training Model
BlazeFace-FPN-SSH	83.2/80.5/60.5	52.4	73.2	0.606 M	BlazeFace-FPN-SSH.yaml	Inference Model/Training Model
PicoDet_LCNet_x2_5_face	93.7/90.7/68.1	33.7	185.1	28.9 M	PicoDet_LCNet_x2_5_face.yaml	Inference Model/Training Model
PP-YOLOE_plus-S_face	93.9/91.8/79.8	25.8	159.9	26.5 M	PP-YOLOE_plus-S_face	Inference Model/Training Model

**Note: The above precision metrics are evaluated on the WIDER-FACE validation set with an input size of 640x640.** ## [Anomaly Detection Module](../module_usage/tutorials/cv_modules/anomaly_detection.en.md)

Model Name	mIoU	GPU Inference Time (ms)	CPU Inference Time (ms)	Model Storage Size	yaml File	Model Download Link
STFPM	0.9901	-	-	22.5 M	STFPM.yaml	Inference Model/Training Model

Note: The above precision metrics are the average anomaly scores on the validation set of [MVTec AD](https://www.mvtec.com/company/research/datasets/mvtec-ad). ## [Human Keypoint Detection Module](../module_usage/tutorials//cv_modules/human_keypoint_detection.en.md) ## [Human Keypoint Detection Module](../module_usage/tutorials//cv_modules/human_keypoint_detection.md)

Model	Scheme	Input Size	AP(0.5:0.95)	GPU Inference Time (ms)	CPU Inference Time (ms)	Model Storage Size (M)	yaml File	Model Download Link
PP-TinyPose_128x96	Top-Down	128*96	58.4			4.9	PP-TinyPose_128x96.yaml	Inference Model/Training Model
PP-TinyPose_256x192	Top-Down	256*192	68.3			4.9	PP-TinyPose_256x192.yaml	Inference Model/Training Model

**Note: The above accuracy metrics are based on the COCO dataset AP(0.5:0.95), with detection boxes obtained from ground truth annotations. All GPU inference times are based on an NVIDIA Tesla T4 machine, with precision type FP32. CPU inference speeds are based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz, with 8 threads and precision type FP32.** ## [3D Multi-modal Fusion Detection Module](../module_usage/tutorials//cv_modules/3d_bev_detection.en.md)

Model	mAP(%)	NDS	yaml File	Model Download Link
BEVFusion	53.9	60.9	BEVFusion.yaml	Inference Model/Training Model

Note: The above accuracy metrics are based on the nuscenes validation set with mAP(0.5:0.95) and NDS 60.9, and the precision type is FP32.

## [Semantic Segmentation Module](../module_usage/tutorials/cv_modules/semantic_segmentation.en.md)

Model Name	mloU（%）	GPU Inference Time (ms)	CPU Inference Time (ms)	Model Storage Size	yaml File	Model Download Link
Deeplabv3_Plus-R50	80.36	61.0531	1513.58	94.9 M	Deeplabv3_Plus-R50.yaml	Inference Model/Training Model
Deeplabv3_Plus-R101	81.10	100.026	2460.71	162.5 M	Deeplabv3_Plus-R101.yaml	Inference Model/Training Model
Deeplabv3-R50	79.90	82.2631	1735.83	138.3 M	Deeplabv3-R50.yaml	Inference Model/Training Model
Deeplabv3-R101	80.85	121.492	2685.51	205.9 M	Deeplabv3-R101.yaml	Inference Model/Training Model
OCRNet_HRNet-W18	80.67	48.2335	906.385	43.1 M	OCRNet_HRNet-W18.yaml	Inference Model/Training Model
OCRNet_HRNet-W48	82.15	78.9976	2226.95	249.8 M	OCRNet_HRNet-W48.yaml	Inference Model/Training Model
PP-LiteSeg-T	73.10	7.6827	138.683	28.5 M	PP-LiteSeg-T.yaml	Inference Model/Training Model
PP-LiteSeg-B	75.25	10.9935	194.727	47.0 M	PP-LiteSeg-B.yaml	Inference Model/Training Model
SegFormer-B0 (slice)	76.73	11.1946	268.929	13.2 M	SegFormer-B0.yaml	Inference Model/Training Model
SegFormer-B1 (slice)	78.35	17.9998	403.393	48.5 M	SegFormer-B1.yaml	Inference Model/Training Model
SegFormer-B2 (slice)	81.60	48.0371	1248.52	96.9 M	SegFormer-B2.yaml	Inference Model/Training Model
SegFormer-B3 (slice)	82.47	64.341	1666.35	167.3 M	SegFormer-B3.yaml	Inference Model/Training Model
SegFormer-B4 (slice)	82.38	82.4336	1995.42	226.7 M	SegFormer-B4.yaml	Inference Model/Training Model
SegFormer-B5 (slice)	82.58	97.3717	2420.19	229.7 M	SegFormer-B5.yaml	Inference Model/Training Model

Note: The above accuracy metrics are based on the [Cityscapes](https://www.cityscapes-dataset.com/) dataset mIoU.

Model Name	mIoU (%)	GPU Inference Time (ms)	CPU Inference Time (ms)	Model Storage Size	yaml File	Model Download Link
SeaFormer_base(slice)	40.92	24.4073	397.574	30.8 M	SeaFormer_base.yaml	Inference Model/Training Model
SeaFormer_large (slice)	43.66	27.8123	550.464	49.8 M	SeaFormer_large.yaml	Inference Model/Training Model
SeaFormer_small (slice)	38.73	19.2295	358.343	14.3 M	SeaFormer_small.yaml	Inference Model/Training Model
SeaFormer_tiny (slice)	34.58	13.9496	330.132	6.1M	SeaFormer_tiny.yaml	Inference Model/Training Model

Note: The above accuracy metrics are based on the [ADE20k](https://groups.csail.mit.edu/vision/datasets/ADE20K/) dataset. "Slice" indicates that the input images have been cropped. ## [Instance Segmentation Module](../module_usage/tutorials/cv_modules/instance_segmentation.en.md)

Model Name	Mask AP	GPU Inference Time (ms)	CPU Inference Time (ms)	Model Storage Size	yaml File	Model Download Link
Mask-RT-DETR-H	50.6	132.693	4896.17	449.9 M	Mask-RT-DETR-H.yaml	Inference Model/Training Model
Mask-RT-DETR-L	45.7	46.5059	2575.92	113.6 M	Mask-RT-DETR-L.yaml	Inference Model/Training Model
Mask-RT-DETR-M	42.7	36.8329	-	66.6 M	Mask-RT-DETR-M.yaml	Inference Model/Training Model
Mask-RT-DETR-S	41.0	33.5007	-	51.8 M	Mask-RT-DETR-S.yaml	Inference Model/Training Model
Mask-RT-DETR-X	47.5	75.755	3358.04	237.5 M	Mask-RT-DETR-X.yaml	Inference Model/Training Model
Cascade-MaskRCNN-ResNet50-FPN	36.3	-	-	254.8 M	Cascade-MaskRCNN-ResNet50-FPN.yaml	Inference Model/Training Model
Cascade-MaskRCNN-ResNet50-vd-SSLDv2-FPN	39.1	-	-	254.7 M	Cascade-MaskRCNN-ResNet50-vd-SSLDv2-FPN.yaml	Inference Model/Training Model
MaskRCNN-ResNet50-FPN	35.6	-	-	157.5 M	MaskRCNN-ResNet50-FPN.yaml	Inference Model/Training Model
MaskRCNN-ResNet50-vd-FPN	36.4	-	-	157.5 M	MaskRCNN-ResNet50-vd-FPN.yaml	Inference Model/Training Model
MaskRCNN-ResNet50	32.8	-	-	127.8 M	MaskRCNN-ResNet50.yaml	Inference Model/Training Model
MaskRCNN-ResNet101-FPN	36.6	-	-	225.4 M	MaskRCNN-ResNet101-FPN.yaml	Inference Model/Training Model
MaskRCNN-ResNet101-vd-FPN	38.1	-	-	225.1 M	MaskRCNN-ResNet101-vd-FPN.yaml	Inference Model/Training Model
MaskRCNN-ResNeXt101-vd-FPN	39.5	-	-	370.0 M	MaskRCNN-ResNeXt101-vd-FPN.yaml	Inference Model/Training Model
PP-YOLOE_seg-S	32.5	-	-	31.5 M	PP-YOLOE_seg-S.yaml	Inference Model/Training Model
SOLOv2	35.5	-	-	179.1 M	SOLOv2.yaml	Inference Model/Training Model

Note: The above accuracy metrics are based on the Mask AP(0.5:0.95) on the [COCO2017](https://cocodataset.org/#home) validation set. ## [Text Detection Module](../module_usage/tutorials/ocr_modules/text_detection.en.md)

Model	Detection Hmean (%)	GPU Inference Time (ms)	CPU Inference Time (ms)	Model Storage Size (M)	yaml File	Model Download Link
PP-OCRv4_server_det	82.56	83.3501	2434.01	109	PP-OCRv4_server_det.yaml	Inference Model/Training Model
PP-OCRv4_mobile_det	77.35	10.6923	120.177	4.7	PP-OCRv4_mobile_det.yaml	Inference Model/Training Model
PP-OCRv3_mobile_det	78.68			2.1	PP-OCRv3_mobile_det.yaml	Inference Model/Training Model
PP-OCRv3_server_det	80.11			102.1	PP-OCRv3_server_det.yaml	Inference Model/Training Model

Note: The evaluation dataset for the above accuracy metrics is the self-built Chinese and English dataset of PaddleOCR, covering multiple scenarios such as street view, web images, documents, and handwriting, with 593 images for text recognition. The GPU inference time for all models is based on an NVIDIA Tesla T4 machine with FP32 precision type, while the CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision type. ## [Seal Text Detection Module](../module_usage/tutorials/ocr_modules/seal_text_detection.en.md)

Model Name	Detection Hmean (%)	GPU Inference Time (ms)	CPU Inference Time (ms)	Model Storage Size	yaml File	Model Download Link
PP-OCRv4_mobile_seal_det	96.47	10.5878	131.813	4.7M	PP-OCRv4_mobile_seal_det.yaml	Inference Model/Training Model
PP-OCRv4_server_seal_det	98.21	84.341	2425.06	108.3 M	PP-OCRv4_server_seal_det.yaml	Inference Model/Training Model

Note: The evaluation set for the above precision metrics is the seal dataset built by PaddleX, which includes 500 seal images. ## [Text Recognition Module](../module_usage/tutorials/ocr_modules/text_recognition.en.md) * Chinese Text Recognition Models

Model	Recognition Avg Accuracy(%)	GPU Inference Time (ms)	CPU Inference Time (ms)	Model Storage Size (M)	yaml File	Model Download Link
PP-OCRv4_server_rec_doc	81.53			74.7 M	PP-OCRv4_server_rec_doc.yaml	Inference Model/Training Model
PP-OCRv4_mobile_rec	78.74	7.95018	46.7868	10.6 M	PP-OCRv4_mobile_rec.yaml	Inference Model/Training Model
PP-OCRv4_server_rec	80.61	7.19439	140.179	71.2 M	PP-OCRv4_server_rec.yaml	Inference Model/Training Model
PP-OCRv3_mobile_rec	72.96			9.2 M	PP-OCRv3_mobile_rec.yaml	Inference Model/Training Model

Note: The evaluation set for the above accuracy metrics is a Chinese dataset built by PaddleOCR, covering multiple scenarios such as street view, web images, documents, and handwriting, with 8367 images for text recognition. All models' GPU inference times are based on an NVIDIA Tesla T4 machine with FP32 precision, while CPU inference speeds are based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.

Model	Recognition Avg Accuracy(%)	GPU Inference Time (ms)	CPU Inference Time	Model Storage Size (M)	yaml File	Model Download Link
ch_SVTRv2_rec	68.81	8.36801	165.706	73.9 M	ch_SVTRv2_rec.yaml	Inference Model/Training Model

Note: The evaluation dataset for the above accuracy metrics is the PaddleOCR Algorithm Model Challenge - Task 1: OCR End-to-End Recognition Task Leaderboard A. All model GPU inference times are based on an NVIDIA Tesla T4 machine with FP32 precision. CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.

Model	Recognition Avg Accuracy(%)	GPU Inference Time (ms)	CPU Inference Time	Model Storage Size (M)	yaml File	Model Download Link
ch_RepSVTR_rec	65.07	10.5047	51.5647	22.1 M	ch_RepSVTR_rec.yaml	Inference Model/Training Model

Note: The evaluation dataset for the above accuracy metrics is the PaddleOCR Algorithm Model Challenge - Task 1: OCR End-to-End Recognition Task Leaderboard B. All model GPU inference times are based on an NVIDIA Tesla T4 machine with FP32 precision. CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.

English Recognition Model

Model	Recognition Avg Accuracy(%)	GPU Inference Time (ms)	CPU Inference Time	Model Storage Size (M)	yaml File	Model Download Link
en_PP-OCRv4_mobile_rec	70.39			6.8 M	en_PP-OCRv4_mobile_rec.yaml	Inference Model/Training Model
en_PP-OCRv3_mobile_rec	70.69			7.8 M	en_PP-OCRv3_mobile_rec.yaml	Inference Model/Training Model

Multilingual Recognition Model

Model	Recognition Avg Accuracy(%)	Model Storage Size (M)	yaml File	Model Download Link
korean_PP-OCRv3_mobile_rec	60.21	8.6 M	korean_PP-OCRv3_mobile_rec.yaml	Inference Model/Training Model
japan_PP-OCRv3_mobile_rec	45.69	8.8 M	japan_PP-OCRv3_mobile_rec.yaml	Inference Model/Training Model
chinese_cht_PP-OCRv3_mobile_rec	82.06	9.7 M	chinese_cht_PP-OCRv3_mobile_rec.yaml	Inference Model/Training Model
te_PP-OCRv3_mobile_rec	95.88	7.8 M	te_PP-OCRv3_mobile_rec.yaml	Inference Model/Training Model
ka_PP-OCRv3_mobile_rec	96.96	8.0 M	ka_PP-OCRv3_mobile_rec.yaml	Inference Model/Training Model
ta_PP-OCRv3_mobile_rec	76.83	8.0 M	ta_PP-OCRv3_mobile_rec.yaml	Inference Model/Training Model
latin_PP-OCRv3_mobile_rec	76.93	7.8 M	latin_PP-OCRv3_mobile_rec.yaml	Inference Model/Training Model
arabic_PP-OCRv3_mobile_rec	73.55	7.8 M	arabic_PP-OCRv3_mobile_rec.yaml	Inference Model/Training Model
cyrillic_PP-OCRv3_mobile_rec	94.28	7.9 M	cyrillic_PP-OCRv3_mobile_rec.yaml	Inference Model/Training Model
devanagari_PP-OCRv3_mobile_rec	96.44	7.9 M	devanagari_PP-OCRv3_mobile_rec.yaml	Inference Model/Training Model

Note: The evaluation set for the above accuracy metrics is a multi-language dataset built by PaddleX. All model GPU inference times are based on NVIDIA Tesla T4 machines, with precision type FP32. CPU inference speed is based on Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz, with 8 threads, and precision type FP32.

## [Formula Recognition Module](../module_usage/tutorials/ocr_modules/formula_recognition.en.md)

Model	Avg-BLEU	GPU Inference Time (ms)	Model Storage Size (M)	yaml File	Model Download Link
UniMERNet	0.8613	2266.96	1.4 G	UniMERNet.yaml	Inference Model/Training Model
PP-FormulaNet-S	0.8712	202.25	167.9 M	PP-FormulaNet-S.yaml	Inference Model/Training Model
PP-FormulaNet-L	0.9213	1976.52	535.2 M	PP-FormulaNet-L.yaml	Inference Model/Training Model
LaTeX_OCR_rec	0.7163	-	89.7 M	LaTeX_OCR_rec.yaml	Inference Model/Training Model

Note: The above accuracy metrics are measured from the internal formula recognition test set of PaddleX. The BLEU score of LaTeX_OCR_rec on the LaTeX-OCR formula recognition test set is 0.8821. All model GPU inference times are based on Tesla V100 GPUs, with precision type FP32. ## [Table Structure Recognition Module](../module_usage/tutorials/ocr_modules/table_structure_recognition.en.md)

Model	Accuracy (%)	GPU Inference Time (ms)	CPU Inference Time (ms)	Model Storage Size (M)	yaml File	Model Download Link
SLANet	59.52	522.536	1845.37	6.9 M	SLANet.yaml	Inference Model/Training Model
SLANet_plus	63.69	522.536	1845.37	6.9 M	SLANet_plus.yaml	Inference Model/Training Model
SLANeXt_wired	69.65	--	--	--	SLANeXt_wired.yaml	Inference Model/Training Model
SLANeXt_wireless	69.65	--	--	--	SLANeXt_wireless.yaml	Inference Model/Training Model

Note: The above accuracy metrics are measured from the high-difficulty Chinese table recognition dataset built internally by PaddleX. All model GPU inference times are based on an NVIDIA Tesla T4 machine with FP32 precision type. CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision type. ## [Table Cell Detection Module](../module_usage/tutorials/ocr_modules/table_cells_detection.en.md)

Model	mAP(%)	GPU Inference Time (ms)	CPU Inference Time (ms)	Model Storage Size (M)	yaml File	Model Download Link
RT-DETR-L_wired_table_cell_det	--	--	--	--	RT-DETR-L_wired_table_cell_det.yaml	Inference Model/Training Model
RT-DETR-L_wireless_table_cell_det	--	--	--	--	RT-DETR-L_wireless_table_cell_det.yaml	Inference Model/Training Model

Note: The above accuracy metrics are measured from the internal table cell detection dataset of PaddleX. All model GPU inference times are based on an NVIDIA Tesla T4 machine, with precision type FP32. CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz, with 8 threads, and precision type FP32.

## [Table Classification Module](../module_usage/tutorials/ocr_modules/table_classification.en.md)

Model	Top1 Acc(%)	GPU Inference Time (ms)	CPU Inference Time (ms)	Model Storage Size (M)	yaml File	Model Download Link
PP-LCNet_x1_0_table_cls	--	--	--	--	PP-LCNet_x1_0_table_cls.yaml	Inference Model/Training Model

Note: The above accuracy metrics are measured from the internal table classification dataset built by PaddleX. All model GPU inference times are based on an NVIDIA Tesla T4 machine with FP32 precision. CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.

## [Text Image Unwarping Module](../module_usage/tutorials/ocr_modules/text_image_unwarping.en.md)

Model Name	MS-SSIM (%)	GPU Inference Time (ms)	CPU Inference Time (ms)	Model Storage Size	yaml File	Model Download Link
UVDoc	54.40	-	-	30.3 M	UVDoc.yaml	Inference Model/Training Model

Note: The above accuracy metrics are measured from the image unwarping dataset built by PaddleX. ## [Layout Detection Module](../module_usage/tutorials/ocr_modules/layout_detection.en.md) * Table Layout Detection Model

Model	mAP(0.5) (%)	GPU Inference Time (ms)	CPU Inference Time (ms)	Model Storage Size (M)	yaml File	Model Download Link
PicoDet_layout_1x_table	97.5	12.623	90.8934	7.4 M	PicoDet_layout_1x_table.yaml	Inference Model/Training Model

Note: The evaluation set for the above accuracy metrics is the layout table area detection dataset built by PaddleOCR, which contains 7835 images of document types with tables in both Chinese and English. The GPU inference time is based on an NVIDIA Tesla T4 machine with FP32 precision, and the CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision. 3 types of layout detection models, including tables, images, and seals

Model	mAP(0.5) (%)	GPU Inference Time (ms)	CPU Inference Time (ms)	Model Storage Size (M)	yaml File	Model Download Link
PicoDet-S_layout_3cls	88.2	13.5	45.8	4.8	PicoDet-S_layout_3cls.yaml	Inference Model/Training Model
PicoDet-L_layout_3cls	89.0	15.7	159.8	22.6	PicoDet-L_layout_3cls.yaml	Inference Model/Training Model
RT-DETR-H_layout_3cls	95.8	114.6	3832.6	470.1	RT-DETR-H_layout_3cls.yaml	Inference Model/Training Model

Note: The evaluation dataset for the above accuracy metrics is the layout area detection dataset built by PaddleOCR, which includes 1,154 common types of document images such as Chinese and English papers, magazines, and research reports. The GPU inference time is based on an NVIDIA Tesla T4 machine with FP32 precision, and the CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision. * 5-class English document layout detection model, including text, title, table, image, and list

Model	mAP(0.5) (%)	GPU Inference Time (ms)	CPU Inference Time (ms)	Model Size (M)	yaml File	Model Download Link
PicoDet_layout_1x	97.8	13.0	91.3	7.4	PicoDet_layout_1x.yaml	Inference Model/Training Model

Note: The evaluation dataset for the above accuracy metrics is the [PubLayNet](https://developer.ibm.com/exchanges/data/all/publaynet/) evaluation dataset, which contains 11,245 images of English documents. The GPU inference time is based on an NVIDIA Tesla T4 machine with FP32 precision. The CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision. * 17-class layout detection model, including 17 common layout categories: paragraph title, image, text, number, abstract, content, figure title, formula, table, table title, reference, document title, footnote, header, algorithm, footer, and seal

Model	mAP(0.5) (%)	GPU Inference Time (ms)	CPU Inference Time (ms)	Model Size (M)	yaml File	Model Download Link
PicoDet-S_layout_17cls	87.4	13.6	46.2	4.8	PicoDet-S_layout_17cls.yaml	Inference Model/Training Model

PicoDet-L_layout_17cls 89.0 17.2 160.2 22.6 PicoDet-L_layout_17cls.yaml Inference Model/Training Model RT-DETR-H_layout_17cls 98.3 115.1 3827.2 470.2 RT-DETR-H_layout_17cls.yaml Inference Model/Training Model Note: The evaluation set for the above accuracy metrics is the layout area detection dataset built by PaddleOCR, which includes 892 images of common document types such as Chinese and English papers, magazines, and research reports. The GPU inference time is based on an NVIDIA Tesla T4 machine with FP32 precision. The CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision. ## [Document Image Orientation Classification Module](../module_usage/tutorials/ocr_modules/doc_img_orientation_classification.en.md)

Model	Top-1 Acc (%)	GPU Inference Time (ms)	CPU Inference Time (ms)	Model Size (M)	yaml File	Model Download Link
PP-LCNet_x1_0_doc_ori	99.06	3.84845	9.23735	7	PP-LCNet_x1_0_doc_ori.yaml	Inference Model/Training Model

Note: The evaluation set for the above accuracy metrics is a self-built dataset covering multiple scenarios such as documents and certificates, with 1000 images. The GPU inference time is based on an NVIDIA Tesla T4 machine with FP32 precision. The CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision. ## [Time Series Forecasting Module](../module_usage/tutorials/time_series_modules/time_series_forecasting.en.md)

Model Name	mse	mae	Model Storage Size	yaml File	Model Download Link
DLinear	0.382	0.394	72 K	DLinear.yaml	Inference Model/Training Model
NLinear	0.386	0.392	40 K	NLinear.yaml	Inference Model/Training Model
Nonstationary	0.600	0.515	55.5 M	Nonstationary.yaml	Inference Model/Training Model
PatchTST	0.379	0.391	2.0 M	PatchTST.yaml	Inference Model/Training Model
RLinear	0.385	0.392	40 K	RLinear.yaml	Inference Model/Training Model
TiDE	0.407	0.414	31.7 M	TiDE.yaml	Inference Model/Training Model
TimesNet	0.416	0.429	4.9 M	TimesNet.yaml	Inference Model/Training Model

Note: The above accuracy metrics are measured from the [ETTH1](https://paddle-model-ecology.bj.bcebos.com/paddlex/data/Etth1.tar) dataset (evaluation results on the test.csv test set). ## [Time Series Anomaly Detection Module](../module_usage/tutorials/time_series_modules/time_series_anomaly_detection.en.md)

Model Name	Precision	Recall	F1 Score	Model Storage Size	YAML File	Model Download Link
AutoEncoder_ad	99.36	84.36	91.25	52 K	AutoEncoder_ad.yaml	Inference Model/Training Model
DLinear_ad	98.98	93.96	96.41	112 K	DLinear_ad.yaml	Inference Model/Training Model
Nonstationary_ad	98.55	88.95	93.51	1.8 M	Nonstationary_ad.yaml	Inference Model/Training Model
PatchTST_ad	98.78	90.70	94.57	320 K	PatchTST_ad.yaml	Inference Model/Training Model

Note: The above precision metrics are measured from the [PSM](https://paddle-model-ecology.bj.bcebos.com/paddlex/data/ts_anomaly_examples.tar) dataset. ## [Time Series Classification Module](../module_usage/tutorials/time_series_modules/time_series_classification.en.md)

Model Name	acc(%)	Model Storage Size	yaml File	Model Download Link
TimesNet_cls	87.5	792 K	TimesNet_cls.yaml	Inference Model/Training Model

Note: The above accuracy metrics are measured from the [UWaveGestureLibrary](https://paddlets.bj.bcebos.com/classification/UWaveGestureLibrary_TEST.csv) dataset. >Note: The GPU inference time for all models above is based on an NVIDIA Tesla T4 machine with FP32 precision. The CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision. ## [Multilingual Speech Recognition Module](../module_usage/tutorials/speech_modules/multilingual_speech_recognition.en.md)

Model	Training Data	Model Size	Word Error Rate	YAML File	Model Download Link
whisper_large	680kh	5.8G	2.7 (Librispeech)	whisper_large.yaml	Inference Model
whisper_medium	680kh	2.9G	-	whisper_medium.yaml	Inference Model
whisper_small	680kh	923M	-	whisper_small.yaml	Inference Model
whisper_base	680kh	277M	-	whisper_base.yaml	Inference Model
whisper_tiny	680kh	145M	-	whisper_small.yaml	Inference Model

## [Video Classification Module](../module_usage/tutorials/video_modules/video_classification.en.md)

Model	Top1 Acc(%)	Model Storage Size (M)	yaml File	Model Download Link
PP-TSM-R50_8frames_uniform	74.36	93.4 M	PP-TSM-R50_8frames_uniform.yaml	Inference Model/Training Model
PP-TSMv2-LCNetV2_8frames_uniform	71.71	22.5 M	PP-TSMv2-LCNetV2_8frames_uniform.yaml	Inference Model/Training Model
PP-TSMv2-LCNetV2_16frames_uniform	73.11	22.5 M	PP-TSMv2-LCNetV2_16frames_uniform.yaml	Inference Model/Training Model

Note: The above accuracy metrics are based on the K400 validation set Top1 Acc.

## [Video Detection Module](../module_usage/tutorials/video_modules/video_detection.en.md)

Model	Frame-mAP(@ IoU 0.5)	Model Storage Size (M)	yaml File	Model Download Link
YOWO	80.94	462.891M	YOWO.yaml	Inference Model/Training Model

Note: The above accuracy metrics are based on the test dataset UCF101-24, using the Frame-mAP (@ IoU 0.5) metric. All model GPU inference times are based on an NVIDIA Tesla T4 machine with FP32 precision. CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.