model_list_gcu.en.md 6.7 KB


comments: true

PaddleX Model List (Enflame GCU)

PaddleX incorporates multiple pipelines, each containing several modules, and each module encompasses various models. You can select the appropriate models based on the benchmark data below. If you prioritize model accuracy, choose models with higher accuracy. If you prioritize model size, select models with smaller storage requirements.

Image Classification Module

Model Name Top-1 Accuracy (%) Model Size (M) Model Download Link
ResNet50 76.96 90.8 M Inference Model/Trained Model
Note: The above accuracy metrics refer to Top-1 Accuracy on the ImageNet-1k validation set.

Object Detection Module

Model Name mAP (%) Model Size (M) Model Download Link
PP-YOLOE_plus-L 52.8 185.3 M Inference Model/Trained Model
PP-YOLOE_plus-M 49.7 83.2 M Inference Model/Trained Model
PP-YOLOE_plus-S 43.6 28.3 M Inference Model/Trained Model
PP-YOLOE_plus-X 54.7 349.4 M Inference Model/Trained Model
RT-DETR-H 56.3 435.8 M Inference Model/Trained Model
RT-DETR-L 53.0 113.7 M Inference Model/Trained Model
RT-DETR-R18 46.5 70.7 M Inference Model/Trained Model
RT-DETR-R50 53.1 149.1 M Inference Model/Trained Model
RT-DETR-X 54.8 232.9 M Inference Model/Trained Model
Note: The above accuracy metrics are for COCO2017 validation set mAP(0.5:0.95).

Text Detection Module

Model Name Detection Hmean (%) Model Size (M) Model Download Link
PP-OCRv4_mobile_det 77.79 4.2 M Inference Model/Trained Model
PP-OCRv4_server_det 82.69 100.1 M Inference Model/Trained Model
Note: The above accuracy metrics are evaluated on PaddleOCR's self-built Chinese dataset, covering street scenes, web images, documents, and handwritten scenarios, with 500 images for detection.

Text Recognition Module

Model Name Recognition Avg Accuracy (%) Model Size (M) Model Download Link
PP-OCRv4_mobile_rec 78.20 10.6 M Inference Model/Trained Model
PP-OCRv4_server_rec 79.20 71.2 M Inference Model/Trained Model
Note: The above accuracy metrics are evaluated on PaddleOCR's self-built Chinese dataset, covering street scenes, web images, documents, and handwritten scenarios, with 11,000 images for text recognition.