Jelajahi Sumber

Add layout (#2303)

* add layout

* add layout
Sunflower7788 1 tahun lalu
induk
melakukan
a9ebee2b6d
27 mengubah file dengan 779 tambahan dan 21 penghapusan
  1. 3 0
      docs/module_usage/tutorials/ocr_modules/layout_detection.md
  2. 4 1
      docs/module_usage/tutorials/ocr_modules/layout_detection_en.md
  3. 2 1
      docs/pipeline_deploy/high_performance_inference.md
  4. 2 1
      docs/pipeline_deploy/high_performance_inference_en.md
  5. 3 0
      docs/pipeline_usage/tutorials/information_extraction_pipelines/document_scene_information_extraction.md
  6. 4 1
      docs/pipeline_usage/tutorials/information_extraction_pipelines/document_scene_information_extraction_en.md
  7. 5 0
      docs/pipeline_usage/tutorials/ocr_pipelines/layout_parsing.md
  8. 5 0
      docs/pipeline_usage/tutorials/ocr_pipelines/layout_parsing_en.md
  9. 4 0
      docs/pipeline_usage/tutorials/ocr_pipelines/seal_recognition.md
  10. 5 1
      docs/pipeline_usage/tutorials/ocr_pipelines/seal_recognition_en.md
  11. 9 6
      docs/pipeline_usage/tutorials/ocr_pipelines/table_recognition.md
  12. 9 6
      docs/pipeline_usage/tutorials/ocr_pipelines/table_recognition_en.md
  13. 4 1
      docs/practical_tutorials/document_scene_information_extraction(layout_detection)_tutorial.md
  14. 4 1
      docs/practical_tutorials/document_scene_information_extraction(layout_detection)_tutorial_en.md
  15. 4 1
      docs/support_list/models_list.md
  16. 3 0
      docs/support_list/models_list_en.md
  17. 40 0
      paddlex/configs/structure_analysis/PicoDet-L_layout_17cls.yaml
  18. 40 0
      paddlex/configs/structure_analysis/PicoDet-S_layout_17cls.yaml
  19. 40 0
      paddlex/configs/structure_analysis/PicoDet-S_layout_3cls.yaml
  20. 3 0
      paddlex/inference/utils/official_models.py
  21. 3 0
      paddlex/modules/object_detection/model_list.py
  22. 165 0
      paddlex/repo_apis/PaddleDetection_api/configs/PicoDet-L_layout_17cls.yaml
  23. 165 0
      paddlex/repo_apis/PaddleDetection_api/configs/PicoDet-S_layout_17cls.yaml
  24. 165 0
      paddlex/repo_apis/PaddleDetection_api/configs/PicoDet-S_layout_3cls.yaml
  25. 1 1
      paddlex/repo_apis/PaddleDetection_api/object_det/model.py
  26. 43 0
      paddlex/repo_apis/PaddleDetection_api/object_det/official_categories.py
  27. 44 0
      paddlex/repo_apis/PaddleDetection_api/object_det/register.py

+ 3 - 0
docs/module_usage/tutorials/ocr_modules/layout_detection.md

@@ -11,7 +11,10 @@
 |模型|mAP(0.5)(%)|GPU推理耗时(ms)|CPU推理耗时 (ms)|模型存储大小(M)|介绍|
 |-|-|-|-|-|-|
 |PicoDet_layout_1x|86.8|13.0|91.3|7.4|基于PicoDet-1x在PubLayNet数据集训练的高效率版面区域定位模型,可定位包含文字、标题、表格、图片以及列表这5类区域|
+|PicoDet-S_layout_3cls|87.1|13.5 |45.8 |4.8|基于PicoDet-S轻量模型在中英文论文、杂志和研报等场景上自建数据集训练的高效率版面区域定位模型,包含3个类别:表格,图像和印章|
+|PicoDet-S_layout_17cls|70.3|13.6|46.2|4.8|基于PicoDet-S轻量模型在中英文论文、杂志和研报等场景上自建数据集训练的高效率版面区域定位模型,包含17个版面常见类别,分别是:段落标题、图片、文本、数字、摘要、内容、图表标题、公式、表格、表格标题、参考文献、文档标题、脚注、页眉、算法、页脚、印章|
 |PicoDet-L_layout_3cls|89.3|15.7|159.8|22.6|基于PicoDet-L在中英文论文、杂志和研报等场景上自建数据集训练的高效率版面区域定位模型,包含3个类别:表格,图像和印章|
+|PicoDet-L_layout_17cls|79.9|17.2 |160.2|22.6|基于PicoDet-L在中英文论文、杂志和研报等场景上自建数据集训练的高效率版面区域定位模型,包含17个版面常见类别,分别是:段落标题、图片、文本、数字、摘要、内容、图表标题、公式、表格、表格标题、参考文献、文档标题、脚注、页眉、算法、页脚、印章|
 |RT-DETR-H_layout_3cls|95.9|114.6|3832.6|470.1|基于RT-DETR-H在中英文论文、杂志和研报等场景上自建数据集训练的高精度版面区域定位模型,包含3个类别:表格,图像和印章|
 |RT-DETR-H_layout_17cls|92.6|115.1|3827.2|470.2|基于RT-DETR-H在中英文论文、杂志和研报等场景上自建数据集训练的高精度版面区域定位模型,包含17个版面常见类别,分别是:段落标题、图片、文本、数字、摘要、内容、图表标题、公式、表格、表格标题、参考文献、文档标题、脚注、页眉、算法、页脚、印章|
 

+ 4 - 1
docs/module_usage/tutorials/ocr_modules/layout_detection_en.md

@@ -11,7 +11,10 @@ The core task of structure analysis is to parse and segment the content of input
 | Model | mAP(0.5) (%) | GPU Inference Time (ms) | CPU Inference Time (ms) | Model Size (M) | Description |
 |-|-|-|-|-|-|
 | PicoDet_layout_1x | 86.8 | 13.0 | 91.3 | 7.4 | An efficient layout area localization model trained on the PubLayNet dataset based on PicoDet-1x can locate five types of areas, including text, titles, tables, images, and lists. |
-| PicoDet-L_layout_3cls | 89.3 | 15.7 | 159.8 | 22.6 | An efficient layout area localization model trained on a self-constructed dataset based on PicoDet-L for scenarios such as Chinese and English papers, magazines, and research reports includes three categories: tables, images, and seals. |
+|PicoDet-S_layout_3cls|87.1|13.5 |45.8 |4.8|An high-efficient layout area localization model trained on a self-constructed dataset based on PicoDet-S for scenarios such as Chinese and English papers, magazines, and research reports includes three categories: tables, images, and seals. |
+|PicoDet-S_layout_17cls|70.3|13.6|46.2|4.8|A high-efficient layout area localization model trained on a self-constructed dataset based on PicoDet-S_layout_17cls for scenarios such as Chinese and English papers, magazines, and research reports includes 17 common layout categories, namely: paragraph titles, images, text, numbers, abstracts, content, chart titles, formulas, tables, table titles, references, document titles, footnotes, headers, algorithms, footers, and seals. |
+|PicoDet-L_layout_3cls|89.3|15.7|159.8|22.6|An efficient layout area localization model trained on a self-constructed dataset based on PicoDet-L for scenarios such as Chinese and English papers, magazines, and research reports includes three categories: tables, images, and seals. |
+|PicoDet-L_layout_17cls|79.9|17.2 |160.2|22.6|A efficient layout area localization model trained on a self-constructed dataset based on PicoDet-L_layout_17cls for scenarios such as Chinese and English papers, magazines, and research reports includes 17 common layout categories, namely: paragraph titles, images, text, numbers, abstracts, content, chart titles, formulas, tables, table titles, references, document titles, footnotes, headers, algorithms, footers, and seals. |
 | RT-DETR-H_layout_3cls | 95.9 | 114.6 | 3832.6 | 470.1 | A high-precision layout area localization model trained on a self-constructed dataset based on RT-DETR-H for scenarios such as Chinese and English papers, magazines, and research reports includes three categories: tables, images, and seals. |
 | RT-DETR-H_layout_17cls | 92.6 | 115.1 | 3827.2 | 470.2 | A high-precision layout area localization model trained on a self-constructed dataset based on RT-DETR-H for scenarios such as Chinese and English papers, magazines, and research reports includes 17 common layout categories, namely: paragraph titles, images, text, numbers, abstracts, content, chart titles, formulas, tables, table titles, references, document titles, footnotes, headers, algorithms, footers, and seals. |
 

+ 2 - 1
docs/pipeline_deploy/high_performance_inference.md

@@ -192,7 +192,8 @@ PaddleX 为每个模型提供默认的高性能推理配置,并将其存储在
   <tr>
     <td rowspan="3">印章文本识别</td>
     <td>版面区域分析</td>
-    <td>PicoDet-L_layout_3cls<br/>RT-DETR-H_layout_3cls<br/>RT-DETR-H_layout_17cls</td>
+    <td>PicoDet-S_layout_3cls<br/>PicoDet-S_layout_17cls<details>
+    <summary><b>more</b></summary>PicoDet-L_layout_3cls<br/>PicoDet-L_layout_17cls<br/>RT-DETR-H_layout_3cls<br/>RT-DETR-H_layout_17cls</details></td>
   </tr>
 
   <tr>

+ 2 - 1
docs/pipeline_deploy/high_performance_inference_en.md

@@ -183,7 +183,8 @@ PaddleX provides default high-performance inference configurations for each mode
   <tr>
     <td rowspan="3">Seal Text Recognition</td>
     <td>Layout Analysis</td>
-    <td>PicoDet-L_layout_3cls<br/>RT-DETR-H_layout_3cls<br/>RT-DETR-H_layout_17cls</td>
+    <td>PicoDet-S_layout_3cls<br/>PicoDet-S_layout_17cls<details>
+    <summary><b>more</b></summary>PicoDet-L_layout_3cls<br/>PicoDet-L_layout_17cls<br/>RT-DETR-H_layout_3cls<br/>RT-DETR-H_layout_17cls</details></td>
   </tr>
 
   <tr>

+ 3 - 0
docs/pipeline_usage/tutorials/information_extraction_pipelines/document_scene_information_extraction.md

@@ -51,7 +51,10 @@
 |模型|mAP(0.5)(%)|GPU推理耗时(ms)|CPU推理耗时 (ms)|模型存储大小(M)|介绍|
 |-|-|-|-|-|-|
 |PicoDet_layout_1x|86.8|13.0|91.3|7.4|基于PicoDet-1x在PubLayNet数据集训练的高效率版面区域定位模型,可定位包含文字、标题、表格、图片以及列表这5类区域|
+|PicoDet-S_layout_3cls|87.1|13.5 |45.8 |4.8|基于PicoDet-S轻量模型在中英文论文、杂志和研报等场景上自建数据集训练的高效率版面区域定位模型,包含3个类别:表格,图像和印章|
+|PicoDet-S_layout_17cls|70.3|13.6|46.2|4.8|基于PicoDet-S轻量模型在中英文论文、杂志和研报等场景上自建数据集训练的高效率版面区域定位模型,包含17个版面常见类别,分别是:段落标题、图片、文本、数字、摘要、内容、图表标题、公式、表格、表格标题、参考文献、文档标题、脚注、页眉、算法、页脚、印章|
 |PicoDet-L_layout_3cls|89.3|15.7|159.8|22.6|基于PicoDet-L在中英文论文、杂志和研报等场景上自建数据集训练的高效率版面区域定位模型,包含3个类别:表格,图像和印章|
+|PicoDet-L_layout_17cls|79.9|17.2 |160.2|22.6|基于PicoDet-L在中英文论文、杂志和研报等场景上自建数据集训练的高效率版面区域定位模型,包含17个版面常见类别,分别是:段落标题、图片、文本、数字、摘要、内容、图表标题、公式、表格、表格标题、参考文献、文档标题、脚注、页眉、算法、页脚、印章|
 |RT-DETR-H_layout_3cls|95.9|114.6|3832.6|470.1|基于RT-DETR-H在中英文论文、杂志和研报等场景上自建数据集训练的高精度版面区域定位模型,包含3个类别:表格,图像和印章|
 |RT-DETR-H_layout_17cls|92.6|115.1|3827.2|470.2|基于RT-DETR-H在中英文论文、杂志和研报等场景上自建数据集训练的高精度版面区域定位模型,包含17个版面常见类别,分别是:段落标题、图片、文本、数字、摘要、内容、图表标题、公式、表格、表格标题、参考文献、文档标题、脚注、页眉、算法、页脚、印章|
 

+ 4 - 1
docs/pipeline_usage/tutorials/information_extraction_pipelines/document_scene_information_extraction_en.md

@@ -50,7 +50,10 @@ The **PP-ChatOCRv3-doc** pipeline includes modules for **Table Structure Recogni
 | Model | mAP(0.5) (%) | GPU Inference Time (ms) | CPU Inference Time (ms) | Model Size (M) | Description |
 |-|-|-|-|-|-|
 | PicoDet_layout_1x | 86.8 | 13.0 | 91.3 | 7.4 | An efficient layout area localization model trained on the PubLayNet dataset based on PicoDet-1x can locate five types of areas, including text, titles, tables, images, and lists. |
-| PicoDet-L_layout_3cls | 89.3 | 15.7 | 159.8 | 22.6 | An efficient layout area localization model trained on a self-constructed dataset based on PicoDet-L for scenarios such as Chinese and English papers, magazines, and research reports includes three categories: tables, images, and seals. |
+|PicoDet-S_layout_3cls|87.1|13.5 |45.8 |4.8|An high-efficient layout area localization model trained on a self-constructed dataset based on PicoDet-S for scenarios such as Chinese and English papers, magazines, and research reports includes three categories: tables, images, and seals. |
+|PicoDet-S_layout_17cls|70.3|13.6|46.2|4.8|A high-efficient layout area localization model trained on a self-constructed dataset based on PicoDet-S_layout_17cls for scenarios such as Chinese and English papers, magazines, and research reports includes 17 common layout categories, namely: paragraph titles, images, text, numbers, abstracts, content, chart titles, formulas, tables, table titles, references, document titles, footnotes, headers, algorithms, footers, and seals. |
+|PicoDet-L_layout_3cls|89.3|15.7|159.8|22.6|An efficient layout area localization model trained on a self-constructed dataset based on PicoDet-L for scenarios such as Chinese and English papers, magazines, and research reports includes three categories: tables, images, and seals. |
+|PicoDet-L_layout_17cls|79.9|17.2 |160.2|22.6|A efficient layout area localization model trained on a self-constructed dataset based on PicoDet-L_layout_17cls for scenarios such as Chinese and English papers, magazines, and research reports includes 17 common layout categories, namely: paragraph titles, images, text, numbers, abstracts, content, chart titles, formulas, tables, table titles, references, document titles, footnotes, headers, algorithms, footers, and seals. |
 | RT-DETR-H_layout_3cls | 95.9 | 114.6 | 3832.6 | 470.1 | A high-precision layout area localization model trained on a self-constructed dataset based on RT-DETR-H for scenarios such as Chinese and English papers, magazines, and research reports includes three categories: tables, images, and seals. |
 | RT-DETR-H_layout_17cls | 92.6 | 115.1 | 3827.2 | 470.2 | A high-precision layout area localization model trained on a self-constructed dataset based on RT-DETR-H for scenarios such as Chinese and English papers, magazines, and research reports includes 17 common layout categories, namely: paragraph titles, images, text, numbers, abstracts, content, chart titles, formulas, tables, table titles, references, document titles, footnotes, headers, algorithms, footers, and seals. |
 

+ 5 - 0
docs/pipeline_usage/tutorials/ocr_pipelines/layout_parsing.md

@@ -49,6 +49,11 @@
 |模型|mAP(0.5)(%)|GPU推理耗时(ms)|CPU推理耗时 (ms)|模型存储大小(M)|介绍|
 |-|-|-|-|-|-|
 |PicoDet_layout_1x|86.8|13.0|91.3|7.4|基于PicoDet-1x在PubLayNet数据集训练的高效率版面区域定位模型,可定位包含文字、标题、表格、图片以及列表这5类区域|
+|PicoDet-S_layout_3cls|87.1|13.5 |45.8 |4.8|基于PicoDet-S轻量模型在中英文论文、杂志和研报等场景上自建数据集训练的高效率版面区域定位模型,包含3个类别:表格,图像和印章|
+|PicoDet-S_layout_17cls|70.3|13.6|46.2|4.8|基于PicoDet-S轻量模型在中英文论文、杂志和研报等场景上自建数据集训练的高效率版面区域定位模型,包含17个版面常见类别,分别是:段落标题、图片、文本、数字、摘要、内容、图表标题、公式、表格、表格标题、参考文献、文档标题、脚注、页眉、算法、页脚、印章|
+|PicoDet-L_layout_3cls|89.3|15.7|159.8|22.6|基于PicoDet-L在中英文论文、杂志和研报等场景上自建数据集训练的高效率版面区域定位模型,包含3个类别:表格,图像和印章|
+|PicoDet-L_layout_17cls|79.9|17.2 |160.2|22.6|基于PicoDet-L在中英文论文、杂志和研报等场景上自建数据集训练的高效率版面区域定位模型,包含17个版面常见类别,分别是:段落标题、图片、文本、数字、摘要、内容、图表标题、公式、表格、表格标题、参考文献、文档标题、脚注、页眉、算法、页脚、印章|
+|RT-DETR-H_layout_3cls|95.9|114.6|3832.6|470.1|基于RT-DETR-H在中英文论文、杂志和研报等场景上自建数据集训练的高精度版面区域定位模型,包含3个类别:表格,图像和印章|
 |RT-DETR-H_layout_17cls|92.6|115.1|3827.2|470.2|基于RT-DETR-H在中英文论文、杂志和研报等场景上自建数据集训练的高精度版面区域定位模型,包含17个版面常见类别,分别是:段落标题、图片、文本、数字、摘要、内容、图表标题、公式、表格、表格标题、参考文献、文档标题、脚注、页眉、算法、页脚、印章|
 
 

+ 5 - 0
docs/pipeline_usage/tutorials/ocr_pipelines/layout_parsing_en.md

@@ -48,6 +48,11 @@ The **General Layout Parsing Pipeline** includes modules for table structure rec
 | Model | mAP(0.5) (%) | GPU Inference Time (ms) | CPU Inference Time (ms) | Model Size (M) | Description |
 |-|-|-|-|-|-|
 | PicoDet_layout_1x | 86.8 | 13.0 | 91.3 | 7.4 | An efficient layout area localization model trained on the PubLayNet dataset based on PicoDet-1x can locate five types of areas, including text, titles, tables, images, and lists. |
+|PicoDet-S_layout_3cls|87.1|13.5 |45.8 |4.8|An high-efficient layout area localization model trained on a self-constructed dataset based on PicoDet-S for scenarios such as Chinese and English papers, magazines, and research reports includes three categories: tables, images, and seals. |
+|PicoDet-S_layout_17cls|70.3|13.6|46.2|4.8|A high-efficient layout area localization model trained on a self-constructed dataset based on PicoDet-S_layout_17cls for scenarios such as Chinese and English papers, magazines, and research reports includes 17 common layout categories, namely: paragraph titles, images, text, numbers, abstracts, content, chart titles, formulas, tables, table titles, references, document titles, footnotes, headers, algorithms, footers, and seals. |
+|PicoDet-L_layout_3cls|89.3|15.7|159.8|22.6|An efficient layout area localization model trained on a self-constructed dataset based on PicoDet-L for scenarios such as Chinese and English papers, magazines, and research reports includes three categories: tables, images, and seals. |
+|PicoDet-L_layout_17cls|79.9|17.2 |160.2|22.6|A efficient layout area localization model trained on a self-constructed dataset based on PicoDet-L_layout_17cls for scenarios such as Chinese and English papers, magazines, and research reports includes 17 common layout categories, namely: paragraph titles, images, text, numbers, abstracts, content, chart titles, formulas, tables, table titles, references, document titles, footnotes, headers, algorithms, footers, and seals. |
+| RT-DETR-H_layout_3cls | 95.9 | 114.6 | 3832.6 | 470.1 | A high-precision layout area localization model trained on a self-constructed dataset based on RT-DETR-H for scenarios such as Chinese and English papers, magazines, and research reports includes three categories: tables, images, and seals. |
 | RT-DETR-H_layout_17cls | 92.6 | 115.1 | 3827.2 | 470.2 | A high-precision layout area localization model trained on a self-constructed dataset based on RT-DETR-H for scenarios such as Chinese and English papers, magazines, and research reports includes 17 common layout categories, namely: paragraph titles, images, text, numbers, abstracts, content, chart titles, formulas, tables, table titles, references, document titles, footnotes, headers, algorithms, footers, and seals. |
 
 **Note: The evaluation set for the above accuracy metrics is PaddleOCR's self-built layout region analysis dataset, containing 10,000 images of common document types, including English and Chinese papers, magazines, research reports, etc. GPU inference time is based on an NVIDIA Tesla T4 machine with FP32 precision. CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.**

+ 4 - 0
docs/pipeline_usage/tutorials/ocr_pipelines/seal_recognition.md

@@ -22,7 +22,11 @@
 
 |模型|mAP(0.5)(%)|GPU推理耗时(ms)|CPU推理耗时 (ms)|模型存储大小(M)|介绍|
 |-|-|-|-|-|-|
+|PicoDet_layout_1x|86.8|13.0|91.3|7.4|基于PicoDet-1x在PubLayNet数据集训练的高效率版面区域定位模型,可定位包含文字、标题、表格、图片以及列表这5类区域|
+|PicoDet-S_layout_3cls|87.1|13.5 |45.8 |4.8|基于PicoDet-S轻量模型在中英文论文、杂志和研报等场景上自建数据集训练的高效率版面区域定位模型,包含3个类别:表格,图像和印章|
+|PicoDet-S_layout_17cls|70.3|13.6|46.2|4.8|基于PicoDet-S轻量模型在中英文论文、杂志和研报等场景上自建数据集训练的高效率版面区域定位模型,包含17个版面常见类别,分别是:段落标题、图片、文本、数字、摘要、内容、图表标题、公式、表格、表格标题、参考文献、文档标题、脚注、页眉、算法、页脚、印章|
 |PicoDet-L_layout_3cls|89.3|15.7|159.8|22.6|基于PicoDet-L在中英文论文、杂志和研报等场景上自建数据集训练的高效率版面区域定位模型,包含3个类别:表格,图像和印章|
+|PicoDet-L_layout_17cls|79.9|17.2 |160.2|22.6|基于PicoDet-L在中英文论文、杂志和研报等场景上自建数据集训练的高效率版面区域定位模型,包含17个版面常见类别,分别是:段落标题、图片、文本、数字、摘要、内容、图表标题、公式、表格、表格标题、参考文献、文档标题、脚注、页眉、算法、页脚、印章|
 |RT-DETR-H_layout_3cls|95.9|114.6|3832.6|470.1|基于RT-DETR-H在中英文论文、杂志和研报等场景上自建数据集训练的高精度版面区域定位模型,包含3个类别:表格,图像和印章|
 |RT-DETR-H_layout_17cls|92.6|115.1|3827.2|470.2|基于RT-DETR-H在中英文论文、杂志和研报等场景上自建数据集训练的高精度版面区域定位模型,包含17个版面常见类别,分别是:段落标题、图片、文本、数字、摘要、内容、图表标题、公式、表格、表格标题、参考文献、文档标题、脚注、页眉、算法、页脚、印章|
 

+ 5 - 1
docs/pipeline_usage/tutorials/ocr_pipelines/seal_recognition_en.md

@@ -19,7 +19,11 @@ The **Seal Recognition** pipeline includes a layout area analysis module, a seal
 
 | Model | mAP(0.5) (%) | GPU Inference Time (ms) | CPU Inference Time (ms) | Model Size (M) | Description |
 |-|-|-|-|-|-|
-| PicoDet-L_layout_3cls | 89.3 | 15.7 | 159.8 | 22.6 | An efficient layout area localization model trained on a self-constructed dataset based on PicoDet-L for scenarios such as Chinese and English papers, magazines, and research reports includes three categories: tables, images, and seals. |
+| PicoDet_layout_1x | 86.8 | 13.0 | 91.3 | 7.4 | An efficient layout area localization model trained on the PubLayNet dataset based on PicoDet-1x can locate five types of areas, including text, titles, tables, images, and lists. |
+|PicoDet-S_layout_3cls|87.1|13.5 |45.8 |4.8|An high-efficient layout area localization model trained on a self-constructed dataset based on PicoDet-S for scenarios such as Chinese and English papers, magazines, and research reports includes three categories: tables, images, and seals. |
+|PicoDet-S_layout_17cls|70.3|13.6|46.2|4.8|A high-efficient layout area localization model trained on a self-constructed dataset based on PicoDet-S_layout_17cls for scenarios such as Chinese and English papers, magazines, and research reports includes 17 common layout categories, namely: paragraph titles, images, text, numbers, abstracts, content, chart titles, formulas, tables, table titles, references, document titles, footnotes, headers, algorithms, footers, and seals. |
+|PicoDet-L_layout_3cls|89.3|15.7|159.8|22.6|An efficient layout area localization model trained on a self-constructed dataset based on PicoDet-L for scenarios such as Chinese and English papers, magazines, and research reports includes three categories: tables, images, and seals. |
+|PicoDet-L_layout_17cls|79.9|17.2 |160.2|22.6|A efficient layout area localization model trained on a self-constructed dataset based on PicoDet-L_layout_17cls for scenarios such as Chinese and English papers, magazines, and research reports includes 17 common layout categories, namely: paragraph titles, images, text, numbers, abstracts, content, chart titles, formulas, tables, table titles, references, document titles, footnotes, headers, algorithms, footers, and seals. |
 | RT-DETR-H_layout_3cls | 95.9 | 114.6 | 3832.6 | 470.1 | A high-precision layout area localization model trained on a self-constructed dataset based on RT-DETR-H for scenarios such as Chinese and English papers, magazines, and research reports includes three categories: tables, images, and seals. |
 | RT-DETR-H_layout_17cls | 92.6 | 115.1 | 3827.2 | 470.2 | A high-precision layout area localization model trained on a self-constructed dataset based on RT-DETR-H for scenarios such as Chinese and English papers, magazines, and research reports includes 17 common layout categories, namely: paragraph titles, images, text, numbers, abstracts, content, chart titles, formulas, tables, table titles, references, document titles, footnotes, headers, algorithms, footers, and seals. |
 

+ 9 - 6
docs/pipeline_usage/tutorials/ocr_pipelines/table_recognition.md

@@ -47,12 +47,15 @@
 
 **版面区域分析模块模型:**
 
-|模型名称|mAP(%)|GPU推理耗时(ms)|CPU推理耗时(ms)|模型存储大小(M)|
-|-|-|-|-|-|
-|PicoDet_layout_1x|86.8|13.036|91.2634|7.4M |
-|PicoDet-L_layout_3cls|89.3|15.7425|159.771|22.6 M|
-|RT-DETR-H_layout_3cls|95.9|114.644|3832.62|470.1M|
-|RT-DETR-H_layout_17cls|92.6|115.126|3827.25|470.2M|
+|模型|mAP(0.5)(%)|GPU推理耗时(ms)|CPU推理耗时 (ms)|模型存储大小(M)|介绍|
+|-|-|-|-|-|-|
+|PicoDet_layout_1x|86.8|13.0|91.3|7.4|基于PicoDet-1x在PubLayNet数据集训练的高效率版面区域定位模型,可定位包含文字、标题、表格、图片以及列表这5类区域|
+|PicoDet-S_layout_3cls|87.1|13.5 |45.8 |4.8|基于PicoDet-S轻量模型在中英文论文、杂志和研报等场景上自建数据集训练的高效率版面区域定位模型,包含3个类别:表格,图像和印章|
+|PicoDet-S_layout_17cls|70.3|13.6|46.2|4.8|基于PicoDet-S轻量模型在中英文论文、杂志和研报等场景上自建数据集训练的高效率版面区域定位模型,包含17个版面常见类别,分别是:段落标题、图片、文本、数字、摘要、内容、图表标题、公式、表格、表格标题、参考文献、文档标题、脚注、页眉、算法、页脚、印章|
+|PicoDet-L_layout_3cls|89.3|15.7|159.8|22.6|基于PicoDet-L在中英文论文、杂志和研报等场景上自建数据集训练的高效率版面区域定位模型,包含3个类别:表格,图像和印章|
+|PicoDet-L_layout_17cls|79.9|17.2 |160.2|22.6|基于PicoDet-L在中英文论文、杂志和研报等场景上自建数据集训练的高效率版面区域定位模型,包含17个版面常见类别,分别是:段落标题、图片、文本、数字、摘要、内容、图表标题、公式、表格、表格标题、参考文献、文档标题、脚注、页眉、算法、页脚、印章|
+|RT-DETR-H_layout_3cls|95.9|114.6|3832.6|470.1|基于RT-DETR-H在中英文论文、杂志和研报等场景上自建数据集训练的高精度版面区域定位模型,包含3个类别:表格,图像和印章|
+|RT-DETR-H_layout_17cls|92.6|115.1|3827.2|470.2|基于RT-DETR-H在中英文论文、杂志和研报等场景上自建数据集训练的高精度版面区域定位模型,包含17个版面常见类别,分别是:段落标题、图片、文本、数字、摘要、内容、图表标题、公式、表格、表格标题、参考文献、文档标题、脚注、页眉、算法、页脚、印章|
 
 **注:以上精度指标的评估集是 PaddleX 自建的版面区域分析数据集,包含 1w 张图片。以上所有模型 GPU 推理耗时基于 NVIDIA Tesla T4 机器,精度类型为 FP32, CPU 推理速度基于 Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz,线程数为8,精度类型为 FP32。**
 

+ 9 - 6
docs/pipeline_usage/tutorials/ocr_pipelines/table_recognition_en.md

@@ -51,12 +51,15 @@ SLANet_plus is an enhanced version of SLANet, a table structure recognition mode
 
 **Layout Analysis Module Models**:
 
-|Model Name|mAP (%)|GPU Inference Time (ms)|CPU Inference Time (ms)|Model Size (M)|
-|-|-|-|-|-|
-|PicoDet_layout_1x|86.8|13.036|91.2634|7.4M|
-|PicoDet-L_layout_3cls|89.3|15.7425|159.771|22.6 M|
-|RT-DETR-H_layout_3cls|95.9|114.644|3832.62|470.1M|
-|RT-DETR-H_layout_17cls|92.6|115.126|3827.25|470.2M|
+| Model | mAP(0.5) (%) | GPU Inference Time (ms) | CPU Inference Time (ms) | Model Size (M) | Description |
+|-|-|-|-|-|-|
+| PicoDet_layout_1x | 86.8 | 13.0 | 91.3 | 7.4 | An efficient layout area localization model trained on the PubLayNet dataset based on PicoDet-1x can locate five types of areas, including text, titles, tables, images, and lists. |
+|PicoDet-S_layout_3cls|87.1|13.5 |45.8 |4.8|An high-efficient layout area localization model trained on a self-constructed dataset based on PicoDet-S for scenarios such as Chinese and English papers, magazines, and research reports includes three categories: tables, images, and seals. |
+|PicoDet-S_layout_17cls|70.3|13.6|46.2|4.8|A high-efficient layout area localization model trained on a self-constructed dataset based on PicoDet-S_layout_17cls for scenarios such as Chinese and English papers, magazines, and research reports includes 17 common layout categories, namely: paragraph titles, images, text, numbers, abstracts, content, chart titles, formulas, tables, table titles, references, document titles, footnotes, headers, algorithms, footers, and seals. |
+|PicoDet-L_layout_3cls|89.3|15.7|159.8|22.6|An efficient layout area localization model trained on a self-constructed dataset based on PicoDet-L for scenarios such as Chinese and English papers, magazines, and research reports includes three categories: tables, images, and seals. |
+|PicoDet-L_layout_17cls|79.9|17.2 |160.2|22.6|A efficient layout area localization model trained on a self-constructed dataset based on PicoDet-L_layout_17cls for scenarios such as Chinese and English papers, magazines, and research reports includes 17 common layout categories, namely: paragraph titles, images, text, numbers, abstracts, content, chart titles, formulas, tables, table titles, references, document titles, footnotes, headers, algorithms, footers, and seals. |
+| RT-DETR-H_layout_3cls | 95.9 | 114.6 | 3832.6 | 470.1 | A high-precision layout area localization model trained on a self-constructed dataset based on RT-DETR-H for scenarios such as Chinese and English papers, magazines, and research reports includes three categories: tables, images, and seals. |
+| RT-DETR-H_layout_17cls | 92.6 | 115.1 | 3827.2 | 470.2 | A high-precision layout area localization model trained on a self-constructed dataset based on RT-DETR-H for scenarios such as Chinese and English papers, magazines, and research reports includes 17 common layout categories, namely: paragraph titles, images, text, numbers, abstracts, content, chart titles, formulas, tables, table titles, references, document titles, footnotes, headers, algorithms, footers, and seals. |
 
 **Note: The above accuracy metrics are evaluated on PaddleX's self-built layout analysis dataset containing 10,000 images. All GPU inference times are based on an NVIDIA Tesla T4 machine with FP32 precision. CPU inference speeds are based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.**
 

+ 4 - 1
docs/practical_tutorials/document_scene_information_extraction(layout_detection)_tutorial.md

@@ -94,9 +94,12 @@ PaddleX 提供了 4 个端到端的版面区域定位模型,具体可参考 [
 |模型|mAP(0.5)(%)|GPU推理耗时(ms)|CPU推理耗时 (ms)|模型存储大小(M)|介绍|
 |-|-|-|-|-|-|
 |PicoDet_layout_1x|86.8|13.0|91.3|7.4|基于PicoDet-1x在PubLayNet数据集训练的高效率版面区域定位模型,可定位包含文字、标题、表格、图片以及列表这5类区域|
+|PicoDet-S_layout_3cls|87.1|13.5 |45.8 |4.8|基于PicoDet-S轻量模型在中英文论文、杂志和研报等场景上自建数据集训练的高效率版面区域定位模型,包含3个类别:表格,图像和印章|
+|PicoDet-S_layout_17cls|70.3|13.6|46.2|4.8|基于PicoDet-S轻量模型在中英文论文、杂志和研报等场景上自建数据集训练的高效率版面区域定位模型,包含17个版面常见类别,分别是:段落标题、图片、文本、数字、摘要、内容、图表标题、公式、表格、表格标题、参考文献、文档标题、脚注、页眉、算法、页脚、印章|
 |PicoDet-L_layout_3cls|89.3|15.7|159.8|22.6|基于PicoDet-L在中英文论文、杂志和研报等场景上自建数据集训练的高效率版面区域定位模型,包含3个类别:表格,图像和印章|
+|PicoDet-L_layout_17cls|79.9|17.2 |160.2|22.6|基于PicoDet-L在中英文论文、杂志和研报等场景上自建数据集训练的高效率版面区域定位模型,包含17个版面常见类别,分别是:段落标题、图片、文本、数字、摘要、内容、图表标题、公式、表格、表格标题、参考文献、文档标题、脚注、页眉、算法、页脚、印章|
 |RT-DETR-H_layout_3cls|95.9|114.6|3832.6|470.1|基于RT-DETR-H在中英文论文、杂志和研报等场景上自建数据集训练的高精度版面区域定位模型,包含3个类别:表格,图像和印章|
-|RT-DETR-H_layout_17cls|92.6|115.1|3827.2|470.2|基于RT-DETR-H在中英文论文、杂志和研报等场景上自建数据集训练的高精度版面区域定位模型,包含17个版面常见类别,分别是:段落标题、图片、文本、数字、摘要、内容、表标题、公式、表格、表格标题、参考文献、文档标题、脚注、页眉、算法、页脚、印章|
+|RT-DETR-H_layout_17cls|92.6|115.1|3827.2|470.2|基于RT-DETR-H在中英文论文、杂志和研报等场景上自建数据集训练的高精度版面区域定位模型,包含17个版面常见类别,分别是:段落标题、图片、文本、数字、摘要、内容、表标题、公式、表格、表格标题、参考文献、文档标题、脚注、页眉、算法、页脚、印章|
 
 **注:以上精度指标的评估集是 PaddleOCR 自建的版面区域分析数据集,包含中英文论文、杂志和研报等常见的 1w 张文档类型图片。GPU 推理耗时基于 NVIDIA Tesla T4 机器,精度类型为 FP32, CPU 推理速度基于 Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz,线程数为 8,精度类型为 FP32。**
 

+ 4 - 1
docs/practical_tutorials/document_scene_information_extraction(layout_detection)_tutorial_en.md

@@ -89,7 +89,10 @@ PaddleX provides 4 end-to-end layout detection models, which can be referenced i
 | Model | mAP(0.5) (%) | GPU Inference Time (ms) | CPU Inference Time (ms) | Model Size (M) | Description |
 |-|-|-|-|-|-|
 | PicoDet_layout_1x | 86.8 | 13.0 | 91.3 | 7.4 | An efficient layout area localization model trained on the PubLayNet dataset based on PicoDet-1x can locate five types of areas, including text, titles, tables, images, and lists. |
-| PicoDet-L_layout_3cls | 89.3 | 15.7 | 159.8 | 22.6 | An efficient layout area localization model trained on a self-constructed dataset based on PicoDet-L for scenarios such as Chinese and English papers, magazines, and research reports includes three categories: tables, images, and seals. |
+|PicoDet-S_layout_3cls|87.1|13.5 |45.8 |4.8|An high-efficient layout area localization model trained on a self-constructed dataset based on PicoDet-S for scenarios such as Chinese and English papers, magazines, and research reports includes three categories: tables, images, and seals. |
+|PicoDet-S_layout_17cls|70.3|13.6|46.2|4.8|A high-efficient layout area localization model trained on a self-constructed dataset based on PicoDet-S_layout_17cls for scenarios such as Chinese and English papers, magazines, and research reports includes 17 common layout categories, namely: paragraph titles, images, text, numbers, abstracts, content, chart titles, formulas, tables, table titles, references, document titles, footnotes, headers, algorithms, footers, and seals. |
+|PicoDet-L_layout_3cls|89.3|15.7|159.8|22.6|An efficient layout area localization model trained on a self-constructed dataset based on PicoDet-L for scenarios such as Chinese and English papers, magazines, and research reports includes three categories: tables, images, and seals. |
+|PicoDet-L_layout_17cls|79.9|17.2 |160.2|22.6|A efficient layout area localization model trained on a self-constructed dataset based on PicoDet-L_layout_17cls for scenarios such as Chinese and English papers, magazines, and research reports includes 17 common layout categories, namely: paragraph titles, images, text, numbers, abstracts, content, chart titles, formulas, tables, table titles, references, document titles, footnotes, headers, algorithms, footers, and seals. |
 | RT-DETR-H_layout_3cls | 95.9 | 114.6 | 3832.6 | 470.1 | A high-precision layout area localization model trained on a self-constructed dataset based on RT-DETR-H for scenarios such as Chinese and English papers, magazines, and research reports includes three categories: tables, images, and seals. |
 | RT-DETR-H_layout_17cls | 92.6 | 115.1 | 3827.2 | 470.2 | A high-precision layout area localization model trained on a self-constructed dataset based on RT-DETR-H for scenarios such as Chinese and English papers, magazines, and research reports includes 17 common layout categories, namely: paragraph titles, images, text, numbers, abstracts, content, chart titles, formulas, tables, table titles, references, document titles, footnotes, headers, algorithms, footers, and seals. |
 

+ 4 - 1
docs/support_list/models_list.md

@@ -330,10 +330,13 @@ PaddleX 内置了多条产线,每条产线都包含了若干模块,每个模
 **注:以上精度指标测量自 ****PaddleX自建的图像矫正数据集****。**
 
 ## [版面区域检测模块](../module_usage/tutorials/ocr_modules/layout_detection.md)
-|模型名称|mAP(%)|GPU推理耗时(ms)|CPU推理耗时(ms)|模型存储大小|yaml 文件|
+|模型名称|mAP@(0.50:0.95)(%)|GPU推理耗时(ms)|CPU推理耗时(ms)|模型存储大小|yaml 文件|
 |-|-|-|-|-|-|
 |PicoDet_layout_1x|86.8|13.036|91.2634|7.4 M |[PicoDet_layout_1x.yaml](../../paddlex/configs/structure_analysis/PicoDet_layout_1x.yaml)|
+|PicoDet-S_layout_3cls|87.1|?|?|4.8 M|[PicoDet-S_layout_3cls.yaml](../../paddlex/configs/structure_analysis/PicoDet-S_layout_3cls.yaml)|
+|PicoDet-S_layout_17cls|70.3|?|?|4.8 M|[PicoDet-S_layout_17cls.yaml](../../paddlex/configs/structure_analysis/PicoDet-S_layout_17cls.yaml)|
 |PicoDet-L_layout_3cls|89.3|15.7425|159.771|22.6 M|[PicoDet-L_layout_3cls.yaml](../../paddlex/configs/structure_analysis/PicoDet-L_layout_3cls.yaml)|
+|PicoDet-L_layout_17cls|79.9|?|?|22.6 M|[PicoDet-L_layout_17cls.yaml](../../paddlex/configs/structure_analysis/PicoDet-L_layout_17cls.yaml)|
 |RT-DETR-H_layout_3cls|95.9|114.644|3832.62|470.1 M|[RT-DETR-H_layout_3cls.yaml](../../paddlex/configs/structure_analysis/RT-DETR-H_layout_3cls.yaml)|
 |RT-DETR-H_layout_17cls|92.6|115.126|3827.25|470.2 M|[RT-DETR-H_layout_17cls.yaml](../../paddlex/configs/structure_analysis/RT-DETR-H_layout_17cls.yaml)|
 

+ 3 - 0
docs/support_list/models_list_en.md

@@ -337,7 +337,10 @@ PaddleX incorporates multiple pipelines, each containing several modules, and ea
 |Model Name|mAP (%)|GPU Inference Time (ms)|CPU Inference Time (ms)|Model Size|YAML File|
 |-|-|-|-|-|-|
 |PicoDet_layout_1x|86.8|13.036|91.2634|7.4 M |[PicoDet_layout_1x.yaml](../../paddlex/configs/structure_analysis/PicoDet_layout_1x.yaml)|
+|PicoDet-S_layout_3cls|87.1|13.521 |45.7633 |4.8 M|[PicoDet-S_layout_3cls.yaml](../../paddlex/configs/structure_analysis/PicoDet-S_layout_3cls.yaml)|
+|PicoDet-S_layout_17cls|70.3|13.5632|46.2059|4.8 M  |[PicoDet-S_layout_17cls.yaml](../../paddlex/configs/structure_analysis/PicoDet-S_layout_17cls.yaml)|
 |PicoDet-L_layout_3cls|89.3|15.7425|159.771|22.6 M|[PicoDet-L_layout_3cls.yaml](../../paddlex/configs/structure_analysis/PicoDet-L_layout_3cls.yaml)|
+|PicoDet-L_layout_17cls|79.9|17.1901 |160.262|22.6 M |[PicoDet-L_layout_17cls.yaml](../../paddlex/configs/structure_analysis/PicoDet-L_layout_17cls.yaml)|
 |RT-DETR-H_layout_3cls|95.9|114.644|3832.62|470.1 M|[RT-DETR-H_layout_3cls.yaml](../../paddlex/configs/structure_analysis/RT-DETR-H_layout_3cls.yaml)|
 |RT-DETR-H_layout_17cls|92.6|115.126|3827.25|470.2 M|[RT-DETR-H_layout_17cls.yaml](../../paddlex/configs/structure_analysis/RT-DETR-H_layout_17cls.yaml)|
 

+ 40 - 0
paddlex/configs/structure_analysis/PicoDet-L_layout_17cls.yaml

@@ -0,0 +1,40 @@
+Global:
+  model: PicoDet-L_layout_17cls
+  mode: check_dataset # check_dataset/train/evaluate/predict
+  dataset_dir: "/paddle/dataset/paddlex/layout/det_layout_examples"
+  device: gpu:0,1,2,3
+  output: "output"
+
+CheckDataset:
+  convert:
+    enable: False
+    src_dataset_type: null
+  split:
+    enable: False
+    train_percent: null
+    val_percent: null
+
+Train:
+  num_classes: 11
+  epochs_iters: 50
+  batch_size: 2
+  learning_rate: 0.06
+  pretrain_weight_path: null
+  warmup_steps: 100
+  resume_path: null
+  log_interval: 10
+  eval_interval: 1
+
+Evaluate:
+  weight_path: "output/best_model/best_model.pdparams"
+  log_interval: 10
+
+Export:
+  weight_path: https://paddle-model-ecology.bj.bcebos.com/paddlex/pretrained/PicoDet-L_layout_pretrained_17cls.pdparams
+
+Predict:
+  batch_size: 1
+  model_dir: "output/best_model/inference"
+  input: "https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/layout.jpg"
+  kernel_option:
+    run_mode: paddle

+ 40 - 0
paddlex/configs/structure_analysis/PicoDet-S_layout_17cls.yaml

@@ -0,0 +1,40 @@
+Global:
+  model: PicoDet-S_layout_17cls
+  mode: check_dataset # check_dataset/train/evaluate/predict
+  dataset_dir: "/paddle/dataset/paddlex/layout/det_layout_examples"
+  device: gpu:0,1,2,3
+  output: "output"
+
+CheckDataset:
+  convert:
+    enable: False
+    src_dataset_type: null
+  split:
+    enable: False
+    train_percent: null
+    val_percent: null
+
+Train:
+  num_classes: 11
+  epochs_iters: 50
+  batch_size: 2
+  learning_rate: 0.06
+  pretrain_weight_path: null
+  warmup_steps: 100
+  resume_path: null
+  log_interval: 10
+  eval_interval: 1
+
+Evaluate:
+  weight_path: "output/best_model/best_model.pdparams"
+  log_interval: 10
+
+Export:
+  weight_path: https://paddle-model-ecology.bj.bcebos.com/paddlex/pretrained/PicoDet-S_layout_pretrained_17cls.pdparams
+
+Predict:
+  batch_size: 1
+  model_dir: "output/best_model/inference"
+  input: "https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/layout.jpg"
+  kernel_option:
+    run_mode: paddle

+ 40 - 0
paddlex/configs/structure_analysis/PicoDet-S_layout_3cls.yaml

@@ -0,0 +1,40 @@
+Global:
+  model: PicoDet-S_layout_3cls
+  mode: check_dataset # check_dataset/train/evaluate/predict
+  dataset_dir: "/paddle/dataset/paddlex/layout/det_layout_examples"
+  device: gpu:0,1,2,3
+  output: "output"
+
+CheckDataset:
+  convert:
+    enable: False
+    src_dataset_type: null
+  split:
+    enable: False
+    train_percent: null
+    val_percent: null
+
+Train:
+  num_classes: 11
+  epochs_iters: 50
+  batch_size: 2
+  learning_rate: 0.06
+  pretrain_weight_path: null
+  warmup_steps: 100
+  resume_path: null
+  log_interval: 10
+  eval_interval: 1
+
+Evaluate:
+  weight_path: "output/best_model/best_model.pdparams"
+  log_interval: 10
+
+Export:
+  weight_path: https://paddle-model-ecology.bj.bcebos.com/paddlex/pretrained/PicoDet-S_layout_pretrained_3cls.pdparams
+
+Predict:
+  batch_size: 1
+  model_dir: "output/best_model/inference"
+  input: "https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/layout.jpg"
+  kernel_option:
+    run_mode: paddle

+ 3 - 0
paddlex/inference/utils/official_models.py

@@ -254,7 +254,10 @@ PP-LCNet_x1_0_vehicle_attribute_infer.tar",
     "PP-YOLOE_plus_SOD-largesize-L": "https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0b1_v2/PP-YOLOE_plus_SOD-largesize-L_infer.tar",
     "CenterNet-DLA-34": "https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0b1_v2/CenterNet-DLA-34_infer.tar",
     "CenterNet-ResNet50": "https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0b1_v2/CenterNet-ResNet50_infer.tar",
+    "PicoDet-S_layout_3cls": "https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0b1_v2/PicoDet-S_layout_3cls_infer.tar",
+    "PicoDet-S_layout_17cls": "https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0b1_v2/PicoDet-S_layout_17cls_infer.tar",
     "PicoDet-L_layout_3cls": "https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0b1_v2/PicoDet-L_layout_3cls_infer.tar",
+    "PicoDet-L_layout_17cls": "https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0b1_v2/PicoDet-L_layout_17cls_infer.tar",
     "RT-DETR-H_layout_3cls": "https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0b1_v2/RT-DETR-H_layout_3cls_infer.tar",
     "RT-DETR-H_layout_17cls": "https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0b1_v2/RT-DETR-H_layout_17cls_infer.tar",
     "PicoDet_LCNet_x2_5_face": "https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0b1_v2/PicoDet_LCNet_x2_5_face_infer.tar",

+ 3 - 0
paddlex/modules/object_detection/model_list.py

@@ -26,7 +26,10 @@ MODELS = [
     "RT-DETR-R50",
     "RT-DETR-X",
     "PicoDet_layout_1x",
+    "PicoDet-S_layout_3cls",
+    "PicoDet-S_layout_17cls",
     "PicoDet-L_layout_3cls",
+    "PicoDet-L_layout_17cls",
     "RT-DETR-H_layout_3cls",
     "RT-DETR-H_layout_17cls",
     "YOLOv3-DarkNet53",

+ 165 - 0
paddlex/repo_apis/PaddleDetection_api/configs/PicoDet-L_layout_17cls.yaml

@@ -0,0 +1,165 @@
+# Runtime
+epoch: 100
+log_iter: 10
+find_unused_parameters: true
+use_gpu: true
+use_xpu: false
+use_mlu: false
+use_npu: false
+use_ema: true
+save_dir: output
+snapshot_epoch: 10
+print_flops: false
+print_params: false
+
+# Dataset
+metric: COCO
+num_classes: 17
+
+worker_num: 6
+eval_height: &eval_height 640
+eval_width: &eval_width 640
+eval_size: &eval_size [*eval_height, *eval_width]
+
+TrainDataset:
+  name: COCODetDataset
+  image_dir: images
+  anno_path: annotations/instance_train.json
+  dataset_dir: datasets/COCO
+  data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
+
+EvalDataset:
+  name: COCODetDataset
+  image_dir: images
+  anno_path: annotations/instance_val.json
+  dataset_dir: datasets/COCO
+  allow_empty: true
+
+TestDataset:
+  name: ImageFolder
+  anno_path: annotations/instance_val.json
+  dataset_dir: datasets/COCO
+
+TrainReader:
+  sample_transforms:
+  - Decode: {}
+  - RandomCrop: {}
+  - RandomFlip: {prob: 0.5}
+  - RandomDistort: {}
+  batch_transforms:
+  - BatchRandomResize: {target_size: [576, 608, 640, 672, 704], random_size: True, random_interp: True, keep_ratio: False}
+  - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
+  - Permute: {}
+  - PadGT: {}
+  batch_size: 16
+  shuffle: true
+  drop_last: true
+
+EvalReader:
+  sample_transforms:
+  - Decode: {}
+  - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False}
+  - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
+  - Permute: {}
+  batch_transforms:
+  - PadBatch: {pad_to_stride: 32}
+  batch_size: 8
+  shuffle: false
+
+TestReader:
+  inputs_def:
+    image_shape: [1, 3, *eval_height, *eval_width]
+  sample_transforms:
+  - Decode: {}
+  - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False}
+  - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
+  - Permute: {}
+  batch_size: 1
+
+# Model
+architecture: PicoDet
+pretrain_weights: https://paddle-model-ecology.bj.bcebos.com/paddlex/pretrained/PicoDet-L_layout_pretrained_v1.pdparams
+
+PicoDet:
+  backbone: LCNet
+  neck: LCPAN
+  head: PicoHeadV2
+
+LCNet:
+  scale: 2.0
+  feature_maps: [3, 4, 5]
+
+LCPAN:
+  out_channels: 160
+  use_depthwise: true
+  num_features: 4
+
+PicoHeadV2:
+  conv_feat:
+    name: PicoFeat
+    feat_in: 160
+    feat_out: 160
+    num_convs: 4
+    num_fpn_stride: 4
+    norm_type: bn
+    share_cls_reg: true
+    use_se: true
+  fpn_stride: [8, 16, 32, 64]
+  feat_in_chan: 160
+  prior_prob: 0.01
+  reg_max: 7
+  cell_offset: 0.5
+  grid_cell_scale: 5.0
+  static_assigner_epoch: 100
+  use_align_head: true
+  static_assigner:
+    name: ATSSAssigner
+    topk: 9
+    force_gt_matching: false
+  assigner:
+    name: TaskAlignedAssigner
+    topk: 13
+    alpha: 1.0
+    beta: 6.0
+  loss_class:
+    name: VarifocalLoss
+    use_sigmoid: false
+    iou_weighted: true
+    loss_weight: 1.0
+  loss_dfl:
+    name: DistributionFocalLoss
+    loss_weight: 0.5
+  loss_bbox:
+    name: GIoULoss
+    loss_weight: 2.5
+  nms:
+    name: MultiClassNMS
+    nms_top_k: 1000
+    keep_top_k: 100
+    score_threshold: 0.025
+    nms_threshold: 0.6
+
+# Optimizer
+LearningRate:
+  base_lr: 0.06
+  schedulers:
+  - name: CosineDecay
+    max_epochs: 150
+  - name: LinearWarmup
+    start_factor: 0.1
+    steps: 300
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.00004
+    type: L2
+
+# Export
+export:
+  post_process: true
+  nms: true
+  benchmark: false
+  fuse_conv_bn: false

+ 165 - 0
paddlex/repo_apis/PaddleDetection_api/configs/PicoDet-S_layout_17cls.yaml

@@ -0,0 +1,165 @@
+# Runtime
+epoch: 100
+log_iter: 10
+find_unused_parameters: true
+use_gpu: true
+use_xpu: false
+use_mlu: false
+use_npu: false
+use_ema: true
+save_dir: output
+snapshot_epoch: 10
+print_flops: false
+print_params: false
+
+# Dataset
+metric: COCO
+num_classes: 17
+
+worker_num: 6
+eval_height: &eval_height 480
+eval_width: &eval_width 480
+eval_size: &eval_size [*eval_height, *eval_width]
+
+TrainDataset:
+  name: COCODetDataset
+  image_dir: images
+  anno_path: annotations/instance_train.json
+  dataset_dir: datasets/COCO
+  data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
+
+EvalDataset:
+  name: COCODetDataset
+  image_dir: images
+  anno_path: annotations/instance_val.json
+  dataset_dir: datasets/COCO
+  allow_empty: true
+
+TestDataset:
+  name: ImageFolder
+  anno_path: annotations/instance_val.json
+  dataset_dir: datasets/COCO
+
+TrainReader:
+  sample_transforms:
+  - Decode: {}
+  - RandomCrop: {}
+  - RandomFlip: {prob: 0.5}
+  - RandomDistort: {}
+  batch_transforms:
+  - BatchRandomResize: {target_size: [416, 448, 480, 512, 544], random_size: True, random_interp: True, keep_ratio: False}
+  - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
+  - Permute: {}
+  - PadGT: {}
+  batch_size: 16
+  shuffle: true
+  drop_last: true
+
+EvalReader:
+  sample_transforms:
+  - Decode: {}
+  - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False}
+  - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
+  - Permute: {}
+  batch_transforms:
+  - PadBatch: {pad_to_stride: 32}
+  batch_size: 8
+  shuffle: false
+
+TestReader:
+  inputs_def:
+    image_shape: [1, 3, *eval_height, *eval_width]
+  sample_transforms:
+  - Decode: {}
+  - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False}
+  - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
+  - Permute: {}
+  batch_size: 1
+
+# Model
+architecture: PicoDet
+pretrain_weights: https://paddle-model-ecology.bj.bcebos.com/paddlex/pretrained/PicoDet-S_layout_pretrained_17cls.pdparams
+
+PicoDet:
+  backbone: LCNet
+  neck: LCPAN
+  head: PicoHeadV2
+
+LCNet:
+  scale: 0.75
+  feature_maps: [3, 4, 5]
+
+LCPAN:
+  out_channels: 96
+  use_depthwise: true
+  num_features: 4
+
+PicoHeadV2:
+  conv_feat:
+    name: PicoFeat
+    feat_in: 96
+    feat_out: 96
+    num_convs: 2
+    num_fpn_stride: 4
+    norm_type: bn
+    share_cls_reg: true
+    use_se: true
+  fpn_stride: [8, 16, 32, 64]
+  feat_in_chan: 96
+  prior_prob: 0.01
+  reg_max: 7
+  cell_offset: 0.5
+  grid_cell_scale: 5.0
+  static_assigner_epoch: 100
+  use_align_head: true
+  static_assigner:
+    name: ATSSAssigner
+    topk: 9
+    force_gt_matching: false
+  assigner:
+    name: TaskAlignedAssigner
+    topk: 13
+    alpha: 1.0
+    beta: 6.0
+  loss_class:
+    name: VarifocalLoss
+    use_sigmoid: false
+    iou_weighted: true
+    loss_weight: 1.0
+  loss_dfl:
+    name: DistributionFocalLoss
+    loss_weight: 0.5
+  loss_bbox:
+    name: GIoULoss
+    loss_weight: 2.5
+  nms:
+    name: MultiClassNMS
+    nms_top_k: 1000
+    keep_top_k: 100
+    score_threshold: 0.025
+    nms_threshold: 0.6
+
+# Optimizer
+LearningRate:
+  base_lr: 0.08
+  schedulers:
+  - name: CosineDecay
+    max_epochs: 300
+  - name: LinearWarmup
+    start_factor: 0.1
+    steps: 100
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.00004
+    type: L2
+
+# Export
+export:
+  post_process: true
+  nms: true
+  benchmark: false
+  fuse_conv_bn: false

+ 165 - 0
paddlex/repo_apis/PaddleDetection_api/configs/PicoDet-S_layout_3cls.yaml

@@ -0,0 +1,165 @@
+# Runtime
+epoch: 100
+log_iter: 10
+find_unused_parameters: true
+use_gpu: true
+use_xpu: false
+use_mlu: false
+use_npu: false
+use_ema: true
+save_dir: output
+snapshot_epoch: 10
+print_flops: false
+print_params: false
+
+# Dataset
+metric: COCO
+num_classes: 3
+
+worker_num: 6
+eval_height: &eval_height 480
+eval_width: &eval_width 480
+eval_size: &eval_size [*eval_height, *eval_width]
+
+TrainDataset:
+  name: COCODetDataset
+  image_dir: images
+  anno_path: annotations/instance_train.json
+  dataset_dir: datasets/COCO
+  data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
+
+EvalDataset:
+  name: COCODetDataset
+  image_dir: images
+  anno_path: annotations/instance_val.json
+  dataset_dir: datasets/COCO
+  allow_empty: true
+
+TestDataset:
+  name: ImageFolder
+  anno_path: annotations/instance_val.json
+  dataset_dir: datasets/COCO
+
+TrainReader:
+  sample_transforms:
+  - Decode: {}
+  - RandomCrop: {}
+  - RandomFlip: {prob: 0.5}
+  - RandomDistort: {}
+  batch_transforms:
+  - BatchRandomResize: {target_size: [416, 448, 480, 512, 544], random_size: True, random_interp: True, keep_ratio: False}
+  - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
+  - Permute: {}
+  - PadGT: {}
+  batch_size: 16
+  shuffle: true
+  drop_last: true
+
+EvalReader:
+  sample_transforms:
+  - Decode: {}
+  - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False}
+  - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
+  - Permute: {}
+  batch_transforms:
+  - PadBatch: {pad_to_stride: 32}
+  batch_size: 8
+  shuffle: false
+
+TestReader:
+  inputs_def:
+    image_shape: [1, 3, *eval_height, *eval_width]
+  sample_transforms:
+  - Decode: {}
+  - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False}
+  - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
+  - Permute: {}
+  batch_size: 1
+
+# Model
+architecture: PicoDet
+pretrain_weights: https://paddle-model-ecology.bj.bcebos.com/paddlex/pretrained/PicoDet-S_layout_pretrained_3cls.pdparams
+
+PicoDet:
+  backbone: LCNet
+  neck: LCPAN
+  head: PicoHeadV2
+
+LCNet:
+  scale: 0.75
+  feature_maps: [3, 4, 5]
+
+LCPAN:
+  out_channels: 96
+  use_depthwise: true
+  num_features: 4
+
+PicoHeadV2:
+  conv_feat:
+    name: PicoFeat
+    feat_in: 96
+    feat_out: 96
+    num_convs: 2
+    num_fpn_stride: 4
+    norm_type: bn
+    share_cls_reg: true
+    use_se: true
+  fpn_stride: [8, 16, 32, 64]
+  feat_in_chan: 96
+  prior_prob: 0.01
+  reg_max: 7
+  cell_offset: 0.5
+  grid_cell_scale: 5.0
+  static_assigner_epoch: 100
+  use_align_head: true
+  static_assigner:
+    name: ATSSAssigner
+    topk: 9
+    force_gt_matching: false
+  assigner:
+    name: TaskAlignedAssigner
+    topk: 13
+    alpha: 1.0
+    beta: 6.0
+  loss_class:
+    name: VarifocalLoss
+    use_sigmoid: false
+    iou_weighted: true
+    loss_weight: 1.0
+  loss_dfl:
+    name: DistributionFocalLoss
+    loss_weight: 0.5
+  loss_bbox:
+    name: GIoULoss
+    loss_weight: 2.5
+  nms:
+    name: MultiClassNMS
+    nms_top_k: 1000
+    keep_top_k: 100
+    score_threshold: 0.025
+    nms_threshold: 0.6
+
+# Optimizer
+LearningRate:
+  base_lr: 0.08
+  schedulers:
+  - name: CosineDecay
+    max_epochs: 300
+  - name: LinearWarmup
+    start_factor: 0.1
+    steps: 100
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.00004
+    type: L2
+
+# Export
+export:
+  post_process: true
+  nms: true
+  benchmark: false
+  fuse_conv_bn: false

+ 1 - 1
paddlex/repo_apis/PaddleDetection_api/object_det/model.py

@@ -177,7 +177,7 @@ class DetModel(BaseModel):
         if batch_size is not None:
             config.update_batch_size(batch_size, "eval")
         device_type, device_ids = parse_device(device)
-        if len(device_ids) > 1:
+        if device_ids is not None and len(device_ids) > 1:
             raise ValueError(
                 f"multi-{device_type} evaluation is not supported. Please use a single {device_type}."
             )

+ 43 - 0
paddlex/repo_apis/PaddleDetection_api/object_det/official_categories.py

@@ -11,11 +11,54 @@ official_categories = {
         {"name": "Table", "id": 3},
         {"name": "Figure", "id": 4},
     ],
+    "PicoDet-S_layout_3cls": [
+        {"name": "image", "id": 0},
+        {"name": "table", "id": 1},
+        {"name": "seal", "id": 2},
+    ],
+    "PicoDet-S_layout_17cls": [
+        {"name": "paragraph_title", "id": 0},
+        {"name": "image", "id": 1},
+        {"name": "text", "id": 2},
+        {"name": "number", "id": 3},
+        {"name": "abstract", "id": 4},
+        {"name": "content", "id": 5},
+        {"name": "figure_title", "id": 6},
+        {"name": "formula", "id": 7},
+        {"name": "table", "id": 8},
+        {"name": "table_title", "id": 9},
+        {"name": "reference", "id": 10},
+        {"name": "doc_title", "id": 11},
+        {"name": "footnote", "id": 12},
+        {"name": "header", "id": 13},
+        {"name": "algorithm", "id": 14},
+        {"name": "footer", "id": 15},
+        {"name": "seal", "id": 16},
+    ],
     "PicoDet-L_layout_3cls": [
         {"name": "image", "id": 0},
         {"name": "table", "id": 1},
         {"name": "seal", "id": 2},
     ],
+    "PicoDet-L_layout_17cls": [
+        {"name": "paragraph_title", "id": 0},
+        {"name": "image", "id": 1},
+        {"name": "text", "id": 2},
+        {"name": "number", "id": 3},
+        {"name": "abstract", "id": 4},
+        {"name": "content", "id": 5},
+        {"name": "figure_title", "id": 6},
+        {"name": "formula", "id": 7},
+        {"name": "table", "id": 8},
+        {"name": "table_title", "id": 9},
+        {"name": "reference", "id": 10},
+        {"name": "doc_title", "id": 11},
+        {"name": "footnote", "id": 12},
+        {"name": "header", "id": 13},
+        {"name": "algorithm", "id": 14},
+        {"name": "footer", "id": 15},
+        {"name": "seal", "id": 16},
+    ],
     "RT-DETR-H_layout_3cls": [
         {"name": "image", "id": 0},
         {"name": "table", "id": 1},

+ 44 - 0
paddlex/repo_apis/PaddleDetection_api/object_det/register.py

@@ -790,6 +790,35 @@ register_model_info(
     }
 )
 
+register_model_info(
+    {
+        "model_name": "PicoDet-S_layout_3cls",
+        "suite": "Det",
+        "config_path": osp.join(PDX_CONFIG_DIR, "PicoDet-S_layout_3cls.yaml"),
+        "supported_apis": ["train", "evaluate", "predict", "export", "infer"],
+        "supported_dataset_types": ["COCODetDataset"],
+        "supported_train_opts": {
+            "device": ["cpu", "gpu_nxcx", "xpu", "npu", "mlu"],
+            "dy2st": False,
+            "amp": ["OFF"],
+        },
+    }
+)
+
+register_model_info(
+    {
+        "model_name": "PicoDet-S_layout_17cls",
+        "suite": "Det",
+        "config_path": osp.join(PDX_CONFIG_DIR, "PicoDet-S_layout_17cls.yaml"),
+        "supported_apis": ["train", "evaluate", "predict", "export", "infer"],
+        "supported_dataset_types": ["COCODetDataset"],
+        "supported_train_opts": {
+            "device": ["cpu", "gpu_nxcx", "xpu", "npu", "mlu"],
+            "dy2st": False,
+            "amp": ["OFF"],
+        },
+    }
+)
 
 register_model_info(
     {
@@ -806,6 +835,21 @@ register_model_info(
     }
 )
 
+register_model_info(
+    {
+        "model_name": "PicoDet-L_layout_17cls",
+        "suite": "Det",
+        "config_path": osp.join(PDX_CONFIG_DIR, "PicoDet-L_layout_17cls.yaml"),
+        "supported_apis": ["train", "evaluate", "predict", "export", "infer"],
+        "supported_dataset_types": ["COCODetDataset"],
+        "supported_train_opts": {
+            "device": ["cpu", "gpu_nxcx", "xpu", "npu", "mlu"],
+            "dy2st": False,
+            "amp": ["OFF"],
+        },
+    }
+)
+
 
 register_model_info(
     {