6 ماه پیش · 3f070231aa
--- a/docs/module_usage/tutorials/ocr_modules/layout_detection.en.md
+++ b/docs/module_usage/tutorials/ocr_modules/layout_detection.en.md
@@ -9,6 +9,31 @@ The core task of structure analysis is to parse and segment the content of input
 
				 
			
 
				 ## II. Supported Model List
			
 
				 
			
 
				+* <b>The layout detection model includes 20 common categories: document title, paragraph title, text, page number, abstract, table, references, footnotes, header, footer, algorithm, formula, formula number, image, table, seal, figure_table title, chart, and sidebar text and lists of references</b>
			
 
				+<table>
			
 
				+<thead>
			
 
				+<tr>
			
 
				+<th>Model</th><th>Model Download Link</th>
			
 
				+<th>mAP(0.5) (%)</th>
			
 
				+<th>GPU Inference Time (ms)<br/>[Normal Mode / High-Performance Mode]</th>
			
 
				+<th>CPU Inference Time (ms)<br/>[Normal Mode / High-Performance Mode]</th>
			
 
				+<th>Model Storage Size (M)</th>
			
 
				+<th>Introduction</th>
			
 
				+</tr>
			
 
				+</thead>
			
 
				+<tbody>
			
 
				+<tr>
			
 
				+<td>PP-DocLayout_plus-L</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-DocLayout_plus-L_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-DocLayout_plus-L_pretrained.pdparams">Training Model</a></td>
			
 
				+<td>83.2</td>
			
 
				+<td>34.6244 / 10.3945</td>
			
 
				+<td>510.57 / - </td>
			
 
				+<td>126.01 M</td>
			
 
				+<td>A higher-precision layout area localization model trained on a self-built dataset containing Chinese and English papers, PPT, multi-layout magazines, contracts, books, exams, ancient books and research reports using RT-DETR-L</td>
			
 
				+</tr>
			
 
				+<tr>
			
 
				+</tbody>
			
 
				+</table>
			
 
				+
			
 
				 * <b>The layout detection model includes 23 common categories: document title, paragraph title, text, page number, abstract, table of contents, references, footnotes, header, footer, algorithm, formula, formula number, image, figure caption, table, table caption, seal, figure title, figure, header image, footer image, and sidebar text</b>
			
 
				 <table>
			
 
				 <thead>
			
@@ -49,7 +74,7 @@ The core task of structure analysis is to parse and segment the content of input
 
				 </tbody>
			
 
				 </table>
			
 
				 
			
 
				-> ❗ The above list includes the <b>3 core models</b> that are key supported by the text recognition module. The module actually supports a total of <b>11 full models</b>, including several predefined models with different categories. The complete model list is as follows:
			
 
				+> ❗ The above list includes the <b>4 core models</b> that are key supported by the text recognition module. The module actually supports a total of <b>12 full models</b>, including several predefined models with different categories. The complete model list is as follows:
			
 
				 
			
 
				 <details><summary> 👉 Details of Model List</summary>
			
 
				 
			
--- a/docs/module_usage/tutorials/ocr_modules/layout_detection.md
+++ b/docs/module_usage/tutorials/ocr_modules/layout_detection.md
@@ -9,6 +9,33 @@ comments: true
 
				 
			
 
				 ## 二、支持模型列表
			
 
				 
			
 
				+* <b>版面检测模型，包含20个常见的类别：文档标题、段落标题、文本、页码、摘要、目录、参考文献、脚注、页眉、页脚、算法、公式、公式编号、图像、表格、图和表标题（图标题、表格标题和图表标题）、印章、图表、侧栏文本和参考文献内容</b>
			
 
				+<table>
			
 
				+<thead>
			
 
				+<tr>
			
 
				+<th>模型</th><th>模型下载链接</th>
			
 
				+<th>mAP(0.5)（%）</th>
			
 
				+<th>GPU推理耗时（ms）<br/>[常规模式 / 高性能模式]</th>
			
 
				+<th>CPU推理耗时（ms）<br/>[常规模式 / 高性能模式]</th>
			
 
				+<th>模型存储大小（M）</th>
			
 
				+<th>介绍</th>
			
 
				+</tr>
			
 
				+</thead>
			
 
				+<tbody>
			
 
				+<tr>
			
 
				+<td>PP-DocLayout_plus-L</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-DocLayout_plus-L_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-DocLayout_plus-L_pretrained.pdparams">训练模型</a></td>
			
 
				+<td>83.2</td>
			
 
				+<td>34.6244 / 10.3945</td>
			
 
				+<td>510.57 / - </td>
			
 
				+<td>126.01 M</td>
			
 
				+<td>基于RT-DETR-L在包含中英文论文、多栏杂志、报纸、PPT、合同、书本、试卷、研报、古籍、日文文档、竖版文字文档等场景的自建数据集训练的更高精度版面区域定位模型</td>
			
 
				+</tr>
			
 
				+<tr>
			
 
				+</tbody>
			
 
				+</table>
			
 
				+
			
 
				+<b>注：以上精度指标的评估集是自建的版面区域检测数据集，包含中英文论文、杂志、报纸、研报、PPT、试卷、课本等 1300 张文档类型图片。</b>
			
 
				+
			
 
				 * <b>版面检测模型，包含23个常见的类别：文档标题、段落标题、文本、页码、摘要、目录、参考文献、脚注、页眉、页脚、算法、公式、公式编号、图像、图表标题、表格、表格标题、印章、图表标题、图表、页眉图像、页脚图像、侧栏文本</b>
			
 
				 <table>
			
 
				 <thead>
			
@@ -50,7 +77,7 @@ comments: true
 
				 </table>
			
 
				 
			
 
				 
			
 
				-> ❗ 以上列出的是版面检测模块重点支持的<b>3个核心模型</b>，该模块总共支持<b>11个全量模型</b>，包含多个预定义了不同类别的模型，完整的模型列表如下：
			
 
				+> ❗ 以上列出的是版面检测模块重点支持的<b>4个核心模型</b>，该模块总共支持<b>12个全量模型</b>，包含多个预定义了不同类别的模型，完整的模型列表如下：
			
 
				 
			
 
				 <details><summary> 👉模型列表详情</summary>
			
 
				 
			
@@ -186,7 +213,8 @@ comments: true
 
				           <ul>
			
 
				               <li><strong>测试数据集：</strong>
			
 
				                  <ul>
			
 
				-                    <li>版面检测模型： PaddleOCR 自建的版面区域检测数据集，包含中英文论文、杂志、合同、书本、试卷和研报等常见的 500 张文档类型图片。</li>
			
 
				+                    <li>20类版面检测模型： PaddleOCR 自建的版面区域检测数据集，包含中英文论文、杂志、报纸、研报、PPT、试卷、课本等 1300 张文档类型图片。</li>
			
 
				+                    <li>23类版面检测模型： PaddleOCR 自建的版面区域检测数据集，包含中英文论文、杂志、合同、书本、试卷和研报等常见的 500 张文档类型图片。</li>
			
 
				                     <li>表格版面检测模型：PaddleOCR 自建的版面表格区域检测数据集，包含中英文 7835 张带有表格的论文文档类型图片。</li>
			
 
				                     <li>3类版面检测模型：PaddleOCR 自建的版面区域检测数据集，包含中英文论文、杂志和研报等常见的 1154 张文档类型图片。</li>
			
 
				                     <li>5类英文文档区域检测模型： <a href="https://developer.ibm.com/exchanges/data/all/publaynet" target="_blank">PubLayNet</a> 的评估数据集，包含英文文档的 11245 张图片。</li>
			
--- a/docs/support_list/models_list.md
+++ b/docs/support_list/models_list.md
@@ -2341,6 +2341,80 @@ devanagari_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模
 
				 
			
 
				 ## [版面区域检测模块](../module_usage/tutorials/ocr_modules/layout_detection.md)
			
 
				 
			
 
				+* <b>版面检测模型，包含20个常见的类别：文档标题、段落标题、文本、页码、摘要、目录、参考文献、脚注、页眉、页脚、算法、公式、公式编号、图像、表格、图和表标题（图标题、表格标题和图表标题）、印章、图表、侧栏文本和参考文献内容</b>
			
 
				+<table>
			
 
				+<thead>
			
 
				+<tr>
			
 
				+<th>模型</th>
			
 
				+<th>mAP(0.5)（%）</th>
			
 
				+<th>GPU推理耗时（ms）<br/>[常规模式 / 高性能模式]</th>
			
 
				+<th>CPU推理耗时（ms）<br/>[常规模式 / 高性能模式]</th>
			
 
				+<th>模型存储大小（M）</th>
			
 
				+<th>yaml文件</th>
			
 
				+<th>模型下载链接</th>
			
 
				+</tr>
			
 
				+</thead>
			
 
				+<tbody>
			
 
				+<tr>
			
 
				+<td>PP-DocLayout_plus-L</td>
			
 
				+<td>83.2</td>
			
 
				+<td>34.6244 / 10.3945</td>
			
 
				+<td>510.57 / - </td>
			
 
				+<td>126.01 </td>
			
 
				+<td><a href="https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/layout_detection/PP-DocLayout_plus-L.yaml">PP-DocLayout_plus-L.yaml</a></td>
			
 
				+<td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-DocLayout_plus-L_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-DocLayout_plus-L_pretrained.pdparams">训练模型</a></td>
			
 
				+</tr>
			
 
				+</tbody>
			
 
				+</table>
			
 
				+
			
 
				+<b>注：以上精度指标的评估集是自建的版面区域检测数据集，包含中英文论文、杂志、报纸、研报、PPT、试卷、课本等 1300 张文档类型图片。</b>
			
 
				+
			
 
				+* <b>版面检测模型，包含23个常见的类别：文档标题、段落标题、文本、页码、摘要、目录、参考文献、脚注、页眉、页脚、算法、公式、公式编号、图像、图表标题、表格、表格标题、印章、图表标题、图表、页眉图像、页脚图像、侧栏文本</b>
			
 
				+<table>
			
 
				+<thead>
			
 
				+<tr>
			
 
				+<th>模型</th>
			
 
				+<th>mAP(0.5)（%）</th>
			
 
				+<th>GPU推理耗时（ms）<br/>[常规模式 / 高性能模式]</th>
			
 
				+<th>CPU推理耗时（ms）<br/>[常规模式 / 高性能模式]</th>
			
 
				+<th>模型存储大小（M）</th>
			
 
				+<th>yaml文件</th>
			
 
				+<th>模型下载链接</th>
			
 
				+</tr>
			
 
				+</thead>
			
 
				+<tbody>
			
 
				+<tr>
			
 
				+<td>PP-DocLayout-L</td>
			
 
				+<td>90.4</td>
			
 
				+<td>34.6244 / 10.3945</td>
			
 
				+<td>510.57 / - </td>
			
 
				+<td>123.76 </td>
			
 
				+<td><a href="https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/layout_detection/PP-DocLayout-L.yaml">PP-DocLayout-L.yaml</a></td>
			
 
				+<td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-DocLayout-L_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-DocLayout-L_pretrained.pdparams">训练模型</a></td>
			
 
				+</tr>
			
 
				+<tr>
			
 
				+<td>PP-DocLayout-M</td>
			
 
				+<td>75.2</td>
			
 
				+<td>13.3259 / 4.8685</td>
			
 
				+<td>44.0680 / 44.0680</td>
			
 
				+<td>22.578</td>
			
 
				+<td><a href="https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/layout_detection/PP-DocLayout-M.yaml">PP-DocLayout-M.yaml</a></td>
			
 
				+<td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-DocLayout-M_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-DocLayout-M_pretrained.pdparams">训练模型</a></td>
			
 
				+</tr>
			
 
				+<tr>
			
 
				+<td>PP-DocLayout-S</td>
			
 
				+<td>70.9</td>
			
 
				+<td>8.3008 / 2.3794</td>
			
 
				+<td>10.0623 / 9.9296</td>
			
 
				+<td>4.834</td>
			
 
				+<td><a href="https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/layout_detection/PP-DocLayout-S.yaml">PP-DocLayout-S.yaml</a></td>
			
 
				+<td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-DocLayout-S_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-DocLayout-S_pretrained.pdparams">训练模型</a></td>
			
 
				+</tr>
			
 
				+</tbody>
			
 
				+</table>
			
 
				+
			
 
				+<b>注：以上精度指标的评估集是自建的版面区域检测数据集，包含中英文论文、杂志和研报等常见的 500 张文档类型图片。</b>
			
 
				+
			
 
				 * <b>表格版面检测模型</b>
			
 
				 <table>
			
 
				 <thead>
			
--- a/paddlex/configs/modules/layout_detection/PP-DocBlockLayout.yaml
+++ b/paddlex/configs/modules/layout_detection/PP-DocBlockLayout.yaml
@@ -0,0 +1,40 @@
 
				+Global:
			
 
				+  model: PP-DocBlockLayout
			
 
				+  mode: check_dataset # check_dataset/train/evaluate/predict
			
 
				+  dataset_dir: "/paddle/dataset/paddlex/layout/det_layout_examples"
			
 
				+  device: gpu:0,1,2,3
			
 
				+  output: "output"
			
 
				+
			
 
				+CheckDataset:
			
 
				+  convert:
			
 
				+    enable: False
			
 
				+    src_dataset_type: null
			
 
				+  split:
			
 
				+    enable: False
			
 
				+    train_percent: null
			
 
				+    val_percent: null
			
 
				+
			
 
				+Train:
			
 
				+  num_classes: 11
			
 
				+  epochs_iters: 100
			
 
				+  batch_size: 1
			
 
				+  learning_rate: 0.0001
			
 
				+  pretrain_weight_path: https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-DocBlockLayout_pretrain.pdparams
			
 
				+  warmup_steps: 100
			
 
				+  resume_path: null
			
 
				+  log_interval: 10
			
 
				+  eval_interval: 1
			
 
				+
			
 
				+Evaluate:
			
 
				+  weight_path: "output/best_model/best_model.pdparams"
			
 
				+  log_interval: 10
			
 
				+
			
 
				+Export:
			
 
				+  weight_path: https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-DocBlockLayout_pretrain.pdparams
			
 
				+
			
 
				+Predict:
			
 
				+  batch_size: 1
			
 
				+  model_dir: "output/best_model/inference"
			
 
				+  input: "https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/layout.jpg"
			
 
				+  kernel_option:
			
 
				+    run_mode: paddle
			
--- a/paddlex/configs/modules/layout_detection/PP-DocLayout_plus-L.yaml
+++ b/paddlex/configs/modules/layout_detection/PP-DocLayout_plus-L.yaml
@@ -0,0 +1,40 @@
 
				+Global:
			
 
				+  model: PP-DocLayout_plus-L
			
 
				+  mode: check_dataset # check_dataset/train/evaluate/predict
			
 
				+  dataset_dir: "/paddle/dataset/paddlex/layout/det_layout_examples"
			
 
				+  device: gpu:0,1,2,3
			
 
				+  output: "output"
			
 
				+
			
 
				+CheckDataset:
			
 
				+  convert:
			
 
				+    enable: False
			
 
				+    src_dataset_type: null
			
 
				+  split:
			
 
				+    enable: False
			
 
				+    train_percent: null
			
 
				+    val_percent: null
			
 
				+
			
 
				+Train:
			
 
				+  num_classes: 11
			
 
				+  epochs_iters: 100
			
 
				+  batch_size: 1
			
 
				+  learning_rate: 0.0001
			
 
				+  pretrain_weight_path: https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-DocLayout_plus-L_pretrain.pdparams
			
 
				+  warmup_steps: 100
			
 
				+  resume_path: null
			
 
				+  log_interval: 10
			
 
				+  eval_interval: 1
			
 
				+
			
 
				+Evaluate:
			
 
				+  weight_path: "output/best_model/best_model.pdparams"
			
 
				+  log_interval: 10
			
 
				+
			
 
				+Export:
			
 
				+  weight_path: https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-DocLayout_plus-L_pretrain.pdparams
			
 
				+
			
 
				+Predict:
			
 
				+  batch_size: 1
			
 
				+  model_dir: "output/best_model/inference"
			
 
				+  input: "https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/layout.jpg"
			
 
				+  kernel_option:
			
 
				+    run_mode: paddle
			
--- a/paddlex/inference/models/object_detection/predictor.py
+++ b/paddlex/inference/models/object_detection/predictor.py
@@ -316,6 +316,8 @@ class DetPredictor(BasePredictor):
 
				             "BlazeFace",
			
 
				             "BlazeFace-FPN-SSH",
			
 
				             "PP-DocLayout-L",
			
 
				+            "PP-DocLayout_plus-L",
			
 
				+            "PP-DocBlockLayout",
			
 
				         ]
			
 
				         if any(name in self.model_name for name in models_required_imgsize):
			
 
				             ordered_required_keys = (
			
--- a/paddlex/inference/models/object_detection/utils.py
+++ b/paddlex/inference/models/object_detection/utils.py
@@ -65,4 +65,6 @@ STATIC_SHAPE_MODEL_LIST = [
 
				     "PP-DocLayout-L",
			
 
				     "PP-DocLayout-M",
			
 
				     "PP-DocLayout-S",
			
 
				+    "PP-DocLayout_plus-L",
			
 
				+    "PP-DocBlockLayout",
			
 
				 ]
			
--- a/paddlex/inference/utils/hpi_model_info_collection.json
+++ b/paddlex/inference/utils/hpi_model_info_collection.json
@@ -445,6 +445,14 @@
 
				       "onnxruntime",
			
 
				       "paddle"
			
 
				     ],
			
 
				+    "PP-DocLayout_plus-L": [
			
 
				+      "onnxruntime",
			
 
				+      "paddle"
			
 
				+    ],
			
 
				+    "PP-DocBlockLayout": [
			
 
				+      "onnxruntime",
			
 
				+      "paddle"
			
 
				+    ],
			
 
				     "RT-DETR-H_layout_17cls": [
			
 
				       "onnxruntime",
			
 
				       "paddle"
			
--- a/paddlex/inference/utils/official_models.py
+++ b/paddlex/inference/utils/official_models.py
@@ -331,6 +331,8 @@ PP-LCNet_x1_0_vehicle_attribute_infer.tar",
 
				     "PP-DocLayout-L": "https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-DocLayout-L_infer.tar",
			
 
				     "PP-DocLayout-M": "https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-DocLayout-M_infer.tar",
			
 
				     "PP-DocLayout-S": "https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-DocLayout-S_infer.tar",
			
 
				+    "PP-DocLayout_plus-L": "https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-DocLayout_plus-L_infer.tar",
			
 
				+    "PP-DocBlockLayout": "https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-DocBlockLayout_infer.tar",
			
 
				     "BEVFusion": "https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/BEVFusion_infer.tar",
			
 
				     "YOLO-Worldv2-L": "https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/YOLO-Worldv2-L_infer.tar",
			
 
				     "PP-DocBee-2B": "https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-DocBee-2B_infer.tar",
			
--- a/paddlex/modules/object_detection/model_list.py
+++ b/paddlex/modules/object_detection/model_list.py
@@ -81,4 +81,6 @@ MODELS = [
 
				     "PP-DocLayout-L",
			
 
				     "PP-DocLayout-M",
			
 
				     "PP-DocLayout-S",
			
 
				+    "PP-DocLayout_plus-L",
			
 
				+    "PP-DocBlockLayout",
			
 
				 ]
			
--- a/paddlex/repo_apis/PaddleDetection_api/configs/PP-DocBlockLayout.yaml
+++ b/paddlex/repo_apis/PaddleDetection_api/configs/PP-DocBlockLayout.yaml
@@ -0,0 +1,173 @@
 
				+# Runtime
			
 
				+epoch: 40
			
 
				+log_iter: 10
			
 
				+find_unused_parameters: true
			
 
				+use_gpu: true
			
 
				+use_xpu: false
			
 
				+use_mlu: false
			
 
				+use_npu: false
			
 
				+use_ema: true
			
 
				+ema_decay: 0.9999
			
 
				+ema_decay_type: "exponential"
			
 
				+ema_filter_no_grad: true
			
 
				+save_dir: output
			
 
				+snapshot_epoch: 1
			
 
				+print_flops: false
			
 
				+print_params: false
			
 
				+eval_size: [640, 640]
			
 
				+
			
 
				+# Dataset
			
 
				+metric: COCO
			
 
				+num_classes: 1
			
 
				+
			
 
				+worker_num: 4
			
 
				+
			
 
				+TrainDataset:
			
 
				+  name: COCODetDataset
			
 
				+  image_dir: images
			
 
				+  anno_path: annotations/instance_train.json
			
 
				+  dataset_dir: datasets/COCO
			
 
				+  data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
			
 
				+
			
 
				+EvalDataset:
			
 
				+  name: COCODetDataset
			
 
				+  image_dir: images
			
 
				+  anno_path: annotations/instance_val.json
			
 
				+  dataset_dir: datasets/COCO
			
 
				+  allow_empty: true
			
 
				+
			
 
				+TestDataset:
			
 
				+  name: ImageFolder
			
 
				+  anno_path: annotations/instance_val.json
			
 
				+  dataset_dir: datasets/COCO
			
 
				+
			
 
				+TrainReader:
			
 
				+  sample_transforms:
			
 
				+    - Decode: {}
			
 
				+    - RandomDistort: {prob: 0.8}
			
 
				+    - RandomExpand: {fill_value: [123.675, 116.28, 103.53]}
			
 
				+    - RandomCrop: {prob: 0.8}
			
 
				+    - RandomFlip: {}
			
 
				+  batch_transforms:
			
 
				+    - BatchRandomResize: {target_size: [480, 512, 544, 576, 608, 640, 640, 640, 672, 704, 736, 768, 800], random_size: True, random_interp: True, keep_ratio: False}
			
 
				+    - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none}
			
 
				+    - NormalizeBox: {}
			
 
				+    - BboxXYXY2XYWH: {}
			
 
				+    - Permute: {}
			
 
				+  batch_size: 8
			
 
				+  shuffle: true
			
 
				+  drop_last: true
			
 
				+  collate_batch: false
			
 
				+  use_shared_memory: true
			
 
				+
			
 
				+EvalReader:
			
 
				+  sample_transforms:
			
 
				+    - Decode: {}
			
 
				+    - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2}
			
 
				+    - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none}
			
 
				+    - Permute: {}
			
 
				+  batch_size: 4
			
 
				+  shuffle: false
			
 
				+  drop_last: false
			
 
				+
			
 
				+TestReader:
			
 
				+  inputs_def:
			
 
				+    image_shape: [3, 640, 640]
			
 
				+  sample_transforms:
			
 
				+    - Decode: {}
			
 
				+    - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2}
			
 
				+    - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none}
			
 
				+    - Permute: {}
			
 
				+  batch_size: 1
			
 
				+  shuffle: false
			
 
				+  drop_last: false
			
 
				+
			
 
				+# Model
			
 
				+architecture: DETR
			
 
				+pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/PPHGNetV2_L_ssld_pretrained.pdparams
			
 
				+
			
 
				+norm_type: sync_bn
			
 
				+hidden_dim: 256
			
 
				+use_focal_loss: True
			
 
				+
			
 
				+DETR:
			
 
				+  backbone: PPHGNetV2
			
 
				+  neck: HybridEncoder
			
 
				+  transformer: RTDETRTransformer
			
 
				+  detr_head: DINOHead
			
 
				+  post_process: DETRPostProcess
			
 
				+
			
 
				+PPHGNetV2:
			
 
				+  arch: 'L'
			
 
				+  return_idx: [1, 2, 3]
			
 
				+  freeze_stem_only: true
			
 
				+  freeze_at: 0
			
 
				+  freeze_norm: true
			
 
				+  lr_mult_list: [0., 0.05, 0.05, 0.05, 0.05]
			
 
				+
			
 
				+HybridEncoder:
			
 
				+  hidden_dim: 256
			
 
				+  use_encoder_idx: [2]
			
 
				+  num_encoder_layers: 1
			
 
				+  encoder_layer:
			
 
				+    name: TransformerLayer
			
 
				+    d_model: 256
			
 
				+    nhead: 8
			
 
				+    dim_feedforward: 1024
			
 
				+    dropout: 0.
			
 
				+    activation: 'gelu'
			
 
				+  expansion: 1.0
			
 
				+
			
 
				+RTDETRTransformer:
			
 
				+  num_queries: 300
			
 
				+  position_embed_type: sine
			
 
				+  feat_strides: [8, 16, 32]
			
 
				+  num_levels: 3
			
 
				+  nhead: 8
			
 
				+  num_decoder_layers: 6
			
 
				+  dim_feedforward: 1024
			
 
				+  dropout: 0.0
			
 
				+  activation: relu
			
 
				+  num_denoising: 100
			
 
				+  label_noise_ratio: 0.5
			
 
				+  box_noise_scale: 1.0
			
 
				+  learnt_init_query: false
			
 
				+
			
 
				+DINOHead:
			
 
				+  loss:
			
 
				+    name: DINOLoss
			
 
				+    loss_coeff: {class: 1, bbox: 5, giou: 2}
			
 
				+    aux_loss: true
			
 
				+    use_vfl: true
			
 
				+    matcher:
			
 
				+      name: HungarianMatcher
			
 
				+      matcher_coeff: {class: 2, bbox: 5, giou: 2}
			
 
				+
			
 
				+DETRPostProcess:
			
 
				+  num_top_queries: 300
			
 
				+
			
 
				+# Optimizer
			
 
				+LearningRate:
			
 
				+  base_lr: 0.0001
			
 
				+  schedulers:
			
 
				+  - !PiecewiseDecay
			
 
				+    gamma: 1.0
			
 
				+    milestones: [100]
			
 
				+    use_warmup: true
			
 
				+  - !LinearWarmup
			
 
				+    start_factor: 0.001
			
 
				+    steps: 100
			
 
				+
			
 
				+OptimizerBuilder:
			
 
				+  clip_grad_by_norm: 0.1
			
 
				+  regularizer: false
			
 
				+  optimizer:
			
 
				+    type: AdamW
			
 
				+    weight_decay: 0.0001
			
 
				+
			
 
				+# Export
			
 
				+export:
			
 
				+  post_process: true
			
 
				+  nms: true
			
 
				+  benchmark: false
			
 
				+  fuse_conv_bn: false
			
--- a/paddlex/repo_apis/PaddleDetection_api/configs/PP-DocLayout_plus-L.yaml
+++ b/paddlex/repo_apis/PaddleDetection_api/configs/PP-DocLayout_plus-L.yaml
@@ -0,0 +1,173 @@
 
				+# Runtime
			
 
				+epoch: 40
			
 
				+log_iter: 10
			
 
				+find_unused_parameters: true
			
 
				+use_gpu: true
			
 
				+use_xpu: false
			
 
				+use_mlu: false
			
 
				+use_npu: false
			
 
				+use_ema: true
			
 
				+ema_decay: 0.9999
			
 
				+ema_decay_type: "exponential"
			
 
				+ema_filter_no_grad: true
			
 
				+save_dir: output
			
 
				+snapshot_epoch: 1
			
 
				+print_flops: false
			
 
				+print_params: false
			
 
				+eval_size: [800, 800]
			
 
				+
			
 
				+# Dataset
			
 
				+metric: COCO
			
 
				+num_classes: 23
			
 
				+
			
 
				+worker_num: 4
			
 
				+
			
 
				+TrainDataset:
			
 
				+  name: COCODetDataset
			
 
				+  image_dir: images
			
 
				+  anno_path: annotations/instance_train.json
			
 
				+  dataset_dir: datasets/COCO
			
 
				+  data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
			
 
				+
			
 
				+EvalDataset:
			
 
				+  name: COCODetDataset
			
 
				+  image_dir: images
			
 
				+  anno_path: annotations/instance_val.json
			
 
				+  dataset_dir: datasets/COCO
			
 
				+  allow_empty: true
			
 
				+
			
 
				+TestDataset:
			
 
				+  name: ImageFolder
			
 
				+  anno_path: annotations/instance_val.json
			
 
				+  dataset_dir: datasets/COCO
			
 
				+
			
 
				+TrainReader:
			
 
				+  sample_transforms:
			
 
				+    - Decode: {}
			
 
				+    - RandomDistort: {prob: 0.8}
			
 
				+    - RandomExpand: {fill_value: [123.675, 116.28, 103.53]}
			
 
				+    - RandomCrop: {prob: 0.8}
			
 
				+    - RandomFlip: {}
			
 
				+  batch_transforms:
			
 
				+    - BatchRandomResize: {target_size: [672, 704, 736, 768, 800, 800, 800, 800, 832, 864, 896, 928], random_size: True, random_interp: True, keep_ratio: False}
			
 
				+    - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none}
			
 
				+    - NormalizeBox: {}
			
 
				+    - BboxXYXY2XYWH: {}
			
 
				+    - Permute: {}
			
 
				+  batch_size: 8
			
 
				+  shuffle: true
			
 
				+  drop_last: true
			
 
				+  collate_batch: false
			
 
				+  use_shared_memory: true
			
 
				+
			
 
				+EvalReader:
			
 
				+  sample_transforms:
			
 
				+    - Decode: {}
			
 
				+    - Resize: {target_size: [800, 800], keep_ratio: False, interp: 2}
			
 
				+    - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none}
			
 
				+    - Permute: {}
			
 
				+  batch_size: 4
			
 
				+  shuffle: false
			
 
				+  drop_last: false
			
 
				+
			
 
				+TestReader:
			
 
				+  inputs_def:
			
 
				+    image_shape: [3, 800, 800]
			
 
				+  sample_transforms:
			
 
				+    - Decode: {}
			
 
				+    - Resize: {target_size: [800, 800], keep_ratio: False, interp: 2}
			
 
				+    - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none}
			
 
				+    - Permute: {}
			
 
				+  batch_size: 1
			
 
				+  shuffle: false
			
 
				+  drop_last: false
			
 
				+
			
 
				+# Model
			
 
				+architecture: DETR
			
 
				+pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/PPHGNetV2_L_ssld_pretrained.pdparams
			
 
				+
			
 
				+norm_type: sync_bn
			
 
				+hidden_dim: 256
			
 
				+use_focal_loss: True
			
 
				+
			
 
				+DETR:
			
 
				+  backbone: PPHGNetV2
			
 
				+  neck: HybridEncoder
			
 
				+  transformer: RTDETRTransformer
			
 
				+  detr_head: DINOHead
			
 
				+  post_process: DETRPostProcess
			
 
				+
			
 
				+PPHGNetV2:
			
 
				+  arch: 'L'
			
 
				+  return_idx: [1, 2, 3]
			
 
				+  freeze_stem_only: true
			
 
				+  freeze_at: 0
			
 
				+  freeze_norm: true
			
 
				+  lr_mult_list: [0., 0.05, 0.05, 0.05, 0.05]
			
 
				+
			
 
				+HybridEncoder:
			
 
				+  hidden_dim: 256
			
 
				+  use_encoder_idx: [2]
			
 
				+  num_encoder_layers: 1
			
 
				+  encoder_layer:
			
 
				+    name: TransformerLayer
			
 
				+    d_model: 256
			
 
				+    nhead: 8
			
 
				+    dim_feedforward: 1024
			
 
				+    dropout: 0.
			
 
				+    activation: 'gelu'
			
 
				+  expansion: 1.0
			
 
				+
			
 
				+RTDETRTransformer:
			
 
				+  num_queries: 300
			
 
				+  position_embed_type: sine
			
 
				+  feat_strides: [8, 16, 32]
			
 
				+  num_levels: 3
			
 
				+  nhead: 8
			
 
				+  num_decoder_layers: 6
			
 
				+  dim_feedforward: 1024
			
 
				+  dropout: 0.0
			
 
				+  activation: relu
			
 
				+  num_denoising: 100
			
 
				+  label_noise_ratio: 0.5
			
 
				+  box_noise_scale: 1.0
			
 
				+  learnt_init_query: false
			
 
				+
			
 
				+DINOHead:
			
 
				+  loss:
			
 
				+    name: DINOLoss
			
 
				+    loss_coeff: {class: 1, bbox: 5, giou: 2}
			
 
				+    aux_loss: true
			
 
				+    use_vfl: true
			
 
				+    matcher:
			
 
				+      name: HungarianMatcher
			
 
				+      matcher_coeff: {class: 2, bbox: 5, giou: 2}
			
 
				+
			
 
				+DETRPostProcess:
			
 
				+  num_top_queries: 300
			
 
				+
			
 
				+# Optimizer
			
 
				+LearningRate:
			
 
				+  base_lr: 0.0001
			
 
				+  schedulers:
			
 
				+  - !PiecewiseDecay
			
 
				+    gamma: 1.0
			
 
				+    milestones: [100]
			
 
				+    use_warmup: true
			
 
				+  - !LinearWarmup
			
 
				+    start_factor: 0.001
			
 
				+    steps: 100
			
 
				+
			
 
				+OptimizerBuilder:
			
 
				+  clip_grad_by_norm: 0.1
			
 
				+  regularizer: false
			
 
				+  optimizer:
			
 
				+    type: AdamW
			
 
				+    weight_decay: 0.0001
			
 
				+
			
 
				+# Export
			
 
				+export:
			
 
				+  post_process: true
			
 
				+  nms: true
			
 
				+  benchmark: false
			
 
				+  fuse_conv_bn: false
			
--- a/paddlex/repo_apis/PaddleDetection_api/object_det/official_categories.py
+++ b/paddlex/repo_apis/PaddleDetection_api/object_det/official_categories.py
@@ -217,4 +217,29 @@ official_categories = {
 
				         {"name": "footer_image", "id": 21},
			
 
				         {"name": "aside_text", "id": 22},
			
 
				     ],
			
 
				+    "PP-DocLayout_plus-L": [
			
 
				+        {"name": "paragraph_title", "id": 0},
			
 
				+        {"name": "image", "id": 1},
			
 
				+        {"name": "text", "id": 2},
			
 
				+        {"name": "number", "id": 3},
			
 
				+        {"name": "abstract", "id": 4},
			
 
				+        {"name": "content", "id": 5},
			
 
				+        {"name": "figure_title", "id": 6},
			
 
				+        {"name": "formula", "id": 7},
			
 
				+        {"name": "table", "id": 8},
			
 
				+        {"name": "reference", "id": 9},
			
 
				+        {"name": "doc_title", "id": 10},
			
 
				+        {"name": "footnote", "id": 11},
			
 
				+        {"name": "header", "id": 12},
			
 
				+        {"name": "algorithm", "id": 13},
			
 
				+        {"name": "footer", "id": 14},
			
 
				+        {"name": "seal", "id": 15},
			
 
				+        {"name": "chart", "id": 16},
			
 
				+        {"name": "formula_number", "id": 17},
			
 
				+        {"name": "aside_text", "id": 18},
			
 
				+        {"name": "reference_content", "id": 19},
			
 
				+    ],
			
 
				+    "PP-DocBlockLayout": [
			
 
				+        {"name": "Region", "id": 0},
			
 
				+    ],
			
 
				 }
			
--- a/paddlex/repo_apis/PaddleDetection_api/object_det/register.py
+++ b/paddlex/repo_apis/PaddleDetection_api/object_det/register.py
@@ -1103,3 +1103,33 @@ register_model_info(
 
				         },
			
 
				     }
			
 
				 )
			
 
				+
			
 
				+register_model_info(
			
 
				+    {
			
 
				+        "model_name": "PP-DocLayout_plus-L",
			
 
				+        "suite": "Det",
			
 
				+        "config_path": osp.join(PDX_CONFIG_DIR, "PP-DocLayout_plus-L.yaml"),
			
 
				+        "supported_apis": ["train", "evaluate", "predict", "export", "infer"],
			
 
				+        "supported_dataset_types": ["COCODetDataset"],
			
 
				+        "supported_train_opts": {
			
 
				+            "device": ["cpu", "gpu_nxcx", "xpu", "npu", "mlu"],
			
 
				+            "dy2st": False,
			
 
				+            "amp": ["OFF"],
			
 
				+        },
			
 
				+    }
			
 
				+)
			
 
				+
			
 
				+register_model_info(
			
 
				+    {
			
 
				+        "model_name": "PP-DocBlockLayout",
			
 
				+        "suite": "Det",
			
 
				+        "config_path": osp.join(PDX_CONFIG_DIR, "PP-DocBlockLayout.yaml"),
			
 
				+        "supported_apis": ["train", "evaluate", "predict", "export", "infer"],
			
 
				+        "supported_dataset_types": ["COCODetDataset"],
			
 
				+        "supported_train_opts": {
			
 
				+            "device": ["cpu", "gpu_nxcx", "xpu", "npu", "mlu"],
			
 
				+            "dy2st": False,
			
 
				+            "amp": ["OFF"],
			
 
				+        },
			
 
				+    }
			
 
				+)