|
|
@@ -24,82 +24,67 @@ The seal text recognition pipeline is used to recognize the text content of seal
|
|
|
<th>CPU Inference Time (ms)<br/>[Normal Mode / High-Performance Mode]</th>
|
|
|
<th>CPU Inference Time (ms)<br/>[Normal Mode / High-Performance Mode]</th>
|
|
|
<th>Model Storage Size (M)</th>
|
|
|
-<th>Description</th>
|
|
|
+<th>Introduction</th>
|
|
|
</tr>
|
|
|
</thead>
|
|
|
<tbody>
|
|
|
<tr>
|
|
|
-<td>PicoDet_layout_1x</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PicoDet_layout_1x_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PicoDet_layout_1x_pretrained.pdparams">Trained Model</a></td>
|
|
|
-<td>86.8</td>
|
|
|
-<td>9.03 / 3.10</td>
|
|
|
-<td>25.82 / 20.70</td>
|
|
|
-<td>7.4</td>
|
|
|
-<td>An efficient layout area localization model trained on the PubLayNet dataset based on PicoDet-1x can locate five types of areas, including text, titles, tables, images, and lists.</td>
|
|
|
-</tr>
|
|
|
-<tr>
|
|
|
-<td>PicoDet_layout_1x_table</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PicoDet_layout_1x_table_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PicoDet_layout_1x_table_pretrained.pdparams">Trained Model</a></td>
|
|
|
-<td>95.7</td>
|
|
|
-<td>8.02 / 3.09</td>
|
|
|
-<td>23.70 / 20.41</td>
|
|
|
-<td>7.4 M</td>
|
|
|
-<td>An efficient layout area localization model trained on the PubLayNet dataset based on PicoDet-1x can locate one type of tables.</td>
|
|
|
-</tr>
|
|
|
-<tr>
|
|
|
-<td>PicoDet-S_layout_3cls</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PicoDet-S_layout_3cls_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PicoDet-S_layout_3cls_pretrained.pdparams">Trained Model</a></td>
|
|
|
-<td>87.1</td>
|
|
|
-<td>8.99 / 2.22</td>
|
|
|
-<td>16.11 / 8.73</td>
|
|
|
-<td>4.8</td>
|
|
|
-<td>An high-efficient layout area localization model trained on a self-constructed dataset based on PicoDet-S for scenarios such as Chinese and English papers, magazines, and research reports includes three categories: tables, images, and seals.</td>
|
|
|
-</tr>
|
|
|
-<tr>
|
|
|
-<td>PicoDet-S_layout_17cls</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PicoDet-S_layout_17cls_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PicoDet-S_layout_17cls_pretrained.pdparams">Trained Model</a></td>
|
|
|
-<td>70.3</td>
|
|
|
-<td>9.11 / 2.12</td>
|
|
|
-<td>15.42 / 9.12</td>
|
|
|
-<td>4.8</td>
|
|
|
-<td>A high-efficient layout area localization model trained on a self-constructed dataset based on PicoDet-S_layout_17cls for scenarios such as Chinese and English papers, magazines, and research reports includes 17 common layout categories, namely: paragraph titles, images, text, numbers, abstracts, content, chart titles, formulas, tables, table titles, references, document titles, footnotes, headers, algorithms, footers, and seals.</td>
|
|
|
-</tr>
|
|
|
-<tr>
|
|
|
-<td>PicoDet-L_layout_3cls</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PicoDet-L_layout_3cls_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PicoDet-L_layout_3cls_pretrained.pdparams">Trained Model</a></td>
|
|
|
-<td>89.3</td>
|
|
|
-<td>13.05 / 4.50</td>
|
|
|
-<td>41.30 / 41.30</td>
|
|
|
-<td>22.6</td>
|
|
|
-<td>An efficient layout area localization model trained on a self-constructed dataset based on PicoDet-L for scenarios such as Chinese and English papers, magazines, and research reports includes three categories: tables, images, and seals.</td>
|
|
|
+<td>PP-DocLayout-L</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-DocLayout-L_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-DocLayout-L_pretrained.pdparams">Training Model</a></td>
|
|
|
+<td>90.4</td>
|
|
|
+<td>34.6244 / 10.3945</td>
|
|
|
+<td>510.57 / -</td>
|
|
|
+<td>123.76 M</td>
|
|
|
+<td>A high-precision layout area localization model trained on a self-built dataset containing Chinese and English papers, magazines, contracts, books, exams, and research reports using RT-DETR-L.</td>
|
|
|
</tr>
|
|
|
<tr>
|
|
|
-<td>PicoDet-L_layout_17cls</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PicoDet-L_layout_17cls_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PicoDet-L_layout_17cls_pretrained.pdparams">Trained Model</a></td>
|
|
|
-<td>79.9</td>
|
|
|
-<td>13.50 / 4.69</td>
|
|
|
-<td>43.32 / 43.32</td>
|
|
|
-<td>22.6</td>
|
|
|
-<td>A efficient layout area localization model trained on a self-constructed dataset based on PicoDet-L_layout_17cls for scenarios such as Chinese and English papers, magazines, and research reports includes 17 common layout categories, namely: paragraph titles, images, text, numbers, abstracts, content, chart titles, formulas, tables, table titles, references, document titles, footnotes, headers, algorithms, footers, and seals.</td>
|
|
|
+<td>PP-DocLayout-M</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-DocLayout-M_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-DocLayout-M_pretrained.pdparams">Training Model</a></td>
|
|
|
+<td>75.2</td>
|
|
|
+<td>13.3259 / 4.8685</td>
|
|
|
+<td>44.0680 / 44.0680</td>
|
|
|
+<td>22.578</td>
|
|
|
+<td>A layout area localization model with balanced precision and efficiency, trained on a self-built dataset containing Chinese and English papers, magazines, contracts, books, exams, and research reports using PicoDet-L.</td>
|
|
|
</tr>
|
|
|
<tr>
|
|
|
-<td>RT-DETR-H_layout_3cls</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/RT-DETR-H_layout_3cls_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/RT-DETR-H_layout_3cls_pretrained.pdparams">Trained Model</a></td>
|
|
|
-<td>95.9</td>
|
|
|
-<td>114.93 / 27.71</td>
|
|
|
-<td>947.56 / 947.56</td>
|
|
|
-<td>470.1</td>
|
|
|
-<td>A high-precision layout area localization model trained on a self-constructed dataset based on RT-DETR-H for scenarios such as Chinese and English papers, magazines, and research reports includes three categories: tables, images, and seals.</td>
|
|
|
-</tr>
|
|
|
-<tr>
|
|
|
-<td>RT-DETR-H_layout_17cls</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/RT-DETR-H_layout_17cls_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/RT-DETR-H_layout_17cls_pretrained.pdparams">Trained Model</a></td>
|
|
|
-<td>92.6</td>
|
|
|
-<td>115.29 / 104.09</td>
|
|
|
-<td>995.27 / 995.27</td>
|
|
|
-<td>470.2</td>
|
|
|
-<td>A high-precision layout area localization model trained on a self-constructed dataset based on RT-DETR-H for scenarios such as Chinese and English papers, magazines, and research reports includes 17 common layout categories, namely: paragraph titles, images, text, numbers, abstracts, content, chart titles, formulas, tables, table titles, references, document titles, footnotes, headers, algorithms, footers, and seals.</td>
|
|
|
+<td>PP-DocLayout-S</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-DocLayout-S_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-DocLayout-S_pretrained.pdparams">Training Model</a></td>
|
|
|
+<td>70.9</td>
|
|
|
+<td>8.3008 / 2.3794</td>
|
|
|
+<td>10.0623 / 9.9296</td>
|
|
|
+<td>4.834</td>
|
|
|
+<td>A high-efficiency layout area localization model trained on a self-built dataset containing Chinese and English papers, magazines, contracts, books, exams, and research reports using PicoDet-S.</td>
|
|
|
</tr>
|
|
|
</tbody>
|
|
|
</table>
|
|
|
|
|
|
-> ❗ The above listed are the <b>3 core models</b> that the layout detection module mainly supports. This module supports a total of <b>11 full models</b>, including multiple models predefined with different categories. Among them, there are 9 models that include the seal category. In addition to the above 3 core models, the remaining model list is as follows:
|
|
|
|
|
|
-<details><summary> 👉Model List Details</summary>
|
|
|
+> ❗ The above list includes the <b>3 core models</b> that are key supported by the text recognition module. The module actually supports a total of <b>11 full models</b>, including several predefined models with different categories. The complete model list is as follows:
|
|
|
|
|
|
-* <b>3-category Layout Detection Models, including table, image, and seal</b>
|
|
|
+<details><summary> 👉 Details of Model List</summary>
|
|
|
+
|
|
|
+* <b>Table Layout Detection Model</b>
|
|
|
+<table>
|
|
|
+<thead>
|
|
|
+<tr>
|
|
|
+<th>Model</th><th>Model Download Link</th>
|
|
|
+<th>mAP(0.5) (%)</th>
|
|
|
+<th>CPU Inference Time (ms)<br/>[Normal Mode / High-Performance Mode]</th>
|
|
|
+<th>CPU Inference Time (ms)<br/>[Normal Mode / High-Performance Mode]</th>
|
|
|
+<th>Model Storage Size (M)</th>
|
|
|
+<th>Introduction</th>
|
|
|
+</tr>
|
|
|
+</thead>
|
|
|
+<tbody>
|
|
|
+<tr>
|
|
|
+<td>PicoDet_layout_1x_table</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PicoDet_layout_1x_table_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PicoDet_layout_1x_table_pretrained.pdparams">Training Model</a></td>
|
|
|
+<td>97.5</td>
|
|
|
+<td>8.02 / 3.09</td>
|
|
|
+<td>23.70 / 20.41</td>
|
|
|
+<td>7.4 M</td>
|
|
|
+<td>A high-efficiency layout area localization model trained on a self-built dataset using PicoDet-1x, capable of detecting table regions.</td>
|
|
|
+</tr>
|
|
|
+</tbody></table>
|
|
|
+<b>Note: The evaluation dataset for the above precision metrics is a self-built layout table area detection dataset by PaddleOCR, containing 7835 Chinese and English document images with tables. GPU inference time is based on an NVIDIA Tesla T4 machine with FP32 precision. CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.</b>
|
|
|
+
|
|
|
+* <b>3-Class Layout Detection Model, including Table, Image, and Stamp</b>
|
|
|
<table>
|
|
|
<thead>
|
|
|
<tr>
|
|
|
@@ -118,7 +103,7 @@ The seal text recognition pipeline is used to recognize the text content of seal
|
|
|
<td>8.99 / 2.22</td>
|
|
|
<td>16.11 / 8.73</td>
|
|
|
<td>4.8</td>
|
|
|
-<td>A high-efficiency layout area localization model trained on a self-built dataset for Chinese and English papers, magazines, and research reports based on the lightweight PicoDet-S model</td>
|
|
|
+<td>A high-efficiency layout area localization model trained on a self-built dataset of Chinese and English papers, magazines, and research reports using PicoDet-S.</td>
|
|
|
</tr>
|
|
|
<tr>
|
|
|
<td>PicoDet-L_layout_3cls</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PicoDet-L_layout_3cls_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PicoDet-L_layout_3cls_pretrained.pdparams">Training Model</a></td>
|
|
|
@@ -126,7 +111,7 @@ The seal text recognition pipeline is used to recognize the text content of seal
|
|
|
<td>13.05 / 4.50</td>
|
|
|
<td>41.30 / 41.30</td>
|
|
|
<td>22.6</td>
|
|
|
-<td>A layout area localization model with balanced efficiency and accuracy, trained on a self-built dataset for Chinese and English papers, magazines, and research reports based on PicoDet-L</td>
|
|
|
+<td>A balanced efficiency and precision layout area localization model trained on a self-built dataset of Chinese and English papers, magazines, and research reports using PicoDet-L.</td>
|
|
|
</tr>
|
|
|
<tr>
|
|
|
<td>RT-DETR-H_layout_3cls</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/RT-DETR-H_layout_3cls_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/RT-DETR-H_layout_3cls_pretrained.pdparams">Training Model</a></td>
|
|
|
@@ -134,12 +119,35 @@ The seal text recognition pipeline is used to recognize the text content of seal
|
|
|
<td>114.93 / 27.71</td>
|
|
|
<td>947.56 / 947.56</td>
|
|
|
<td>470.1</td>
|
|
|
-<td>A high-precision layout area localization model trained on a self-built dataset for Chinese and English papers, magazines, and research reports based on RT-DETR-H</td>
|
|
|
+<td>A high-precision layout area localization model trained on a self-built dataset of Chinese and English papers, magazines, and research reports using RT-DETR-H.</td>
|
|
|
</tr>
|
|
|
</tbody></table>
|
|
|
|
|
|
+* <b>5-Class English Document Area Detection Model, including Text, Title, Table, Image, and List</b>
|
|
|
+<table>
|
|
|
+<thead>
|
|
|
+<tr>
|
|
|
+<th>Model</th><th>Model Download Link</th>
|
|
|
+<th>mAP(0.5) (%)</th>
|
|
|
+<th>CPU Inference Time (ms)<br/>[Normal Mode / High-Performance Mode]</th>
|
|
|
+<th>CPU Inference Time (ms)<br/>[Normal Mode / High-Performance Mode]</th>
|
|
|
+<th>Model Storage Size (M)</th>
|
|
|
+<th>Introduction</th>
|
|
|
+</tr>
|
|
|
+</thead>
|
|
|
+<tbody>
|
|
|
+<tr>
|
|
|
+<td>PicoDet_layout_1x</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PicoDet_layout_1x_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PicoDet_layout_1x_pretrained.pdparams">Training Model</a></td>
|
|
|
+<td>97.8</td>
|
|
|
+<td>9.03 / 3.10</td>
|
|
|
+<td>25.82 / 20.70</td>
|
|
|
+<td>7.4</td>
|
|
|
+<td>A high-efficiency English document layout area localization model trained on the PubLayNet dataset using PicoDet-1x.</td>
|
|
|
+</tr>
|
|
|
+</tbody></table>
|
|
|
+<b>Note: The evaluation dataset for the above precision metrics is the [PubLayNet](https://developer.ibm.com/exchanges/data/all/publaynet/) dataset, containing 11245 English document images. GPU inference time is based on an NVIDIA Tesla T4 machine with FP32 precision. CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.</b>
|
|
|
|
|
|
-* <b>17-category Layout Detection Models, including 17 common layout categories: paragraph title, image, text, number, abstract, content, figure title, formula, table, table title, reference, document title, footnote, header, algorithm, footer, seal</b>
|
|
|
+* <b>17-Class Area Detection Model, including 17 common layout categories: Paragraph Title, Image, Text, Number, Abstract, Content, Figure Caption, Formula, Table, Table Caption, References, Document Title, Footnote, Header, Algorithm, Footer, and Stamp</b>
|
|
|
<table>
|
|
|
<thead>
|
|
|
<tr>
|
|
|
@@ -158,7 +166,7 @@ The seal text recognition pipeline is used to recognize the text content of seal
|
|
|
<td>9.11 / 2.12</td>
|
|
|
<td>15.42 / 9.12</td>
|
|
|
<td>4.8</td>
|
|
|
-<td>A high-efficiency layout area localization model trained on a self-built dataset for Chinese and English papers, magazines, and research reports based on the lightweight PicoDet-S model</td>
|
|
|
+<td>A high-efficiency layout area localization model trained on a self-built dataset of Chinese and English papers, magazines, and research reports using PicoDet-S.</td>
|
|
|
</tr>
|
|
|
<tr>
|
|
|
<td>PicoDet-L_layout_17cls</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PicoDet-L_layout_17cls_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PicoDet-L_layout_17cls_pretrained.pdparams">Training Model</a></td>
|
|
|
@@ -166,7 +174,7 @@ The seal text recognition pipeline is used to recognize the text content of seal
|
|
|
<td>13.50 / 4.69</td>
|
|
|
<td>43.32 / 43.32</td>
|
|
|
<td>22.6</td>
|
|
|
-<td>A layout area localization model with balanced efficiency and accuracy, trained on a self-built dataset for Chinese and English papers, magazines, and research reports based on PicoDet-L</td>
|
|
|
+<td>A balanced efficiency and precision layout area localization model trained on a self-built dataset of Chinese and English papers, magazines, and research reports using PicoDet-L.</td>
|
|
|
</tr>
|
|
|
<tr>
|
|
|
<td>RT-DETR-H_layout_17cls</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/RT-DETR-H_layout_17cls_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/RT-DETR-H_layout_17cls_pretrained.pdparams">Training Model</a></td>
|
|
|
@@ -174,11 +182,12 @@ The seal text recognition pipeline is used to recognize the text content of seal
|
|
|
<td>115.29 / 104.09</td>
|
|
|
<td>995.27 / 995.27</td>
|
|
|
<td>470.2</td>
|
|
|
-<td>A high-precision layout area localization model trained on a self-built dataset for Chinese and English papers, magazines, and research reports based on RT-DETR-H</td>
|
|
|
+<td>A high-precision layout area localization model trained on a self-built dataset of Chinese and English papers, magazines, and research reports using RT-DETR-H.</td>
|
|
|
</tr>
|
|
|
</tbody>
|
|
|
</table>
|
|
|
|
|
|
+
|
|
|
<p><b>Document Image Orientation Classification Module (Optional):</b></p>
|
|
|
<table>
|
|
|
<thead>
|