Переглянути джерело

new layout docs (#2218)

* new layout docs

* new layout docs
Sunflower7788 1 рік тому
батько
коміт
e2deec1cf6

+ 5 - 5
docs/module_usage/tutorials/ocr_modules/layout_detection.md

@@ -10,12 +10,12 @@
 <details>
    <summary> 👉模型列表详情</summary>
 
-|模型|mAP(0.5)(%)|GPU推理耗时(ms)|CPU推理耗时 (ms)|模型存储大小(M)|介绍|
+|模型|mAP(0.5)(%)|GPU推理耗时(ms)|CPU推理耗时 (ms)|模型存储大小(M|介绍|
 |-|-|-|-|-|-|
-|PicoDet-L_layout_3cls|89.3|15.7|159.8|22.6|基于PicoDet-L的高效率版面区域定位模型,包含3个类别:表格,图像和印章|
-|PicoDet_layout_1x|86.8|13.0|91.3|7.4|基于PicoDet-1x的高效率版面区域定位模型,包含文字、标题、表格、图片、列表名|
-|RT-DETR-H_layout_17cls|92.6|115.1|3827.2|470.2|基于RT-DETR-H的的高精度版面区域定位模型,包含17个版面常见类别,分别是:段落标题、图片、文本、数字、摘要、内容、图表标题、公式、表格、表格标题、参考文献、文档标题、脚注、页眉、算法、页脚、印章|
-|RT-DETR-H_layout_3cls|95.9|114.6|3832.6|470.1|基于RT-DETR-H的的高精度版面区域定位模型,包含3个类别:表格,图像和印章|
+|PicoDet_layout_1x|86.8|13.0|91.3|7.4|基于PicoDet-1x在PubLayNet数据集训练的高效率版面区域定位模型,可定位包含文字、标题、表格、图片以及列表这5类区域|
+|PicoDet-L_layout_3cls|89.3|15.7|159.8|22.6|基于PicoDet-L在中英文论文、杂志和研报等场景上自建数据集训练的高效率版面区域定位模型,包含3个类别:表格,图像和印章|
+|RT-DETR-H_layout_3cls|95.9|114.6|3832.6|470.1|基于RT-DETR-H在中英文论文、杂志和研报等场景上自建数据集训练的高精度版面区域定位模型,包含3个类别:表格,图像和印章|
+|RT-DETR-H_layout_17cls|92.6|115.1|3827.2|470.2|基于RT-DETR-H在中英文论文、杂志和研报等场景上自建数据集训练的高精度版面区域定位模型,包含17个版面常见类别,分别是:段落标题、图片、文本、数字、摘要、内容、图表标题、公式、表格、表格标题、参考文献、文档标题、脚注、页眉、算法、页脚、印章|
 
 **注:以上精度指标的评估集是 PaddleOCR 自建的版面区域分析数据集,包含中英文论文、杂志和研报等常见的 1w 张文档类型图片。GPU 推理耗时基于 NVIDIA Tesla T4 机器,精度类型为 FP32, CPU 推理速度基于 Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz,线程数为 8,精度类型为 FP32。**
 </details>

+ 4 - 4
docs/module_usage/tutorials/ocr_modules/layout_detection_en.md

@@ -12,10 +12,10 @@ The core task of structure analysis is to parse and segment the content of input
 
 | Model | mAP(0.5) (%) | GPU Inference Time (ms) | CPU Inference Time (ms) | Model Size (M) | Description |
 |-|-|-|-|-|-|
-| PicoDet-L_layout_3cls | 89.3 | 15.7 | 159.8 | 22.6 | High-efficiency structure analysis model based on PicoDet-L, including 3 classes: table, image, and seal |
-| PicoDet_layout_1x | 86.8 | 13.0 | 91.3 | 7.4 | High-efficiency structure analysis model based on PicoDet-1x, including text, title, table, image, and list |
-| RT-DETR-H_layout_17cls | 92.6 | 115.1 | 3827.2 | 470.2 | High-precision structure analysis model based on RT-DETR-H, containing 17 common layout categories, namely: paragraph title, image, text, number, abstract, content, figure title, formula, table, table title, reference, document title, footnote, header, algorithm, footer, and seal. |
-| RT-DETR-H_layout_3cls | 95.9 | 114.6 | 3832.6 | 470.1 | High-precision structure analysis model based on RT-DETR-H, including 3 classes: table, image, and seal |
+| PicoDet_layout_1x | 86.8 | 13.0 | 91.3 | 7.4 | An efficient layout area localization model trained on the PubLayNet dataset based on PicoDet-1x can locate five types of areas, including text, titles, tables, images, and lists. |
+| PicoDet-L_layout_3cls | 89.3 | 15.7 | 159.8 | 22.6 | An efficient layout area localization model trained on a self-constructed dataset based on PicoDet-L for scenarios such as Chinese and English papers, magazines, and research reports includes three categories: tables, images, and seals. |
+| RT-DETR-H_layout_3cls | 95.9 | 114.6 | 3832.6 | 470.1 | A high-precision layout area localization model trained on a self-constructed dataset based on RT-DETR-H for scenarios such as Chinese and English papers, magazines, and research reports includes three categories: tables, images, and seals. |
+| RT-DETR-H_layout_17cls | 92.6 | 115.1 | 3827.2 | 470.2 | A high-precision layout area localization model trained on a self-constructed dataset based on RT-DETR-H for scenarios such as Chinese and English papers, magazines, and research reports includes 17 common layout categories, namely: paragraph titles, images, text, numbers, abstracts, content, chart titles, formulas, tables, table titles, references, document titles, footnotes, headers, algorithms, footers, and seals. |
 
 **Note: The evaluation set for the above accuracy metrics is PaddleOCR's self-built layout region analysis dataset, containing 10,000 images of common document types, including English and Chinese papers, magazines, research reports, etc. GPU inference time is based on an NVIDIA Tesla T4 machine with FP32 precision. CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.**
 </details>

+ 7 - 6
docs/pipeline_usage/tutorials/information_extration_pipelines/document_scene_information_extraction.md

@@ -48,14 +48,15 @@
 
 **版面区域检测模块模型:**
 
-|模型|mAP(0.5)(%)|GPU推理耗时(ms)|CPU推理耗时 (ms)|模型存储大小(M)|介绍|
+|模型|mAP(0.5)(%)|GPU推理耗时(ms)|CPU推理耗时 (ms)|模型存储大小(M|介绍|
 |-|-|-|-|-|-|
-|PicoDet-L_layout_3cls|89.3|15.7|159.8|22.6|基于PicoDet-L的高效率版面区域定位模型,包含3个类别:表格,图像和印章|
-|PicoDet_layout_1x|86.8|13.0|91.3|7.4|基于PicoDet-1x的高效率版面区域定位模型,包含文字、标题、表格、图片、列表|
-|RT-DETR-H_layout_17cls|92.6|115.1|3827.2|470.2|基于RT-DETR-H的的高精度版面区域定位模型,包含17个版面常见类别。|
-|RT-DETR-H_layout_3cls|95.9|114.6|3832.6|470.1|基于RT-DETR-H的的高精度版面区域定位模型,包含3个类别:表格,图像和印章|
+|PicoDet_layout_1x|86.8|13.0|91.3|7.4|基于PicoDet-1x在PubLayNet数据集训练的高效率版面区域定位模型,可定位包含文字、标题、表格、图片以及列表这5类区域|
+|PicoDet-L_layout_3cls|89.3|15.7|159.8|22.6|基于PicoDet-L在中英文论文、杂志和研报等场景上自建数据集训练的高效率版面区域定位模型,包含3个类别:表格,图像和印章|
+|RT-DETR-H_layout_3cls|95.9|114.6|3832.6|470.1|基于RT-DETR-H在中英文论文、杂志和研报等场景上自建数据集训练的高精度版面区域定位模型,包含3个类别:表格,图像和印章|
+|RT-DETR-H_layout_17cls|92.6|115.1|3827.2|470.2|基于RT-DETR-H在中英文论文、杂志和研报等场景上自建数据集训练的高精度版面区域定位模型,包含17个版面常见类别,分别是:段落标题、图片、文本、数字、摘要、内容、图表标题、公式、表格、表格标题、参考文献、文档标题、脚注、页眉、算法、页脚、印章|
 
-**注:以上精度指标的评估集是 PaddleOCR 自建的版面区域分析数据集,包含 1w 张图片。GPU 推理耗时基于 NVIDIA Tesla T4 机器,精度类型为 FP32, CPU 推理速度基于 Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz,线程数为 8,精度类型为 FP32。**
+
+**注:以上精度指标的评估集是 PaddleOCR 自建的版面区域分析数据集,包含中英文论文、杂志和研报等常见的 1w 张文档类型图片。GPU 推理耗时基于 NVIDIA Tesla T4 机器,精度类型为 FP32, CPU 推理速度基于 Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz,线程数为 8,精度类型为 FP32。**
 
 **文本检测模块模型:**
 

+ 6 - 6
docs/pipeline_usage/tutorials/information_extration_pipelines/document_scene_information_extraction_en.md

@@ -47,14 +47,14 @@ The **PP-ChatOCRv3-doc** pipeline includes modules for **Table Structure Recogni
 
 **Layout Detection Module Models**:
 
-|Model|mAP(0.5) (%)|GPU Inference Time (ms)|CPU Inference Time (ms)|Model Size (M)|Description|
+| Model | mAP(0.5) (%) | GPU Inference Time (ms) | CPU Inference Time (ms) | Model Size (M) | Description |
 |-|-|-|-|-|-|
-|PicoDet-L_layout_3cls|89.3|15.7|159.8|22.6|A high-efficiency layout detection model based on PicoDet-L, including 3 categories: table, image, and seal.|
-|PicoDet_layout_1x|86.8|13.0|91.3|7.4|A high-efficiency layout detection model based on PicoDet-1x, including text, title, table, image, and list.|
-|RT-DETR-H_layout_17cls|92.6|115.1|3827.2|470.2|A high-precision layout detection model based on RT-DETR-H, including 17 common layout categories.|
-|RT-DETR-H_layout_3cls|95.9|114.6|3832.6|470.1|A high-precision layout detection model based on RT-DETR-H, including 3 categories: table, image, and seal.|
+| PicoDet_layout_1x | 86.8 | 13.0 | 91.3 | 7.4 | An efficient layout area localization model trained on the PubLayNet dataset based on PicoDet-1x can locate five types of areas, including text, titles, tables, images, and lists. |
+| PicoDet-L_layout_3cls | 89.3 | 15.7 | 159.8 | 22.6 | An efficient layout area localization model trained on a self-constructed dataset based on PicoDet-L for scenarios such as Chinese and English papers, magazines, and research reports includes three categories: tables, images, and seals. |
+| RT-DETR-H_layout_3cls | 95.9 | 114.6 | 3832.6 | 470.1 | A high-precision layout area localization model trained on a self-constructed dataset based on RT-DETR-H for scenarios such as Chinese and English papers, magazines, and research reports includes three categories: tables, images, and seals. |
+| RT-DETR-H_layout_17cls | 92.6 | 115.1 | 3827.2 | 470.2 | A high-precision layout area localization model trained on a self-constructed dataset based on RT-DETR-H for scenarios such as Chinese and English papers, magazines, and research reports includes 17 common layout categories, namely: paragraph titles, images, text, numbers, abstracts, content, chart titles, formulas, tables, table titles, references, document titles, footnotes, headers, algorithms, footers, and seals. |
 
-**Note: The above accuracy metrics are evaluated on PaddleOCR's self-built layout analysis dataset, containing 10,000 images. GPU inference times are based on NVIDIA Tesla T4 machines with FP32 precision. CPU inference speeds are based on Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.**
+**Note: The evaluation set for the above accuracy metrics is PaddleOCR's self-built layout region analysis dataset, containing 10,000 images of common document types, including English and Chinese papers, magazines, research reports, etc. GPU inference time is based on an NVIDIA Tesla T4 machine with FP32 precision. CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.**
 
 **Text Detection Module Models**: