Browse Source

add chatocrv3 practical tutorial (#2241)

* add chatocrv3 practical tutorial

* add seal
Sunflower7788 1 year ago
parent
commit
25fb64714c

+ 344 - 0
docs/pipeline_usage/tutorials/ocr_pipelines/seal_recognition.md

@@ -0,0 +1,344 @@
+简体中文 | [English](seal_recognition_en.md)
+
+# 印章文本识别产线使用教程
+
+## 1. 印章文本识别产线介绍
+印章文本识别是一种自动从文档或图像中提取和识别印章内容的技术,印章文本的识别是文档处理的一部分,在很多场景都有用途,例如合同比对,出入库审核以及发票报销审核等场景。
+
+
+![](https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/doc_images/practical_tutorial/PP-ChatOCRv3_doc_seal/01.png)
+
+
+**印章文本识别**产线中包含版面区域分析模块、印章印章文本检测模块和文本识别模块。
+
+**如您更考虑模型精度,请选择精度较高的模型,如您更考虑模型推理速度,请选择推理速度较快的模型,如您更考虑模型存储大小,请选择存储大小较小的模型**。
+
+<details>
+   <summary> 👉模型列表详情</summary>
+
+
+
+**版面区域分析模块模型:**
+
+|模型名称|mAP(%)|GPU推理耗时(ms)|CPU推理耗时|模型存储大小(M)|
+|-|-|-|-|-|
+|PicoDet-L_layout_3cls|89.3|15.7425|159.771|22.6 M|
+|RT-DETR-H_layout_3cls|95.9|114.644|3832.62|470.1M|
+|RT-DETR-H_layout_17cls|92.6|115.126|3827.25|470.2M|
+
+**注:以上精度指标的评估集是 PaddleX 自建的版面区域分析数据集,包含 1w 张图片。以上所有模型 GPU 推理耗时基于 NVIDIA Tesla T4 机器,精度类型为 FP32, CPU 推理速度基于 Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz,线程数为8,精度类型为 FP32。**
+
+**印章文本检测模块模型:**
+
+|模型|检测Hmean(%)|GPU推理耗时(ms)|CPU推理耗时 (ms)|模型存储大小(M)|介绍|
+|-|-|-|-|-|-|
+|PP-OCRv4_server_seal_det|98.21|84.341|2425.06|109|PP-OCRv4的服务端印章文本检测模型,精度更高,适合在较好的服务器上部署|
+|PP-OCRv4_mobile_seal_det|96.47|10.5878|131.813|4.6|PP-OCRv4的移动端印章文本检测模型,效率更高,适合在端侧部署|
+
+**注:以上精度指标的评估集是自建的数据集,包含500张圆形印章图像。GPU 推理耗时基于 NVIDIA Tesla T4 机器,精度类型为 FP32, CPU 推理速度基于 Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz,线程数为 8,精度类型为 FP32。**
+
+**文本识别模块模型:**
+
+|模型名称|识别Avg Accuracy(%)|GPU推理耗时(ms)|CPU推理耗时|模型存储大小(M)|
+|-|-|-|-|-|
+|PP-OCRv4_mobile_rec |78.20|7.95018|46.7868|10.6 M|
+|PP-OCRv4_server_rec |79.20|7.19439|140.179|71.2 M|
+
+**注:以上精度指标的评估集是 PaddleOCR 自建的中文数据集 ,覆盖街景、网图、文档、手写多个场景,其中文本识别包含 1.1w 张图片。以上所有模型 GPU 推理耗时基于 NVIDIA Tesla T4 机器,精度类型为 FP32, CPU 推理速度基于 Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz,线程数为8,精度类型为 FP32。**
+
+</details>
+
+## 2. 快速开始
+PaddleX 所提供的预训练的模型产线均可以快速体验效果,你可以在线体验印章文本识别产线的效果,也可以在本地使用命令行或 Python 体验印章文本识别产线的效果。
+
+### 2.1 在线体验
+您可以[在线体验](https://aistudio.baidu.com/community/app/182491/webUI)文档场景信息抽取v3产线中的印章文本识别的效果,用官方提供的 Demo 图片进行识别,例如:
+
+![](https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/pipelines/seal_recognition/02.png)
+
+如果您对产线运行的效果满意,可以直接对产线进行集成部署,如果不满意,您也可以利用私有数据**对产线中的模型进行在线微调**。
+
+### 2.2 本地体验
+在本地使用印章文本识别产线前,请确保您已经按照[PaddleX本地安装教程](../../../installation/installation.md)完成了PaddleX的wheel包安装。
+
+### 2.1 命令行方式体验
+一行命令即可快速体验印章文本识别产线效果,使用 [测试文件](https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/seal_text_det.png),并将 `--input` 替换为本地路径,进行预测
+
+```
+paddlex --pipeline seal_recognition --input seal_text_det.png --device gpu:0 --save_path ./output
+```
+参数说明:
+
+```
+--pipeline:产线名称,此处为印章文本识别产线
+--input:待处理的输入图片的本地路径或URL
+--device: 使用的GPU序号(例如gpu:0表示使用第0块GPU,gpu:1,2表示使用第1、2块GPU),也可选择使用CPU(--device cpu)
+--save_path: 输出结果保存路径
+```
+
+在执行上述 Python 脚本时,加载的是默认的印章文本识别产线配置文件,若您需要自定义配置文件,可执行如下命令获取:
+
+<details>
+   <summary> 👉点击展开</summary>
+
+```
+paddlex --get_pipeline_config seal_recognition
+```
+执行后,印章文本识别产线配置文件将被保存在当前路径。若您希望自定义保存位置,可执行如下命令(假设自定义保存位置为 `./my_path` ):
+
+```
+paddlex --get_pipeline_config seal_recognition --save_path ./my_path
+```
+
+获取产线配置文件后,可将 `--pipeline` 替换为配置文件保存路径,即可使配置文件生效。例如,若配置文件保存路径为 `./seal_recognition.yaml`,只需执行:
+
+```
+paddlex --pipeline seal_recognition.yaml --input seal_text_det.png --save_path ./output
+```
+其中,`--model`、`--device` 等参数无需指定,将使用配置文件中的参数。若依然指定了参数,将以指定的参数为准。
+
+</details>
+
+运行后,得到的结果为:
+
+<details>
+   <summary> 👉点击展开</summary>
+
+```
+{'input_path': 'seal_text_det.png', 'layout_result': {'input_path': 'seal_text_det.png', 'boxes': [{'cls_id': 2, 'label': 'seal', 'score': 0.9813116192817688, 'coordinate': [0, 5.2238655, 639.59766, 637.6985]}]}, 'ocr_result': [{'input_path': PosixPath('/root/.paddlex/temp/tmp19fn93y5.png'), 'dt_polys': [array([[468, 469],
+       [472, 469],
+       [477, 471],
+       [507, 501],
+       [509, 505],
+       [509, 509],
+       [508, 513],
+       [506, 514],
+       [456, 553],
+       [454, 555],
+       [391, 581],
+       [388, 581],
+       [309, 590],
+       [306, 590],
+       [234, 577],
+       [232, 577],
+       [172, 548],
+       [170, 546],
+       [121, 504],
+       [118, 501],
+       [118, 496],
+       [119, 492],
+       [121, 490],
+       [152, 463],
+       [156, 461],
+       [160, 461],
+       [164, 463],
+       [202, 495],
+       [252, 518],
+       [311, 530],
+       [371, 522],
+       [425, 501],
+       [464, 471]]), array([[442, 439],
+       [445, 442],
+       [447, 447],
+       [449, 490],
+       [448, 494],
+       [446, 497],
+       [440, 499],
+       [197, 500],
+       [193, 499],
+       [190, 496],
+       [188, 491],
+       [188, 448],
+       [189, 444],
+       [192, 441],
+       [197, 439],
+       [438, 438]]), array([[465, 341],
+       [470, 344],
+       [472, 346],
+       [476, 356],
+       [476, 419],
+       [475, 424],
+       [472, 428],
+       [467, 431],
+       [462, 433],
+       [175, 434],
+       [170, 433],
+       [166, 430],
+       [163, 426],
+       [161, 420],
+       [161, 354],
+       [162, 349],
+       [165, 345],
+       [170, 342],
+       [175, 340],
+       [460, 340]]), array([[326,  34],
+       [481,  85],
+       [485,  88],
+       [488,  90],
+       [584, 220],
+       [586, 225],
+       [587, 229],
+       [589, 378],
+       [588, 383],
+       [585, 388],
+       [581, 391],
+       [576, 393],
+       [570, 392],
+       [507, 373],
+       [502, 371],
+       [498, 367],
+       [496, 359],
+       [494, 255],
+       [423, 162],
+       [322, 129],
+       [246, 151],
+       [205, 169],
+       [144, 252],
+       [139, 360],
+       [137, 365],
+       [134, 369],
+       [128, 373],
+       [ 66, 391],
+       [ 61, 392],
+       [ 56, 390],
+       [ 51, 387],
+       [ 48, 382],
+       [ 47, 377],
+       [ 49, 230],
+       [ 50, 225],
+       [ 52, 221],
+       [149,  89],
+       [153,  86],
+       [157,  84],
+       [318,  34],
+       [322,  33]])], 'dt_scores': [0.9943362380813267, 0.9994290391836306, 0.9945320407374245, 0.9908104427126033], 'rec_text': ['5263647368706', '吗繁物', '发票专用章', '天津君和缘商贸有限公司'], 'rec_score': [0.9921098351478577, 0.997374951839447, 0.9999369382858276, 0.9901710152626038]}]}
+```
+</details>
+
+![](https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/pipelines/seal_recognition/03.png)
+
+可视化图片默认保存在 `output` 目录下,您也可以通过 `--save_path` 进行自定义。
+
+
+### 2.2 Python脚本方式集成
+几行代码即可完成产线的快速推理,以印章文本识别产线为例:
+
+```python
+from paddlex import create_pipeline
+
+pipeline = create_pipeline(pipeline="seal_recognition")
+
+output = pipeline.predict("seal_text_det.png")
+for res in output:
+    res.print() ## 打印预测的结构化输出
+    res.save_to_img("./output/") ## 保存可视化结果
+```
+得到的结果与命令行方式相同。
+
+在上述 Python 脚本中,执行了如下几个步骤:
+
+(1)实例化 `create_pipeline` 实例化产线对象:具体参数说明如下:
+
+|参数|参数说明|参数类型|默认值|
+|-|-|-|-|
+|`pipeline`|产线名称或是产线配置文件路径。如为产线名称,则必须为 PaddleX 所支持的产线。|`str`|无|
+|`device`|产线模型推理设备。支持:“gpu”,“cpu”。|`str`|`gpu`|
+|`enable_hpi`|是否启用高性能推理,仅当该产线支持高性能推理时可用。|`bool`|`False`|
+
+(2)调用产线对象的 `predict` 方法进行推理预测:`predict` 方法参数为`x`,用于输入待预测数据,支持多种输入方式,具体示例如下:
+
+| 参数类型      | 参数说明                                                                                                  |
+|---------------|-----------------------------------------------------------------------------------------------------------|
+| Python Var    | 支持直接传入Python变量,如numpy.ndarray表示的图像数据。                                               |
+| str         | 支持传入待预测数据文件路径,如图像文件的本地路径:`/root/data/img.jpg`。                                   |
+| str           | 支持传入待预测数据文件URL,如图像文件的网络URL:[示例](https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/seal_text_det.png)。|
+| str           | 支持传入本地目录,该目录下需包含待预测数据文件,如本地路径:`/root/data/`。                               |
+| dict          | 支持传入字典类型,字典的key需与具体任务对应,如图像分类任务对应\"img\",字典的val支持上述类型数据,例如:`{\"img\": \"/root/data1\"}`。|
+| list          | 支持传入列表,列表元素需为上述类型数据,如`[numpy.ndarray, numpy.ndarray],[\"/root/data/img1.jpg\", \"/root/data/img2.jpg\"]`,`[\"/root/data1\", \"/root/data2\"]`,`[{\"img\": \"/root/data1\"}, {\"img\": \"/root/data2/img.jpg\"}]`。|
+
+(3)调用`predict`方法获取预测结果:`predict` 方法为`generator`,因此需要通过调用获得预测结果,`predict`方法以batch为单位对数据进行预测,因此预测结果为list形式表示的一组预测结果。
+
+(4)对预测结果进行处理:每个样本的预测结果均为`dict`类型,且支持打印,或保存为文件,支持保存的类型与具体产线相关,如:
+
+
+| 方法         | 说明                        | 方法参数                                                                                               |
+|--------------|-----------------------------|--------------------------------------------------------------------------------------------------------|
+| print        | 打印结果到终端              | `- format_json`:bool类型,是否对输出内容进行使用json缩进格式化,默认为True;<br>`- indent`:int类型,json格式化设置,仅当format_json为True时有效,默认为4;<br>`- ensure_ascii`:bool类型,json格式化设置,仅当format_json为True时有效,默认为False; |
+| save_to_json | 将结果保存为json格式的文件   | `- save_path`:str类型,保存的文件路径,当为目录时,保存文件命名与输入文件类型命名一致;<br>`- indent`:int类型,json格式化设置,默认为4;<br>`- ensure_ascii`:bool类型,json格式化设置,默认为False; |
+| save_to_img  | 将结果保存为图像格式的文件  | `- save_path`:str类型,保存的文件路径,当为目录时,保存文件命名与输入文件类型命名一致; |
+
+若您获取了配置文件,即可对印章文本识别产线各项配置进行自定义,只需要修改 `create_pipeline` 方法中的 `pipeline` 参数值为产线配置文件路径即可。
+
+例如,若您的配置文件保存在 `./my_path/seal_recognition.yaml` ,则只需执行:
+
+```python
+from paddlex import create_pipeline
+pipeline = create_pipeline(pipeline="./my_path/seal_recognition.yaml")
+output = pipeline.predict("seal_text_det.png")
+for res in output:
+    res.print() ## 打印预测的结构化输出
+    res.save_to_img("./output/") ## 保存可视化结果
+```
+## 3. 开发集成/部署
+如果产线可以达到您对产线推理速度和精度的要求,您可以直接进行开发集成/部署。
+
+若您需要将产线直接应用在您的Python项目中,可以参考 [2.2.2 Python脚本方式](#222-python脚本方式集成)中的示例代码。
+
+此外,PaddleX 也提供了其他三种部署方式,详细说明如下:
+
+🚀 **高性能部署**:在实际生产环境中,许多应用对部署策略的性能指标(尤其是响应速度)有着较严苛的标准,以确保系统的高效运行与用户体验的流畅性。为此,PaddleX 提供高性能推理插件,旨在对模型推理及前后处理进行深度性能优化,实现端到端流程的显著提速,详细的高性能部署流程请参考[PaddleX高性能部署指南](../../../pipeline_deploy/high_performance_deploy.md)。
+
+☁️ **服务化部署**:服务化部署是实际生产环境中常见的一种部署形式。通过将推理功能封装为服务,客户端可以通过网络请求来访问这些服务,以获取推理结果。PaddleX 支持用户以低成本实现产线的服务化部署,详细的服务化部署流程请参考[PaddleX服务化部署指南](../../../pipeline_deploy/service_deploy.md)。
+
+下面是API参考和多语言服务调用示例:
+
+
+
+📱 **端侧部署**:端侧部署是一种将计算和数据处理功能放在用户设备本身上的方式,设备可以直接处理数据,而不需要依赖远程的服务器。PaddleX 支持将模型部署在 Android 等端侧设备上,详细的端侧部署流程请参考[PaddleX端侧部署指南](../../../pipeline_deploy/lite_deploy.md)。
+您可以根据需要选择合适的方式部署模型产线,进而进行后续的 AI 应用集成。
+
+## 4. 二次开发
+如果印章文本识别产线提供的默认模型权重在您的场景中,精度或速度不满意,您可以尝试利用**您自己拥有的特定领域或应用场景的数据**对现有模型进行进一步的**微调**,以提升印章文本识别产线的在您的场景中的识别效果。
+
+### 4.1 模型微调
+由于印章文本识别产线包含三个模块,模型产线的效果不及预期可能来自于其中任何一个模块。
+
+您可以对识别效果差的图片进行分析,参考如下规则进行分析和模型微调:
+
+* 印章区域在整体版面中定位错误,那么可能是版面区域定位模块存在不足,您需要参考[版面区域检测模块开发教程](../../../module_usage/tutorials/ocr_modules/layout_detection.md)中的[二次开发](../../../module_usage/tutorials/ocr_modules/layout_detection.md#四二次开发)章节,使用您的私有数据集对版面区域定位模型进行微调。
+* 有较多的文本未被检测出来(即文本漏检现象),那么可能是文本检测模型存在不足,您需要参考[印章文本检测模块开发教程](../../../module_usage/tutorials/ocr_modules/seal_text_detection.md)中的[二次开发](../../../module_usage/tutorials/ocr_modules/seal_text_detection.md#四二次开发)章节,使用您的私有数据集对文本检测模型进行微调。
+* 已检测到的文本中出现较多的识别错误(即识别出的文本内容与实际文本内容不符),这表明文本识别模型需要进一步改进,您需要参考[文本识别模块开发教程](../../../module_usage/tutorials/ocr_modules/text_recognition.md)中的[二次开发](../../../module_usage/tutorials/ocr_modules/text_recognition.md#四二次开发)章节对文本识别模型进行微调。
+
+### 4.2 模型应用
+当您使用私有数据集完成微调训练后,可获得本地模型权重文件。
+
+若您需要使用微调后的模型权重,只需对产线配置文件做修改,将微调后模型权重的本地路径替换至产线配置文件中的对应位置即可:
+
+```python
+......
+ Pipeline:
+  layout_model: RT-DETR-H_layout_3cls #可修改为微调后模型的本地路径
+  text_det_model: PP-OCRv4_server_seal_det  #可修改为微调后模型的本地路径
+  text_rec_model: PP-OCRv4_server_rec #可修改为微调后模型的本地路径
+  layout_batch_size: 1
+  text_rec_batch_size: 1
+  device: "gpu:0"
+......
+```
+随后, 参考本地体验中的命令行方式或 Python 脚本方式,加载修改后的产线配置文件即可。
+
+##  5. 多硬件支持
+
+PaddleX 支持英伟达 GPU、昆仑芯 XPU、昇腾 NPU和寒武纪 MLU 等多种主流硬件设备,**仅需修改 `--device` 参数**即可完成不同硬件之间的无缝切换。
+
+例如,您使用英伟达 GPU 进行印章文本识别产线的推理,使用的 Python 命令为:
+
+```
+paddlex --pipeline seal_recognition --input seal_text_det.png --device gpu:0 --save_path output
+```
+此时,若您想将硬件切换为昇腾 NPU,仅需对 Python 命令中的 `--device` 修改为npu 即可:
+
+```
+paddlex --pipeline seal_recognition --input seal_text_det.png --device npu:0 --save_path output
+```
+若您想在更多种类的硬件上使用印章文本识别产线,请参考[PaddleX多硬件使用指南](../../../other_devices_support/installation_other_devices.md)。

+ 354 - 0
docs/pipeline_usage/tutorials/ocr_pipelines/seal_recognition_en.md

@@ -0,0 +1,354 @@
+[简体中文](seal_recognition.md) | English
+  
+# Tutorial for Using Seal Text Recognition Pipeline  
+  
+## 1. Introduction to the Seal Text Recognition Pipeline  
+Seal text recognition is a technology that automatically extracts and recognizes seal content from documents or images. The recognition of seal text is part of document processing and has various applications in many scenarios, such as contract comparison, inventory access approval, and invoice reimbursement approval.  
+  
+![](https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/doc_images/practical_tutorial/PP-ChatOCRv3_doc_seal/01.png)  
+  
+The **Seal Text Recognition** pipeline includes a layout area analysis module, a seal text detection module, and a text recognition module.  
+  
+**If you prioritize model accuracy, please choose a model with higher accuracy. If you prioritize inference speed, please choose a model with faster inference. If you prioritize model storage size, please choose a model with a smaller storage footprint.**  
+  
+<details>  
+   <summary> 👉 Detailed Model List </summary>  
+  
+
+**Layout Analysis Module Models:**
+  
+|Model Name|mAP (%)|GPU Inference Time (ms)|CPU Inference Time|Model Size (M)|
+|-|-|-|-|-|
+|PicoDet-L_layout_3cls|89.3|15.7425|159.771|22.6 M|
+|RT-DETR-H_layout_3cls|95.9|114.644|3832.62|470.1M|
+|RT-DETR-H_layout_17cls|92.6|115.126|3827.25|470.2M|
+
+**Note: The evaluation set for the above accuracy indicators is a self-built layout area analysis dataset from PaddleX, containing 10,000 images. The GPU inference time for all models above is based on an NVIDIA Tesla T4 machine with a precision type of FP32. The CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads, and the precision type is also FP32.**
+
+
+**Seal Text Detection Module Models**:
+
+| Model | Detection Hmean (%) | GPU Inference Time (ms) | CPU Inference Time (ms) | Model Size (M) | Description |
+|-------|---------------------|-------------------------|-------------------------|--------------|-------------|
+| PP-OCRv4_server_seal_det | 98.21 | 84.341 | 2425.06 | 109 | PP-OCRv4's server-side seal text detection model, featuring higher accuracy, suitable for deployment on better-equipped servers |
+| PP-OCRv4_mobile_seal_det | 96.47 | 10.5878 | 131.813 | 4.6 | PP-OCRv4's mobile seal text detection model, offering higher efficiency, suitable for deployment on edge devices |
+
+**Note: The above accuracy metrics are evaluated on a self-built dataset containing 500 circular seal images. GPU inference time is based on an NVIDIA Tesla T4 machine with FP32 precision. CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.**
+
+**Text Recognition Module Models**:
+
+
+| Model Name | Average Recognition Accuracy (%) | GPU Inference Time (ms) | CPU Inference Time | Model Size (M) |
+|-|-|-|-|-|
+|PP-OCRv4_mobile_rec |78.20|7.95018|46.7868|10.6 M|
+|PP-OCRv4_server_rec |79.20|7.19439|140.179|71.2 M|
+
+**Note: The evaluation set for the above accuracy indicators is a self-built Chinese dataset from PaddleOCR, covering various scenarios such as street scenes, web images, documents, and handwriting. The text recognition subset includes 11,000 images. The GPU inference time for all models above is based on an NVIDIA Tesla T4 machine with a precision type of FP32. The CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads, and the precision type is also FP32.**
+
+</details>  
+
+## 2.  Quick Start
+The pre trained model production line provided by PaddleX can quickly experience the effect. You can experience the effect of the seal text recognition production line online, or use the command line or Python locally to experience the effect of the seal text recognition production line.
+
+### 2.1 Online Experience
+You can [experience online](https://aistudio.baidu.com/community/app/182491/webUI) the effect of seal text recognition in the v3 production line for extracting document scene information, using official demo images for recognition, for example:
+
+! []( https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/pipelines/seal_recognition/02.png )
+
+If you are satisfied with the performance of the production line, you can directly integrate and deploy the production line. If you are not satisfied, you can also use private data to fine tune the models in the production line online.
+
+### 2.2 Local Experience
+Before using the seal text recognition production line locally, please ensure that you have completed the wheel package installation of PaddleX according to the  [PaddleX Local Installation Guide](../../../installation/installation_en.md).
+
+### 2.3 Command line experience
+One command can quickly experience the effect of seal text recognition production line, use [test file](https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/seal_text_det.png), and replace ` --input ` with the local path for prediction
+
+```
+paddlex --pipeline seal_recognition --input seal_text_det.png --device gpu:0 --save_path output 
+```
+
+Parameter description:
+
+```
+--Pipeline: Production line name, here is the seal text recognition production line
+--Input: The local path or URL of the input image to be processed
+--The GPU serial number used by the device (e.g. GPU: 0 indicates the use of the 0th GPU, GPU: 1,2 indicates the use of the 1st and 2nd GPUs), or the CPU (-- device CPU) can be selected for use
+```
+
+When executing the above Python script, the default seal text recognition production line configuration file is loaded. If you need to customize the configuration file, you can execute the following command to obtain it:
+
+<details>
+<summary>  👉 Click to expand</summary>
+
+```bash
+paddlex --get_pipeline_config seal_recognition
+```
+
+After execution, the seal text recognition production line configuration file will be saved in the current path. If you want to customize the save location, you can execute the following command (assuming the custom save location is `./my_path `):
+
+```bash
+paddlex --get_pipeline_config seal_recognition --save_path ./my_path --save_path output 
+```
+
+After obtaining the production line configuration file, you can replace '-- pipeline' with the configuration file save path to make the configuration file effective. For example, if the configuration file save path is `/ seal_recognition.yaml`, Just need to execute:
+
+```bash
+paddlex --pipeline ./seal_recognition.yaml --input seal_text_det.png --save_path output 
+```
+Among them, parameters such as `--model` and `--device` do not need to be specified and will use the parameters in the configuration file. If the parameters are still specified, the specified parameters will prevail.
+
+</details>
+
+After running, the result obtained is:
+
+<details>
+<summary>  👉 Click to expand</summary>
+
+```
+{'input_path': 'seal_text_det.png', 'layout_result': {'input_path': 'seal_text_det.png', 'boxes': [{'cls_id': 2, 'label': 'seal', 'score': 0.9813116192817688, 'coordinate': [0, 5.2238655, 639.59766, 637.6985]}]}, 'ocr_result': [{'input_path': PosixPath('/root/.paddlex/temp/tmp19fn93y5.png'), 'dt_polys': [array([[468, 469],
+       [472, 469],
+       [477, 471],
+       [507, 501],
+       [509, 505],
+       [509, 509],
+       [508, 513],
+       [506, 514],
+       [456, 553],
+       [454, 555],
+       [391, 581],
+       [388, 581],
+       [309, 590],
+       [306, 590],
+       [234, 577],
+       [232, 577],
+       [172, 548],
+       [170, 546],
+       [121, 504],
+       [118, 501],
+       [118, 496],
+       [119, 492],
+       [121, 490],
+       [152, 463],
+       [156, 461],
+       [160, 461],
+       [164, 463],
+       [202, 495],
+       [252, 518],
+       [311, 530],
+       [371, 522],
+       [425, 501],
+       [464, 471]]), array([[442, 439],
+       [445, 442],
+       [447, 447],
+       [449, 490],
+       [448, 494],
+       [446, 497],
+       [440, 499],
+       [197, 500],
+       [193, 499],
+       [190, 496],
+       [188, 491],
+       [188, 448],
+       [189, 444],
+       [192, 441],
+       [197, 439],
+       [438, 438]]), array([[465, 341],
+       [470, 344],
+       [472, 346],
+       [476, 356],
+       [476, 419],
+       [475, 424],
+       [472, 428],
+       [467, 431],
+       [462, 433],
+       [175, 434],
+       [170, 433],
+       [166, 430],
+       [163, 426],
+       [161, 420],
+       [161, 354],
+       [162, 349],
+       [165, 345],
+       [170, 342],
+       [175, 340],
+       [460, 340]]), array([[326,  34],
+       [481,  85],
+       [485,  88],
+       [488,  90],
+       [584, 220],
+       [586, 225],
+       [587, 229],
+       [589, 378],
+       [588, 383],
+       [585, 388],
+       [581, 391],
+       [576, 393],
+       [570, 392],
+       [507, 373],
+       [502, 371],
+       [498, 367],
+       [496, 359],
+       [494, 255],
+       [423, 162],
+       [322, 129],
+       [246, 151],
+       [205, 169],
+       [144, 252],
+       [139, 360],
+       [137, 365],
+       [134, 369],
+       [128, 373],
+       [ 66, 391],
+       [ 61, 392],
+       [ 56, 390],
+       [ 51, 387],
+       [ 48, 382],
+       [ 47, 377],
+       [ 49, 230],
+       [ 50, 225],
+       [ 52, 221],
+       [149,  89],
+       [153,  86],
+       [157,  84],
+       [318,  34],
+       [322,  33]])], 'dt_scores': [0.9943362380813267, 0.9994290391836306, 0.9945320407374245, 0.9908104427126033], 'rec_text': ['5263647368706', '吗繁物', '发票专用章', '天津君和缘商贸有限公司'], 'rec_score': [0.9921098351478577, 0.997374951839447, 0.9999369382858276, 0.9901710152626038]}]}
+```
+</details>
+
+![](https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/pipelines/seal_recognition/03.png)
+
+The visualized image not saved by default. You can customize the save path through `--save_path`, and then all results will be saved in the specified path. 
+
+
+###  2.2 Python Script Integration
+A few lines of code can complete the fast inference of the production line. Taking the seal text recognition production line as an example:
+
+```python
+from paddlex import create_pipeline
+
+pipeline = create_pipeline(pipeline="seal_recognition")
+
+output = pipeline.predict("seal_text_det.png")
+for res in output:
+    res.print() 
+    res.save_to_img("./output/") # Save the results in img
+```
+
+The result obtained is the same as the command line method.
+
+In the above Python script, the following steps were executed:
+
+(1)Instantiate the  production line object using `create_pipeline`: Specific parameter descriptions are as follows:
+
+| Parameter | Description | Type | Default |
+|-|-|-|-|
+|`pipeline`| The name of the production line or the path to the production line configuration file. If it is the name of the production line, it must be supported by PaddleX. |`str`|None|
+|`device`| The device for production line model inference. Supports: "gpu", "cpu". |`str`|`gpu`|
+|`use_hpip`| Whether to enable high-performance inference, only available if the production line supports it. |`bool`|`False`|
+
+(2)Invoke the `predict` method of the  production line object for inference prediction: The `predict` method parameter is `x`, which is used to input data to be predicted, supporting multiple input methods, as shown in the following examples:
+
+| Parameter Type | Parameter Description |
+|---------------|-----------------------------------------------------------------------------------------------------------|
+| Python Var    | Supports directly passing in Python variables, such as numpy.ndarray representing image data. |
+| str         | Supports passing in the path of the file to be predicted, such as the local path of an image file: `/root/data/img.jpg`. |
+| str           | Supports passing in the URL of the file to be predicted, such as the network URL of an image file: [Example](https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/seal_text_det.png). |
+| str           | Supports passing in a local directory, which should contain files to be predicted, such as the local path: `/root/data/`. |
+| dict          | Supports passing in a dictionary type, where the key needs to correspond to a specific task, such as "img" for image classification tasks. The value of the dictionary supports the above types of data, for example: `{"img": "/root/data1"}`. |
+| list          | Supports passing in a list, where the list elements need to be of the above types of data, such as `[numpy.ndarray, numpy.ndarray], ["/root/data/img1.jpg", "/root/data/img2.jpg"], ["/root/data1", "/root/data2"], [{"img": "/root/data1"}, {"img": "/root/data2/img.jpg"}]`. |
+
+(3)Obtain the prediction results by calling the `predict` method: The `predict` method is a `generator`, so prediction results need to be obtained through iteration. The `predict` method predicts data in batches, so the prediction results are in the form of a list.
+
+(4)Process the prediction results: The prediction result for each sample is of `dict` type and supports printing or saving to files, with the supported file types depending on the specific pipeline. For example:
+
+| Method | Description | Method Parameters |
+|--------|-------------|-------------------|
+| save_to_img | Save the results as an img format file | `- save_path`: str, the path to save the file. When it's a directory, the saved file name will be consistent with the input file type; |
+
+Where `save_to_img` can save visualization results (including OCR result images, layout analysis result images).
+
+If you have a configuration file, you can customize the configurations of the seal recognition  pipeline by simply modifying the `pipeline` parameter in the `create_pipeline` method to the path of the pipeline configuration file.
+
+For example, if your configuration file is saved in `/ my_path/seal_recognition.yaml` , Then only need to execute:
+
+
+```python
+from paddlex import create_pipeline
+pipeline = create_pipeline(pipeline="./my_path/seal_recognition.yaml")
+output = pipeline.predict("seal_text_det.png")
+for res in output:
+    res.print() ## 打印预测的结构化输出
+    res.save_to_img("./output/") ## 保存可视化结果
+```
+
+## 3. Development integration/deployment
+If the production line can meet your requirements for inference speed and accuracy, you can directly develop integration/deployment.
+
+If you need to directly apply the production line to your Python project, you can refer to the example code in [2.2.2 Python scripting] (# 222 python scripting integration).
+
+In addition, PaddleX also offers three other deployment methods, detailed as follows:
+
+🚀 ** High performance deployment: In actual production environments, many applications have strict standards for the performance indicators of deployment strategies, especially response speed, to ensure efficient system operation and smooth user experience. To this end, PaddleX provides a high-performance inference plugin aimed at deep performance optimization of model inference and pre-processing, achieving significant acceleration of end-to-end processes. For a detailed high-performance deployment process, please refer to the [PaddleX High Performance Deployment Guide] (../../../pipelin_deploy/high_performance_deploy. md).
+
+☁️ ** Service deployment * *: Service deployment is a common form of deployment in actual production environments. By encapsulating inference functions as services, clients can access these services through network requests to obtain inference results. PaddleX supports users to achieve service-oriented deployment of production lines at low cost. For detailed service-oriented deployment processes, please refer to the PaddleX Service Deployment Guide (../../../ipeline_deploy/service_deploy. md).
+
+Here are API references and examples of calling multilingual services:
+
+
+
+## 4.  Secondary development
+If the default model weights provided by the seal text recognition production line are not satisfactory in terms of accuracy or speed in your scenario, you can try using your own specific domain or application scenario data to further fine tune the existing model to improve the recognition performance of the seal text recognition production line in your scenario.
+
+### 4.1 Model fine-tuning
+Due to the fact that the seal text recognition production line consists of three modules, the performance of the model production line may not be as expected due to any of these modules.
+
+You can analyze images with poor recognition performance and refer to the following rules for analysis and model fine-tuning:
+
+* If the seal area is incorrectly located within the overall layout, the layout detection module may be insufficient. You need to refer to the [Customization](../../../module_usage/tutorials/ocr_modules/layout_detection_en.md#customization) section in the [Layout Detection Module Development Tutorial](../../../module_usage/tutorials/ocr_modules/layout_detection_en.md) and use your private dataset to fine-tune the layout detection model.
+* If there is a significant amount of text that has not been detected (i.e. text miss detection phenomenon), it may be due to the shortcomings of the text detection model. You need to refer to the [Secondary Development](../../../module_usage/tutorials/ocr_modules/seal_text_detection_en.md#customization) section in the [Seal Text Detection Module Development Tutorial](../../../module_usage/tutorials/ocr_modules/seal_text_detection_en.md) to fine tune the text detection model using your private dataset.
+* If seal texts are undetected (i.e., text miss detection), the text detection model may be insufficient. You need to refer to the [Customization](../../../module_usage/tutorials/ocr_modules/text_recognition_en.md#customization) section in the [Text Detection Module Development Tutorial](../../../module_usage/tutorials/ocr_modules/text_recognition_en.md) and use your private dataset to fine-tune the text detection model.
+
+* If many detected texts contain recognition errors (i.e., the recognized text content does not match the actual text content), the text recognition model requires further improvement. You need to refer to the [Customization](../../../module_usage/tutorials/ocr_modules/text_recognition_en.md#customization) section.
+
+### 4.2 Model Application
+After completing fine-tuning training using a private dataset, you can obtain a local model weight file.
+
+If you need to use the fine tuned model weights, simply modify the production line configuration file and replace the local path of the fine tuned model weights with the corresponding position in the production line configuration file
+
+```python
+......
+ Pipeline:
+  layout_model: RT-DETR-H_layout_3cls #can be modified to the local path of the fine tuned model
+  text_det_model: PP-OCRv4_server_seal_det  #can be modified to the local path of the fine tuned model
+  text_rec_model: PP-OCRv4_server_rec #can be modified to the local path of the fine tuned model
+  layout_batch_size: 1
+  text_rec_batch_size: 1
+  device: "gpu:0"
+......
+```
+Subsequently, refer to the command line or Python script in the local experience to load the modified production line configuration file.
+
+##  5.  Multiple hardware support
+PaddleX supports various mainstream hardware devices such as Nvidia GPU, Kunlun Core XPU, Ascend NPU, and Cambrian MLU, and can seamlessly switch between different hardware devices by simply modifying the **`--device`** parameter.
+
+For example, if you use Nvidia GPU for inference on a seal text recognition production line, the Python command you use is:
+
+```
+paddlex --pipeline seal_recognition --input seal_text_det.png --device gpu:0 --save_path output
+```
+
+At this point, if you want to switch the hardware to Ascend NPU, simply modify the ` --device ` in the Python command to NPU:
+
+```
+paddlex --pipeline seal_recognition --input seal_text_det.png --device npu:0 --save_path output
+```
+
+If you want to use the seal text recognition production line on a wider range of hardware, please refer to the [PaddleX Multi Hardware Usage Guide](../../../other_devices_support/installation_other_devices_en.md)。
+
+
+
+
+
+
+
+

+ 438 - 0
docs/practical_tutorials/document_scene_information_extraction(layout_detection)_tutorial.md

@@ -0,0 +1,438 @@
+简体中文 | [English](document_scene_information_extraction(layout_detection)_tutorial_en.md)
+
+# PaddleX 3.0 文档场景信息抽取v3(PP-ChatOCRv3_doc) -- 论文文献信息抽取教程
+
+
+PaddleX 提供了丰富的模型产线,模型产线由一个或多个模型组合实现,每个模型产线都能够解决特定的场景任务问题。PaddleX 所提供的模型产线均支持快速体验,如果效果不及预期,也同样支持使用私有数据微调模型,并且 PaddleX 提供了 Python API,方便将产线集成到个人项目中。在使用之前,您首先需要安装 PaddleX, 安装方式请参考 [PaddleX本地安装教程](../installation/installation.md)。此处以一个论文文献的文档场景信息抽取任务为例子,介绍该产线的在实际场景中的使用流程。
+
+
+## 1. 选择产线
+
+文档信息抽取是文档处理的一部分,在众多场景中都有着广泛的应用,例如学术研究、图书馆管理、科技情报分析、文献综述撰写等场景。通过文档信息抽取技术,我们可以从论文文献中自动提取出标题、作者、摘要、关键词、发表年份、期刊名称、引用信息等关键信息,并以结构化的形式存储,便于后续的检索、分析与应用。这不仅提升了科研人员的工作效率,也为学术研究的深入发展提供了强有力的支持。
+
+
+首先,需要根据任务场景,选择对应的 PaddleX 产线,本节以论文文献的信息抽取为例,介绍如何进行 文档场景信息抽取v3 产线相关任务的二次开发,对应 PaddleX 的文档场景信息抽取v3。如果无法确定任务和产线的对应关系,您可以在 PaddleX 支持的[模型产线列表](../support_list/pipelines_list.md)中了解相关产线的能力介绍。
+
+
+## 2. 快速体验
+
+PaddleX 提供了两种体验的方式,你可以在线体验文档场景信息抽取v3产线的效果,也可以在本地使用  Python 体验文档场景信息抽取v3产线的效果。
+
+### 2.1 在线体验
+
+您可以在AI Studio 星河社区体验文档场景信息抽取v3产线的效果,点击链接下载 [论文文献测试文件](https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/doc_images/practical_tutorial/PP-ChatOCRv3_doc_layout/test.jpg),上传至[官方文档场景信息抽取v3 应用](https://aistudio.baidu.com/community/app/182491/webUI?source=appCenter) 体验抽取效果。如下:
+
+![](https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/doc_images/practical_tutorial/PP-ChatOCRv3_doc_layout/06.png)
+
+
+### 2.2 本地体验
+
+在本地使用文档场景信息抽取v3产线前,请确保您已经按照[PaddleX本地安装教程](../../../installation/installation.md)完成了PaddleX的wheel包安装。几行代码即可完成产线的快速推理:
+
+
+```python
+from paddlex import create_pipeline
+
+pipeline = create_pipeline(
+    pipeline="PP-ChatOCRv3-doc",
+    llm_name="ernie-3.5",
+    llm_params={"api_type": "qianfan", "ak": "", "sk": ""} # 请填入您的ak与sk,否则无法调用大模型
+    )
+
+visual_result, visual_info = pipeline.visual_predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/doc_images/practical_tutorial/PP-ChatOCRv3_doc_layout/test.jpg")
+
+for res in visual_result:
+    res.save_to_img("./output")
+    res.save_to_html('./output')
+    res.save_to_xlsx('./output')
+
+vector = pipeline.build_vector(visual_info=visual_info)
+chat_result = pipeline.chat(
+    key_list=["页眉", "图表标题"],
+    visual_info=visual_info,
+    vector=vector,
+    )
+chat_result.print()
+```
+
+**注**:请先在[百度云千帆平台](https://console.bce.baidu.com/qianfan/ais/console/onlineService)获取自己的ak与sk(详细流程请参考[AK和SK鉴权调用API流程](https://cloud.baidu.com/doc/WENXINWORKSHOP/s/Hlwerugt8)),将ak与sk填入至指定位置后才能正常调用大模型。
+
+
+输出打印的结果如下:
+
+```
+The result has been saved in output/tmpfnss9sq9_layout.jpg.
+The result has been saved in output/tmpfnss9sq9_ocr.jpg.
+The result has been saved in output/tmpfnss9sq9_table.jpg.
+The result has been saved in output/tmpfnss9sq9_table.jpg.
+The result has been saved in output/tmpfnss9sq9/tmpfnss9sq9.html.
+The result has been saved in output/tmpfnss9sq9/tmpfnss9sq9.html.
+The result has been saved in output/tmpfnss9sq9/tmpfnss9sq9.xlsx.
+The result has been saved in output/tmpfnss9sq9/tmpfnss9sq9.xlsx.
+
+{'chat_res': {'页眉': '未知', '图表标题': '未知'}, 'prompt': ''}
+
+```
+
+在`output` 目录中,保存了版面区域检测、OCR、表格识别可视化结果以及表格html和xlsx结果。
+
+其中版面区域定位结果可视化如下:
+
+![](https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/practical_tutorials/PP-ChatOCRv3_doc/layout_detection_01.png)
+
+
+通过上面的文档场景信息抽取抽取的在线体验可以进行 Badcase 分析,发现文档场景信息抽取产线的官方模型,存在下面的问题:由于官方模型目前只区分了图,表格和印章三个类别,因此目前无法准确的定位并抽取出页眉和表格标题等其他信息,在`{'chat_res': {'页眉': '未知', '图表标题': '未知'}, 'prompt': ''}`中的结果是未知。因此,本节工作聚焦于论文文献的场景,利用论文文档数据集,以页眉和图表标题信息的抽取为例,对文档场景信息抽取产线中的版面分析模型进行微调,从而达到能够精确提取文档中页眉和表格标题信息的能力。
+
+
+
+## 3. 选择模型
+
+PaddleX 提供了 4 个端到端的版面区域定位模型,具体可参考 [模型列表](../support_list/models_list.md),其中版面区域检测模型的 benchmark 如下:
+
+|模型|mAP(0.5)(%)|GPU推理耗时(ms)|CPU推理耗时 (ms)|模型存储大小(M)|介绍|
+|-|-|-|-|-|-|
+|PicoDet_layout_1x|86.8|13.0|91.3|7.4|基于PicoDet-1x在PubLayNet数据集训练的高效率版面区域定位模型,可定位包含文字、标题、表格、图片以及列表这5类区域|
+|PicoDet-L_layout_3cls|89.3|15.7|159.8|22.6|基于PicoDet-L在中英文论文、杂志和研报等场景上自建数据集训练的高效率版面区域定位模型,包含3个类别:表格,图像和印章|
+|RT-DETR-H_layout_3cls|95.9|114.6|3832.6|470.1|基于RT-DETR-H在中英文论文、杂志和研报等场景上自建数据集训练的高精度版面区域定位模型,包含3个类别:表格,图像和印章|
+|RT-DETR-H_layout_17cls|92.6|115.1|3827.2|470.2|基于RT-DETR-H在中英文论文、杂志和研报等场景上自建数据集训练的高精度版面区域定位模型,包含17个版面常见类别,分别是:段落标题、图片、文本、数字、摘要、内容、图表标题、公式、表格、表格标题、参考文献、文档标题、脚注、页眉、算法、页脚、印章|
+
+**注:以上精度指标的评估集是 PaddleOCR 自建的版面区域分析数据集,包含中英文论文、杂志和研报等常见的 1w 张文档类型图片。GPU 推理耗时基于 NVIDIA Tesla T4 机器,精度类型为 FP32, CPU 推理速度基于 Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz,线程数为 8,精度类型为 FP32。**
+
+
+## 4. 数据准备和校验
+### 4.1 数据准备
+
+本教程采用 `论文文献数据集` 作为示例数据集,可通过以下命令获取示例数据集。如果您使用自备的已标注数据集,需要按照 PaddleX 的格式要求对自备数据集进行调整,以满足 PaddleX 的数据格式要求。关于数据格式介绍,您可以参考 [PaddleX 目标检测模块数据标注教程](../data_annotations/cv_modules/object_detection.md)。
+
+数据集获取命令:
+```bash
+cd /path/to/paddlex
+wget https://paddle-model-ecology.bj.bcebos.com/paddlex/data/paperlayout.tar -P ./dataset
+tar -xf ./dataset/paperlayout.tar -C ./dataset/
+```
+
+### 4.2 数据集校验
+
+在对数据集校验时,只需一行命令:
+
+```bash
+python main.py -c paddlex/configs/structure_analysis/RT-DETR-H_layout_3cls.yaml \
+    -o Global.mode=check_dataset \
+    -o Global.dataset_dir=./dataset/paperlayout/
+```
+
+执行上述命令后,PaddleX 会对数据集进行校验,并统计数据集的基本信息。命令运行成功后会在 log 中打印出 `Check dataset passed !` 信息,同时相关产出会保存在当前目录的 `./output/check_dataset` 目录下,产出目录中包括可视化的示例样本图片和样本分布直方图。校验结果文件保存在 `./output/check_dataset_result.json`,校验结果文件具体内容为
+```
+{
+  "done_flag": true,
+  "check_pass": true,
+  "attributes": {
+    "num_classes": 4,
+    "train_samples": 4734,
+    "train_sample_paths": [
+      "check_dataset\/demo_img\/train_4612.jpg",
+      "check_dataset\/demo_img\/train_4844.jpg",
+      "check_dataset\/demo_img\/train_0084.jpg",
+      "check_dataset\/demo_img\/train_0448.jpg",
+      "check_dataset\/demo_img\/train_4703.jpg",
+      "check_dataset\/demo_img\/train_3572.jpg",
+      "check_dataset\/demo_img\/train_4516.jpg",
+      "check_dataset\/demo_img\/train_2836.jpg",
+      "check_dataset\/demo_img\/train_1353.jpg",
+      "check_dataset\/demo_img\/train_0225.jpg"
+    ],
+    "val_samples": 928,
+    "val_sample_paths": [
+      "check_dataset\/demo_img\/val_0982.jpg",
+      "check_dataset\/demo_img\/val_0607.jpg",
+      "check_dataset\/demo_img\/val_0623.jpg",
+      "check_dataset\/demo_img\/val_0890.jpg",
+      "check_dataset\/demo_img\/val_0036.jpg",
+      "check_dataset\/demo_img\/val_0654.jpg",
+      "check_dataset\/demo_img\/val_0895.jpg",
+      "check_dataset\/demo_img\/val_0059.jpg",
+      "check_dataset\/demo_img\/val_0142.jpg",
+      "check_dataset\/demo_img\/val_0088.jpg"
+    ]
+  },
+  "analysis": {
+    "histogram": "check_dataset\/histogram.png"
+  },
+  "dataset_path": ".\/dataset\/paperlayout\/",
+  "show_type": "image",
+  "dataset_type": "COCODetDataset"
+}
+```
+上述校验结果中,check_pass 为 True 表示数据集格式符合要求,其他部分指标的说明如下:
+
+- attributes.num_classes:该数据集类别数为 4,此处类别数量为后续训练需要传入的类别数量;
+- attributes.train_samples:该数据集训练集样本数量为 4734;
+- attributes.val_samples:该数据集验证集样本数量为 928;
+- attributes.train_sample_paths:该数据集训练集样本可视化图片相对路径列表;
+- attributes.val_sample_paths:该数据集验证集样本可视化图片相对路径列表;
+
+另外,数据集校验还对数据集中所有类别的样本数量分布情况进行了分析,并绘制了分布直方图(histogram.png):
+
+<center>
+
+<img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/practical_tutorials/PP-ChatOCRv3_doc/layout_detection_02.png" width=600>
+
+</center>
+
+**注**:只有通过数据校验的数据才可以训练和评估。
+
+
+### 4.3 数据集划分(非必选)
+
+如需对数据集格式进行转换或是重新划分数据集,可通过修改配置文件或是追加超参数的方式进行设置。
+
+数据集校验相关的参数可以通过修改配置文件中 `CheckDataset` 下的字段进行设置,配置文件中部分参数的示例说明如下:
+
+* `CheckDataset`:
+    * `split`:
+        * `enable`: 是否进行重新划分数据集,为 `True` 时进行数据集格式转换,默认为 `False`;
+        * `train_percent`: 如果重新划分数据集,则需要设置训练集的百分比,类型为 0-100 之间的任意整数,需要保证和 `val_percent` 值加和为 100;
+        * `val_percent`: 如果重新划分数据集,则需要设置验证集的百分比,类型为 0-100 之间的任意整数,需要保证和 `train_percent` 值加和为 100;
+
+数据划分时,原有标注文件会被在原路径下重命名为 `xxx.bak`,以上参数同样支持通过追加命令行参数的方式进行设置,例如重新划分数据集并设置训练集与验证集比例:`-o CheckDataset.split.enable=True -o CheckDataset.split.train_percent=80 -o CheckDataset.split.val_percent=20`。
+
+
+## 5. 模型训练和评估
+### 5.1 模型训练
+
+在训练之前,请确保您已经对数据集进行了校验。完成 PaddleX 模型的训练,只需如下一条命令:
+
+```bash
+python main.py -c paddlex/configs/structure_analysis/RT-DETR-H_layout_3cls.yaml \
+    -o Global.mode=train \
+    -o Global.dataset_dir=./dataset/paperlayout \
+    -o Train.num_classes=4
+```
+
+在 PaddleX 中模型训练支持:修改训练超参数、单机单卡/多卡训练等功能,只需修改配置文件或追加命令行参数。
+
+PaddleX 中每个模型都提供了模型开发的配置文件,用于设置相关参数。模型训练相关的参数可以通过修改配置文件中 `Train` 下的字段进行设置,配置文件中部分参数的示例说明如下:
+
+* `Global`:
+    * `mode`:模式,支持数据校验(`check_dataset`)、模型训练(`train`)、模型评估(`evaluate`);
+    * `device`:训练设备,可选`cpu`、`gpu`、`xpu`、`npu`、`mlu`,除 cpu 外,多卡训练可指定卡号,如:`gpu:0,1,2,3`;
+* `Train`:训练超参数设置;
+    * `epochs_iters`:训练轮次数设置;
+    * `learning_rate`:训练学习率设置;
+
+更多超参数介绍,请参考 [PaddleX 通用模型配置文件参数说明](../module_usage/instructions/config_parameters_common.md)。
+
+**注:**
+- 以上参数可以通过追加令行参数的形式进行设置,如指定模式为模型训练:`-o Global.mode=train`;指定前 2 卡 gpu 训练:`-o Global.device=gpu:0,1`;设置训练轮次数为 10:`-o Train.epochs_iters=10`。
+- 模型训练过程中,PaddleX 会自动保存模型权重文件,默认为`output`,如需指定保存路径,可通过配置文件中 `-o Global.output` 字段
+- PaddleX 对您屏蔽了动态图权重和静态图权重的概念。在模型训练的过程中,会同时产出动态图和静态图的权重,在模型推理时,默认选择静态图权重推理。
+
+**训练产出解释:**
+
+在完成模型训练后,所有产出保存在指定的输出目录(默认为`./output/`)下,通常有以下产出:
+
+* train_result.json:训练结果记录文件,记录了训练任务是否正常完成,以及产出的权重指标、相关文件路径等;
+* train.log:训练日志文件,记录了训练过程中的模型指标变化、loss 变化等;
+* config.yaml:训练配置文件,记录了本次训练的超参数的配置;
+* .pdparams、.pdopt、.pdstates、.pdiparams、.pdmodel:模型权重相关文件,包括网络参数、优化器、静态图网络参数、静态图网络结构等;
+
+
+### 5.2 模型评估
+
+在完成模型训练后,可以对指定的模型权重文件在验证集上进行评估,验证模型精度。使用 PaddleX 进行模型评估,只需一行命令:
+
+```bash
+python main.py -c paddlex/configs/structure_analysis/RT-DETR-H_layout_3cls.yaml \
+    -o Global.mode=evaluate \
+    -o Global.dataset_dir=./dataset/paperlayout
+```
+
+与模型训练类似,模型评估支持修改配置文件或追加命令行参数的方式设置。
+
+**注:** 在模型评估时,需要指定模型权重文件路径,每个配置文件中都内置了默认的权重保存路径,如需要改变,只需要通过追加命令行参数的形式进行设置即可,如`-o Evaluate.weight_path=./output/best_model/best_model.pdparams`。
+
+### 5.3 模型调优
+
+在学习了模型训练和评估后,我们可以通过调整超参数来提升模型的精度。通过合理调整训练轮数,您可以控制模型的训练深度,避免过拟合或欠拟合;而学习率的设置则关乎模型收敛的速度和稳定性。因此,在优化模型性能时,务必审慎考虑这两个参数的取值,并根据实际情况进行灵活调整,以获得最佳的训练效果。
+
+推荐在调试参数时遵循控制变量法:
+1. 首先固定训练轮次为 30,批大小为 4。
+2. 基于 RT-DETR-H_layout 模型启动四个实验,学习率分别为:0.001,0.0005,0.0001,0.00001。
+3. 可以发现实验二精度最高的配置为学习率为 0.0001,同时观察验证集分数,精度在最后几轮仍在上涨。因此可以提升训练轮次为 50、100,模型精度会有进一步的提升。
+
+学习率探寻实验结果:
+
+<center>
+
+| 实验ID           | 学习率 | mAP@0\.5|
+| --------------- | ------------- | -------------------- |
+| 1 | 0.00001     | 88.90        |
+| **2** | **0.0001**   | **92.41**      |
+| 3 | 0.0005       | 92.27    |
+| 4 | 0.001     | 90.66      | 
+
+</center>
+
+接下来,我们可以在学习率设置为 0.001 的基础上,增加训练轮次,对比下面实验 [2,4,5] 可知,训练轮次增大,模型精度有了进一步的提升。
+
+<center>
+
+
+| 实验ID           | 训练轮次 |  mAP@0\.5| 
+| --------------- | ------------- | -------------------- |
+| 2 | 30    |92.41   |
+| 4 | 50    |92.63   |
+| **5**  | **100**   | **92.88**    |
+
+</center>
+
+** 注:本教程为 4 卡教程,如果您只有 1 张 GPU,可通过调整训练卡数完成本次实验,但最终指标未必和上述指标完全对齐,属正常情况。**
+
+在选择训练环境时,要考虑训练卡数和总 batch_size,以及学习率的关系。首先训练卡数乘以单卡 batch_size 等于总 batch_size。其次,总 batch_size 和学习率是相关的,学习率应与总 batch_size 保持同步调整。 目前默认学习率对应基于 4 卡训练的总 batch_size,若您打算在单卡环境下进行训练,则设置学习率时需相应除以 4。若您打算在 8 卡环境下进行训练,则设置学习率时需相应乘以 2。
+
+调整不同参数执行训练的命令可以参考:
+
+```bash
+python main.py -c paddlex/configs/structure_analysis/RT-DETR-H_layout_3cls.yaml \
+    -o Global.mode=train \
+    -o Global.dataset_dir=./dataset/paperlayout \
+    -o Train.num_classes=4 \
+    -o Train.learning_rate=0.0001 \
+    -o Train.epochs_iters=30 \
+    -o Train.batch_size=4
+```
+
+### 5.4 模型测试
+
+可以将微调后的单模型进行测试,使用 [测试文件](https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/doc_images/practical_tutorial/PP-ChatOCRv3_doc_layout/test.jpg),进行预测:
+
+```bash
+python main.py -c paddlex/configs/structure_analysis/RT-DETR-H_layout_3cls.yaml \
+    -o Global.mode=predict \
+    -o Predict.model_dir="output/best_model/inference" \
+    -o Predict.input="https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/doc_images/practical_tutorial/PP-ChatOCRv3_doc_layout/test.jpg"
+```
+
+通过上述可在`./output`下生成预测结果,其中`test.jpg`的预测结果如下:
+
+![](https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/practical_tutorials/PP-ChatOCRv3_doc/layout_detection_03.png)
+
+
+## 6. 产线测试
+
+将产线中的模型替换为微调后的模型进行测试,使用 [论文文献测试文件](https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/doc_images/practical_tutorial/PP-ChatOCRv3_doc_layout/test.jpg),进行预测:
+
+首先获取并更新文档信息抽取v3的配置文件,执行下面的命令获取配置文件,(假设自定义保存位置为 `./my_path` ):
+
+```bash
+paddlex --get_pipeline_config PP-ChatOCRv3-doc --save_path ./my_path
+```
+
+将`PP-ChatOCRv3-doc.yaml`中的`Pipeline.layout_model`字段修改为上面微调后的模型路径,修改后配置如下:
+
+```yaml
+Pipeline:
+  layout_model: ./output/best_model/inference
+  table_model: SLANet_plus
+  text_det_model: PP-OCRv4_server_det
+  text_rec_model: PP-OCRv4_server_rec
+  seal_text_det_model: PP-OCRv4_server_seal_det
+  doc_image_ori_cls_model: null
+  doc_image_unwarp_model: null
+  llm_name: "ernie-3.5"
+  llm_params:
+    api_type: qianfan
+    ak: 
+    sk:
+```
+
+修改后,只需要修改 `create_pipeline` 方法中的 `pipeline` 参数值为产线配置文件路径即可应用配置。
+
+```python
+from paddlex import create_pipeline
+
+pipeline = create_pipeline(
+    pipeline="./my_path/PP-ChatOCRv3-doc.yaml",
+    llm_name="ernie-3.5",
+    llm_params={"api_type": "qianfan", "ak": "", "sk": ""} # 请填入您的ak与sk,否则无法调用大模型
+    )
+
+visual_result, visual_info = pipeline.visual_predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/doc_images/practical_tutorial/PP-ChatOCRv3_doc_layout/test.jpg")
+
+for res in visual_result:
+    res.save_to_img("./output_ft")
+    res.save_to_html('./output_ft')
+    res.save_to_xlsx('./output_ft')
+
+vector = pipeline.build_vector(visual_info=visual_info)
+chat_result = pipeline.chat(
+    key_list=["页眉", "table caption"],
+    visual_info=visual_info,
+    vector=vector,
+    )
+chat_result.print()
+```
+
+通过上述可在`./output_ft`下生成预测结果,打印的关键信息抽取结果:
+
+
+```
+{'chat_res': {'页眉': '第43卷\n 航空发动机\n 44', '表格标题': '表1模拟来流Ma=5飞行的空气加热器工作参数'}, 'prompt': ''}
+```
+可以发现,在模型微调之后,关键信息已经被正确的提取出来。
+
+版面的可视化结果如下,已经正确增加了页眉和表格标题的区域定位能力:
+
+![](https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/practical_tutorials/PP-ChatOCRv3_doc/layout_detection_04.png)
+
+
+## 7. 开发集成/部署
+
+如果文档场景信息抽取v3产线可以达到您对产线推理速度和精度的要求,您可以直接进行开发集成/部署。
+
+1. 直接将训练好的模型产线应用在您的 Python 项目中,如下面代码所示:
+
+```python
+from paddlex import create_pipeline
+
+pipeline = create_pipeline(
+    pipeline="./my_path/PP-ChatOCRv3-doc.yaml",
+    llm_name="ernie-3.5",
+    llm_params={"api_type": "qianfan", "ak": "", "sk": ""} # 请填入您的ak与sk,否则无法调用大模型
+    )
+
+visual_result, visual_info = pipeline.visual_predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/doc_images/practical_tutorial/PP-ChatOCRv3_doc_layout/test.jpg")
+
+for res in visual_result:
+    res.save_to_img("./output")
+    res.save_to_html('./output')
+    res.save_to_xlsx('./output')
+
+vector = pipeline.build_vector(visual_info=visual_info)
+chat_result = pipeline.chat(
+    key_list=["页眉", "图表标题"],
+    visual_info=visual_info,
+    vector=vector,
+    )
+chat_result.print()
+```
+
+更多参数请参考 [文档场景信息抽取v3产线使用教程](../pipeline_usage/tutorials/cv_pipelines/image_classification.md)。
+
+2. 此外,PaddleX 也提供了其他三种部署方式,详细说明如下:
+
+* 高性能部署:在实际生产环境中,许多应用对部署策略的性能指标(尤其是响应速度)有着较严苛的标准,以确保系统的高效运行与用户体验的流畅性。为此,PaddleX 提供高性能推理插件,旨在对模型推理及前后处理进行深度性能优化,实现端到端流程的显著提速,详细的高性能部署流程请参考 [PaddleX 高性能部署指南](../pipeline_deploy/high_performance_deploy.md)。
+* 服务化部署:服务化部署是实际生产环境中常见的一种部署形式。通过将推理功能封装为服务,客户端可以通过网络请求来访问这些服务,以获取推理结果。PaddleX 支持用户以低成本实现产线的服务化部署,详细的服务化部署流程请参考 [PaddleX 服务化部署指南](../pipeline_deploy/service_deploy.md)。
+* 端侧部署:端侧部署是一种将计算和数据处理功能放在用户设备本身上的方式,设备可以直接处理数据,而不需要依赖远程的服务器。PaddleX 支持将模型部署在 Android 等端侧设备上,详细的端侧部署流程请参考 [PaddleX端侧部署指南](../pipeline_deploy/lite_deploy.md)。
+
+您可以根据需要选择合适的方式部署模型产线,进而进行后续的 AI 应用集成。
+
+
+
+
+
+
+
+
+
+