Просмотр исходного кода

refine docs (#2165) (#2169)

* refine docs

* Update OCR.md
Liu Jiaxuan 1 год назад
Родитель
Сommit
843f62b565

+ 89 - 21
docs/pipeline_usage/tutorials/information_extration_pipelines/document_scene_information_extraction.md

@@ -48,32 +48,99 @@
 
 **版面区域检测模块模型:**
 
-|模型名称|mAP(%)|GPU推理耗时(ms)|CPU推理耗时|模型存储大小(M)|
-|-|-|-|-|-|
-|PicoDet_layout_1x|86.8|13.036|91.2634|7.4M |
-|PicoDet-L_layout_3cls|89.3|15.7425|159.771|22.6 M|
-|RT-DETR-H_layout_3cls|95.9|114.644|3832.62|470.1M|
-|RT-DETR-H_layout_17cls|92.6|115.126|3827.25|470.2M|
+|模型|mAP(0.5)(%)|GPU推理耗时(ms)|CPU推理耗时 (ms)|模型存储大小(M)|介绍|
+|-|-|-|-|-|-|
+|PicoDet-L_layout_3cls|89.3|15.7|159.8|22.6|基于PicoDet-L的高效率版面区域定位模型,包含3个类别:表格,图像和印章|
+|PicoDet_layout_1x|86.8|13.0|91.3|7.4|基于PicoDet-1x的高效率版面区域定位模型,包含文字、标题、表格、图片、列表|
+|RT-DETR-H_layout_17cls|92.6|115.1|3827.2|470.2|基于RT-DETR-H的的高精度版面区域定位模型,包含17个版面常见类别。|
+|RT-DETR-H_layout_3cls|95.9|114.6|3832.6|470.1|基于RT-DETR-H的的高精度版面区域定位模型,包含3个类别:表格,图像和印章|
 
-**注:以上精度指标的评估集是 PaddleX 自建的版面区域分析数据集,包含 1w 张图片。以上所有模型 GPU 推理耗时基于 NVIDIA Tesla T4 机器,精度类型为 FP32, CPU 推理速度基于 Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz,线程数为8,精度类型为 FP32。**
+**注:以上精度指标的评估集是 PaddleOCR 自建的版面区域分析数据集,包含 1w 张图片。GPU 推理耗时基于 NVIDIA Tesla T4 机器,精度类型为 FP32, CPU 推理速度基于 Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz,线程数为 8,精度类型为 FP32。**
 
 **文本检测模块模型:**
 
-|模型名称|检测Hmean(%)|GPU推理耗时(ms)|CPU推理耗时|模型存储大小(M)|
-|-|-|-|-|-|
-|PP-OCRv4_mobile_det |77.79|10.6923|120.177|4.2 M|
-|PP-OCRv4_server_det |82.69|83.3501|2434.01|100.1M|
+|模型|检测Hmean(%)|GPU推理耗时(ms)|CPU推理耗时 (ms)|模型存储大小(M)|介绍|
+|-|-|-|-|-|-|
+|PP-OCRv4_server_det|82.69|83.3501|2434.01|109|PP-OCRv4 的服务端文本检测模型,精度更高,适合在性能较好的服务器上部署|
+|PP-OCRv4_mobile_det|77.79|10.6923|120.177|4.7|PP-OCRv4 的移动端文本检测模型,效率更高,适合在端侧设备部署|
 
 **注:以上精度指标的评估集是 PaddleOCR 自建的中文数据集,覆盖街景、网图、文档、手写多个场景,其中检测包含 500 张图片。以上所有模型 GPU 推理耗时基于 NVIDIA Tesla T4 机器,精度类型为 FP32, CPU 推理速度基于 Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz,线程数为8,精度类型为 FP32。**
 
 **文本识别模块模型:**
 
-|模型名称|识别Avg Accuracy(%)|GPU推理耗时(ms)|CPU推理耗时|模型存储大小(M)|
-|-|-|-|-|-|
-|PP-OCRv4_mobile_rec |78.20|7.95018|46.7868|10.6 M|
-|PP-OCRv4_server_rec |79.20|7.19439|140.179|71.2 M|
+<table >
+    <tr>
+        <th>模型</th>
+        <th>识别 Avg Accuracy(%)</th>
+        <th>GPU推理耗时(ms)</th>
+        <th>CPU推理耗时 (ms)</th>
+        <th>模型存储大小(M)</th>
+        <th>介绍</th>
+    </tr>
+    <tr>
+        <td>PP-OCRv4_mobile_rec</td>
+        <td>78.20</td>
+        <td>7.95018</td>
+        <td>46.7868</td>
+        <td>10.6 M</td>
+        <td rowspan="2">PP-OCRv4是百度飞桨视觉团队自研的文本识别模型PP-OCRv3的下一个版本,通过引入数据增强方案、GTC-NRTR指导分支等策略,在模型推理速度不变的情况下,进一步提升了文本识别精度。该模型提供了服务端(server)和移动端(mobile)两个不同版本,来满足不同场景下的工业需求。</td>
+    </tr>
+    <tr>
+        <td>PP-OCRv4_server_rec </td>
+        <td>79.20</td>
+        <td>7.19439</td>
+        <td>140.179</td>
+        <td>71.2 M</td>
+    </tr>
+</table>
+
+**注:以上精度指标的评估集是 PaddleOCR 自建的中文数据集,覆盖街景、网图、文档、手写多个场景,其中文本识别包含 1.1w 张图片。所有模型 GPU 推理耗时基于 NVIDIA Tesla T4 机器,精度类型为 FP32, CPU 推理速度基于 Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz,线程数为8,精度类型为 FP32。**
+
+
+<table >
+    <tr>
+        <th>模型</th>
+        <th>识别 Avg Accuracy(%)</th>
+        <th>GPU推理耗时(ms)</th>
+        <th>CPU推理耗时</th>
+        <th>模型存储大小(M)</th>
+        <th>介绍</th>
+    </tr>
+    <tr>
+        <td>ch_SVTRv2_rec</td>
+        <td>68.81</td>
+        <td>8.36801</td>
+        <td>165.706</td>
+        <td>73.9 M</td>
+        <td rowspan="1">
+        SVTRv2 是一种由复旦大学视觉与学习实验室(FVL)的OpenOCR团队研发的服务端文本识别模型,其在PaddleOCR算法模型挑战赛 - 赛题一:OCR端到端识别任务中荣获一等奖,A榜端到端识别精度相比PP-OCRv4提升6%。
+    </td>
+    </tr>
+</table>
+
+
+**注:以上精度指标的评估集是 [PaddleOCR算法模型挑战赛 - 赛题一:OCR端到端识别任务](https://aistudio.baidu.com/competition/detail/1131/0/introduction)A榜。 所有模型 GPU 推理耗时基于 NVIDIA Tesla T4 机器,精度类型为 FP32, CPU 推理速度基于 Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz,线程数为8,精度类型为 FP32。**
+
+<table >
+    <tr>
+        <th>模型</th>
+        <th>识别 Avg Accuracy(%)</th>
+        <th>GPU推理耗时(ms)</th>
+        <th>CPU推理耗时</th>
+        <th>模型存储大小(M)</th>
+        <th>介绍</th>
+    </tr>
+    <tr>
+        <td>ch_RepSVTR_rec</td>
+        <td>65.07</td>
+        <td>10.5047</td>
+        <td>51.5647</td>
+        <td>22.1 M</td>
+        <td rowspan="1">    RepSVTR 文本识别模型是一种基于SVTRv2 的移动端文本识别模型,其在PaddleOCR算法模型挑战赛 - 赛题一:OCR端到端识别任务中荣获一等奖,B榜端到端识别精度相比PP-OCRv4提升2.5%,推理速度持平。</td>
+    </tr>
+</table>
 
-**注:以上精度指标的评估集是 PaddleOCR 自建的中文数据集 ,覆盖街景、网图、文档、手写多个场景,其中文本识别包含 1.1w 张图片。以上所有模型 GPU 推理耗时基于 NVIDIA Tesla T4 机器,精度类型为 FP32, CPU 推理速度基于 Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz,线程数为8,精度类型为 FP32。**
+**注:以上精度指标的评估集是 [PaddleOCR算法模型挑战赛 - 赛题一:OCR端到端识别任务](https://aistudio.baidu.com/competition/detail/1131/0/introduction)B榜。 所有模型 GPU 推理耗时基于 NVIDIA Tesla T4 机器,精度类型为 FP32, CPU 推理速度基于 Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz,线程数为8,精度类型为 FP32。**
 
 **印章文本检测模块模型:**
 
@@ -115,7 +182,7 @@ PaddleX 所提供的预训练的模型产线均可以快速体验效果,你可
 如果您对产线运行的效果满意,可以直接对产线进行集成部署,如果不满意,您也可以利用私有数据**对产线中的模型进行在线微调**。
 
 ### 2.2 本地体验
-在本地使用文档场景信息抽取v3产线前,请确保您已经按照[PaddleX本地安装教程]../../../installation/installation.md)完成了PaddleX的wheel包安装。
+在本地使用文档场景信息抽取v3产线前,请确保您已经按照[PaddleX本地安装教程](../../../installation/installation.md)完成了PaddleX的wheel包安装。
 
 几行代码即可完成产线的快速推理,使用 [测试文件](https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/contract.pdf),以通用文档场景信息抽取v3产线为例:
 
@@ -135,7 +202,7 @@ for res in visual_result:
 
 print(predict.chat("乙方,手机号"))
 ```
-**注**:请先在[百度云千帆平台](https://qianfan.cloud.baidu.com/)获取自己的ak与sk,将其填入至指定位置后才能正常调用大模型。
+**注**:请先在[百度云千帆平台](https://console.bce.baidu.com/qianfan/ais/console/onlineService)获取自己的ak与sk(详细流程请参考[AK和SK鉴权调用API流程](https://cloud.baidu.com/doc/WENXINWORKSHOP/s/Hlwerugt8)),将ak与sk填入至指定位置后才能正常调用大模型。
 
 运行后,输出结果如下:
 
@@ -175,9 +242,9 @@ print(predict.chat("乙方,手机号"))
 (5)对预测结果进行处理:每个样本的预测结果均为 dict 类型,且支持打印,或保存为文件,支持保存的类型与具体产线相关,如:
 |方法|说明|方法参数|
 |-|-|-|
-|save_to_img|将版面分析、表格识别等结果保存为图片格式的文件|save_path:str类型,保存的文件路径;|
-|save_to_html|将表格识别等结果保存为html格式的文件|save_path:str类型,保存的文件路径;|
-|save_to_xlsx|将表格识别等结果保存为表格格式的文件|save_path:str类型,保存的文件路径;|
+|save_to_img|将版面分析、表格识别等结果保存为图片格式的文件|`save_path`:str类型,保存的文件路径;|
+|save_to_html|将表格识别等结果保存为html格式的文件|`save_path`:str类型,保存的文件路径;|
+|save_to_xlsx|将表格识别等结果保存为表格格式的文件|`save_path`:str类型,保存的文件路径;|
 
 在执行上述 Python 脚本时,加载的是默认的文档场景信息抽取v3产线配置文件,若您需要自定义配置文件,可执行如下命令获取:
 
@@ -581,6 +648,7 @@ if __name__ == "__main__":
     print("Final result:")
     print(len(result_chat["chatResult"]))
 ```
+**注**:请在 `API_KEY`、`SECRET_KEY` 处填入您的 ak、sk。
 </details>
 </details>
 <br/>

+ 126 - 46
docs/pipeline_usage/tutorials/information_extration_pipelines/document_scene_information_extraction_en.md

@@ -14,14 +14,14 @@ The **PP-ChatOCRv3-doc** pipeline includes modules for **Table Structure Recogni
 <details>
    <summary> 👉Model List Details</summary>
 
-**Table Structure Recognition Module Models:**
+**Table Structure Recognition Module Models**:
 
 <table>
   <tr>
     <th>Model</th>
     <th>Accuracy (%)</th>
     <th>GPU Inference Time (ms)</th>
-    <th>CPU Inference Time</th>
+    <th>CPU Inference Time (ms)</th>
     <th>Model Size (M)</th>
     <th>Description</th>
   </tr>
@@ -31,7 +31,7 @@ The **PP-ChatOCRv3-doc** pipeline includes modules for **Table Structure Recogni
     <td>522.536</td>
     <td>1845.37</td>
     <td>6.9 M</td>
-    <td>SLANet is a table structure recognition model independently developed by Baidu's PaddlePaddle vision team. The model significantly enhances the accuracy and inference speed for table structure recognition by utilizing the CPU-friendly lightweight backbone network PP-LCNet, the CSP-PAN high-low layer feature fusion module, and the SLA Head feature decoding module that aligns structural and positional information.</td>
+    <td>SLANet is a table structure recognition model developed by Baidu PaddlePaddle Vision Team. The model significantly improves the accuracy and inference speed of table structure recognition by adopting a CPU-friendly lightweight backbone network PP-LCNet, a high-low-level feature fusion module CSP-PAN, and a feature decoding module SLA Head that aligns structural and positional information.</td>
   </tr>
   <tr>
     <td>SLANet_plus</td>
@@ -39,65 +39,132 @@ The **PP-ChatOCRv3-doc** pipeline includes modules for **Table Structure Recogni
     <td>522.536</td>
     <td>1845.37</td>
     <td>6.9 M</td>
-    <td>SLANet_plus is an enhanced version of the SLANet model independently developed by Baidu's PaddlePaddle vision team. Compared to SLANet, SLANet_plus significantly improves the recognition capability for unbounded and complex tables, and reduces the model's sensitivity to table localization accuracy. Even if the table localization is offset, it can still accurately recognize the table structure.</td>
+    <td>SLANet_plus is an enhanced version of SLANet, the table structure recognition model developed by Baidu PaddlePaddle Vision Team. Compared to SLANet, SLANet_plus significantly improves the recognition ability for wireless and complex tables and reduces the model's sensitivity to the accuracy of table positioning, enabling more accurate recognition even with offset table positioning.</td>
   </tr>
 </table>
 
-**Note: The above accuracy metrics are measured on PaddleX's internal English table recognition dataset. All models' GPU inference times are based on NVIDIA Tesla T4 machines, with accuracy type FP32. CPU inference speeds are based on Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz, with 8 threads and accuracy type FP32.**
+**Note: The above accuracy metrics are measured on PaddleX's internally built English table recognition dataset. All GPU inference times are based on NVIDIA Tesla T4 machines with FP32 precision. CPU inference speeds are based on Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.**
 
-**Layout Area Detection Module Models:**
+**Layout Detection Module Models**:
 
-|Model Name|mAP (%)|GPU Inference Time (ms)|CPU Inference Time|Model Size (M)|
-|-|-|-|-|-|
-|PicoDet_layout_1x|86.8|13.036|91.2634|7.4 M|
-|PicoDet-L_layout_3cls|89.3|15.7425|159.771|22.6 M|
-|RT-DETR-H_layout_3cls|95.9|114.644|3832.62|470.1 M|
-|RT-DETR-H_layout_17cls|92.6|115.126|3827.25|470.2 M|
+|Model|mAP(0.5) (%)|GPU Inference Time (ms)|CPU Inference Time (ms)|Model Size (M)|Description|
+|-|-|-|-|-|-|
+|PicoDet-L_layout_3cls|89.3|15.7|159.8|22.6|A high-efficiency layout detection model based on PicoDet-L, including 3 categories: table, image, and seal.|
+|PicoDet_layout_1x|86.8|13.0|91.3|7.4|A high-efficiency layout detection model based on PicoDet-1x, including text, title, table, image, and list.|
+|RT-DETR-H_layout_17cls|92.6|115.1|3827.2|470.2|A high-precision layout detection model based on RT-DETR-H, including 17 common layout categories.|
+|RT-DETR-H_layout_3cls|95.9|114.6|3832.6|470.1|A high-precision layout detection model based on RT-DETR-H, including 3 categories: table, image, and seal.|
 
-**Note: The above accuracy metrics are evaluated on PaddleX's internal layout area analysis dataset, which includes 10,000 images. All models' GPU inference times are based on NVIDIA Tesla T4 machines, with accuracy type FP32. CPU inference speeds are based on Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz, with 8 threads and accuracy type FP32.**
+**Note: The above accuracy metrics are evaluated on PaddleOCR's self-built layout analysis dataset, containing 10,000 images. GPU inference times are based on NVIDIA Tesla T4 machines with FP32 precision. CPU inference speeds are based on Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.**
 
-**Text Detection Module Models:**
+**Text Detection Module Models**:
 
-|Model Name|Detection Hmean (%)|GPU Inference Time (ms)|CPU Inference Time|Model Size (M)|
-|-|-|-|-|-|
-|PP-OCRv4_mobile_det|77.79|10.6923|120.177|4.2 M|
-|PP-OCRv4_server_det|82.69|83.3501|2434.01|100.1 M|
+| Model | Detection Hmean (%) | GPU Inference Time (ms) | CPU Inference Time (ms) | Model Size (M) | Description |
+|-------|---------------------|-------------------------|-------------------------|--------------|-------------|
+| PP-OCRv4_server_det | 82.69 | 83.3501 | 2434.01 | 109 | PP-OCRv4's server-side text detection model, featuring higher accuracy, suitable for deployment on high-performance servers |
+| PP-OCRv4_mobile_det | 77.79 | 10.6923 | 120.177 | 4.7 | PP-OCRv4's mobile text detection model, optimized for efficiency, suitable for deployment on edge devices |
 
-**Note: The above accuracy metrics are evaluated on PaddleOCR's internal Chinese dataset, covering multiple scenarios such as street views, web images, documents, and handwriting, with 500 images for detection. All models' GPU inference times are based on NVIDIA Tesla T4 machines, with accuracy type FP32. CPU inference speeds are based on Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz, with 8 threads and accuracy type FP32.**
+**Note: The evaluation set for the above accuracy metrics is PaddleOCR's self-built Chinese dataset, covering street scenes, web images, documents, and handwritten texts, with 500 images for detection. All GPU inference times are based on NVIDIA Tesla T4 machines with FP32 precision. CPU inference speeds are based on Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.**
 
-**Text Recognition Module Models:**
+**Text Recognition Module Models**:
 
-|Model Name|Avg Accuracy (%)|GPU Inference Time (ms)|CPU Inference Time|Model Size (M)|
-|-|-|-|-|-|
-|PP-OCRv4_mobile_rec|78.20|7.95018|46.7868|10.6 M|
-|PP-OCRv4_server_rec|79.20|7.19439|140.179|71.2 M|
+<table>
+    <tr>
+        <th>Model</th>
+        <th>Recognition Avg Accuracy (%)</th>
+        <th>GPU Inference Time (ms)</th>
+        <th>CPU Inference Time (ms)</th>
+        <th>Model Size (M)</th>
+        <th>Description</th>
+    </tr>
+    <tr>
+        <td>PP-OCRv4_mobile_rec</td>
+        <td>78.20</td>
+        <td>7.95018</td>
+        <td>46.7868</td>
+        <td>10.6 M</td>
+        <td rowspan="2">PP-OCRv4 is the next version of Baidu PaddlePaddle's self-developed text recognition model PP-OCRv3. By introducing data augmentation schemes and GTC-NRTR guidance branches, it further improves text recognition accuracy without compromising inference speed. The model offers both server (server) and mobile (mobile) versions to meet industrial needs in different scenarios.</td>
+    </tr>
+    <tr>
+        <td>PP-OCRv4_server_rec</td>
+        <td>79.20</td>
+        <td>7.19439</td>
+        <td>140.179</td>
+        <td>71.2 M</td>
+    </tr>
+</table>
 
-**Note: The above accuracy metrics are evaluated on PaddleOCR's internal Chinese dataset, covering various scenarios such as street views, web images, documents, and handwriting, with 11,000 images for text recognition. All models' GPU inference times are based on NVIDIA Tesla T4 machines, with accuracy type FP32. CPU inference speeds are based on Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz, with 8 threads and accuracy type FP32.**
+**Note: The evaluation set for the above accuracy metrics is PaddleOCR's self-built Chinese dataset, covering street scenes, web images, documents, and handwritten texts, with 11,000 images for text recognition. All GPU inference times are based on NVIDIA Tesla T4 machines with FP32 precision. CPU inference speeds are based on Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.**
+
+<table >
+    <tr>
+        <th>Model</th>
+        <th>Recognition Avg Accuracy (%)</th>
+        <th>GPU Inference Time (ms)</th>
+        <th>CPU Inference Time</th>
+        <th>Model Size (M)</th>
+        <th>Description</th>
+    </tr>
+    <tr>
+        <td>ch_SVTRv2_rec</td>
+        <td>68.81</td>
+        <td>8.36801</td>
+        <td>165.706</td>
+        <td>73.9 M</td>
+        <td rowspan="1">
+        SVTRv2 is a server-side text recognition model developed by the OpenOCR team at the Vision and Learning Lab (FVL) of Fudan University. It won the first prize in the OCR End-to-End Recognition Task of the PaddleOCR Algorithm Model Challenge, with a 6% improvement in end-to-end recognition accuracy compared to PP-OCRv4 on the A-list.
+    </td>
+    </tr>
+</table>
 
-**Seal Text Detection Module Models:**
+**Note: The evaluation set for the above accuracy metrics is the [PaddleOCR Algorithm Model Challenge - Task 1: OCR End-to-End Recognition Task](https://aistudio.baidu.com/competition/detail/1131/0/introduction) A-list. GPU inference time is based on NVIDIA Tesla T4 with FP32 precision. CPU inference speed is based on Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.**
+
+<table >
+    <tr>
+        <th>Model</th>
+        <th>Recognition Avg Accuracy (%)</th>
+        <th>GPU Inference Time (ms)</th>
+        <th>CPU Inference Time</th>
+        <th>Model Size (M)</th>
+        <th>Description</th>
+    </tr>
+    <tr>
+        <td>ch_RepSVTR_rec</td>
+        <td>65.07</td>
+        <td>10.5047</td>
+        <td>51.5647</td>
+        <td>22.1 M</td>
+        <td rowspan="1">
+        The RepSVTR text recognition model is a mobile-oriented text recognition model based on SVTRv2. It won the first prize in the OCR End-to-End Recognition Task of the PaddleOCR Algorithm Model Challenge, with a 2.5% improvement in end-to-end recognition accuracy compared to PP-OCRv4 on the B-list, while maintaining similar inference speed.
+    </td>
+    </tr>
+</table>
 
-|Model|Detection Hmean (%)|GPU Inference Time (ms)|CPU Inference Time (ms)|Model Size (M)|Description|
-|-|-|-|-|-|-|
-|PP-OCRv4_server_seal_det|98.21|84.341|2425.06|109|The server-side seal text detection model of PP-OCRv4, with higher accuracy, suitable for deployment on high-performance servers.|
-|PP-OCRv4_mobile_seal_det|96.47|10.5878|131.813|4.6|The mobile-side seal text detection model of PP-OCRv4, with higher efficiency, suitable for edge deployment.|
+**Note: The evaluation set for the above accuracy metrics is the [PaddleOCR Algorithm Model Challenge - Task 1: OCR End-to-End Recognition Task](https://aistudio.baidu.com/competition/detail/1131/0/introduction) B-list. GPU inference time is based on NVIDIA Tesla T4 with FP32 precision. CPU inference speed is based on Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.**
 
-**Note: The above accuracy metrics are evaluated on an internal dataset containing 500 circular seal images. GPU inference times are based on NVIDIA Tesla T4 machines, with accuracy type FP32. CPU inference speeds are based on Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz, with 8 threads and accuracy type FP32.**
+**Seal Text Detection Module Models**:
 
-**Text Image Correction Module Models:**
+| Model | Detection Hmean (%) | GPU Inference Time (ms) | CPU Inference Time (ms) | Model Size (M) | Description |
+|-------|---------------------|-------------------------|-------------------------|--------------|-------------|
+| PP-OCRv4_server_seal_det | 98.21 | 84.341 | 2425.06 | 109 | PP-OCRv4's server-side seal text detection model, featuring higher accuracy, suitable for deployment on better-equipped servers |
+| PP-OCRv4_mobile_seal_det | 96.47 | 10.5878 | 131.813 | 4.6 | PP-OCRv4's mobile seal text detection model, offering higher efficiency, suitable for deployment on edge devices |
 
-|Model|MS-SSIM (%)|Model Size (M)|Description|
-|-|-|-|-|
-|UVDoc|54.40|30.3 M|High-precision text image correction model|
+**Note: The above accuracy metrics are evaluated on a self-built dataset containing 500 circular seal images. GPU inference time is based on an NVIDIA Tesla T4 machine with FP32 precision. CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.**
 
-**The accuracy metrics for the model are measured using the [DocUNet benchmark](https://www3.cs.stonybrook.edu/~cvl/docunet.html).**
+**Text Image Rectification Module Models**:
 
-**Document Image Orientation Classification Module Models:**
+| Model | MS-SSIM (%) | Model Size (M) | Description |
+|-------|-------------|--------------|-------------|
+| UVDoc | 54.40 | 30.3 M | High-precision text image rectification model |
 
-|Model|Top-1 Acc (%)|GPU Inference Time (ms)|CPU Inference Time (ms)|Model Size (M)|Description|
-|-|-|-|-|-|-|
-|PP-LCNet_x1_0_doc_ori|99.06|3.84845|9.23735|7|Document image classification model based on PP-LCNet_x1_0, containing four categories: 0 degrees, 90 degrees, 180 degrees, and 270 degrees.|
+**The accuracy metrics of the models are measured from the [DocUNet benchmark](https://www3.cs.stonybrook.edu/~cvl/docunet.html).**
+
+**Document Image Orientation Classification Module Models**:
 
-**Note: The above accuracy metrics are evaluated on an internal dataset covering various scenarios such as certificates and documents, with 1,000 images. GPU inference times are based on NVIDIA Tesla T4 machines, with accuracy type FP32. CPU inference speeds are based on Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz, with 8 threads and accuracy type FP32.**
+| Model | Top-1 Acc (%) | GPU Inference Time (ms) | CPU Inference Time (ms) | Model Size (M) | Description |
+|-------|---------------|-------------------------|-------------------------|--------------|-------------|
+| PP-LCNet_x1_0_doc_ori | 99.06 | 3.84845 | 9.23735 | 7 | A document image classification model based on PP-LCNet_x1_0, with four categories: 0°, 90°, 180°, 270° |
+
+**Note: The above accuracy metrics are evaluated on a self-built dataset covering various scenarios such as certificates and documents, containing 1000 images. GPU inference time is based on an NVIDIA Tesla T4 machine with FP32 precision. CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.**
 
 </details>
 
@@ -130,14 +197,14 @@ for res in visual_result:
     res.save_to_html('./output')
     res.save_to_xlsx('./output')
 
-print(predict.chat("Party B, Phone Number"))
+print(predict.chat("乙方,手机号"))
 ```
-**Note**: Please first obtain your ak and sk from the [Baidu Qianfan Platform](https://qianfan.cloud.baidu.com/) and fill them in the designated places to properly call the large model.
+**Note**: Please first obtain your ak and sk on the [Baidu Cloud Qianfan Platform](https://console.bce.baidu.com/qianfan/ais/console/onlineService) (for detailed steps, please refer to the [AK and SK Authentication API Call Process](https://cloud.baidu.com/doc/WENXINWORKSHOP/s/Hlwerugt8)), and fill in your ak and sk to the specified locations to enable normal calls to the large model.
 
 After running, the output is as follows:
 
 ```
-{'chat_res': {'Party B': 'Shareholding Test Co., Ltd.', 'Phone Number': '19331729920'}, 'prompt': ''}
+{'chat_res': {'乙方': '股份测试有限公司', '手机号': '19331729920'}, 'prompt': ''}
 ```
 
 In the above Python script, the following steps are executed:
@@ -161,6 +228,18 @@ In the above Python script, the following steps are executed:
 | str | Supports passing a local directory, which should contain files to be predicted, such as the local path: /root/data/; |
 | dict | Supports passing a dictionary type, where the key needs to correspond to the specific pipeline, such as "img
 
+(3) Obtain prediction results by calling the `predict` method: The `predict` method is a `generator`, so prediction results need to be obtained through calls. The `predict` method predicts data in batches, so the prediction results are represented as a list of prediction results.
+
+(4) Interact with the large model by calling the `predict.chat` method, which takes as input keywords (multiple keywords are supported) for information extraction. The prediction results are represented as a list of information extraction results.
+
+(5) Process the prediction results: The prediction result for each sample is in the form of a dict, which supports printing or saving to a file. The supported file types depend on the specific pipeline, such as:
+
+| Method | Description | Method Parameters |
+|-|-|-|
+| save_to_img | Saves layout analysis, table recognition, etc. results as image files. | `save_path`: str, the file path to save. |
+| save_to_html | Saves table recognition results as HTML files. | `save_path`: str, the file path to save. |
+| save_to_xlsx | Saves table recognition results as Excel files. | `save_path`: str, the file path to save. |
+
 When executing the above command, the default Pipeline configuration file is loaded. If you need to customize the configuration file, you can use the following command to obtain it:
 
 ```bash
@@ -210,7 +289,7 @@ for res in visual_result:
     res.save_to_html('./output')
     res.save_to_xlsx('./output')
 
-print(predict.chat("Party B, phone number"))
+print(predict.chat("乙方,手机号"))
 ```
 
 ## 3. Development Integration/Deployment
@@ -469,6 +548,7 @@ if __name__ == "__main__":
     print("Final result:")
     print(len(result_chat["chatResult"]))
 ```
+**Note**: Please fill in your ak and sk at `API_KEY` and `SECRET_KEY`.
 </details>
 </details>
 <br/>
@@ -512,7 +592,7 @@ Subsequently, load the modified pipeline configuration file using the command-li
 ## 5. Multi-hardware Support
 PaddleX supports various mainstream hardware devices such as NVIDIA GPUs, Kunlun XPU, Ascend NPU, and Cambricon MLU. **Seamless switching between different hardware can be achieved by simply setting the `--device` parameter**.
 
-For example, to perform inference using the PP-ChatOCRv3-doc Pipeline on an NVIDIA GPU```
+For example, to perform inference using the PP-ChatOCRv3-doc Pipeline on an NVIDIA GPU.
 At this point, if you wish to switch the hardware to Ascend NPU, simply modify the `--device` in the script to `npu`:
 
 ```python

+ 5 - 9
docs/pipeline_usage/tutorials/ocr_pipelines/OCR.md

@@ -138,13 +138,10 @@ from paddlex import create_pipeline
 
 pipeline = create_pipeline(pipeline="ocr")
 
-output = pipeline.predict("pre_image.jpg")
-for batch in output:
-    for item in batch:
-        res = item['result']
-        res.print()
-        res.save_to_img("./output/")
-        res.save_to_json("./output/")
+output = pipeline.predict("general_ocr_002.png")
+for res in output:
+    res.print()
+    res.save_to_img("./output/")
 ```
 > ❗ Python脚本运行得到的结果与命令行方式相同。
 
@@ -188,9 +185,8 @@ from paddlex import create_pipeline
 pipeline = create_pipeline(pipeline="./my_path/ocr.yaml")
 output = pipeline.predict("general_ocr_002.png")
 for res in output:
-    res.print(json_format=False)
+    res.print()
     res.save_to_img("./output/")
-    res.save_to_json("./output/res.json")
 ```
 ## 3. 开发集成/部署
 如果通用 OCR 产线可以达到您对产线推理速度和精度的要求,您可以直接进行开发集成/部署。

+ 6 - 10
docs/pipeline_usage/tutorials/ocr_pipelines/OCR_en.md

@@ -135,13 +135,10 @@ from paddlex import create_pipeline
 
 pipeline = create_pipeline(pipeline="ocr")
 
-output = pipeline.predict("pre_image.jpg")
-for batch in output:
-    for item in batch:
-        res = item['result']
-        res.print()
-        res.save_to_img("./output/")
-        res.save_to_json("./output/")
+output = pipeline.predict("general_ocr_002.png")
+for res in output:
+    res.print() 
+    res.save_to_img("./output/") 
 ```
 > ❗ The results obtained from running the Python script are the same as those from the command line.
 
@@ -185,9 +182,8 @@ from paddlex import create_pipeline
 pipeline = create_pipeline(pipeline="./my_path/ocr.yaml")
 output = pipeline.predict("general_ocr_002.png")
 for res in output:
-    res.print(json_format=False)
-    res.save_to_img("./output/")
-    res.save_to_json("./output/res.json")
+    res.print()
+    res.save_to_img("./output/") 
 ```
 
 ## 3. Development Integration/Deployment