Browse Source

update doc

zhouchangda 9 months ago
parent
commit
33d776b52b

+ 614 - 644
docs/pipeline_usage/tutorials/information_extraction_pipelines/document_scene_information_extraction_v3.en.md

@@ -328,846 +328,816 @@ The RepSVTR text recognition model is a mobile-oriented text recognition model b
 </details>
 
 ## 2. Quick Start
-PaddleX's pre-trained model pipelines can be quickly experienced. You can experience the effect of the Document Scene Information Extraction v3 pipeline online or locally using Python.
+The pre-trained model pipelines provided by PaddleX allow for quick experience of their effects. You can experience the effect of the Document Scene Information Extraction v3 pipeline online, or use Python to experience it locally.
 
 ### 2.1 Online Experience
-You can [experience online](https://aistudio.baidu.com/community/app/182491/webUI) the effect of the Document Scene Information Extraction v3 pipeline, using the official demo images for recognition, for example:
+You can [experience online](https://aistudio.baidu.com/community/app/182491/webUI) the effect of the Document Scene Information Extraction v3 pipeline, using the demo images provided by the official. For example:
 
 <img src="https://github.com/user-attachments/assets/aa261b2b-b79c-4487-9323-dfcc43c3d581"/>
 
-If you are satisfied with the pipeline's performance, you can directly integrate and deploy it. If not, you can also use your private data to <b>fine-tune the models in the pipeline online</b>.
+If you are satisfied with the pipeline's performance, you can directly integrate and deploy it. If not, you can also use private data to **fine-tune the models in the pipeline online**.
 
 ### 2.2 Local Experience
-Before using the PP-ChatOCRv3-doc pipeline locally, please ensure you have installed the PaddleX wheel package following the [PaddleX Local Installation Guide](../../../installation/installation.en.md).
+Before using the Document Scene Information Extraction v3 pipeline locally, ensure that you have completed the installation of the PaddleX wheel package according to the [PaddleX Local Installation Guide](../../../installation/installation.md).
 
-A few lines of code are all you need to complete the quick inference of the pipeline. Using the [test file](https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/contract.pdf), taking the PP-ChatOCRv3-doc pipeline as an example:
+Before performing model inference, you need to prepare the API key for the large language model. PP-ChatOCRv3 supports calling the large model inference service provided by the [Baidu Cloud Qianfan Platform](https://console.bce.baidu.com/qianfan/ais/console/onlineService). You can refer to [Authentication and Authorization](https://cloud.baidu.com/doc/WENXINWORKSHOP/s/Um2wxbaps) to obtain the API key from the Qianfan Platform.
+
+After updating the configuration file, you can use a few lines of Python code to complete the quick inference. You can use the [test file](https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/vehicle_certificate-1.png) for testing:
 
 ```python
 from paddlex import create_pipeline
 
-pipeline = create_pipeline(
-    pipeline="PP-ChatOCRv3-doc",
-    llm_name="ernie-3.5",
-    llm_params={"api_type": "qianfan", "ak": "", "sk": ""} # Please enter your ak and sk; otherwise, the large model cannot be invoked.
-    # llm_params={"api_type": "aistudio", "access_token": ""} # Please enter your access_token; otherwise, the large model cannot be invoked.
-    )
-
-visual_result, visual_info = pipeline.visual_predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/contract.pdf")
-
-for res in visual_result:
-    res.save_to_img("./output")
-    res.save_to_html('./output')
-    res.save_to_xlsx('./output')
-
-vector = pipeline.build_vector(visual_info=visual_info)
-
+pipeline = create_pipeline(pipeline="PP-ChatOCRv3-doc", initial_predictor=False)
+
+visual_predict_res = pipeline.visual_predict(input="vehicle_certificate-1.png",
+    use_doc_orientation_classify=False,
+    use_doc_unwarping=False,
+    use_common_ocr=True,
+    use_seal_recognition=True,
+    use_table_recognition=True)
+
+visual_info_list = []
+for res in visual_predict_res:
+    visual_info_list.append(res["visual_info"])
+    layout_parsing_result = res["layout_parsing_result"]
+
+vector_info = pipeline.build_vector(visual_info_list, flag_save_bytes_vector=True, retriever_config={
+    "module_name": "retriever",
+    "model_name": "embedding-v1",
+    "base_url": "https://qianfan.baidubce.com/v2",
+    "api_type": "qianfan",
+    "api_key": "api_key" # your api_key
+})
 chat_result = pipeline.chat(
-    key_list=["乙方", "手机号"],
-    visual_info=visual_info,
-    vector=vector,
-    )
-chat_result.print()
+    key_list=["驾驶室准乘人数"],
+    visual_info_list=visual_info_list,
+    vector_info=vector_info,
+    chat_bot_config={
+      "module_name": "chat_bot",
+      "model_name": "ernie-3.5-8k",
+      "base_url": "https://qianfan.baidubce.com/v2",
+      "api_type": "openai",
+      "api_key": "api_key" # your api_key
+    },
+    retriever_config={
+        "module_name": "retriever",
+        "model_name": "embedding-v1",
+        "base_url": "https://qianfan.baidubce.com/v2",
+        "api_type": "qianfan",
+        "api_key": "api_key" # your api_key
+    }
+)
+print(chat_result)
+
 ```
-<b>Note</b>: Currently, the large language model only supports Ernie. You can obtain the relevant ak/sk (access_token) on the [Baidu Cloud Qianfan Platform](https://console.bce.baidu.com/qianfan/ais/console/onlineService) or [Baidu AIStudio Community](https://aistudio.baidu.com/). If you use the Baidu Cloud Qianfan Platform, you can refer to the [AK and SK Authentication API Calling Process](https://cloud.baidu.com/doc/WENXINWORKSHOP/s/Hlwerugt8) to obtain ak/sk. If you use Baidu AIStudio Community, you can obtain the access_token from the [Baidu AIStudio Community Access Token](https://aistudio.baidu.com/account/accessToken).
 
-After running, the output is as follows:
+After running, the output will be as follows:
 
 ```
-{'chat_res': {'乙方': '股份测试有限公司', '手机号': '19331729920'}, 'prompt': ''}
+{'chat_res': {'驾驶室准乘人数': '2'}}
 ```
 
-In the above Python script, the following steps are executed:
+The prediction process, API descriptions, and output descriptions of PP-ChatOCRv3-doc are as follows:
+
+<details><summary>(1) Call the <code>create_pipeline</code> method to instantiate the PP-ChatOCRv3 pipeline object.</summary>
 
-(1) Call the `create_pipeline` to instantiate a PP-ChatOCRv3-doc pipeline object, related parameters descriptions are as follows:
+The relevant parameter descriptions are as follows:
 
 <table>
 <thead>
 <tr>
 <th>Parameter</th>
-<th>Type</th>
-<th>Default</th>
-<th>Description</th>
+<th>Parameter Description</th>
+<th>Parameter Type</th>
+<th>Default Value</th>
 </tr>
 </thead>
 <tbody>
 <tr>
 <td><code>pipeline</code></td>
-<td>str</td>
-<td>None</td>
-<td>Pipeline name or pipeline configuration file path. If it's a pipeline name, it must be supported by PaddleX;</td>
-</tr>
-<tr>
-<td><code>llm_name</code></td>
-<td>str</td>
-<td>"ernie-3.5"</td>
-<td>Large Language Model name, we support <code>ernie-4.0</code> and <code>ernie-3.5</code>, with more models on the way.</td>
-</tr>
-<tr>
-<td><code>llm_params</code></td>
-<td>dict</td>
-<td><code>{}</code></td>
-<td>API configuration;</td>
-</tr>
-<tr>
-<td><code>device(kwargs)</code></td>
-<td>str/<code>None</code></td>
+<td>The name of the pipeline or the path to the pipeline configuration file. If it is the name of the pipeline, it must be a pipeline supported by PaddleX.</td>
+<td><code>str</code></td>
 <td><code>None</code></td>
-<td>Running device, support <code>cpu</code>, <code>gpu</code>, <code>gpu:0</code>, etc. <code>None</code> meaning automatic selection;</td>
-</tr>
-</tbody>
-</table>
-(2) Call the `visual_predict` of the PP-ChatOCRv3-doc pipeline object to visual predict, related parameters descriptions are as follows:
-
-<table>
-<thead>
-<tr>
-<th>Parameter</th>
-<th>Type</th>
-<th>Default</th>
-<th>Description</th>
-</tr>
-</thead>
-<tbody>
-<tr>
-<td><code>input</code></td>
-<td>Python Var</td>
-<td>-</td>
-<td>Support to pass Python variables directly, such as <code>numpy.ndarray</code> representing image data;</td>
 </tr>
 <tr>
-<td><code>input</code></td>
-<td>str</td>
-<td>-</td>
-<td>Support to pass the path of the file to be predicted, such as the local path of an image file: <code>/root/data/img.jpg</code>;</td>
-</tr>
-<tr>
-<td><code>input</code></td>
-<td>str</td>
-<td>-</td>
-<td>Support to pass the URL of the file to be predicted, such as: <code>https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/contract.pdf</code>;</td>
-</tr>
-<tr>
-<td><code>input</code></td>
-<td>str</td>
-<td>-</td>
-<td>Support to pass the local directory, which should contain files to be predicted, such as: <code>/root/data/</code>;</td>
-</tr>
-<tr>
-<td><code>input</code></td>
-<td>dict</td>
-<td>-</td>
-<td>Support to pass a dictionary, where the key needs to correspond to the specific pipeline, such as: <code>{"img": "/root/data1"}</code>;</td>
-</tr>
-<tr>
-<td><code>input</code></td>
-<td>list</td>
-<td>-</td>
-<td>Support to pass a list, where the elements must be of the above types of data, such as: <code>[numpy.ndarray, numpy.ndarray]</code>,<code>["/root/data/img1.jpg", "/root/data/img2.jpg"]</code>,<code>["/root/data1", "/root/data2"]</code>,<code>[{"img": "/root/data1"}, {"img": "/root/data2/img.jpg"}]</code>;</td>
+<td><code>config</code></td>
+<td>Specific configuration information for the pipeline (if set simultaneously with <code>pipeline</code>, it has higher priority than <code>pipeline</code>, and the pipeline name must be consistent).</td>
+<td><code>dict[str, Any]</code></td>
+<td><code>None</code></td>
 </tr>
 <tr>
-<td><code>use_doc_image_ori_cls_model</code></td>
-<td>bool</td>
-<td><code>True</code></td>
-<td>Whether or not to use the orientation classification model;</td>
+<td><code>device</code></td>
+<td>The device for pipeline inference. Supports specifying specific GPU card numbers, such as "gpu:0", specific card numbers for other hardware, such as "npu:0", and CPU such as "cpu".</td>
+<td><code>str</code></td>
+<td><code>gpu</code></td>
 </tr>
 <tr>
-<td><code>use_doc_image_unwarp_model</code></td>
-<td>bool</td>
-<td><code>True</code></td>
-<td>Whether or not to use the unwarp model;</td>
+<td><code>use_hpip</code></td>
+<td>Whether to enable high-performance inference, which is only available when the pipeline supports it.</td>
+<td><code>bool</code></td>
+<td><code>False</code></td>
 </tr>
 <tr>
-<td><code>use_seal_text_det_model</code></td>
-<td>bool</td>
+<td><code>initial_predictor</code></td>
+<td>Whether to initialize the inference module (if <code>False</code>, it will be initialized when the relevant inference module is used for the first time).</td>
+<td><code>bool</code></td>
 <td><code>True</code></td>
-<td>Whether or not to use the seal text detection model;</td>
-</tr>
-</tbody>
-</table>
-(3) Call the relevant functions of prediction object to save the prediction results. The related functions are as follows:
-
-<table>
-<thead>
-<tr>
-<th>Function</th>
-<th>Parameter</th>
-<th>Description</th>
-</tr>
-</thead>
-<tbody>
-<tr>
-<td><code>save_to_img</code></td>
-<td><code>save_path</code></td>
-<td>Save OCR prediction results, layout results, and table recognition results as image files, with the parameter <code>save_path</code> used to specify the save path;</td>
-</tr>
-<tr>
-<td><code>save_to_html</code></td>
-<td><code>save_path</code></td>
-<td>Save the table recognition results as an HTML file, with the parameter 'save_path' used to specify the save path;</td>
-</tr>
-<tr>
-<td><code>save_to_xlsx</code></td>
-<td><code>save_path</code></td>
-<td>Save the table recognition results as an Excel file, with the parameter 'save_path' used to specify the save path;</td>
 </tr>
 </tbody>
 </table>
-(4) Call the `chat` of PP-ChatOCRv3-doc pipeline object to query information with LLM, related parameters are described as follows:
-
-<table>
-<thead>
-<tr>
-<th>Parameter</th>
-<th>Type</th>
-<th>Default</th>
-<th>Description</th>
-</tr>
-</thead>
-<tbody>
-<tr>
-<td><code>key_list</code></td>
-<td>str</td>
-<td>-</td>
-<td>Keywords used to query. A string composed of multiple keywords with "," as separators, such as "Party B, phone number";</td>
-</tr>
-<tr>
-<td><code>key_list</code></td>
-<td>list</td>
-<td>-</td>
-<td>Keywords used to query. A list composed of multiple keywords.</td>
-</tr>
-</tbody>
-</table>
-(3) Obtain prediction results by calling the `predict` method: The `predict` method is a `generator`, so prediction results need to be obtained through calls. The `predict` method predicts data in batches, so the prediction results are represented as a list of prediction results.
+</details>
 
-(4) Interact with the large model by calling the `predict.chat` method, which takes as input keywords (multiple keywords are supported) for information extraction. The prediction results are represented as a list of information extraction results.
+<details><summary>(2) Call the <code>visual_predict()</code> method of the PP-ChatOCRv3-doc pipeline object to obtain visual prediction results. This method will return a generator.</summary>
 
-(5) Process the prediction results: The prediction result for each sample is in the form of a dict, which supports printing or saving to a file. The supported file types depend on the specific pipeline, such as:
+The following are the parameters and their descriptions for the `visual_predict()` method:
 
 <table>
 <thead>
 <tr>
-<th>Method</th>
-<th>Description</th>
-<th>Method Parameters</th>
+<th>Parameter</th>
+<th>Parameter Description</th>
+<th>Parameter Type</th>
+<th>Options</th>
+<th>Default Value</th>
 </tr>
 </thead>
-<tbody>
 <tr>
-<td>save_to_img</td>
-<td>Saves layout analysis, table recognition, etc. results as image files.</td>
-<td><code>save_path</code>: str, the file path to save.</td>
+<td><code>input</code></td>
+<td>The data to be predicted, supporting multiple input types, required.</td>
+<td><code>Python Var|str|list</code></td>
+<td>
+<ul>
+<li><b>Python Var</b>: Such as <code>numpy.ndarray</code> representing image data.</li>
+<li><b>str</b>: Such as the local path of an image file or PDF file: <code>/root/data/img.jpg</code>; <b>URL link</b>, such as the network URL of an image file or PDF file: <a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/demo_paper.png">Example</a>; <b>Local directory</b>, which should contain images to be predicted, such as the local path: <code>/root/data/</code> (currently does not support prediction of PDF files in directories, PDF files need to be specified to the specific file path).</li>
+<li><b>List</b>: List elements need to be of the above types of data, such as <code>[numpy.ndarray, numpy.ndarray]</code>, <code>["/root/data/img1.jpg", "/root/data/img2.jpg"]</code>, <code>["/root/data1", "/root/data2"]</code>.</li>
+</ul>
+</td>
+<td><code>None</code></td>
 </tr>
 <tr>
-<td>save_to_html</td>
-<td>Saves table recognition results as HTML files.</td>
-<td><code>save_path</code>: str, the file path to save.</td>
+<td><code>device</code></td>
+<td>The device for pipeline inference.</td>
+<td><code>str|None</code></td>
+<td>
+<ul>
+<li><b>CPU</b>: Such as <code>cpu</code> to use CPU for inference;</li>
+<li><b>GPU</b>: Such as <code>gpu:0</code> to use the first GPU for inference;</li>
+<li><b>NPU</b>: Such as <code>npu:0</code> to use the first NPU for inference;</li>
+<li><b>XPU</b>: Such as <code>xpu:0</code> to use the first XPU for inference;</li>
+<li><b>MLU</b>: Such as <code>mlu:0</code> to use the first MLU for inference;</li>
+<li><b>DCU</b>: Such as <code>dcu:0</code> to use the first DCU for inference;</li>
+<li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline. During initialization, it will prioritize using the local GPU 0 device, and if not available, it will use the CPU device;</li>
+</ul>
+</td>
+<td><code>None</code></td>
 </tr>
 <tr>
-<td>save_to_xlsx</td>
-<td>Saves table recognition results as Excel files.</td>
-<td><code>save_path</code>: str, the file path to save.</td>
-</tr>
-</tbody>
-</table>
-When executing the above command, the default Pipeline configuration file is loaded. If you need to customize the configuration file, you can use the following command to obtain it:
-
-```bash
-paddlex --get_pipeline_config PP-ChatOCRv3-doc
-```
-
-After execution, the configuration file for the PP-ChatOCRv3-doc pipeline will be saved in the current path. If you wish to customize the save location, you can execute the following command (assuming the custom save location is `./my_path`):
-
-```bash
-paddlex --get_pipeline_config PP-ChatOCRv3-doc --save_path ./my_path
-```
-After obtaining the configuration file, you can customize the various configurations of the PP-ChatOCRv3-doc pipeline:
-
-```yaml
-Pipeline:
-  layout_model: RT-DETR-H_layout_3cls
-  table_model: SLANet_plus
-  text_det_model: PP-OCRv4_server_det
-  text_rec_model: PP-OCRv4_server_rec
-  seal_text_det_model: PP-OCRv4_server_seal_det
-  doc_image_ori_cls_model: null
-  doc_image_unwarp_model: null
-  llm_name: "ernie-3.5"
-  llm_params:
-    api_type: qianfan
-    ak:
-    sk:
-```
-
-In the above configuration, you can modify the models loaded by each module of the pipeline, as well as the large language model used. Please refer to the module documentation for the list of supported models for each module, and the list of supported large language models includes: ernie-4.0, ernie-3.5, ernie-3.5-8k, ernie-lite, ernie-tiny-8k, ernie-speed, ernie-speed-128k, ernie-char-8k.
-
-After making modifications, simply update the `pipeline` parameter value in the `create_pipeline` method to the path of your pipeline configuration file to apply the configuration.
-
-For example, if your configuration file is saved at `./my_path/PP-ChatOCRv3-doc.yaml`, you would execute:
-
-```python
-from paddlex import create_pipeline
-
-pipeline = create_pipeline(
-    pipeline="./my_path/PP-ChatOCRv3-doc.yaml",
-    llm_name="ernie-3.5",
-    llm_params={"api_type": "qianfan", "ak": "", "sk": ""} # Please enter your ak and sk; otherwise, the large model cannot be invoked.
-    # llm_params={"api_type": "aistudio", "access_token": ""} # Please enter your access_token; otherwise, the large model cannot be invoked.
-    )
-
-visual_result, visual_info = pipeline.visual_predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/contract.pdf")
-
-for res in visual_result:
-    res.save_to_img("./output")
-    res.save_to_html('./output')
-    res.save_to_xlsx('./output')
-
-vector = pipeline.build_vector(visual_info=visual_info)
-
-chat_result = pipeline.chat(
-    key_list=["乙方", "手机号"],
-    visual_info=visual_info,
-    vector=vector,
-    )
-chat_result.print()
-```
-
-## 3. Development Integration/Deployment
-If the pipeline meets your requirements for inference speed and accuracy, you can proceed directly with development integration/deployment.
-
-If you need to directly apply the pipeline in your Python project, you can refer to the example code in [2.2 Local Experience](#22-python-script-integration).
-
-Additionally, PaddleX provides three other deployment methods, detailed as follows:
-
-🚀 <b>High-Performance Inference</b>: In actual production environments, many applications have stringent standards for the performance metrics (especially response speed) of deployment strategies to ensure efficient system operation and smooth user experience. To this end, PaddleX provides high-performance inference plugins aimed at deeply optimizing model inference and pre/post-processing to significantly speed up the end-to-end process. For detailed high-performance inference procedures, please refer to the [PaddleX High-Performance Inference Guide](../../../pipeline_deploy/high_performance_inference.en.md).
-
-☁️ <b>Serving</b>: Serving is a common deployment strategy in real-world production environments. By encapsulating inference functions into services, clients can access these services via network requests to obtain inference results. PaddleX supports various solutions for serving pipelines. For detailed pipeline serving procedures, please refer to the [PaddleX Pipeline Serving Guide](../../../pipeline_deploy/serving.md).
-
-Below are the API reference and multi-language service invocation examples for the basic serving solution:
-
-<details><summary>API Reference</summary>
-<p>For the main operations provided by the service:</p>
+<td><code>use_doc_orientation_classify</code></td>
+<td>Whether to use the document orientation classification module.</td>
+<td><code>bool|None</code></td>
+<td>
 <ul>
-<li>The HTTP request method is POST.</li>
-<li>Both the request body and response body are JSON data (JSON objects).</li>
-<li>When the request is processed successfully, the response status code is <code>200</code>, and the attributes of the response body are as follows:</li>
+<li><b>bool</b>: <code>True</code> or <code>False</code>;</li>
+<li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline, initialized to <code>True</code>;</li>
 </ul>
-<table>
-<thead>
-<tr>
-<th>Name</th>
-<th>Type</th>
-<th>Meaning</th>
+</td>
+<td><code>None</code></td>
 </tr>
-</thead>
-<tbody>
 <tr>
-<td><code>logId</code></td>
-<td><code>string</code></td>
-<td>The UUID of the request.</td>
+<td><code>use_doc_unwarping</code></td>
+<td>Whether to use the document distortion correction module.</td>
+<td><code>bool|None</code></td>
+<td>
+<ul>
+<li><b>bool</b>: <code>True</code> or <code>False</code>;</li>
+<li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline, initialized to <code>True</code>;</li>
+</ul>
+</td>
+<td><code>None</code></td>
 </tr>
 <tr>
-<td><code>errorCode</code></td>
-<td><code>integer</code></td>
-<td>Error code. Fixed as <code>0</code>.</td>
+<td><code>use_textline_orientation</code></td>
+<td>Whether to use the text line orientation classification module.</td>
+<td><code>bool|None</code></td>
+<td>
+<ul>
+<li><b>bool</b>: <code>True</code> or <code>False</code>;</li>
+<li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline, initialized to <code>True</code>;</li>
+</ul>
+</td>
+<td><code>None</code></td>
 </tr>
 <tr>
-<td><code>errorMsg</code></td>
-<td><code>string</code></td>
-<td>Error message. Fixed as <code>"Success"</code>.</td>
+<td><code>use_general_ocr</code></td>
+<td>Whether to use the OCR sub-pipeline.</td>
+<td><code>bool|None</code></td>
+<td>
+<ul>
+<li><b>bool</b>: <code>True</code> or <code>False</code>;</li>
+<li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline, initialized to <code>True</code>;</li>
+</ul>
+</td>
+<td><code>None</code></td>
 </tr>
 <tr>
-<td><code>result</code></td>
-<td><code>object</code></td>
-<td>The result of the operation.</td>
-</tr>
-</tbody>
-</table>
+<td><code>use_seal_recognition</code></td>
+<td>Whether to use the seal recognition sub-pipeline.</td>
+<td><code>bool|None</code></td>
+<td>
 <ul>
-<li>When the request is not processed successfully, the attributes of the response body are as follows:</li>
+<li><b>bool</b>: <code>True</code> or <code>False</code>;</li>
+<li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline, initialized to <code>True</code>;</li>
 </ul>
-<table>
-<thead>
-<tr>
-<th>Name</th>
-<th>Type</th>
-<th>Meaning</th>
+</td>
+<td><code>None</code></td>
 </tr>
-</thead>
-<tbody>
 <tr>
-<td><code>logId</code></td>
-<td><code>string</code></td>
-<td>The UUID of the request.</td>
+<td><code>use_table_recognition</code></td>
+<td>Whether to use the table recognition sub-pipeline.</td>
+<td><code>bool|None</code></td>
+<td>
+<ul>
+<li><b>bool</b>: <code>True</code> or <code>False</code>;</li>
+<li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline, initialized to <code>True</code>;</li>
+</ul>
+</td>
+<td><code>None</code></td>
 </tr>
 <tr>
-<td><code>errorCode</code></td>
-<td><code>integer</code></td>
-<td>Error code. Same as the response status code.</td>
+<td><code>layout_threshold</code></td>
+<td>The score threshold for the layout model.</td>
+<td><code>float|dict|None</code></td>
+<td>
+<ul>
+<li><b>float</b>: Any floating-point number between <code>0-1</code>;</li>
+<li><b>dict</b>: <code>{0:0.1}</code> where the key is the category ID and the value is the threshold for that category;</li>
+<li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline, initialized to <code>0.5</code>;</li>
+</ul>
+</td>
+<td><code>None</code></td>
 </tr>
 <tr>
-<td><code>errorMsg</code></td>
-<td><code>string</code></td>
-<td>Error message.</td>
-</tr>
-</tbody>
-</table>
-<p>The main operations provided by the service are as follows:</p>
+<td><code>layout_nms</code></td>
+<td>Whether to use NMS.</td>
+<td><code>bool|None</code></td>
+<td>
 <ul>
-<li><b><code>analyzeImages</code></b></li>
+<li><b>bool</b>: <code>True</code> or <code>False</code>;</li>
+<li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline, initialized to <code>True</code>;</li>
 </ul>
-<p>Analyze images using computer vision models to obtain OCR and table recognition results, and extract key information from images.</p>
-<p><code>POST /chatocr-visual</code></p>
+</td>
+<td><code>None</code></td>
+</tr>
+<tr>
+<td><code>layout_unclip_ratio</code></td>
+<td>The expansion coefficient for layout detection.</td>
+<td><code>float|Tuple[float,float]|None</code></td>
+<td>
 <ul>
-<li>The attributes of the request body are as follows:</li>
+<li><b>float</b>: Any floating-point number greater than <code>0</code>;</li>
+<li><b>Tuple[float,float]</b>: The expansion coefficients in the horizontal and vertical directions, respectively;</li>
+<li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline, initialized to <code>1.0</code>;</li>
 </ul>
-<table>
-<thead>
-<tr>
-<th>Name</th>
-<th>Type</th>
-<th>Meaning</th>
-<th>Required</th>
+</td>
+<td><code>None</code></td>
 </tr>
-</thead>
-<tbody>
 <tr>
-<td><code>file</code></td>
-<td><code>string</code></td>
-<td>The URL of an image or PDF file accessible by the server, or the Base64-encoded content of the file. For PDF files exceeding 10 pages, only the first 10 pages will be used.</td>
-<td>Yes</td>
+<td><code>layout_merge_bboxes_mode</code></td>
+<td>The overlapping box filtering method.</td>
+<td><code>str|None</code></td>
+<td>
+<ul>
+<li><b>str</b>: large, small, union. Respectively representing retaining the large box, small box, or both when filtering overlapping boxes.</li>
+<li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline, initialized to <code>large</code>;</li>
+</ul>
+</td>
+<td><code>None</code></td>
 </tr>
 <tr>
-<td><code>fileType</code></td>
-<td><code>integer</code> | <code>null</code></td>
-<td>The type of the file. <code>0</code> for PDF files, <code>1</code> for image files. If this attribute is missing, the file type will be inferred from the URL.</td>
-<td>No</td>
+<td><code>text_det_limit_side_len</code></td>
+<td>The side length limit for text detection images.</td>
+<td><code>int|None</code></td>
+<td>
+<ul>
+<li><b>int</b>: Any integer greater than <code>0</code>;</li>
+<li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline, initialized to <code>960</code>;</li>
+</ul>
+</td>
+<td><code>None</code></td>
 </tr>
 <tr>
-<td><code>useDocOrientationClassify</code></td>
-<td><code>boolean</code> | <code>null</code></td>
-<td>Refer to the <code>use_doc_orientation_classify</code> parameter description in the pipeline <code>predict</code> method.</td>
-<td>No</td>
+<td><code>text_det_limit_type</code></td>
+<td>The side length limit type for text detection images.</td>
+<td><code>str|None</code></td>
+<td>
+<ul>
+<li><b>str</b>: Supports <code>min</code> and <code>max</code>, where <code>min</code> ensures that the shortest side of the image is not less than <code>det_limit_side_len</code>, and <code>max</code> ensures that the longest side of the image is not greater than <code>limit_side_len</code>.</li>
+<li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline, initialized to <code>max</code>;</li>
+</ul>
+</td>
+<td><code>None</code></td>
 </tr>
 <tr>
-<td><code>useDocUnwarping</code></td>
-<td><code>boolean</code> | <code>null</code></td>
-<td>Refer to the <code>use_doc_unwarping</code> parameter description in the pipeline <code>predict</code> method.</td>
-<td>No</td>
+<td><code>text_det_thresh</code></td>
+<td>The detection pixel threshold, where pixels with scores greater than this threshold in the output probability map are considered text pixels.</td>
+<td><code>float|None</code></td>
+<td>
+<ul>
+<li><b>float</b>: Any floating-point number greater than <code>0</code>.</li>
+<li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline, <code>0.3</code>.</li>
+</ul>
+</td>
+<td><code>None</code></td>
 </tr>
 <tr>
-<td><code>useGeneralOcr</code></td>
-<td><code>boolean</code> | <code>null</code></td>
-<td>Refer to the <code>use_general_ocr</code> parameter description in the pipeline <code>predict</code> method.</td>
-<td>No</td>
+<td><code>text_det_box_thresh</code></td>
+<td>The detection box threshold, where a detection result is considered a text region if the average score of all pixels within the border of the result is greater than this threshold.</td>
+<td><code>float|None</code></td>
+<td>
+<ul>
+<li><b>float</b>: Any floating-point number greater than <code>0</code>.</li>
+<li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline, <code>0.6</code>.</li>
+</ul>
+</td>
+<td><code>None</code></td>
 </tr>
 <tr>
-<td><code>useSealRecognition</code></td>
-<td><code>boolean</code> | <code>null</code></td>
-<td>Refer to the <code>use_seal_recognition</code> parameter description in the pipeline <code>predict</code> method.</td>
-<td>No</td>
+<td><code>text_det_unclip_ratio</code></td>
+<td>The text detection expansion coefficient, which expands the text region using this method. The larger the value, the larger the expansion area.</td>
+<td><code>float|None</code></td>
+<td>
+<ul>
+<li><b>float</b>: Any floating-point number greater than <code>0</code>.</li>
+<li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline, <code>2.0</code>.</li>
+</ul>
+</td>
+<td><code>None</code></td>
 </tr>
 <tr>
-<td><code>useTableRecognition</code></td>
-<td><code>boolean</code> | <code>null</code></td>
-<td>Refer to the <code>use_table_recognition</code> parameter description in the pipeline <code>predict</code> method.</td>
-<td>No</td>
+<td><code>text_rec_score_thresh</code></td>
+<td>The text recognition threshold, where text results with scores greater than this threshold are retained.</td>
+<td><code>float|None</code></td>
+<td>
+<ul>
+<li><b>float</b>: Any floating-point number greater than <code>0</code>.</li>
+<li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline, <code>0.0</code>. I.e., no threshold is set.</li>
+</ul>
+</td>
+<td><code>None</code></td>
 </tr>
 <tr>
-<td><code>textDetLimitSideLen</code></td>
-<td><code>integer</code> | <code>null</code></td>
-<td>Refer to the <code>text_det_limit_side_len</code> parameter description in the pipeline <code>predict</code> method.</td>
-<td>No</td>
+<td><code>seal_det_limit_side_len</code></td>
+<td>The side length limit for seal detection images.</td>
+<td><code>int|None</code></td>
+<td>
+<ul>
+<li><b>int</b>: Any integer greater than <code>0</code>;</li>
+<li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline, initialized to <code>960</code>;</li>
+</ul>
+</td>
+<td><code>None</code></td>
 </tr>
 <tr>
-<td><code>textDetLimitType</code></td>
-<td><code>string</code> | <code>null</code></td>
-<td>Refer to the <code>text_det_limit_type</code> parameter description in the pipeline <code>predict</code> method.</td>
-<td>No</td>
+<td><code>seal_det_limit_type</code></td>
+<td>The side length limit type for seal detection images.</td>
+<td><code>str|None</code></td>
+<td>
+<ul>
+<li><b>str</b>: Supports <code>min</code> and <code>max</code>, where <code>min</code> ensures that the shortest side of the image is not less than <code>det_limit_side_len</code>, and <code>max</code> ensures that the longest side of the image is not greater than <code>limit_side_len</code>.</li>
+<li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline, initialized to <code>max</code>;</li>
+</ul>
+</td>
+<td><code>None</code></td>
 </tr>
 <tr>
-<td><code>textDetThresh</code></td>
-<td><code>number</code> | <code>null</code></td>
-<td>Refer to the <code>text_det_thresh</code> parameter description in the pipeline <code>predict</code> method.</td>
-<td>No</td>
+<td><code>seal_det_thresh</code></td>
+<td>The detection pixel threshold, where pixels with scores greater than this threshold in the output probability map are considered seal pixels.</td>
+<td><code>float|None</code></td>
+<td>
+<ul>
+<li><b>float</b>: Any floating-point number greater than <code>0</code>.</li>
+<li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline, <code>0.3</code>.</li>
+</ul>
+</td>
+<td><code>None</code></td>
 </tr>
 <tr>
-<td><code>textDetBoxThresh</code></td>
-<td><code>number</code> | <code>null</code></td>
-<td>Refer to the <code>text_det_box_thresh</code> parameter description in the pipeline <code>predict</code> method.</td>
-<td>No</td>
+<td><code>seal_det_box_thresh</code></td>
+<td>The detection box threshold, where a detection result is considered a seal region if the average score of all pixels within the border of the result is greater than this threshold.</td>
+<td><code>float|None</code></td>
+<td>
+<ul>
+<li><b>float</b>: Any floating-point number greater than <code>0</code>.</li>
+<li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline, <code>0.6</code>.</li>
+</ul>
+</td>
+<td><code>None</code></td>
 </tr>
 <tr>
-<td><code>textDetUnclipRatio</code></td>
-<td><code>number</code> | <code>null</code></td>
-<td>Refer to the <code>text_det_unclip_ratio</code> parameter description in the pipeline <code>predict</code> method.</td>
-<td>No</td>
+<td><code>seal_det_unclip_ratio</code></td>
+<td>The seal detection expansion coefficient, which expands the seal region using this method. The larger the value, the larger the expansion area.</td>
+<td><code>float|None</code></td>
+<td>
+<ul>
+<li><b>float</b>: Any floating-point number greater than <code>0</code>.</li>
+<li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline, <code>2.0</code>.</li>
+</ul>
+</td>
+<td><code>None</code></td>
 </tr>
 <tr>
-<td><code>textRecScoreThresh</code></td>
-<td><code>number</code> | <code>null</code></td>
-<td>Refer to the <code>text_rec_score_thresh</code> parameter description in the pipeline <code>predict</code> method.</td>
-<td>No</td>
+<td><code>seal_rec_score_thresh</code></td>
+<td>The seal recognition threshold, where text results with scores greater than this threshold are retained.</td>
+<td><code>float|None</code></td>
+<td>
+<ul>
+<li><b>float</b>: Any floating-point number greater than <code>0</code>.</li>
+<li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline, <code>0.0</code>. I.e., no threshold is set.</li>
+</ul>
+</td>
+<td><code>None</code></td>
 </tr>
+</table>
+</details>
+<details><summary>(3) Process the visual prediction results.</summary>
+
+The prediction result for each sample is of type `dict`, containing two fields: `visual_info` and `layout_parsing_result`. Obtain visual information (including `normal_text_dict`, `table_text_list`, `table_html_list`, etc.) through `visual_info`, and place the information for each sample into the `visual_info_list` list, which will be sent to the large language model later.
+
+Of course, you can also obtain the layout parsing results through `layout_parsing_result`, which contains tables, text, images, etc., contained in the file or image, and supports printing, saving as an image, and saving as a `json` file:
+
+```markdown
+```
+``````markdown
+```python
+......
+for res in visual_predict_res:
+    visual_info_list.append(res["visual_info"])
+    layout_parsing_result = res["layout_parsing_result"]
+    layout_parsing_result.print()
+    layout_parsing_result.save_to_img("./output")
+    layout_parsing_result.save_to_json("./output")
+    layout_parsing_result.save_to_xlsx("./output")
+    layout_parsing_result.save_to_html("./output")
+......
+```
+
+<table>
+<thead>
 <tr>
-<td><code>sealDetLimitSideLen</code></td>
-<td><code>integer</code> | <code>null</code></td>
-<td>Refer to the <code>seal_det_limit_side_len</code> parameter description in the pipeline <code>predict</code> method.</td>
-<td>No</td>
+<th>Method</th>
+<th>Method Description</th>
+<th>Parameters</th>
+<th>Parameter Type</th>
+<th>Parameter Description</th>
+<th>Default Value</th>
 </tr>
+</thead>
 <tr>
-<td><code>sealDetLimitType</code></td>
-<td><code>string</code> | <code>null</code></td>
-<td>Refer to the <code>seal_det_limit_type</code> parameter description in the pipeline <code>predict</code> method.</td>
-<td>No</td>
+<td rowspan="3"><code>print()</code></td>
+<td rowspan="3">Prints the result to the terminal</td>
+<td><code>format_json</code></td>
+<td><code>bool</code></td>
+<td>Whether to format the output content with JSON indentation</td>
+<td><code>True</code></td>
 </tr>
 <tr>
-<td><code>sealDetThresh</code></td>
-<td><code>number</code> | <code>null</code></td>
-<td>Refer to the <code>seal_det_thresh</code> parameter description in the pipeline <code>predict</code> method.</td>
-<td>No</td>
+<td><code>indent</code></td>
+<td><code>int</code></td>
+<td>Specifies the indentation level to beautify the output JSON data for better readability, only valid when <code>format_json</code> is <code>True</code></td>
+<td>4</td>
 </tr>
 <tr>
-<td><code>sealDetBoxThresh</code></td>
-<td><code>number</code> | <code>null</code></td>
-<td>Refer to the <code>seal_det_box_thresh</code> parameter description in the pipeline <code>predict</code> method.</td>
-<td>No</td>
+<td><code>ensure_ascii</code></td>
+<td><code>bool</code></td>
+<td>Controls whether to escape non-ASCII characters to Unicode. When set to <code>True</code>, all non-ASCII characters will be escaped; <code>False</code> retains the original characters, only valid when <code>format_json</code> is <code>True</code></td>
+<td><code>False</code></td>
 </tr>
 <tr>
-<td><code>sealDetUnclipRatio</code></td>
-<td><code>number</code> | <code>null</code></td>
-<td>Refer to the <code>seal_det_unclip_ratio</code> parameter description in the pipeline <code>predict</code> method.</td>
-<td>No</td>
+<td rowspan="3"><code>save_to_json()</code></td>
+<td rowspan="3">Saves the result as a JSON file</td>
+<td><code>save_path</code></td>
+<td><code>str</code></td>
+<td>The file path for saving, when it is a directory, the saved file name will be consistent with the input file type</td>
+<td>N/A</td>
 </tr>
 <tr>
-<td><code>sealRecScoreThresh</code></td>
-<td><code>number</code> | <code>null</code></td>
-<td>Refer to the <code>seal_rec_score_thresh</code> parameter description in the pipeline <code>predict</code> method.</td>
-<td>No</td>
+<td><code>indent</code></td>
+<td><code>int</code></td>
+<td>Specifies the indentation level to beautify the output JSON data for better readability, only valid when <code>format_json</code> is <code>True</code></td>
+<td>4</td>
 </tr>
-</tbody>
-</table>
-<ul>
-<li>When the request is processed successfully, the <code>result</code> in the response body has the following attributes:</li>
-</ul>
-<table>
-<thead>
 <tr>
-<th>Name</th>
-<th>Type</th>
-<th>Meaning</th>
+<td><code>ensure_ascii</code></td>
+<td><code>bool</code></td>
+<td>Controls whether to escape non-ASCII characters to Unicode. When set to <code>True</code>, all non-ASCII characters will be escaped; <code>False</code> retains the original characters, only valid when <code>format_json</code> is <code>True</code></td>
+<td><code>False</code></td>
 </tr>
-</thead>
-<tbody>
 <tr>
-<td><code>layoutParsingResults</code></td>
-<td><code>array</code></td>
-<td>The analysis results obtained using a computer vision model. The length of the array is 1 (for image input) or the smaller of the number of document pages and 10 (for PDF input). For PDF input, each element in the array represents the processing result of each page in the PDF file.</td>
+<td><code>save_to_img()</code></td>
+<td>Saves the visual images of each module in PNG format</td>
+<td><code>save_path</code></td>
+<td><code>str</code></td>
+<td>The file path for saving, supports directory or file path</td>
+<td>N/A</td>
 </tr>
 <tr>
-<td><code>visualInfo</code></td>
-<td><code>array</code></td>
-<td>Key information in the image, which can be used as input for other operations.</td>
+<td><code>save_to_html()</code></td>
+<td>Saves the tables in the file as an HTML file</td>
+<td><code>save_path</code></td>
+<td><code>str</code></td>
+<td>The file path for saving, supports directory or file path</td>
+<td>N/A</td>
 </tr>
 <tr>
-<td><code>dataInfo</code></td>
-<td><code>object</code></td>
-<td>Information about the input data.</td>
+<td><code>save_to_xlsx()</code></td>
+<td>Saves the tables in the file as an XLSX file</td>
+<td><code>save_path</code></td>
+<td><code>str</code></td>
+<td>The file path for saving, supports directory or file path</td>
+<td>N/A</td>
 </tr>
-</tbody>
 </table>
-<p>Each element in <code>layoutParsingResults</code> is an <code>object</code> with the following attributes:</p>
+
+- Calling the `print()` method will print the result to the terminal. The content printed to the terminal is explained as follows:
+    - `input_path`: `(str)` The input path of the image to be predicted
+
+    - `page_index`: `(Union[int, None])` If the input is a PDF file, it indicates the current page number of the PDF, otherwise it is `None`
+
+    - `model_settings`: `(Dict[str, bool])` Model parameters required for configuring the pipeline
+
+        - `use_doc_preprocessor`: `(bool)` Controls whether to enable the document preprocessing pipeline
+        - `use_general_ocr`: `(bool)` Controls whether to enable the OCR pipeline
+        - `use_seal_recognition`: `(bool)` Controls whether to enable the seal recognition pipeline
+        - `use_table_recognition`: `(bool)` Controls whether to enable the table recognition pipeline
+        - `use_formula_recognition`: `(bool)` Controls whether to enable the formula recognition pipeline
+
+    - `parsing_res_list`: `(List[Dict])` A list of parsing results, each element is a dictionary, and the list order is the reading order after parsing.
+        - `block_bbox`: `(np.ndarray)` The bounding box of the layout area.
+        - `block_label`: `(str)` The label of the layout area, such as `text`, `table`, etc.
+        - `block_content`: `(str)` The content within the layout area.
+
+    - `overall_ocr_res`: `(Dict[str, Union[List[str], List[float], numpy.ndarray]])` A dictionary of global OCR results
+      - `input_path`: `(Union[str, None])` The image path received by the image OCR pipeline, saved as `None` when the input is `numpy.ndarray`
+      - `model_settings`: `(Dict)` Model configuration parameters for the OCR pipeline
+      - `dt_polys`: `(List[numpy.ndarray])` A list of polygon boxes for text detection. Each detection box is represented by a numpy array of 4 vertex coordinates, with a shape of (4, 2) and a data type of int16
+      - `dt_scores`: `(List[float])` A list of confidence scores for text detection boxes
+      - `text_det_params`: `(Dict[str, Dict[str, int, float]])` Configuration parameters for the text detection module
+        - `limit_side_len`: `(int)` The side length limit during image preprocessing
+        - `limit_type`: `(str)` The processing method for the side length limit
+        - `thresh`: `(float)` The confidence threshold for text pixel classification
+        - `box_thresh`: `(float)` The confidence threshold for text detection boxes
+        - `unclip_ratio`: `(float)` The expansion coefficient for text detection boxes
+        - `text_type`: `(str)` The type of text detection, currently fixed as "general"
+
+      - `text_type`: `(str)` The type of text detection, currently fixed as "general"
+      - `textline_orientation_angles`: `(List[int])` The prediction results of text line orientation classification. Actual angle values are returned when enabled (e.g., [0,0,1])
+      - `text_rec_score_thresh`: `(float)` The filtering threshold for text recognition results
+      - `rec_texts`: `(List[str])` A list of text recognition results, only including texts with confidence exceeding `text_rec_score_thresh`
+      - `rec_scores`: `(List[float])` A list of confidence scores for text recognition, already filtered by `text_rec_score_thresh`
+      - `rec_polys`: `(List[numpy.ndarray])` A list of text detection boxes filtered by confidence, with the same format as `dt_polys`
+
+    - `formula_res_list`: `(List[Dict[str, Union[numpy.ndarray, List[float], str]]])` A list of formula recognition results, each element is a dictionary
+        - `rec_formula`: `(str)` The formula recognition result
+        - `rec_polys`: `(numpy.ndarray)` The formula detection box, with a shape of (4, 2) and a dtype of int16
+        - `formula_region_id`: `(int)` The region```markdown
+- Calling the `save_to_json()` method will save the aforementioned content to the specified `save_path`. If a directory is specified, the save path will be `save_path/{your_img_basename}_res.json`. If a file is specified, it will be saved directly to that file. Since JSON files do not support saving numpy arrays, `numpy.array` types will be converted to list form.
+- Invoking the `save_to_img()` method will save the visualization results to the specified `save_path`. If a directory is specified, the layout detection visualization image, global OCR visualization image, reading order visualization image, and other contents will be saved. If a file is specified, it will be saved directly to that file. (Pipelines often involve multiple result images, so it is not recommended to specify a specific file path directly, as multiple images will be overwritten, leaving only the last one.)
+
+In addition, attributes are also supported to obtain visualized images with results and prediction results, as detailed below:
+
 <table>
 <thead>
 <tr>
-<th>Name</th>
-<th>Type</th>
-<th>Meaning</th>
+<th>Attribute</th>
+<th>Attribute Description</th>
 </tr>
 </thead>
-<tbody>
 <tr>
-<td><code>prunedResult</code></td>
-<td><code>object</code></td>
-<td>A simplified version of the <code>res</code> field in the JSON representation of the result generated by the <code>predict</code> method of the production object, with the <code>input_path</code> field removed.</td>
+<td rowspan="1"><code>json</code></td>
+<td rowspan="1">Obtain prediction results in <code>json</code> format</td>
 </tr>
 <tr>
-<td><code>outputImages</code></td>
-<td><code>object</code> | <code>null</code></td>
-<td>A key-value pair of the input image and the predicted result image. The images are in JPEG format, encoded in Base64.</td>
+<td rowspan="2"><code>img</code></td>
+<td rowspan="2">Obtain visualized images in <code>dict</code> format</td>
 </tr>
-<tr>
-<td><code>inputImage</code></td>
-<td><code>string</code> | <code>null</code></td>
-<td>The input image. The image is in JPEG format, encoded in Base64.</td>
-</tr>
-</tbody>
 </table>
-<ul>
-<li><b><code>buildVectorStore</code></b></li>
-</ul>
-<p>Build a vector database.</p>
-<p><code>POST /chatocr-vector</code></p>
-<ul>
-<li>The attributes of the request body are as follows:</li>
-</ul>
+
+- The prediction result obtained by the `json` attribute is data of type `dict`, and its content is consistent with the content saved by calling the `save_to_json()` method.
+- The prediction result returned by the `img` attribute is data of type `dict`. The keys are `layout_det_res`, `overall_ocr_res`, `text_paragraphs_ocr_res`, `formula_res_region1`, `table_cell_img`, and `seal_res_region1`, and the corresponding values are `Image.Image` objects: used to display the visualized images of layout detection, OCR, OCR text paragraphs, formulas, tables, and seal results, respectively. If optional modules are not used, only `layout_det_res` is included in the dictionary.
+</details>
+
+<details><summary>(4) Call the <code>build_vector()</code> method of the PP-ChatOCRv3-doc Pipeline object to construct vectors for text content.</summary>
+
+Below are the parameters and their descriptions for the `build_vector()` method:
+
 <table>
 <thead>
 <tr>
-<th>Name</th>
-<th>Type</th>
-<th>Meaning</th>
-<th>Required</th>
+<th>Parameter</th>
+<th>Parameter Description</th>
+<th>Parameter Type</th>
+<th>Options</th>
+<th>Default Value</th>
 </tr>
 </thead>
-<tbody>
 <tr>
-<td><code>visualInfo</code></td>
-<td><code>array</code></td>
-<td>Key information in the image. Provided by the <code>analyzeImages</code> operation.</td>
-<td>Yes</td>
+<td><code>visual_info</code></td>
+<td>Visual information, which can be a dictionary containing visual information or a list composed of such dictionaries</td>
+<td><code>list|dict</code></td>
+<td>
+<code>None</code>
+</td>
+<td><code>None</code></td>
 </tr>
 <tr>
-<td><code>minCharacters</code></td>
-<td><code>integer</code> | <code>null</code></td>
-<td>The minimum data length required to enable the vector database.</td>
-<td>No</td>
+<td><code>min_characters</code></td>
+<td>Minimum number of characters</td>
+<td><code>int</code></td>
+<td>
+A positive integer greater than 0, determined based on the token length supported by the large language model
+</td>
+<td><code>3500</code></td>
 </tr>
 <tr>
-<td><code>llmRequestInterval</code></td>
-<td><code>number</code> | <code>null</code></td>
-<td>The interval time for calling the large language model API.</td>
-<td>No</td>
+<td><code>block_size</code></td>
+<td>Block size for vector library creation of long texts</td>
+<td><code>int</code></td>
+<td>
+A positive integer greater than 0, determined based on the token length supported by the large language model
+</td>
+<td><code>300</code></td>
 </tr>
-</tbody>
-</table>
-<ul>
-<li>When the request is processed successfully, the <code>result</code> in the response body has the following attributes:</li>
-</ul>
-<table>
-<thead>
 <tr>
-<th>Name</th>
-<th>Type</th>
-<th>Meaning</th>
+<td><code>flag_save_bytes_vector</code></td>
+<td>Whether to save text as a binary file</td>
+<td><code>bool</code></td>
+<td>
+<code>True|False</code>
+</td>
+<td><code>False</code></td>
 </tr>
-</thead>
-<tbody>
 <tr>
-<td><code>vectorInfo</code></td>
-<td><code>object</code></td>
-<td>The serialized result of the vector database, which can be used as input for other operations.</td>
+<td><code>retriever_config</code></td>
+<td>Configuration parameters for the vector retrieval large model, refer to the "LLM_Retriever" field in the configuration file</td>
+<td><code>dict</code></td>
+<td>
+<code>None</code>
+</td>
+<td><code>None</code></td>
 </tr>
-</tbody>
 </table>
-<ul>
-<li><b><code>chat</code></b></li>
-</ul>
-<p>Interact with large language models to extract key information.</p>
-<p><code>POST /chatocr-chat</code></p>
-<ul>
-<li>The properties of the request body are as follows:</li>
-</ul>
+This method returns a dictionary containing visual text information, with the following content:
+
+- `flag_save_bytes_vector`: `(bool)` Whether the result is saved as a binary file
+- `flag_too_short_text`: `(bool)` Whether the text length is less than the minimum number of characters
+- `vector`: `(str|list)` The binary content or text content of the text, depending on the values of `flag_save_bytes_vector` and `min_characters`. If `flag_save_bytes_vector=True` and the text length is greater than or equal to the minimum number of characters, binary content is returned; otherwise, the original text is returned.
+</details>
+
+<details><summary>(5) Call the <code>chat()</code> method of the PP-ChatOCRv3-doc Pipeline object to extract key information.</summary>
+
+Below are the parameters and their descriptions for the `chat()` method:
+
 <table>
 <thead>
 <tr>
-<th>Name</th>
-<th>Type</th>
-<th>Meaning</th>
-<th>Required</th>
+<th>Parameter</th>
+<th>Parameter Description</th>
+<th>Parameter Type</th>
+<th>Options</th>
+<th>Default Value</th>
 </tr>
 </thead>
 <tbody>
 <tr>
-<td><code>keyList</code></td>
-<td><code>array</code></td>
-<td>List of keywords.</td>
-<td>Yes</td>
+<td><code>key_list</code></td>
+<td>A single key or a list of keys used to extract information</td>
+<td><code>Union[str, List[str]]</code></td>
+<td><code>None</code></td>
+<td><code>None</code></td>
 </tr>
 <tr>
-<td><code>visualInfo</code></td>
-<td><code>object</code></td>
-<td>Key information in the image. Provided by the <code>analyzeImages</code> operation.</td>
-<td>Yes</td>
+<td><code>visual_info</code></td>
+<td>Visual information results</td>
+<td><code>List[dict]</code></td>
+<td><code>None</code></td>
+<td><code>None</code></td>
 </tr>
 <tr>
-<td><code>useVectorRetrieval</code></td>
-<td><code>boolean</code> | <code>null</code></td>
-<td>See the <code>use_vector_retrieval</code> parameter description in the pipeline <code>predict</code> method.</td>
-<td>No</td>
+<td><code>use_vector_retrieval</code></td>
+<td>Whether to use vector retrieval</td>
+<td><code>bool</code></td>
+<td><code>True|False</code></td>
+<td><code>True</code></td>
 </tr>
 <tr>
-<td><code>vectorInfo</code></td>
-<td><code>object</code> | <code>null</code></td>
-<td>Serialized result of the vector database. Provided by the <code>buildVectorStore</code> operation.</td>
-<td>No</td>
+<td><code>vector_info</code></td>
+<td>Vector information used for retrieval</td>
+<td><code>dict</code></td>
+<td><code>None</code></td>
+<td><code>None</code></td>
 </tr>
 <tr>
-<td><code>minCharacters</code></td>
-<td><code>integer</code></td>
-<td>Minimum data length required to enable the vector database.</td>
-<td>No</td>
+<td><code>min_characters</code></td>
+<td>Required minimum number of characters</td>
+<td><code>int</code></td>
+<td>A positive integer greater than 0</td>
+<td><code>3500</code></td>
 </tr>
 <tr>
-<td><code>textTaskDescription</code></td>
-<td><code>string</code> | <code>null</code></td>
-<td>See the <code>text_task_description</code> parameter description in the pipeline <code>predict</code> method.</td>
-<td>No</td>
+<td><code>text_task_description</code></td>
+<td>Description of the text task</td>
+<td><code>str</code></td>
+<td><code>None</code></td>
+<td><code>None</code></td>
 </tr>
 <tr>
-<td><code>textOutputFormat</code></td>
-<td><code>string</code> | <code>null</code></td>
-<td>See the <code>text_output_format</code> parameter description in the pipeline <code>predict</code> method.</td>
-<td>No</td>
+<td><code>text_output_format</code></td>
+<td>Output format of text results</td>
+<td><code>str</code></td>
+<td><code>None</code></td>
+<td><code>None</code></td>
 </tr>
 <tr>
-<td><code>textRulesStr</code></td>
-<td><code>string</code> | <code>null</code></td>
-<td>See the <code>text_rules_str</code> parameter description in the pipeline <code>predict</code> method.</td>
-<td>No</td>
+<td><code>text_rules_str</code></td>
+<td>Rules for generating text results</td>
+<td><code>str</code></td>
+<td><code>None</code></td>
+<td><code>None</code></td>
 </tr>
 <tr>
-<td><code>textFewShotDemoTextContent</code></td>
-<td><code>string</code> | <code>null</code></td>
-<td>See the <code>text_few_shot_demo_text_content</code> parameter description in the pipeline <code>predict</code> method.</td>
-<td>No</td>
+<td><code>text_few_shot_demo_text_content</code></td>
+<td>Text content for few-shot demonstration</td>
+<td><code>str</code></td>
+<td><code>None</code></td>
+<td><code>None</code></td>
 </tr>
 <tr>
-<td><code>textFewShotDemoKeyValueList</code></td>
-<td><code>string</code> | <code>null</code></td>
-<td>See the <code>text_few_shot_demo_key_value_list</code> parameter description in the pipeline <code>predict</code> method.</td>
-<td>No</td>
+<td><code>text_few_shot_demo_key_value_list</code></td>
+<td>Key-value list for few-shot demonstration</td>
+<td><code>str</code></td>
+<td><code>None</code></td>
+<td><code>None</code></td>
 </tr>
 <tr>
-<td><code>tableTaskDescription</code></td>
-<td><code>string</code> | <code>null</code></td>
-<td>See the <code>table_task_description</code> parameter description in the pipeline <code>predict</code> method.</td>
-<td>No</td>
+<td><code>table_task_description</code></td>
+<td>Description of the table task</td>
+<td><code>str</code></td>
+<td><code>None</code></td>
+<td><code>None</code></td>
 </tr>
 <tr>
-<td><code>tableOutputFormat</code></td>
-<td><code>string</code> | <code>null</code></td>
-<td>See the <code>table_output_format</code> parameter description in the pipeline <code>predict</code> method.</td>
-<td>No</td>
+<td><code>table_output_format</code></td>
+<td>表结果的输出格式</td>
+<td><code>str</code></td>
+<td><code>None</code></td>
+<td><code>None</code></td>
 </tr>
 <tr>
-<td><code>tableRulesStr</code></td>
-<td><code>string</code> | <code>null</code></td>
-<td>See the <code>table_rules_str</code> parameter description in the pipeline <code>predict</code> method.</td>
-<td>No</td>
+<td><code>table_rules_str</code></td>
+<td>生成表结果的规则</td>
+<td><code>str</code></td>
+<td><code>None</code></td>
+<td><code>None</code></td>
 </tr>
 <tr>
-<td><code>tableFewShotDemoTextContent</code></td>
-<td><code>string</code> | <code>null</code></td>
-<td>See the <code>table_few_shot_demo_text_content</code> parameter description in the pipeline <code>predict</code> method.</td>
-<td>No</td>
+<td><code>table_few_shot_demo_text_content</code></td>
+<td>表少样本演示的文本内容</td>
+<td><code>str</code></td>
+<td><code>None</code></td>
+<td><code>None</code></td>
 </tr>
 <tr>
-<td><code>tableFewShotDemoKeyValueList</code></td>
-<td><code>string</code> | <code>null</code></td>
-<td>See the <code>table_few_shot_demo_key_value_list</code> parameter description in the pipeline <code>predict</code> method.</td>
-<td>No</td>
+<td><code>table_few_shot_demo_key_value_list</code></td>
+<td>表少样本演示的键值列表</td>
+<td><code>str</code></td>
+<td><code>None</code></td>
+<td><code>None</code></td>
 </tr>
-</tbody>
-</table>
-<ul>
-<li>When the request is processed successfully, the <code>result</code> in the response body has the following attributes:</li>
-</ul>
-<table>
-<thead>
 <tr>
-<th>Name</th>
-<th>Type</th>
-<th>Meaning</th>
+<td><code>chat_bot_config</code></td>
+<td>大语言模型配置信息,内容参考产线配置文件“LLM_Chat”字段</td>
+<td><code>dict</code></td>
+<td>
+<code>None</code>
+</td>
+<td><code>None</code></td>
 </tr>
-</thead>
-<tbody>
 <tr>
-<td><code>chatResult</code></td>
-<td><code>object</code></td>
-<td>The result of key information extraction.</td>
+<td><code>retriever_config</code></td>
+<td>向量检索大模型配置参数,内容参考配置文件中的“LLM_Retriever”字段</td>
+<td><code>dict</code></td>
+<td>
+<code>None</code>
+</td>
+<td><code>None</code></td>
 </tr>
 </tbody>
-</table></details>
-<details><summary>Multi-language service call example</summary>
-<details>
-<summary>Python</summary>
-
-<pre><code class="language-python">import base64
-import pprint
-import sys
-
-import requests
-
-
-API_BASE_URL = "http://0.0.0.0:8080"
-
-file_path = "./demo.jpg"
-keys = ["Name"]
+</table>
 
-with open(file_path, "rb") as file:
-    file_bytes = file.read()
-    file_data = base64.b64encode(file_bytes).decode("ascii")
+该方法会将结果打印到终端,打印到终端的内容解释如下:
+  - `chat_res`: `(dict)` 提取信息的结果,是一个字典,包含了待抽取的键和对应的值。
 
-payload = {
-    "file": file_data,
-    "fileType": 1,
-}
-resp_visual = requests.post(url=f"{API_BASE_URL}/chatocr-visual", json=payload)
-if resp_visual.status_code != 200:
-    print(
-        f"Request to chatocr-visual failed with status code {resp_visual.status_code}.",
-        file=sys.stderr,
-    )
-    pprint.pp(resp_visual.json())
-    sys.exit(1)
-result_visual = resp_visual.json()["result"]
-
-for i, res in enumerate(result_visual["layoutParsingResults"]):
-    print(res["prunedResult"])
-    for img_name, img in res["outputImages"].items():
-        img_path = f"{img_name}_{i}.jpg"
-        with open(img_path, "wb") as f:
-            f.write(base64.b64decode(img))
-        print(f"Output image saved at {img_path}")
-
-payload = {
-    "visualInfo": result_visual["visualInfo"],
-}
-resp_vector = requests.post(url=f"{API_BASE_URL}/chatocr-vector", json=payload)
-if resp_vector.status_code != 200:
-    print(
-        f"Request to chatocr-vector failed with status code {resp_vector.status_code}.",
-        file=sys.stderr,
-    )
-    pprint.pp(resp_vector.json())
-    sys.exit(1)
-result_vector = resp_vector.json()["result"]
-
-payload = {
-    "keyList": keys,
-    "visualInfo": result_visual["visualInfo"],
-    "useVectorRetrieval": True,
-    "vectorInfo": result_vector["vectorInfo"],
-}
-
-resp_chat = requests.post(url=f"{API_BASE_URL}/chatocr-chat", json=payload)
-if resp_chat.status_code != 200:
-    print(
-        f"Request to chatocr-chat failed with status code {resp_chat.status_code}.",
-        file=sys.stderr,
-    )
-    pprint.pp(resp_chat.json())
-    sys.exit(1)
-result_chat = resp_chat.json()["result"]
-print("Final result:")
-print(result_chat["chatResult"])
-</code></pre>
-<b>Note</b>: Please fill in your API key and secret key at `API_KEY` and `SECRET_KEY`. </details>
 </details>
-<br/>
-
-📱 <b>Edge Deployment</b>: Edge deployment is a method that places computing and data processing functions on user devices themselves, allowing devices to process data directly without relying on remote servers. PaddleX supports deploying models on edge devices such as Android. For detailed edge deployment procedures, please refer to the [PaddleX Edge Deployment Guide](../../../pipeline_deploy/edge_deploy.en.md).
 
 ## 4. Custom Development
 

+ 0 - 2
docs/pipeline_usage/tutorials/information_extraction_pipelines/document_scene_information_extraction_v3.md

@@ -904,8 +904,6 @@ for res in visual_predict_res:
 - 调用`save_to_json()` 方法会将上述内容保存到指定的`save_path`中,如果指定为目录,则保存的路径为`save_path/{your_img_basename}_res.json`,如果指定为文件,则直接保存到该文件中。由于json文件不支持保存numpy数组,因此会将其中的`numpy.array`类型转换为列表形式。
 - 调用`save_to_img()` 方法会将可视化结果保存到指定的`save_path`中,如果指定为目录,则会将版面区域检测可视化图像、全局OCR可视化图像、版面阅读顺序可视化图像等内容保存,如果指定为文件,则直接保存到该文件中。(产线通常包含较多结果图片,不建议直接指定为具体的文件路径,否则多张图会被覆盖,仅保留最后一张图)
 
-
-
 此外,也支持通过属性获取带结果的可视化图像和预测结果,具体如下:
 <table>
 <thead>

+ 2072 - 0
docs/pipeline_usage/tutorials/information_extraction_pipelines/document_scene_information_extraction_v4.en.md

@@ -0,0 +1,2072 @@
+---
+comments: true
+---
+
+# PP-ChatOCRv4-doc Pipeline utorial
+
+## 1. Introduction to PP-ChatOCRv4-doc Pipeline
+PP-ChatOCRv4-doc is a unique document and image intelligent analysis solution from PaddlePaddle, combining LLM, MLLM, and OCR technologies to address complex document information extraction challenges such as layout analysis, rare characters, multi-page PDFs, tables, and seal recognition. Integrated with ERNIE Bot, it fuses massive data and knowledge, achieving high accuracy and wide applicability. This pipeline also provides flexible service deployment options, supporting deployment on various hardware. Furthermore, it offers secondary development capabilities, allowing you to train and fine-tune models on your own datasets, with seamless integration of trained models.
+
+<img src="https://github.com/user-attachments/assets/0870cdec-1909-4247-9004-d9efb4ab9635">
+
+The Document Scene Information Extraction v4 pipeline includes modules for **Layout Region Detection**, **Table Structure Recognition**, **Table Classification**, **Table Cell Localization**, **Text Detection**, **Text Recognition**, **Seal Text Detection**, **Text Image Rectification**, and **Document Image Orientation Classification**. The relevant models are integrated as sub-pipelines, and you can view the model configurations of different modules through the [pipeline configuration](../../../../paddlex/configs/pipelines/PP-ChatOCRv4-doc.yaml).
+
+<b>If you prioritize model accuracy, choose a model with higher accuracy. If you prioritize inference speed, select a model with faster inference. If you prioritize model storage size, choose a model with a smaller storage size.</b> Benchmarks for some models are as follows:
+
+<details><summary> 👉Model List Details</summary>
+<p><b>Table Structure Recognition Module Models</b>:</p>
+<table>
+<tr>
+<th>Model</th><th>Model Download Link</th>
+<th>Accuracy (%)</th>
+<th>CPU Inference Time (ms)<br/>[Normal Mode / High-Performance Mode]</th>
+<th>CPU Inference Time (ms)<br/>[Normal Mode / High-Performance Mode]</th>
+<th>Model Size (M)</th>
+<th>Description</th>
+</tr>
+<tr>
+<td>SLANet</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/SLANet_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/SLANet_pretrained.pdparams">Trained Model</a></td>
+<td>59.52</td>
+<td>103.08 / 103.08</td>
+<td>197.99 / 197.99</td>
+<td>6.9 M</td>
+<td>SLANet is a table structure recognition model developed by Baidu PaddleX Team. The model significantly improves the accuracy and inference speed of table structure recognition by adopting a CPU-friendly lightweight backbone network PP-LCNet, a high-low-level feature fusion module CSP-PAN, and a feature decoding module SLA Head that aligns structural and positional information.</td>
+</tr>
+<tr>
+<td>SLANet_plus</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/SLANet_plus_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/SLANet_plus_pretrained.pdparams">Trained Model</a></td>
+<td>63.69</td>
+<td>140.29 / 140.29</td>
+<td>195.39 / 195.39</td>
+<td>6.9 M</td>
+<td>SLANet_plus is an enhanced version of SLANet, the table structure recognition model developed by Baidu PaddleX Team. Compared to SLANet, SLANet_plus significantly improves the recognition ability for wireless and complex tables and reduces the model's sensitivity to the accuracy of table positioning, enabling more accurate recognition even with offset table positioning.</td>
+</tr>
+</table>
+
+<p><b>Layout Detection Module Models</b>:</p>
+<table>
+<thead>
+<tr>
+<th>Model</th><th>Model Download Link</th>
+<th>mAP(0.5) (%)</th>
+<th>CPU Inference Time (ms)<br/>[Normal Mode / High-Performance Mode]</th>
+<th>CPU Inference Time (ms)<br/>[Normal Mode / High-Performance Mode]</th>
+<th>Model Storage Size (M)</th>
+<th>Introduction</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td>PP-DocLayout-L</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-DocLayout-L_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-DocLayout-L_pretrained.pdparams">Training Model</a></td>
+<td>90.4</td>
+<td>34.6244 / 10.3945</td>
+<td>510.57 / -</td>
+<td>123.76 M</td>
+<td>A high-precision layout area localization model trained on a self-built dataset containing Chinese and English papers, magazines, contracts, books, exams, and research reports using RT-DETR-L.</td>
+</tr>
+<tr>
+<td>PP-DocLayout-M</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-DocLayout-M_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-DocLayout-M_pretrained.pdparams">Training Model</a></td>
+<td>75.2</td>
+<td>13.3259 / 4.8685</td>
+<td>44.0680 / 44.0680</td>
+<td>22.578</td>
+<td>A layout area localization model with balanced precision and efficiency, trained on a self-built dataset containing Chinese and English papers, magazines, contracts, books, exams, and research reports using PicoDet-L.</td>
+</tr>
+<tr>
+<td>PP-DocLayout-S</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-DocLayout-S_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-DocLayout-S_pretrained.pdparams">Training Model</a></td>
+<td>70.9</td>
+<td>8.3008 / 2.3794</td>
+<td>10.0623 / 9.9296</td>
+<td>4.834</td>
+<td>A high-efficiency layout area localization model trained on a self-built dataset containing Chinese and English papers, magazines, contracts, books, exams, and research reports using PicoDet-S.</td>
+</tr>
+</tbody>
+</table>
+<b>Note: The evaluation dataset for the above precision metrics is a self-built layout area detection dataset by PaddleOCR, containing 500 common document-type images of Chinese and English papers, magazines, contracts, books, exams, and research reports. GPU inference time is based on an NVIDIA Tesla T4 machine with FP32 precision. CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.</b>
+
+> ❗ The above list includes the <b>3 core models</b> that are key supported by the text recognition module. The module actually supports a total of <b>11 full models</b>, including several predefined models with different categories. The complete model list is as follows:
+
+<details><summary> 👉 Details of Model List</summary>
+
+* <b>Table Layout Detection Model</b>
+<table>
+<thead>
+<tr>
+<th>Model</th><th>Model Download Link</th>
+<th>mAP(0.5) (%)</th>
+<th>CPU Inference Time (ms)<br/>[Normal Mode / High-Performance Mode]</th>
+<th>CPU Inference Time (ms)<br/>[Normal Mode / High-Performance Mode]</th>
+<th>Model Storage Size (M)</th>
+<th>Introduction</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td>PicoDet_layout_1x_table</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PicoDet_layout_1x_table_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PicoDet_layout_1x_table_pretrained.pdparams">Training Model</a></td>
+<td>97.5</td>
+<td>8.02 / 3.09</td>
+<td>23.70 / 20.41</td>
+<td>7.4 M</td>
+<td>A high-efficiency layout area localization model trained on a self-built dataset using PicoDet-1x, capable of detecting table regions.</td>
+</tr>
+</tbody></table>
+<b>Note: The evaluation dataset for the above precision metrics is a self-built layout table area detection dataset by PaddleOCR, containing 7835 Chinese and English document images with tables. GPU inference time is based on an NVIDIA Tesla T4 machine with FP32 precision. CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.</b>
+
+* <b>3-Class Layout Detection Model, including Table, Image, and Stamp</b>
+<table>
+<thead>
+<tr>
+<th>Model</th><th>Model Download Link</th>
+<th>mAP(0.5) (%)</th>
+<th>CPU Inference Time (ms)<br/>[Normal Mode / High-Performance Mode]</th>
+<th>CPU Inference Time (ms)<br/>[Normal Mode / High-Performance Mode]</th>
+<th>Model Storage Size (M)</th>
+<th>Introduction</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td>PicoDet-S_layout_3cls</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PicoDet-S_layout_3cls_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PicoDet-S_layout_3cls_pretrained.pdparams">Training Model</a></td>
+<td>88.2</td>
+<td>8.99 / 2.22</td>
+<td>16.11 / 8.73</td>
+<td>4.8</td>
+<td>A high-efficiency layout area localization model trained on a self-built dataset of Chinese and English papers, magazines, and research reports using PicoDet-S.</td>
+</tr>
+<tr>
+<td>PicoDet-L_layout_3cls</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PicoDet-L_layout_3cls_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PicoDet-L_layout_3cls_pretrained.pdparams">Training Model</a></td>
+<td>89.0</td>
+<td>13.05 / 4.50</td>
+<td>41.30 / 41.30</td>
+<td>22.6</td>
+<td>A balanced efficiency and precision layout area localization model trained on a self-built dataset of Chinese and English papers, magazines, and research reports using PicoDet-L.</td>
+</tr>
+<tr>
+<td>RT-DETR-H_layout_3cls</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/RT-DETR-H_layout_3cls_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/RT-DETR-H_layout_3cls_pretrained.pdparams">Training Model</a></td>
+<td>95.8</td>
+<td>114.93 / 27.71</td>
+<td>947.56 / 947.56</td>
+<td>470.1</td>
+<td>A high-precision layout area localization model trained on a self-built dataset of Chinese and English papers, magazines, and research reports using RT-DETR-H.</td>
+</tr>
+</tbody></table>
+<b>Note: The evaluation dataset for the above precision metrics is a self-built layout area detection dataset by PaddleOCR, containing 1154 common document images of Chinese and English papers, magazines, and research reports. GPU inference time is based on an NVIDIA Tesla T4 machine with FP32 precision. CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.</b>
+
+* <b>5-Class English Document Area Detection Model, including Text, Title, Table, Image, and List</b>
+<table>
+<thead>
+<tr>
+<th>Model</th><th>Model Download Link</th>
+<th>mAP(0.5) (%)</th>
+<th>CPU Inference Time (ms)<br/>[Normal Mode / High-Performance Mode]</th>
+<th>CPU Inference Time (ms)<br/>[Normal Mode / High-Performance Mode]</th>
+<th>Model Storage Size (M)</th>
+<th>Introduction</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td>PicoDet_layout_1x</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PicoDet_layout_1x_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PicoDet_layout_1x_pretrained.pdparams">Training Model</a></td>
+<td>97.8</td>
+<td>9.03 / 3.10</td>
+<td>25.82 / 20.70</td>
+<td>7.4</td>
+<td>A high-efficiency English document layout area localization model trained on the PubLayNet dataset using PicoDet-1x.</td>
+</tr>
+</tbody></table>
+<b>Note: The evaluation dataset for the above precision metrics is the [PubLayNet](https://developer.ibm.com/exchanges/data/all/publaynet/) dataset, containing 11245 English document images. GPU inference time is based on an NVIDIA Tesla T4 machine with FP32 precision. CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.</b>
+
+* <b>17-Class Area Detection Model, including 17 common layout categories: Paragraph Title, Image, Text, Number, Abstract, Content, Figure Caption, Formula, Table, Table Caption, References, Document Title, Footnote, Header, Algorithm, Footer, and Stamp</b>
+<table>
+<thead>
+<tr>
+<th>Model</th><th>Model Download Link</th>
+<th>mAP(0.5) (%)</th>
+<th>CPU Inference Time (ms)<br/>[Normal Mode / High-Performance Mode]</th>
+<th>CPU Inference Time (ms)<br/>[Normal Mode / High-Performance Mode]</th>
+<th>Model Storage Size (M)</th>
+<th>Introduction</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td>PicoDet-S_layout_17cls</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PicoDet-S_layout_17cls_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PicoDet-S_layout_17cls_pretrained.pdparams">Training Model</a></td>
+<td>87.4</td>
+<td>9.11 / 2.12</td>
+<td>15.42 / 9.12</td>
+<td>4.8</td>
+<td>A high-efficiency layout area localization model trained on a self-built dataset of Chinese and English papers, magazines, and research reports using PicoDet-S.</td>
+</tr>
+<tr>
+<td>PicoDet-L_layout_17cls</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PicoDet-L_layout_17cls_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PicoDet-L_layout_17cls_pretrained.pdparams">Training Model</a></td>
+<td>89.0</td>
+<td>13.50 / 4.69</td>
+<td>43.32 / 43.32</td>
+<td>22.6</td>
+<td>A balanced efficiency and precision layout area localization model trained on a self-built dataset of Chinese and English papers, magazines, and research reports using PicoDet-L.</td>
+</tr>
+<tr>
+<td>RT-DETR-H_layout_17cls</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/RT-DETR-H_layout_17cls_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/RT-DETR-H_layout_17cls_pretrained.pdparams">Training Model</a></td>
+<td>98.3</td>
+<td>115.29 / 104.09</td>
+<td>995.27 / 995.27</td>
+<td>470.2</td>
+<td>A high-precision layout area localization model trained on a self-built dataset of Chinese and English papers, magazines, and research reports using RT-DETR-H.</td>
+</tr>
+</tbody>
+</table>
+
+<p><b>Text Detection Module Models</b>:</p>
+<table>
+<thead>
+<tr>
+<th>Model</th><th>Model Download Link</th>
+<th>Detection Hmean (%)</th>
+<th>CPU Inference Time (ms)<br/>[Normal Mode / High-Performance Mode]</th>
+<th>CPU Inference Time (ms)<br/>[Normal Mode / High-Performance Mode]</th>
+<th>Model Size (M)</th>
+<th>Description</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td>PP-OCRv4_server_det</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-OCRv4_server_det_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv4_server_det_pretrained.pdparams">Trained Model</a></td>
+<td>82.69</td>
+<td>83.34 / 80.91</td>
+<td>442.58 / 442.58</td>
+<td>109</td>
+<td>PP-OCRv4's server-side text detection model, featuring higher accuracy, suitable for deployment on high-performance servers</td>
+</tr>
+<tr>
+<td>PP-OCRv4_mobile_det</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-OCRv4_mobile_det_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv4_mobile_det_pretrained.pdparams">Trained Model</a></td>
+<td>77.79</td>
+<td>8.79 / 3.13</td>
+<td>51.00 / 28.58</td>
+<td>4.7</td>
+<td>PP-OCRv4's mobile text detection model, optimized for efficiency, suitable for deployment on edge devices</td>
+</tr>
+</tbody>
+</table>
+
+<p><b>Text Recognition Module Models</b>:</p>
+<table>
+<tr>
+<th>Model</th><th>Model Download Link</th>
+<th>Recognition Avg Accuracy (%)</th>
+<th>CPU Inference Time (ms)<br/>[Normal Mode / High-Performance Mode]</th>
+<th>CPU Inference Time (ms)<br/>[Normal Mode / High-Performance Mode]</th>
+<th>Model Size (M)</th>
+<th>Description</th>
+</tr>
+<tr>
+<td>PP-OCRv4_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-OCRv4_mobile_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv4_mobile_rec_pretrained.pdparams">Trained Model</a></td>
+<td>78.20</td>
+<td>4.82 / 4.82</td>
+<td>16.74 / 4.64</td>
+<td>10.6 M</td>
+<td rowspan="2">PP-OCRv4 is the next version of Baidu PaddlePaddle's self-developed text recognition model PP-OCRv3. By introducing data augmentation schemes and GTC-NRTR guidance branches, it further improves text recognition accuracy without compromising inference speed. The model offers both server (server) and mobile (mobile) versions to meet industrial needs in different scenarios.</td>
+</tr>
+<tr>
+<td>PP-OCRv4_server_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-OCRv4_server_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv4_server_rec_pretrained.pdparams">Trained Model</a></td>
+<td>79.20</td>
+<td>6.58 / 6.58</td>
+<td>33.17 / 33.17</td>
+<td>71.2 M</td>
+</tr>
+</table>
+
+<table>
+<tr>
+<th>Model</th><th>Model Download Link</th>
+<th>Recognition Avg Accuracy (%)</th>
+<th>CPU Inference Time (ms)<br/>[Normal Mode / High-Performance Mode]</th>
+<th>CPU Inference Time (ms)<br/>[Normal Mode / High-Performance Mode]</th>
+<th>Model Size (M)</th>
+<th>Description</th>
+</tr>
+<tr>
+<td>ch_SVTRv2_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/ch_SVTRv2_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/ch_SVTRv2_rec_pretrained.pdparams">Trained Model</a></td>
+<td>68.81</td>
+<td>8.08 / 8.08</td>
+<td>50.17 / 42.50</td>
+<td>73.9 M</td>
+<td rowspan="1">
+SVTRv2 is a server-side text recognition model developed by the OpenOCR team at the Vision and Learning Lab (FVL) of Fudan University. It won the first prize in the OCR End-to-End Recognition Task of the PaddleOCR Algorithm Model Challenge, with a 6% improvement in end-to-end recognition accuracy compared to PP-OCRv4 on the A-list.
+</td>
+</tr>
+</table>
+
+<table>
+<tr>
+<th>Model</th><th>Model Download Link</th>
+<th>Recognition Avg Accuracy (%)</th>
+<th>CPU Inference Time (ms)<br/>[Normal Mode / High-Performance Mode]</th>
+<th>CPU Inference Time (ms)<br/>[Normal Mode / High-Performance Mode]</th>
+<th>Model Size (M)</th>
+<th>Description</th>
+</tr>
+<tr>
+<td>ch_RepSVTR_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/ch_RepSVTR_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/ch_RepSVTR_rec_pretrained.pdparams">Trained Model</a></td>
+<td>65.07</td>
+<td>5.93 / 5.93</td>
+<td>20.73 / 7.32</td>
+<td>22.1 M</td>
+<td rowspan="1">
+The RepSVTR text recognition model is a mobile-oriented text recognition model based on SVTRv2. It won the first prize in the OCR End-to-End Recognition Task of the PaddleOCR Algorithm Model Challenge, with a 2.5% improvement in end-to-end recognition accuracy compared to PP-OCRv4 on the B-list, while maintaining similar inference speed.
+</td>
+</tr>
+</table>
+
+<p><b>Formula Recognition Module Models</b>:</p>
+<table>
+<thead>
+<tr>
+<th>Model Name</th><th>Model Download Link</th>
+<th>BLEU Score</th>
+<th>Normed Edit Distance</th>
+<th>ExpRate (%)</th>
+<th>CPU Inference Time (ms)<br/>[Normal Mode / High-Performance Mode]</th>
+<th>CPU Inference Time (ms)<br/>[Normal Mode / High-Performance Mode]</th>
+<th>Model Size</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td>LaTeX_OCR_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/LaTeX_OCR_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/LaTeX_OCR_rec_pretrained.pdparams">Trained Model</a></td>
+<td>0.8821</td>
+<td>0.0823</td>
+<td>40.01</td>
+<td>2047.13 / 2047.13</td>
+<td>10582.73 / 10582.73</td>
+<td>89.7 M</td>
+</tr>
+</tbody>
+</table>
+
+<p><b>Seal Text Detection Module Models</b>:</p>
+<table>
+<thead>
+<tr>
+<th>Model</th><th>Model Download Link</th>
+<th>Detection Hmean (%)</th>
+<th>CPU Inference Time (ms)<br/>[Normal Mode / High-Performance Mode]</th>
+<th>CPU Inference Time (ms)<br/>[Normal Mode / High-Performance Mode]</th>
+<th>Model Size (M)</th>
+<th>Description</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td>PP-OCRv4_server_seal_det</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-OCRv4_server_seal_det_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv4_server_seal_det_pretrained.pdparams">Trained Model</a></td>
+<td>98.21</td>
+<td>74.75 / 67.72</td>
+<td>382.55 / 382.55</td>
+<td>109</td>
+<td>PP-OCRv4's server-side seal text detection model, featuring higher accuracy, suitable for deployment on better-equipped servers</td>
+</tr>
+<tr>
+<td>PP-OCRv4_mobile_seal_det</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-OCRv4_mobile_seal_det_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv4_mobile_seal_det_pretrained.pdparams">Trained Model</a></td>
+<td>96.47</td>
+<td>7.82 / 3.09</td>
+<td>48.28 / 23.97</td>
+<td>4.6</td>
+<td>PP-OCRv4's mobile seal text detection model, offering higher efficiency, suitable for deployment on edge devices</td>
+</tr>
+</tbody>
+</table>
+
+**Test Environment Description**:
+
+- **Performance Test Environment**
+  - **Test Dataset**:
+    - Text Image Rectification Model: [DocUNet](https://www3.cs.stonybrook.edu/~cvl/docunet.html).
+    - Layout Region Detection Model: A self-built layout analysis dataset using PaddleOCR, containing 10,000 images of common document types such as Chinese and English papers, magazines, and research reports.
+    - Table Structure Recognition Model: A self-built English table recognition dataset using PaddleX.
+    - Text Detection Model: A self-built Chinese dataset using PaddleOCR, covering multiple scenarios such as street scenes, web images, documents, and handwriting, with 500 images for detection.
+    - Chinese Recognition Model: A self-built Chinese dataset using PaddleOCR, covering multiple scenarios such as street scenes, web images, documents, and handwriting, with 11,000 images for text recognition.
+    - ch_SVTRv2_rec: Evaluation set A for "OCR End-to-End Recognition Task" in the [PaddleOCR Algorithm Model Challenge](https://aistudio.baidu.com/competition/detail/1131/0/introduction).
+    - ch_RepSVTR_rec: Evaluation set B for "OCR End-to-End Recognition Task" in the [PaddleOCR Algorithm Model Challenge](https://aistudio.baidu.com/competition/detail/1131/0/introduction).
+    - English Recognition Model: A self-built English dataset using PaddleX.
+    - Multilingual Recognition Model: A self-built multilingual dataset using PaddleX.
+    - Text Line Orientation Classification Model: A self-built dataset using PaddleX, covering various scenarios such as ID cards and documents, containing 1000 images.
+    - Seal Text Detection Model: A self-built dataset using PaddleX, containing 500 images of circular seal textures.
+  - **Hardware Configuration**:
+    - GPU: NVIDIA Tesla T4
+    - CPU: Intel Xeon Gold 6271C @ 2.60GHz
+    - Other Environments: Ubuntu 20.04 / cuDNN 8.6 / TensorRT 8.5.2.2
+
+- **Inference Mode Description**
+
+| Mode        | GPU Configuration                        | CPU Configuration | Acceleration Technology Combination                   |
+|-------------|----------------------------------------|-------------------|---------------------------------------------------|
+| Regular Mode| FP32 Precision / No TRT Acceleration   | FP32 Precision / 8 Threads | PaddleInference                                 |
+| High-Performance Mode | Optimal combination of pre-selected precision types and acceleration strategies | FP32 Precision / 8 Threads | Pre-selected optimal backend (Paddle/OpenVINO/TRT, etc.) |
+
+</details>
+
+## 2. Quick Start
+The pre-trained pipelines provided by PaddleX allow for quick experience of their effects. You can locally use Python to experience the effects of the PP-ChatOCRv4-doc pipeline.
+
+### 2.1 Local Experience
+Before using the PP-ChatOCRv4-doc pipeline locally, ensure you have completed the installation of the PaddleX wheel package according to the [PaddleX Local Installation Tutorial](../../../installation/installation_en.md).
+
+Before performing model inference, you first need to prepare the API key for the large language model. PP-ChatOCRv4 supports large model services on the [Baidu Cloud Qianfan Platform](https://console.bce.baidu.com/qianfan/ais/console/onlineService) or the locally deployed standard OpenAI interface. If using the Baidu Cloud Qianfan Platform, refer to [Authentication and Authorization](https://cloud.baidu.com/doc/WENXINWORKSHOP/s/Um2wxbaps_en) to obtain the API key. If using a locally deployed large model service, refer to the [PaddleNLP Large Model Deployment Documentation](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/llm) for deployment of the dialogue interface and vectorization interface for large models, and fill in the corresponding `base_url` and `api_key`. If you need to use a multimodal large model for data fusion, refer to the OpenAI service deployment in the [PaddleMIX Model Documentation](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/paddlemix/examples/ppdocbee) for multimodal large model deployment, and fill in the corresponding `base_url` and `api_key`.
+
+After updating the configuration file, you can complete quick inference using just a few lines of Python code. You can use the [test file](https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/vehicle_certificate-1.png) for testing:
+
+**Note**: If local deployment of a multimodal large model is restricted due to the local environment, you can comment out the lines containing the `mllm` variable in the code and only use the large language model for information extraction.
+
+```python
+from paddlex import create_pipeline
+
+pipeline = create_pipeline(pipeline="PP-ChatOCRv4-doc", initial_predictor=False)
+
+visual_predict_res = pipeline.visual_predict(input="vehicle_certificate-1.png",
+    use_doc_orientation_classify=False,
+    use_doc_unwarping=False,
+    use_common_ocr=True,
+    use_seal_recognition=True,
+    use_table_recognition=True)
+
+visual_info_list = []
+for res in visual_predict_res:
+    visual_info_list.append(res["visual_info"])
+    layout_parsing_result = res["layout_parsing_result"]
+
+vector_info = pipeline.build_vector(visual_info_list, flag_save_bytes_vector=True, retriever_config={
+    "module_name": "retriever",
+    "model_name": "embedding-v1",
+    "base_url": "https://qianfan.baidubce.com/v2",
+    "api_type": "qianfan",
+    "api_key": "api_key" # your api_key
+})
+mllm_predict_res = pipeline.mllm_pred(input="vehicle_certificate-1.png", key_list=["Driver Compartment Occupancy"], mllm_chat_bot_config={
+    "module_name": "chat_bot",
+    "model_name": "PP-DocBee",
+    "base_url": "http://172.0.0.1:8080/v1/chat/completions", # your local mllm service url
+    "api_type": "openai",
+    "api_key": "api_key" # your api_key
+})
+mllm_predict_info = mllm_predict_res["mllm_res"]
+chat_result = pipeline.chat(
+    key_list=["驾驶室准乘人数"],
+    visual_info_list=visual_info_list,
+    vector_info=vector_info,
+    mllm_predict_info=mllm_predict_info,
+    chat_bot_config={
+      "module_name": "chat_bot",
+      "model_name": "ernie-3.5-8k",
+      "base_url": "https://qianfan.baidubce.com/v2",
+      "api_type": "openai",
+      "api_key": "api_key" # your api_key
+    },
+    retriever_config={
+      "module_name": "retriever",
+      "model_name": "embedding-v1",
+      "base_url": "https://qianfan.baidubce.com/v2",
+      "api_type": "qianfan",
+      "api_key": "api_key" # your api_key
+    }
+)
+print(chat_result)
+
+```
+
+After running, the output result is as follows:
+
+```
+{'chat_res': {'驾驶室准乘人数': '2'}}
+```
+
+PP-ChatOCRv4 Prediction Process, API Description, and Output Description:
+
+<details><summary>(1) Instantiate the PP-ChatOCRv4 Pipeline Object by Calling the <code>create_pipeline</code> Method.</summary>
+
+The following are the parameter descriptions:
+
+<table>
+<thead>
+<tr>
+<th>Parameter</th>
+<th>Parameter Description</th>
+<th>Parameter Type</th>
+<th>Default Value</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td><code>pipeline</code></td>
+<td>The name of the pipeline or the path to the pipeline configuration file. If it is the name of the pipeline, it must be a pipeline supported by PaddleX.</td>
+<td><code>str</code></td>
+<td>None</td>
+</tr>
+<tr>
+<td><code>device</code></td>
+<td>The device for pipeline inference. Supports specifying specific GPU card numbers, such as "gpu:0", other hardware card numbers, such as "npu:0", and CPU as "cpu".</td>
+<td><code>str</code></td>
+<td><code>gpu</code></td>
+</tr>
+<tr>
+<td><code>use_hpip</code></td>
+<td>Whether to enable high-performance inference, which is only available if the pipeline supports it.</td>
+<td><code>bool</code></td>
+<td><code>False</code></td>
+</tr>
+<tr>
+<td><code>initial_predictor</code></td>
+<td>Whether to initialize the inference module (if <code>False</code>, it will be initialized when the relevant inference module is used for the first time).</td>
+<td><code>bool</code></td>
+<td><code>True</code></td>
+</tr>
+</tbody>
+</table>
+</details>
+
+<details><summary>(2) Call the <code>visual_predict()</code> Method of the PP-ChatOCRv4 Pipeline Object to Obtain Visual Prediction Results. This method returns a generator.</summary>
+
+The following are the parameters and descriptions of the `visual_predict()` method:
+
+<table>
+<thead>
+<tr>
+<th>Parameter</th>
+<th>Parameter Description</th>
+<th>Parameter Type</th>
+<th>Options</th>
+<th>Default Value</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td><code>input</code></td>
+<td>The data to be predicted, supporting multiple input types, required.</td>
+<td><code>Python Var|str|list</code></td>
+<td>
+<ul>
+  <li><b>Python Var</b>: Such as <code>numpy.ndarray</code> representing image data.</li>
+  <li><b>str</b>: Such as the local path of an image file or PDF file: <code>/root/data/img.jpg</code>; <b>URL link</b>, such as the network URL of an image file or PDF file: <a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/demo_paper.png">Example</a>; <b>Local directory</b>, which should contain images to be predicted, such as the local path: <code>/root/data/</code> (currently does not support prediction of PDF files in directories, PDF files need to be specified to the specific file path).</li>
+  <li><b>List</b>: List elements need to be of the above types, such as <code>[numpy.ndarray, numpy.ndarray]</code>, <code>["/root/data/img1.jpg", "/root/data/img2.jpg"]</code>, <code>["/root/data1", "/root/data2"]</code>.</li>
+</ul>
+</td>
+<td><code>None</code></td>
+</tr>
+<tr>
+<td><code>device</code></td>
+<td>The device for pipeline inference.</td>
+<td><code>str|None</code></td>
+<td>
+<ul>
+  <li><b>CPU</b>: Such as <code>cpu</code> to use CPU for inference;</li>
+  <li><b>GPU</b>: Such as <code>gpu:0</code> to use the first GPU for inference;</li>
+  <li><b>NPU</b>: Such as <code>npu:0</code> to use the first NPU for inference;</li>
+  <li><b>XPU</b>: Such as <code>xpu:0</code> to use the first XPU for inference;</li>
+  <li><b>MLU</b>: Such as <code>mlu:0</code> to use the first MLU for inference;</li>
+  <li><b>DCU</b>: Such as <code>dcu:0</code> to use the first DCU for inference;</li>
+  <li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline. During initialization, it will prioritize using the local GPU 0 device, and if not available, it will use the CPU device;</li>
+</ul>
+</td>
+<td><code>None</code></td>
+</tr>
+<tr>
+<td><code>use_doc_orientation_classify</code></td>
+<td>Whether to use the document orientation classification module.</td>
+<td><code>bool|None</code></td>
+<td>
+<ul>
+  <li><b>bool</b>: <code>True</code> or <code>False</code>;</li>
+  <li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline, initialized to <code>True</code>;</li>
+</ul>
+</td>
+<td><code>None</code></td>
+</tr>
+<tr>
+<td><code>use_doc_unwarping</code></td>
+<td>Whether to use the document distortion correction module.</td>
+<td><code>bool|None</code></td>
+<td>
+<ul>
+  <li><b>bool</b>: <code>True</code> or <code>False</code>;</li>
+  <li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline, initialized to <code>True</code>;</li>
+</ul>
+</td>
+<td><code>None</code></td>
+</tr>
+<tr>
+<td><code>use_textline_orientation</code></td>
+<td>Whether to use the text line orientation classification module.</td>
+<td><code>bool|None</code></td>
+<td>
+<ul>
+  <li><b>bool</b>: <code>True</code> or <code>False</code>;</li>
+  <li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline, initialized to <code>True</code>;</li>
+</ul>
+</td>
+<td><code>None</code></td>
+</tr>
+<tr>
+<td><code>use_general_ocr</code></td>
+<td>Whether to use the OCR sub-pipeline.</td>
+<td><code>bool|None</code></td>
+<td>
+<ul>
+  <li><b>bool</b>: <code>True</code> or <code>False</code>;</li>
+  <li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline, initialized to <code>True</code>;</li>
+</ul>
+</td>
+<td><code>None</code></td>
+</tr>
+<tr>
+<td><code>use_seal_recognition</code></td>
+<td>Whether to use the seal recognition sub-pipeline.</td>
+<td><code>bool|None</code></td>
+<td>
+<ul>
+  <li><b>bool</b>: <code>True</code> or <code>False</code>;</li>
+  <li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline, initialized to <code>True</code>;</li>
+</ul>
+</td>
+<td><code>None</code></td>
+</tr>
+<tr>
+<td><code>use_table_recognition</code></td>
+<td>Whether to use the table recognition sub-pipeline.</td>
+<td><code>bool|None</code></td>
+<td>
+<ul>
+  <li><b>bool</b>: <code>True</code> or <code>False</code>;</li>
+  <li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline, initialized to <code>True</code>;</li>
+</ul>
+</td>
+<td><code>None</code></td>
+</tr>
+<tr>
+<td><code>layout_threshold</code></td>
+<td>The score threshold for the layout model.</td>
+<td><code>float|dict|None</code></td>
+<td>
+<ul>
+  <li><b>float</b>: Any floating-point number between <code>0-1</code>;</li>
+  <li><b>dict</b>: <code>{0:0.1}</code> where the key is the category ID and the value is the threshold for that category;</li>
+  <li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline, initialized to <code>0.5</code>;</li>
+</ul>
+</td>
+<td><code>None</code></td>
+</tr>
+<tr>
+<td><code>layout_nms</code></td>
+<td>Whether to use NMS.</td>
+<td><code>bool|None</code></td>
+<td>
+<ul>
+  <li><b>bool</b>: <code>True</code> or <code>False</code>;</li>
+  <li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline, initialized to <code>True</code>;</li>
+</ul>
+</td>
+<td><code>None</code></td>
+</tr>
+<tr>
+<td><code>layout_unclip_ratio</code></td>
+<td>The expansion coefficient for layout detection.</td>
+<td><code>float|Tuple[float,float]|None</code></td>
+<td>
+<ul>
+  <li><b>float</b>: Any floating-point number greater than <code>0</code>;</li>
+  <li><b>Tuple[float,float]</b>: The expansion coefficients in the horizontal and vertical directions, respectively;</li>
+  <li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline, initialized to <code>1.0</code>;</li>
+</ul>
+</td>
+<td><code>None</code></td>
+</tr>
+<tr>
+<td><code>layout_merge_bboxes_mode</code></td>
+<td>The method for filtering overlapping bounding boxes.</td>
+<td><code>str|None</code></td>
+<td>
+<ul>
+  <li><b>str</b>: large, small, union. Respectively representing retaining the larger box, smaller box, or both when overlapping boxes are filtered.</li>
+  <li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline, initialized to <code>large</code>;</li>
+</ul>
+</td>
+<td><code>None</code></td>
+</tr>
+<tr>
+<td><code>text_det_limit_side_len</code></td>
+<td>The side length limit for text detection images.</td>
+<td><code>int|None</code></td>
+<td>
+<ul>
+  <li><b>int</b>: Any integer greater than <code>0</code>;</li>
+  <li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline, initialized to <code>960</code>;</li>
+</ul>
+</td>
+<td><code>None</code></td>
+</tr>
+<tr>
+<td><code>text_det_limit_type</code></td>
+<td>The type of side length limit for text detection images.</td>
+<td><code>str|None</code></td>
+<td>
+<ul>
+  <li><b>str</b>: Supports <code>min</code> and <code>max</code>, where <code>min</code> ensures that the shortest side of the image is not less than <code>det_limit_side_len</code>, and <code>max</code> ensures that the longest side of the image is not greater than <code>limit_side_len</code>.</li>
+  <li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline, initialized to <code>max</code>;</li>
+</ul>
+</td>
+<td><code>None</code></td>
+</tr>
+<tr>
+<td><code>text_det_thresh</code></td>
+<td>The pixel threshold for detection. In the output probability map, pixel points with scores greater than this threshold will be considered as text pixels.</td>
+<td><code>float|None</code></td>
+<td>
+<ul>
+    <li><b>float</b>: Any floating-point number greater than <code>0</code>.</li>
+    <li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline, initialized to <code>0.3</code>.</li>
+</ul>
+</td>
+<td><code>None</code></td>
+</tr>
+<tr>
+<td><code>text_det_box_thresh</code></td>
+<td>The bounding box threshold for detection. When the average score of all pixel points within the detection result bounding box is greater than this threshold, the result will be considered as a text region.</td>
+<td><code>float|None</code></td>
+<td>
+<ul>
+    <li><b>float</b>: Any floating-point number greater than <code>0</code>.</li>
+    <li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline, initialized to <code>0.6</code>.</li>
+</ul>
+</td>
+<td><code>None</code></td>
+</tr>
+<tr>
+<td><code>text_det_unclip_ratio</code></td>
+<td>The expansion coefficient for text detection. This method is used to expand the text region, and the larger the value, the larger the expansion area.</td>
+<td><code>float|None</code></td>
+<td>
+<ul>
+    <li><b>float</b>: Any floating-point number greater than <code>0</code>.</li>
+    <li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline, initialized to <code>2.0</code>.</li>
+</ul>
+</ul>
+</td>
+<td><code>None</code></td>
+</tr>
+<tr>
+<td><code>text_rec_score_thresh</code></td>
+<td>The text recognition threshold. Text results with scores greater than this threshold will be retained.</td>
+<td><code>float|None</code></td>
+<td>
+<ul>
+    <li><b>float</b>: Any floating-point number greater than <code>0</code>.</li>
+    <li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline, initialized to <code>0.0</code>. I.e., no threshold is set.</li>
+</ul>
+</ul>
+</td>
+<td><code>None</code></td>
+</tr>
+<tr>
+<td><code>seal_det_limit_side_len</code></td>
+<td>The side length limit for seal detection images.</td>
+<td><code>int|None</code></td>
+<td>
+<ul>
+  <li><b>int</b>: Any integer greater than <code>0</code>;</li>
+  <li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline, initialized to <code>960</code>;</li>
+</ul>
+</td>
+<td><code>None</code></td>
+</tr>
+<tr>
+<td><code>seal_det_limit_type</code></td>
+<td>The type of side length limit for seal detection images.</td>
+<td><code>str|None</code></td>
+<td>
+<ul>
+  <li><b>str</b>: Supports <code>min</code> and <code>max</code>, where <code>min</code> ensures that the shortest side of the image is not less than <code>det_limit_side_len</code>, and <code>max</code> ensures that the longest side of the image is not greater than <code>limit_side_len</code>.</li>
+  <li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline, initialized to <code>max</code>;</li>
+</ul>
+</td>
+<td><code>None</code></td>
+</tr>
+<tr>
+<td><code>seal_det_thresh</code></td>
+<td>The pixel threshold for detection. In the output probability map, pixel points with scores greater than this threshold will be considered as seal pixels.</td>
+<td><code>float|None</code></td>
+<td>
+<ul>
+    <li><b>float</b>: Any floating-point number greater than <code>0</code>.</li>
+    <li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline, initialized to <code>0.3</code>.</li>
+</ul>
+</td>
+<td><code>None</code></td>
+</tr>
+<tr>
+<td><code>seal_det_box_thresh</code></td>
+<td>The bounding box threshold for detection. When the average score of all pixel points within the detection result bounding box is greater than this threshold, the result will be considered as a seal region.</td>
+<td><code>float|None</code></td>
+<td>
+<ul>
+    <li><b>float</b>: Any floating-point number greater than <code>0</code>.</li>
+    <li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline, initialized to <code>0.6</code>.</li>
+</ul>
+</td>
+<td><code>None</code></td>
+</tr>
+<tr>
+<td><code>seal_det_unclip_ratio</code></td>
+<td>The expansion coefficient for seal detection. This method is used to expand the seal region, and the larger the value, the larger the expansion area.</td>
+<td><code>float|None</code></td>
+<td>
+<ul>
+    <li><b>float</b>: Any floating-point number greater than <code>0</code>.</li>
+    <li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline, initialized to <code>2.0</code>.</li>
+</ul>
+</ul>
+</td>
+<td><code>None</code></td>
+</tr>
+<tr>
+<td><code>seal_rec_score_thresh</code></td>
+<td>The seal recognition threshold. Text results with scores greater than this threshold will be retained.</td>
+<td><code>float|None</code></td>
+<td>
+<ul>
+    <li><b>float</b>: Any floating-point number greater than <code>0</code>.</li>
+    <li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline, initialized to <code>0.0</code>. I.e., no threshold is set.</li>
+</ul>
+</ul>
+</td>
+<td><code>None</code></td>
+</tr>
+</tbody>
+</table>
+</details>
+
+<details><summary>(3) Process the Visual Prediction Results.</summary>
+
+The prediction result for each sample is of `dict` type, containing two fields: `visual_info` and `layout_parsing_result`. You can obtain visual information through `visual_info` (including `normal_text_dict`, `table_text_list`, `table_html_list`, etc.), and place the information for each sample into the `visual_info_list` list, which will be fed into the large language model later.
+
+Of course, you can also obtain the layout parsing results through `layout_parsing_result`, which includes tables, text, images, and other content contained in the document or image. It supports operations such as printing, saving as an image, and saving as a `json` file:
+
+```python
+......
+for res in visual_predict_res:
+    visual_info_list.append(res["visual_info"])
+    layout_parsing_result = res["layout_parsing_result"]
+    layout_parsing_result.print()
+    layout_parsing_result.save_to_img("./output")
+    layout_parsing_result.save_to_json("./output")
+    layout_parsing_result.save_to_xlsx("./output")
+    layout_parsing_result.save_to_html("./output")
+......
+```
+
+<table>
+<thead>
+<tr>
+<th>Method</th>
+<th>Method Description</th>
+<th>Parameter</th>
+<th>Parameter Type</th>
+<th>Parameter Description</th>
+<th>Default Value</th>
+</tr>
+</thead>
+
+<tr>
+<td rowspan = "3"><code>print()</code></td>
+<td rowspan = "3">Prints the result to the terminal</td>
+<td><code>format_json</code></td>
+<td><code>bool</code></td>
+<td>Whether to format the output content with JSON indentation</td>
+<td><code>True</code></td>
+</tr>
+<tr>
+<td><code>indent</code></td>
+<td><code>int</code></td>
+<td>Specifies the indentation level to beautify the output JSON data for better readability, only valid when <code>format_json</code> is <code>True</code></td>
+<td>4</td>
+</tr>
+<tr>
+<td><code>ensure_ascii</code></td>
+<td><code>bool</code></td>
+<td>Controls whether to escape non-ASCII characters to Unicode. When set to <code>True</code>, all non-ASCII characters will be escaped; <code>False</code> retains the original characters, only valid when <code>format_json</code> is <code>True</code></td>
+<td><code>False</code></td>
+</tr>
+<tr>
+<td rowspan = "3"><code>save_to_json()</code></td>
+<td rowspan = "3">Saves the result as a json file</td>
+<td><code>save_path</code></td>
+<td><code>str</code></td>
+<td>The path to save the file. When it is a directory, the saved file name is consistent with the input file type</td>
+<td>N/A</td>
+</tr>
+<tr>
+<td><code>indent</code></td>
+<td><code>int</code></td>
+<td>Specifies the indentation level to beautify the output JSON data for better readability, only valid when <code>format_json</code> is <code>True</code></td>
+<td>4</td>
+</tr>
+<tr>
+<td><code>ensure_ascii</code></td>
+<td><code>bool</code></td>
+<td>Controls whether to escape non-ASCII characters to Unicode. When set to <code>True</code>, all non-ASCII characters will be escaped; <code>False</code> retains the original characters, only valid when <code>format_json</code> is <code>True</code></td>
+<td><code>False</code></td>
+</tr>
+<tr>
+<td><code>save_to_img()</code></td>
+<td>Saves the visual images of each intermediate module in png format</td>
+<td><code>save_path</code></td>
+<td><code>str</code></td>
+<td>The path to save the file, supports directory or file path</td>
+<td>N/A</td>
+</tr>
+<tr>
+<td><code>save_to_html()</code></td>
+<td>Saves the tables in the file as html files</td>
+<td><code>save_path</code></td>
+<td><code>str</code></td>
+<td>The path to save the file, supports directory or file path</td>
+<td>N/A</td>
+</tr>
+<tr>
+<td><code>save_to_xlsx()</code></td>
+<td>Saves the tables in the file as xlsx files</td>
+<td><code>save_path</code></td>
+<td><code>str</code></td>
+<td>The path to save the file, supports directory or file path</td>
+<td>N/A</td>
+</tr>
+</table>
+
+- Calling the `print()` method will print the results to the terminal. The content printed to the terminal is explained as follows:
+    - `input_path`: `(str)` The input path of the image to be predicted
+
+    - `page_index`: `(Union[int, None])` If the input is a PDF file, it indicates the current page number of the PDF; otherwise, it is `None`
+
+    - `model_settings`: `(Dict[str, bool])` Model parameters required for the pipeline
+
+        - `use_doc_preprocessor`: `(bool)` Controls whether to enable the document preprocessing pipeline
+        - `use_general_ocr`: `(bool)` Controls whether to enable the OCR pipeline
+        - `use_seal_recognition`: `(bool)` Controls whether to enable the seal recognition pipeline
+        - `use_table_recognition`: `(bool)` Controls whether to enable the table recognition pipeline
+        - `use_formula_recognition`: `(bool)` Controls whether to enable the formula recognition pipeline
+
+    - `parsing_res_list`: `(List[Dict])` A list of parsing results, each element is a dictionary, and the list order is the reading order after parsing.
+        - `block_bbox`: `(np.ndarray)` The bounding box of the layout area.
+        - `block_label`: `(str)` The label of the layout area, such as `text`, `table`, etc.
+        - `block_content`: `(str)` The content within the layout area.
+
+    - `overall_ocr_res`: `(Dict[str, Union[List[str], List[float], numpy.ndarray]])` A dictionary of global OCR results
+      - `input_path`: `(Union[str, None])` The image path accepted by the OCR pipeline, when the input is `numpy.ndarray`, it is saved as `None`
+      - `model_settings`: `(Dict)` Model configuration parameters for the OCR pipeline
+      - `dt_polys`: `(List[numpy.ndarray])` A list of polygon boxes for text detection. Each detection box is represented by a numpy array of 4 vertex coordinates, with a shape of (4, 2) and a data type of int16
+      - `dt_scores`: `(List[float])` A list of confidence scores for text detection boxes
+      - `text_det_params`: `(Dict[str, Dict[str, int, float]])` Configuration parameters for the text detection module
+        - `limit_side_len`: `(int)` The side length limit for image preprocessing
+        - `limit_type`: `(str)` The processing method for the side length limit
+        - `thresh`: `(float)` The confidence threshold for text pixel classification
+        - `box_thresh`: `(float)` The confidence threshold for text detection boxes
+        - `unclip_ratio`: `(float)` The inflation coefficient for text detection boxes
+        - `text_type`: `(str)` The type of text detection, currently fixed as "general"
+
+      - `text_type`: `(str)` The type of text detection, currently fixed as "general"
+      - `textline_orientation_angles`: `(List[int])` The prediction results of text line orientation classification. When enabled, it returns actual angle values (e.g., [0,0,1])
+      - `text_rec_score_thresh`: `(float)` The filtering threshold for text recognition results
+      - `rec_texts`: `(List[str])` A list of text recognition results, only including texts with confidence exceeding `text_rec_score```markdown
+- Calling the `save_to_json()` method will save the aforementioned content to the specified `save_path`. If a directory is specified, the save path will be `save_path/{your_img_basename}.json`. If a file is specified, it will be saved directly to that file. Since JSON files do not support saving numpy arrays, `numpy.array` types will be converted to list form.
+- Calling the `save_to_img()` method will save the visualization results to the specified `save_path`. If a directory is specified, the save path will be `save_path/{your_img_basename}_ocr_res_img.{your_img_extension}`. If a file is specified, it will be saved directly to that file. (Production pipelines often involve numerous result images, so it is not recommended to specify a specific file path directly, as multiple images will be overwritten, leaving only the last one.)
+
+In addition, it is also supported to obtain visualized images with results and prediction results through attributes, as detailed below:
+
+<table>
+<thead>
+<tr>
+<th>Attribute</th>
+<th>Attribute Description</th>
+</tr>
+</thead>
+<tr>
+<td rowspan="1"><code>json</code></td>
+<td rowspan="1">Obtain prediction results in <code>json</code> format</td>
+</tr>
+<tr>
+<td rowspan="2"><code>img</code></td>
+<td rowspan="2">Obtain visualized images in <code>dict</code> format</td>
+</tr>
+</table>
+
+- The prediction result obtained by the `json` attribute is data of type `dict`, with content consistent with that saved by calling the `save_to_json()` method.
+- The prediction result returned by the `img` attribute is data of type `dict`. The keys are `layout_det_res`, `overall_ocr_res`, `text_paragraphs_ocr_res`, `formula_res_region1`, `table_cell_img`, and `seal_res_region1`, with corresponding values being `Image.Image` objects: used for displaying visualized images of layout detection, OCR, OCR text paragraphs, formulas, tables, and seal results, respectively. If optional modules are not used, only `layout_det_res` will be included in the dictionary.
+</details>
+
+<details><summary>(4) Call the <code>build_vector()</code> method of the PP-ChatOCRv4 pipeline object to construct vectors for text content.</summary>
+
+Below are the parameters and their descriptions for the `build_vector()` method:
+
+<table>
+<thead>
+<tr>
+<th>Parameter</th>
+<th>Parameter Description</th>
+<th>Parameter Type</th>
+<th>Options</th>
+<th>Default Value</th>
+</tr>
+</thead>
+<tr>
+<td><code>visual_info</code></td>
+<td>Visual information, which can be a dictionary containing visual information or a list composed of such dictionaries</td>
+<td><code>list|dict</code></td>
+<td>
+<code>None</code>
+</td>
+<td><code>None</code></td>
+</tr>
+<tr>
+<td><code>min_characters</code></td>
+<td>Minimum number of characters</td>
+<td><code>int</code></td>
+<td>
+A positive integer greater than 0, determined based on the token length supported by the large language model
+</td>
+<td><code>3500</code></td>
+</tr>
+<tr>
+<td><code>block_size</code></td>
+<td>Chunk size for establishing a vector library for long text</td>
+<td><code>int</code></td>
+<td>
+A positive integer greater than 0, determined based on the token length supported by the large language model
+</td>
+<td><code>300</code></td>
+</tr>
+<tr>
+<td><code>flag_save_bytes_vector</code></td>
+<td>Whether to save text as a binary file</td>
+<td><code>bool</code></td>
+<td>
+<code>True|False</code>
+</td>
+<td><code>False</code></td>
+</tr>
+<tr>
+<td><code>retriever_config</code></td>
+<td>Configuration parameters for the vector retrieval large model, referring to the "LLM_Retriever" field in the configuration file</td>
+<td><code>dict</code></td>
+<td>
+<code>None</code>
+</td>
+<td><code>None</code></td>
+</tr>
+</table>
+
+This method returns a dictionary containing visual text information, with the following content:
+
+- `flag_save_bytes_vector`: `(bool)` Whether the result is saved as a binary file
+- `flag_too_short_text`: `(bool)` Whether the text length is less than the minimum number of characters
+- `vector`: `(str|list)` Binary content or text content of the text, depending on the values of `flag_save_bytes_vector` and `min_characters`. If `flag_save_bytes_vector=True` and the text length is greater than or equal to the minimum number of characters, binary content is returned; otherwise, the original text is returned.
+</details>
+
+<details><summary>(5) Call the <code>mllm_pred()</code> method of the PP-ChatOCRv4 pipeline object to obtain multimodal large model extraction results.</summary>
+
+Below are the parameters and their descriptions for the `mllm_pred()` method:
+
+<table>
+<thead>
+<tr>
+<th>Parameter</th>
+<th>Parameter Description</th>
+<th>Parameter Type</th>
+<th>Options</th>
+<th>Default Value</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td><code>input</code></td>
+<td>Data to be predicted, supporting multiple input types, required</td>
+<td><code>Python Var|str</code></td>
+<td>
+<ul>
+  <li><b>Python Var</b>: Such as <code>numpy.ndarray</code> representing image data</li>
+  <li><b>str</b>: Local path of an image file or a single-page PDF file, e.g., <code>/root/data/img.jpg</code>; <b>or URL link</b>, such as the network URL of an image file or a single-page PDF file: <a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/demo_paper.png">Example</a>;</li>
+</ul>
+</td>
+<td><code>None</code></td>
+</tr>
+<tr>
+<td><code>key_list</code></td>
+<td>A single key or a list of keys used to extract information</td>
+<td><code>Union[str, List[str]]</code></td>
+<td><code>None</code></td>
+<td><code>None</code></td>
+</tr>
+<tr>
+<td><code>mllm_chat_bot_config</code></td>
+<td>Configuration parameters for the multimodal large model, referring to the "MLLM_Chat" field in the configuration file</td>
+<td><code>dict</code></td>
+<td>
+<code>None</code>
+</td>
+<td><code>None</code></td>
+</tr>
+</tbody>
+</table>
+
+</details>
+
+<details><summary>(6) Call the <code>chat()</code> method of the PP-ChatOCRv4 pipeline object to extract key information.</summary>
+
+Below are the parameters and their descriptions for the `chat()` method:
+
+<table>
+<thead>
+<tr>
+<th>Parameter</th>
+<th>Parameter Description</th>
+<th>Parameter Type</th>
+<th>Options</th>
+<th>Default Value</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td><code>key_list</code></td>
+<td>A single key or a list of keys used to extract information</td>
+<td><code>Union[str, List[str]]</code></td>
+<td><code>None</code></td>
+<td><code>None</code></td>
+</tr>
+<tr>
+<td><code>visual_info</code></td>
+<td>Visual information results</td>
+<td><code>List[dict]</code></td>
+<td><code>None</code></td>
+<td><code>None</code></td>
+</tr>
+<tr>
+<td><code>use_vector_retrieval</code></td>
+<td>Whether to use vector retrieval</td>
+<td><code>bool</code></td>
+<td><code>True|False</code></td>
+<td><code>True</code></td>
+</tr>
+<tr>
+<td><code>vector_info</code></td>
+<td>Vector information for retrieval</td>
+<td><code>dict</code></td>
+<td><code>None</code></td>
+<td><code>None</code></td>
+</tr>
+<tr>
+<td><code>min_characters</code></td>
+<td>Minimum number of characters required</td>
+<td><code>int</code></td>
+<td>A positive integer greater than 0</td>
+<td><code>3500</code></td>
+</tr>
+<tr>
+<td><code>text_task_description</code></td>
+<td>Description of the text task</td>
+<td><code>str</code></td>
+<td><code>None</code></td>
+<td><code>None</code></td>
+</tr>
+<tr>
+<td><code>text_output_format</code></td>
+<td>Output format of the text result</td>
+<td><code>str</code></td>
+<td><code>None</code></td>
+<td><code>None</code></td>
+</tr>
+<tr>
+<td><code>text_rules_str</code></td>
+<td>Rules for generating text results</td>
+<td><code>str</code></td>
+<td><code>None</code></td>
+<td><code>None</code></td>
+</tr>
+<tr>
+<td><code>text_few_shot_demo_text_content</code></td>
+<td>Text content for few-shot demonstration</td>
+<td><code>str</code></td>
+<td><code>None</code></td>
+<td><code>None</code></td>
+</tr>
+<tr>
+<td><code>text_few_shot_demo_key_value_list</code></td>
+<td>Key-value list for few-shot demonstration</td>
+<td><code>str</code></td>
+<td><code>None</code></td>
+<td><code>None</code></td>
+</tr>
+<tr>
+<td><code>table_task_description</code></td>
+<td>Description of the table task</td>
+<td><code>str</code></td>
+<td><code>None</code></td>
+<td><code>None</code></td>
+</tr>
+<tr>
+<td><code>table_output_format</code></td>
+<td>Output format of the table result</td>
+<td><code>str</code></td>
+<td><code>None</code></td>
+<td><code>None</code></td>
+</tr>
+<tr>
+<td><code>table_rules_str</code></td>
+<td>Rules for generating table results</td>
+<td><code>str</code></td>
+<td><code>None</code></td>
+<td><code>None</code></td>
+</tr>
+<tr>
+<td><code>table_few_shot_demo_text_content</code></td>
+<td>Text content for table few-shot demonstration</td>
+<td><code>str</code></td>
+<td><code>None</code></td>
+<td><code>None</code></td>
+</tr>
+<tr>
+<td><code>table_few_shot_demo_key_value_list</code></td>
+<td>Key-value list for table few-shot demonstration</td>
+<td><code>str</code></td>
+<td><code>None</code></td>
+<td><code>None</code></td>
+</tr>
+<tr>
+<td><code>mllm_predict_info</code></td>
+<td>Results from the multimodal large language model</td>
+<td><code>dict</code></td>
+<td><code>None</code></td>
+<td><code>None</code></td>
+</tr>
+<tr>
+<td><code>mllm_integration_strategy</code></td>
+<td>Integration strategy for multimodal large language model and large language model data, supporting the use of either alone or the fusion of both results</td>
+<td><code>str</code></td>
+<td><code>"integration"</code></td>
+<td><code>"integration", "llm_only", and "mllm_only"</code></td>
+</tr>
+<tr>
+<td><code>chat_bot_config</code></td>
+<td>Configuration information for the large language model, with content referring to the "LLM_Chat" field in the pipeline configuration file</td>
+<td><code>dict</code></td>
+<td><code>None</code></td>
+<td><code>None</code></td>
+</tr>
+<tr>
+<td><code>retriever_config</code></td>
+<td>Configuration parameters for the vector retrieval large model, with content referring to the "LLM_Retriever" field in the configuration file</td>
+<td><code>dict</code></td>
+<td><code>None</code></td>
+<td><code>None</code></td>
+</tr>
+</tbody>
+</table>
+
+This method will print the results to the terminal. The content printed to the terminal is explained as follows:
+  - `chat_res`: `(dict)` The result of information extraction, which is a dictionary containing the keys to be extracted and their corresponding values.
+
+</details>
+
+## 3. Development Integration/Deployment
+If the pipeline meets your requirements for inference speed and accuracy in production, you can proceed directly with development integration/deployment.
+
+If you need to apply the pipeline directly in your Python project, you can refer to the sample code in [2.2 Local Experience](#22-local-experience).
+
+Additionally, PaddleX provides three other deployment methods, detailed as follows:
+
+🚀 **High-Performance Inference**: In actual production environments, many applications have stringent standards for the performance metrics of deployment strategies (especially response speed) to ensure efficient system operation and smooth user experience. To this end, PaddleX provides a high-performance inference plugin aimed at deeply optimizing model inference and pre/post-processing to significantly speed up the end-to-end process. For detailed instructions on high-performance inference, please refer to the [PaddleX High-Performance Inference Guide](../../../pipeline_deploy/high_performance_inference.md).
+
+☁️ **Service-Oriented Deployment**: Service-oriented deployment is a common deployment form in actual production environments. By encapsulating the inference functionality as a service, clients can access these services through network requests to obtain inference results. PaddleX supports multiple service-oriented deployment solutions for pipelines. For detailed instructions on service-oriented deployment, please refer to the [PaddleX Service-Oriented Deployment Guide](../../../pipeline_deploy/serving.md).
+
+Below are the API references for basic service-oriented deployment and multi-language service invocation examples:
+
+<details><summary>API Reference</summary>
+
+<p>For the main operations provided by the service:</p>
+<ul>
+<li>The HTTP request method is POST.</li>
+<li>Both the request body and response body are JSON data (JSON objects).</li>
+<li>When the request is successfully processed, the response status code is <code>200</code>, and the response body has the following attributes:</li>
+</ul>
+<table>
+<thead>
+<tr>
+<th>Name</th>
+<th>Type</th>
+<th>Meaning</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td><code>logId</code></td>
+<td><code>string</code></td>
+<td>UUID of the request.</td>
+</tr>
+<tr>
+<td><code>errorCode</code></td>
+<td><code>integer</code></td>
+<td>Error code. Fixed at <code>0</code>.</td>
+</tr>
+<tr>
+<td><code>errorMsg</code></td>
+<td><code>string</code></td>
+<td>Error description. Fixed at <code>"Success"</code>.</td>
+</tr>
+<tr>
+<td><code>result</code></td>
+<td><code>object</code></td>
+<td>Operation result.</td>
+</tr>
+</tbody>
+</table>
+<ul>
+<li>When the request is not successfully processed, the response body has the following attributes:</li>
+</ul>
+<table>
+<thead>
+<tr>
+<th>Name</th>
+<th>Type</th>
+<th>Meaning</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td><code>logId</code></td>
+<td><code>string</code></td>
+<td>UUID of the request.</td>
+</tr>
+<tr>
+<td><code>errorCode</code></td>
+<td><code>integer</code></td>
+<td>Error code. Same as the response status code.</td>
+</tr>
+<tr>
+<td><code>errorMsg</code></td>
+<td><code>string</code></td>
+<td>Error description.</td>
+</tr>
+</tbody>
+</table>
+<p>The main operations provided by the service are as follows:</p>
+<ul>
+<li><b><code>analyzeImages</code></b></li>
+</ul>
+<p>Uses computer vision models to analyze images, obtain OCR, table recognition results, etc., and extract key information from the images.</p>
+<p><code>POST /chatocr-visual</code></p>
+<ul>
+<li>Attributes of the request body:</li>
+</ul>
+<table>
+<thead>
+<tr>
+<th>Name</th>
+<th>Type</th>
+<th>Meaning</th>
+<th>Required</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td><code>file</code></td>
+<td><code>string</code></td>
+<td>URL of an image file or PDF file accessible to the server, or Base64 encoded result of the content of the above file types. For PDF files exceeding 10 pages, only the content of the first 10 pages will be used.</td>
+<td>Yes</td>
+</tr>
+<tr>
+<td><code>fileType</code></td>
+<td><code>integer</code></td>
+<td>File type. <code>0</code> represents a PDF file, <code>1</code> represents an image file. If this attribute is not present in the request body, the file type will be inferred based on the URL.</td>
+<td>No</td>
+</tr>
+<tr>
+<td><code>useImgOrientationCls</code></td>
+<td><code>boolean</code></td>
+<td>Whether to enable document image orientation classification. This feature is enabled by default.</td>
+<td>No</td>
+</tr>
+<tr>
+<td><code>useImgUnwarping</code></td>
+<td><code>boolean</code></td>
+<td>Whether to enable text image correction. This feature is enabled by default.</td>
+<td>No</td>
+</tr>
+<tr>
+<td><code>useSealTextDet</code></td>
+<td><code>boolean</code></td>
+<td>Whether to enable seal text detection. This feature is enabled by default.</td>
+<td>No</td>
+</tr>
+<tr>
+<td><code>inferenceParams</code></td>
+<td><code>object</code></td>
+<td>Inference parameters.</td>
+<td>No</td>
+</tr>
+</tbody>
+</table>
+<p>Attributes of <code>inferenceParams</code>:</p>
+<table>
+<thead>
+<tr>
+<th>Name</th>
+<th>Type</th>
+<th>Meaning</th>
+<th>Required</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td><code>maxLongSide</code></td>
+<td><code>integer</code></td>
+<td>During inference, if the length of the longer side of the input image to the text detection model is greater than <code>maxLongSide</code>, the image will be scaled so that the length of its longer side equals <code>maxLongSide</code>.</td>
+<td>No</td>
+</tr>
+</tbody>
+</table>
+<ul>
+<li>When the request is successfully processed, the <code>result</code> of the response body has the following attributes:</li>
+</ul>
+<table>
+<thead>
+<tr>
+<th>Name</th>
+<th>Type</th>
+<th>Meaning</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td><code>visualResults</code></td>
+<td><code>array</code></td>
+<td>Analysis results obtained using computer vision models. The array length is 1 (for image input) or the smaller value between the number of document pages and 10 (for PDF input). For PDF input, each element in the array represents the processing result of each page in the PDF file in sequence.</td>
+</tr>
+<tr>
+<td><code>visualInfo</code></td>
+<td><code>object</code></td>
+<td>Key information in the image, which can be used as input for other operations.</td>
+</tr>
+<tr>
+<td><code>dataInfo</code></td>
+<td><code>object</code></td>
+<td>Input data information.</td>
+</tr>
+</tbody>
+</table>
+<p>Each element in <code>visualResults</code> is an <code>object</code> with the following attributes:</p>
+<table>
+<thead>
+<tr>
+<th>Name</th>
+<th>Type</th>
+<th>Meaning</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td><code>texts</code></td>
+<td><code>array</code></td>
+<td>Text positions, contents, and scores.</td>
+</tr>
+<tr>
+<td><code>tables</code></td>
+<td><code>array</code></td>
+<td>Table positions and contents.</td>
+</tr>
+<tr>
+<td><code>inputImage</code></td>
+<td><code>string</code></td>
+<td>Input image. The image is in JPEG format and encoded using Base64.</td>
+</tr>
+<tr>
+<td><code>layoutImage</code></td>
+<td><code>string</code></td>
+<td>Detection result image of the layout area. The image is in JPEG format and encoded using Base64.</td>
+</tr>
+<tr>
+<td><code>ocrImage</code></td>
+<td><code>string</code></td>
+<td>OCR result image. The image is in JPEG format and encoded using Base64.</td>
+</tr>
+</tbody>
+</table>
+<p>Each element in <code>texts</code> is an <code>object</code> with the following attributes:</p>
+<table>
+<thead>
+<tr>
+<th>Name</th>
+<th>Type</th>
+<th>Meaning</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td><code>poly</code></td>
+<td><code>array</code></td>
+<td>Text position. The elements in the array are the vertex coordinates of the polygon enclosing the text in sequence.</td>
+</tr>
+<tr>
+<td><code>text</code></td>
+<td><code>string</code></td>
+<td>Text content.</td>
+</tr>
+<tr>
+<td><code>score</code></td>
+<td><code>number</code></td>
+<td>Text recognition score.</td>
+</tr>
+</tbody>
+</table>
+<p>Each element in <code>tables</code> is an <code>object</code> with the following attributes:</p>
+<table>
+<thead>
+<tr>
+<th>Name</th>
+<th>Type</th>
+<th>Meaning</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td><code>bbox</code></td>
+<td><code>array</code></td>
+<td>Table position. The elements in the array are the x-coordinate of the top-left corner, the y-coordinate of the top-left corner, the x-coordinate of the bottom-right corner, and the y-coordinate of the bottom-right corner of the bounding box in sequence.</td>
+</tr>
+<tr>
+<td><code>html</code></td>
+<td><code>string</code></td>
+<td>Table recognition result in HTML format.</td>
+</tr>
+</tbody>
+</table>
+<ul>
+<li><b><code>buildVectorStore</code></b></li>
+</ul>
+<p>Builds a vector database.</p>
+<p><code>POST /chatocr-vector</code></p>
+<ul>
+<li>Attributes of the request body:</li>
+</ul>
+<table>
+<thead>
+<tr>
+<th>Name</th>
+<th>Type</th>
+<th>Meaning</th>
+<th>Required</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td><code>visualInfo</code></td>
+<td><code>object</code></td>
+<td>Key information in the image. Provided by the <code>analyzeImages</code> operation.</td>
+<td>Yes</td>
+</tr>
+<tr>
+<td><code>minChars</code></td>
+<td><code>integer</code></td>
+<td>Minimum data length to enable the vector database.</td>
+<td>No</td>
+</tr>
+<tr>
+<td><code>llmRequestInterval</code></td>
+<td><code>number</code></td>
+<td>Interval time for calling the large language model API.</td>
+<td>No</td>
+</tr>
+<tr>
+<td><code>llmName</code></td>
+<td><code>string</code></td>
+<td>Name of the large language model.</td>
+<td>No</td>
+</tr>
+<tr>
+<td><code>llmParams</code></td>
+<td><code>object</code></td>
+<td>Parameters for the large language model API.</td>
+<td>No</td>
+</tr>
+</tbody>
+</table>
+<p>Currently, <code>llmParams</code> can take one of the following forms:</p>
+<pre><code class="language-json">{
+&quot;apiType&quot;: &quot;qianfan&quot;,
+&quot;apiKey&quot;: &quot;{API key of Qianfan Platform}&quot;,
+&quot;secretKey&quot;: &quot;{Secret key of Qianfan Platform}&quot;
+}
+</code></pre>
+<pre><code class="language-json">{
+&quot;apiType&quot;: &quot;aistudio&quot;,
+&quot;accessToken&quot;: &quot;{Access token of Baidu AIStudio Community}&quot;
+}
+</code></pre>
+<ul>
+<li>When the request is successfully processed, the <code>result</code> of the response body has the following attribute:</li>
+</ul>
+<table>
+<thead>
+<tr>
+<th>Name</th>
+<th>Type</th>
+<th>Meaning</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td><code>vectorStore</code></td>
+<td><code>string</code></td>
+<td>Serialized result of the vector database, which can be used as input for other operations.</td>
+</tr>
+</tbody>
+</table>
+<ul>
+<li><b><code>retrieveKnowledge</code></b></li>
+</ul>
+<p>Performs knowledge retrieval.</p>
+<p><code>POST /chatocr-retrieval</code></p>
+<ul>
+<li>Attributes of the request body:</li>
+</ul>
+<table>
+<thead>
+<tr>
+<th>Name</th>
+<th>Type</th>
+<th>Meaning</th>
+<th>Required</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td><code>keys</code></td>
+<td><code>array</code></td>
+<td>List of keywords.</td>
+<td>Yes</td>
+</tr>
+<tr>
+<td><code>vectorStore</code></td>
+<td><code>string</code></td>
+<td>Serialized result of the vector database. Provided by the <code>buildVectorStore</code> operation.</td>
+<td>Yes</td>
+</tr>
+<tr>
+<td><code>llmName</code></td>
+<td><code>string</code></td>
+<td>Name of the large language model.</td>
+<td>No</td>
+</tr>
+<tr>
+<td><code>llmParams</code></td>
+<td><code>object</code></td>
+<td>Parameters for the large language model API.</td>
+<td>No</td>
+</tr>
+</tbody>
+</table>
+<p>Currently, <code>llmParams</code> can take one of the following forms:</p>
+<pre><code class="language-json">{
+&quot;apiType&quot;: &quot;qianfan&quot;,
+&quot;apiKey&quot;: &quot;{API key of Qianfan Platform}&quot;,
+&quot;secretKey&quot;: &quot;{Secret key of Qianfan Platform}&quot;
+}
+</code></pre>
+<pre><code class="language-json">{
+&quot;apiType&quot;: &quot;aistudio&quot;,
+&quot;accessToken&quot;: &quot;{Access token of Baidu AIStudio Community}&quot;
+}
+</code></pre>
+<ul>
+<li>When the request is successfully processed, the <code>result</code> of the response body has the following attribute:</li>
+</ul>
+<table>
+<thead>
+<tr>
+<th>Name</th>
+<th>Type</th>
+<th>Meaning</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td><code>retrievalResult</code></td>
+<td><code>string</code></td>
+<td>Knowledge retrieval result, which can be used as input for other operations.</td>
+</tr>
+</tbody>
+</table>
+<ul>
+<li><b><code>chat</code></b></li>
+</ul>
+<p>Interacts with the large language model to extract key information using it.</p>
+<p><code>POST /chatocr-chat</code></p>
+<ul>
+<li>Attributes of the request body:</li>
+</ul>
+<table>
+<thead>
+<tr>
+<th>Name</th>
+<th>Type</th>
+<th>Meaning</th>
+<th>Required</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td><code>keys</code></td>
+<td><code>array</code></td>
+<td>List of keywords.</td>
+<td>Yes</td>
+</tr>
+<tr>
+<td><code>visualInfo</code></td>
+<td><code>object</code></td>
+<td>Key information in the image. Provided by the <code>analyzeImages</code> operation.</td>
+<td>Yes</td>
+</tr>
+<tr>
+<td><code>vectorStore</code></td>
+<td><code>string</code></td>
+<td>Serialized result of the vector database. Provided by the <code>buildVectorStore</code> operation.</td>
+<td>No</td>
+</tr>
+<tr>
+<td><code>retrievalResult</code></td>
+<td><code>string</code></td>
+<td>Knowledge retrieval result. Provided by the <code>retrieveKnowledge</code> operation.</td>
+<td>No</td>
+</tr>
+<tr>
+<td><code>taskDescription</code></td>
+<td><code>string</code></td>
+<td>Task description for prompts.</td>
+<td>No</td>
+</tr>
+<tr>
+<td><code>rules</code></td>
+<td><code>string</code></td>
+<td>Prompt rules. Used to customize information extraction rules, such as specifying the output format.</td>
+<td>No</td>
+</tr>
+<tr>
+<td><code>fewShot</code></td>
+<td><code>string</code></td>
+<td>Prompt examples.</td>
+<td>No</td>
+</tr>
+<tr>
+<td><code>llmName</code></td>
+<td><code>string</code></td>
+<td>Name of the large language model.</td>
+<td>No</td>
+</tr>
+<tr>
+<td><code>llmParams</code></td>
+<td><code>object</code></td>
+<td>Parameters for the large language model API.</td>
+<td>No</td>
+</tr>
+<tr>
+<td><code>returnPrompts</code></td>
+<td><code>boolean</code></td>
+<td>Whether to return the used prompts. Disabled by default.</td>
+<td>No</td>
+</tr>
+</tbody>
+</table>
+<p>Currently, <code>llmParams</code> can take one of the following forms:</p>
+<pre><code class="language-json">{
+&quot;apiType&quot;: &quot;qianfan&quot;,
+&quot;apiKey&quot;: &quot;{API key of Qianfan Platform}&quot;,
+&quot;secretKey&quot;: &quot;{Secret key of Qianfan Platform}&quot;
+}
+</code></pre>
+<pre><code class="language-json">{
+&quot;apiType&quot;: &quot;aistudio&quot;,
+&quot;accessToken&quot;: &quot;{Access token of Baidu AIStudio Community}&quot;
+}
+</code></pre>
+<ul>
+<li>When the request is successfully processed, the <code>result</code> of the response body has the following attributes:</li>
+</ul>
+<table>
+<thead>
+<tr>
+<th>Name</th>
+<th>Type</th>
+<th>Meaning</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td><code>chatResult</code></td>
+<td><code>object</code></td>
+<td>Key information extraction result.</td>
+</tr>
+<tr>
+<td><code>prompts</code></td>
+<td><code>object</code></td>
+<td>Used prompts.</td>
+</tr>
+</tbody>
+</table>
+<p>Attributes of <code>prompts</code>:</p>
+<table>
+<thead>
+<tr>
+<th>Name</th>
+<th>Type</th>
+<th>Meaning</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td><code>ocr</code></td>
+<td><code>array</code></td>
+<td>OCR prompts.</td>
+</tr>
+<tr>
+<td><code>table</code></td>
+<td><code>array</code></td>
+<td>Table prompts.</td>
+</tr>
+<tr>
+<td><code>html</code></td>
+<td><code>array</code></td>
+<td>HTML prompts.</td>
+</tr>
+</tbody>
+</table></details>
+
+<details><summary>Multi-language Service Invocation Examples</summary>
+
+<details>
+<summary>Python</summary>
+
+<pre><code class="language-python">import base64
+import pprint
+import sys
+
+import requests
+
+
+API_BASE_URL = "http://0.0.0.0:8080"
+API_KEY = "{API key of Qianfan Platform}"
+SECRET_KEY = "{Secret key of Qianfan Platform}"
+LLM_NAME = "ernie-3.5"
+LLM_PARAMS = {
+    "apiType": "qianfan",
+    "apiKey": API_KEY,
+    "secretKey": SECRET_KEY,
+}
+
+file_path = "./demo.jpg"
+keys = ["电话"]
+
+with open(file_path, "rb") as file:
+    file_bytes = file.read()
+    file_data = base64.b64encode(file_bytes).decode("ascii")
+
+payload = {
+    "file": file_data,
+    "fileType": 1,
+    "useImgOrientationCls": True,
+    "useImgUnwarping": True,
+    "useSealTextDet": True,
+}
+resp_visual = requests.post(url=f"{API_BASE_URL}/chatocr-visual", json=payload)
+if resp_visual.status_code != 200:
+    print(
+        f"Request to chatocr-visual failed with status code {resp_visual.status_code}.",
+        file=sys.stderr,
+    )
+    pprint.pp(resp_visual.json())
+    sys.exit(1)
+result_visual = resp_visual.json()["result"]
+
+for i, res in enumerate(result_visual["visualResults"]):
+    print("Texts:")
+    pprint.pp(res["texts"])
+    print("Tables:")
+    pprint.pp(res["tables"])
+    layout_img_path = f"layout_{i}.jpg"
+    with open(layout_img_path, "wb") as f:
+        f.write(base64.b64decode(res["layoutImage"]))
+    ocr_img_path = f"ocr_{i}.jpg"
+    with open(ocr_img_path, "wb") as f:
+        f.write(base64.b64decode(res["ocrImage"]))
+    print(f"Output images saved at {layout_img_path} and {ocr_img_path}")
+
+payload = {
+    "visualInfo": result_visual["visualInfo"],
+    "minChars": 200,
+    "llmRequestInterval": 1000,
+    "llmName": LLM_NAME,
+    "llmParams": LLM_PARAMS,
+}
+resp_vector = requests.post(url=f"{API_BASE_URL}/chatocr-vector", json=payload)
+if resp_vector.status_code != 200:
+    print(
+        f"Request to chatocr-vector failed with status code {resp_vector.status_code}.",
+        file=sys.stderr,
+    )
+    pprint.pp(resp_vector.json())
+    sys.exit(1)
+result_vector = resp_vector.json()["result"]
+
+payload = {
+    "keys": keys,
+    "vectorStore": result_vector["vectorStore"],
+    "llmName": LLM_NAME,
+    "llmParams": LLM_PARAMS,
+}
+resp_retrieval = requests.post(url=f"{API_BASE_URL}/chatocr-retrieval", json=payload)
+if resp_retrieval.status_code != 200:
+    print(
+        f"Request to chatocr-retrieval failed with status code {resp_retrieval.status_code}.",
+        file=sys.stderr,
+    )
+    pprint.pp(resp_retrieval.json())
+    sys.exit(1)
+result_retrieval = resp_retrieval.json()["result"]
+
+payload = {
+    "keys": keys,
+    "visualInfo": result_visual["visualInfo"],
+    "vectorStore": result_vector["vectorStore"],
+    "retrievalResult": result_retrieval["retrievalResult"],
+    "taskDescription": "",
+    "rules": "",
+    "fewShot": "",
+    "llmName": LLM_NAME,
+    "llmParams": LLM_PARAMS,
+    "returnPrompts": True,
+}
+resp_chat = requests.post(url=f"{API_BASE_URL}/chatocr-chat", json=payload)
+if resp_chat.status_code != 200:
+    print(
+        f"Request to chatocr-chat failed with status code {resp_chat.status_code}.",
+        file=sys.stderr,
+    )
+    pprint.pp(resp_chat.json())
+    sys.exit(1)
+result_chat = resp_chat.json()["result"]
+print("\nPrompts:")
+pprint.pp(result_chat["prompts"])
+print("Final result:")
+print(result_chat["chatResult"])
+</code></pre>
+
+<b>Note</b>: Please fill in your API key and secret key in `API_KEY` and `SECRET_KEY`.
+</details>
+</details>
+<br/>
+
+📱 **Edge Deployment**: Edge deployment is a method where computing and data processing functions are placed on the user's device itself. The device can directly process data without relying on remote servers. PaddleX supports deploying models on edge devices such as Android. For detailed instructions on edge deployment, please refer to the [PaddleX Edge Deployment Guide](../../../pipeline_deploy/edge_deploy.md).
+You can choose an appropriate deployment method for your pipeline based on your needs and proceed with subsequent AI application integration.
+``````markdown
+## 4. Custom Development
+If the default model weights provided by the Document Scene Information Extraction v4 Pipeline do not meet your expectations in terms of accuracy or speed in your specific scenario, you can try to further **fine-tune** the existing models using **data from your specific domain or application scenario** to enhance the recognition performance of the General Table Recognition Pipeline in your context.
+
+### 4.1 Model Fine-Tuning
+Since the Document Scene Information Extraction v4 Pipeline consists of several modules, suboptimal performance may stem from any of these modules. You can analyze cases with poor extraction results, identify which module is problematic through visual image inspection, and refer to the fine-tuning tutorial links in the table below for model fine-tuning.
+
+<table>
+  <thead>
+    <tr>
+      <th>Scenario</th>
+      <th>Module to Fine-Tune</th>
+      <th>Fine-Tuning Reference Link</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td>Inaccurate layout area detection, such as missed detection of seals or tables</td>
+      <td>Layout Area Detection Module</td>
+      <td><a href="../../../module_usage/tutorials/ocr_modules/layout_detection.en.md">Link</a></td>
+    </tr>
+    <tr>
+      <td>Inaccurate table structure recognition</td>
+      <td>Table Structure Recognition</td>
+      <td><a href="../../../module_usage/tutorials/ocr_modules/table_structure_recognition.en.md">Link</a></td>
+    </tr>
+    <tr>
+      <td>Missed detection of seal text</td>
+      <td>Seal Text Detection Module</td>
+      <td><a href="../../../module_usage/tutorials/ocr_modules/seal_text_detection.en.md">Link</a></td>
+    </tr>
+    <tr>
+      <td>Missed detection of text</td>
+      <td>Text Detection Module</td>
+      <td><a href="../../../module_usage/tutorials/ocr_modules/text_detection.en.md">Link</a></td>
+    </tr>
+    <tr>
+      <td>Inaccurate text content</td>
+      <td>Text Recognition Module</td>
+      <td><a href="../../../module_usage/tutorials/ocr_modules/text_recognition.en.md">Link</a></td>
+    </tr>
+    <tr>
+      <td>Inaccurate correction of vertical or rotated text lines</td>
+      <td>Text Line Orientation Classification Module</td>
+      <td><a href="../../../module_usage/tutorials/ocr_modules/textline_orientation_classification.en.md">Link</a></td>
+    </tr>
+    <tr>
+      <td>Inaccurate correction of overall image rotation</td>
+      <td>Document Image Orientation Classification Module</td>
+      <td><a href="../../../module_usage/tutorials/ocr_modules/doc_img_orientation_classification.en.md">Link</a></td>
+    </tr>
+    <tr>
+      <td>Inaccurate correction of image distortion</td>
+      <td>Text Image Rectification Module</td>
+      <td>Fine-tuning Not Supported Yet</td>
+    </tr>
+  </tbody>
+</table>
+
+### 4.2 Model Deployment
+After fine-tuning using your private dataset, you will obtain local model weights files.
+
+To use the fine-tuned model weights, you only need to modify the pipeline configuration file by replacing the path to the default model weights with the path to your fine-tuned model weights in the corresponding location:
+
+```yaml
+......
+SubModules:
+    TextDetection:
+    module_name: text_detection
+    model_name: PP-OCRv4_server_det
+    model_dir: null # Replace with the path to the fine-tuned text detection model weights
+    limit_side_len: 960
+    limit_type: max
+    thresh: 0.3
+    box_thresh: 0.6
+    unclip_ratio: 2.0
+
+    TextRecognition:
+    module_name: text_recognition
+    model_name: PP-OCRv4_server_rec
+    model_dir: null # Replace with the path to the fine-tuned text recognition model weights
+    batch_size: 1
+    score_thresh: 0
+......
+```
+
+Subsequently, refer to the command line method or Python script method in [2.2 Local Experience](#22-local-experience) to load the modified pipeline configuration file.
+
+## 5. Multi-Hardware Support
+PaddleX supports various mainstream hardware devices such as NVIDIA GPUs, Kunlun XPU, Ascend NPU, and Cambricon MLU, allowing seamless switching between different hardware **by simply setting the `device` parameter**.
+
+For example, when using the Document Scene Information Extraction v4 Pipeline, to change the running device from an NVIDIA GPU to an Ascend NPU, you only need to modify the `device` in the script to npu:
+
+```python
+from paddlex import create_pipeline
+pipeline = create_pipeline(
+    pipeline="PP-ChatOCRv4-doc",
+    device="npu:0" # gpu:0 --> npu:0
+    )
+```
+
+If you want to use the General Document Scene Information Extraction v4 Pipeline on more types of hardware, please refer to the [PaddleX Multi-Device Usage Guide](../../../other_devices_support/multi_devices_use_guide_en.md).

+ 362 - 162
docs/pipeline_usage/tutorials/information_extraction_pipelines/document_scene_information_extraction_v4.md

@@ -14,185 +14,190 @@ comments: true
 <b>如您更考虑模型精度,请选择精度较高的模型,如您更考虑模型推理速度,请选择推理速度较快的模型,如您更考虑模型存储大小,请选择存储大小较小的模型</b>。其中部分模型的 benchmark 如下:
 
 <details><summary> 👉模型列表详情</summary>
+<p><b>文档图像方向分类模块(可选):</b></p>
+<table>
+<thead>
+<tr>
+<th>模型</th><th>模型下载链接</th>
+<th>Top-1 Acc(%)</th>
+<th>GPU推理耗时(ms)<br/>[常规模式 / 高性能模式]</th>
+<th>CPU推理耗时(ms)<br/>[常规模式 / 高性能模式]</th>
+<th>模型存储大小(M)</th>
+<th>介绍</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td>PP-LCNet_x1_0_doc_ori</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-LCNet_x1_0_doc_ori_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-LCNet_x1_0_doc_ori_pretrained.pdparams">训练模型</a></td>
+<td>99.06</td>
+<td>2.31 / 0.43</td>
+<td>3.37 / 1.27</td>
+<td>7</td>
+<td>基于PP-LCNet_x1_0的文档图像分类模型,含有四个类别,即0度,90度,180度,270度</td>
+</tr>
+</tbody>
+</table>
+
+<p><b>文本图像矫正模块(可选):</b></p>
+<table>
+<thead>
+<tr>
+<th>模型</th><th>模型下载链接</th>
+<th>CER </th>
+<th>模型存储大小(M)</th>
+<th>介绍</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td>UVDoc</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/UVDoc_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/UVDoc_pretrained.pdparams">训练模型</a></td>
+<td>0.179</td>
+<td>30.3 M</td>
+<td>高精度文本图像矫正模型</td>
+</tr>
+</tbody>
+</table>
 
-<p><b>版面区域检测模块模型:</b></p>
+<p><b>版面区域检测模块模型(必选):</b></p>
 <table>
 <thead>
 <tr>
 <th>模型</th><th>模型下载链接</th>
 <th>mAP(0.5)(%)</th>
-<th>GPU推理耗时(ms)</th>
-<th>CPU推理耗时 (ms)</th>
+<th>GPU推理耗时(ms)<br/>[常规模式 / 高性能模式]</th>
+<th>CPU推理耗时(ms)<br/>[常规模式 / 高性能模式]</th>
 <th>模型存储大小(M)</th>
 <th>介绍</th>
 </tr>
 </thead>
 <tbody>
 <tr>
+<td>PP-DocLayout-L</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-DocLayout-L_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-DocLayout-L_pretrained.pdparams">训练模型</a></td>
+<td>90.4</td>
+<td>34.6244 / 10.3945</td>
+<td>510.57 / -</td>
+<td>123.76 M</td>
+<td>基于RT-DETR-L在包含中英文论文、杂志、合同、书本、试卷和研报等场景的自建数据集训练的高精度版面区域定位模型</td>
+</tr>
+<tr>
+<td>PP-DocLayout-M</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-DocLayout-M_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-DocLayout-M_pretrained.pdparams">训练模型</a></td>
+<td>75.2</td>
+<td>13.3259 / 4.8685</td>
+<td>44.0680 / 44.0680</td>
+<td>22.578</td>
+<td>基于PicoDet-L在包含中英文论文、杂志、合同、书本、试卷和研报等场景的自建数据集训练的精度效率平衡的版面区域定位模型</td>
+</tr>
+<tr>
+<td>PP-DocLayout-S</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-DocLayout-S_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-DocLayout-S_pretrained.pdparams">训练模型</a></td>
+<td>70.9</td>
+<td>8.3008 / 2.3794</td>
+<td>10.0623 / 9.9296</td>
+<td>4.834</td>
+<td>基于PicoDet-S在中英文论文、杂志、合同、书本、试卷和研报等场景上自建数据集训练的高效率版面区域定位模型</td>
+</tr>
+<tr>
 <td>PicoDet_layout_1x</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PicoDet_layout_1x_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PicoDet_layout_1x_pretrained.pdparams">训练模型</a></td>
 <td>86.8</td>
-<td>13.0</td>
-<td>91.3</td>
+<td>9.03 / 3.10</td>
+<td>25.82 / 20.70</td>
 <td>7.4</td>
 <td>基于PicoDet-1x在PubLayNet数据集训练的高效率版面区域定位模型,可定位包含文字、标题、表格、图片以及列表这5类区域</td>
 </tr>
 <tr>
 <td>PicoDet_layout_1x_table</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PicoDet_layout_1x_table_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PicoDet_layout_1x_table_pretrained.pdparams">训练模型</a></td>
 <td>95.7</td>
-<td>12.623</td>
-<td>90.8934</td>
+<td>8.02 / 3.09</td>
+<td>23.70 / 20.41</td>
 <td>7.4 M</td>
 <td>基于PicoDet-1x在自建数据集训练的高效率版面区域定位模型,可定位包含表格这1类区域</td>
 </tr>
 <tr>
 <td>PicoDet-S_layout_3cls</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PicoDet-S_layout_3cls_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PicoDet-S_layout_3cls_pretrained.pdparams">训练模型</a></td>
 <td>87.1</td>
-<td>13.5</td>
-<td>45.8</td>
+<td>8.99 / 2.22</td>
+<td>16.11 / 8.73</td>
 <td>4.8</td>
 <td>基于PicoDet-S轻量模型在中英文论文、杂志和研报等场景上自建数据集训练的高效率版面区域定位模型,包含3个类别:表格,图像和印章</td>
 </tr>
 <tr>
 <td>PicoDet-S_layout_17cls</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PicoDet-S_layout_17cls_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PicoDet-S_layout_17cls_pretrained.pdparams">训练模型</a></td>
 <td>70.3</td>
-<td>13.6</td>
-<td>46.2</td>
+<td>9.11 / 2.12</td>
+<td>15.42 / 9.12</td>
 <td>4.8</td>
 <td>基于PicoDet-S轻量模型在中英文论文、杂志和研报等场景上自建数据集训练的高效率版面区域定位模型,包含17个版面常见类别,分别是:段落标题、图片、文本、数字、摘要、内容、图表标题、公式、表格、表格标题、参考文献、文档标题、脚注、页眉、算法、页脚、印章</td>
 </tr>
 <tr>
 <td>PicoDet-L_layout_3cls</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PicoDet-L_layout_3cls_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PicoDet-L_layout_3cls_pretrained.pdparams">训练模型</a></td>
 <td>89.3</td>
-<td>15.7</td>
-<td>159.8</td>
+<td>13.05 / 4.50</td>
+<td>41.30 / 41.30</td>
 <td>22.6</td>
 <td>基于PicoDet-L在中英文论文、杂志和研报等场景上自建数据集训练的高效率版面区域定位模型,包含3个类别:表格,图像和印章</td>
 </tr>
 <tr>
 <td>PicoDet-L_layout_17cls</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PicoDet-L_layout_17cls_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PicoDet-L_layout_17cls_pretrained.pdparams">训练模型</a></td>
 <td>79.9</td>
-<td>17.2</td>
-<td>160.2</td>
+<td>13.50 / 4.69</td>
+<td>43.32 / 43.32</td>
 <td>22.6</td>
 <td>基于PicoDet-L在中英文论文、杂志和研报等场景上自建数据集训练的高效率版面区域定位模型,包含17个版面常见类别,分别是:段落标题、图片、文本、数字、摘要、内容、图表标题、公式、表格、表格标题、参考文献、文档标题、脚注、页眉、算法、页脚、印章</td>
 </tr>
 <tr>
 <td>RT-DETR-H_layout_3cls</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/RT-DETR-H_layout_3cls_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/RT-DETR-H_layout_3cls_pretrained.pdparams">训练模型</a></td>
 <td>95.9</td>
-<td>114.6</td>
-<td>3832.6</td>
+<td>114.93 / 27.71</td>
+<td>947.56 / 947.56</td>
 <td>470.1</td>
 <td>基于RT-DETR-H在中英文论文、杂志和研报等场景上自建数据集训练的高精度版面区域定位模型,包含3个类别:表格,图像和印章</td>
 </tr>
 <tr>
 <td>RT-DETR-H_layout_17cls</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/RT-DETR-H_layout_17cls_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/RT-DETR-H_layout_17cls_pretrained.pdparams">训练模型</a></td>
 <td>92.6</td>
-<td>115.1</td>
-<td>3827.2</td>
+<td>115.29 / 104.09</td>
+<td>995.27 / 995.27</td>
 <td>470.2</td>
 <td>基于RT-DETR-H在中英文论文、杂志和研报等场景上自建数据集训练的高精度版面区域定位模型,包含17个版面常见类别,分别是:段落标题、图片、文本、数字、摘要、内容、图表标题、公式、表格、表格标题、参考文献、文档标题、脚注、页眉、算法、页脚、印章</td>
 </tr>
 </tbody>
 </table>
-<p><b>注:以上精度指标的评估集是 PaddleOCR 自建的版面区域分析数据集,包含中英文论文、杂志和研报等常见的 1w 张文档类型图片。GPU 推理耗时基于 NVIDIA Tesla T4 机器,精度类型为 FP32, CPU 推理速度基于 Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz,线程数为 8,精度类型为 FP32。</b></p>
 
-<p><b>表格结构识别模块模型:</b></p>
+<p><b>表格结构识别模块(可选):</b></p>
 <table>
 <tr>
 <th>模型</th><th>模型下载链接</th>
 <th>精度(%)</th>
-<th>GPU推理耗时 (ms)</th>
-<th>CPU推理耗时 (ms)</th>
+<th>GPU推理耗时(ms)<br/>[常规模式 / 高性能模式]</th>
+<th>CPU推理耗时(ms)<br/>[常规模式 / 高性能模式]</th>
 <th>模型存储大小 (M)</th>
 <th>介绍</th>
 </tr>
 <tr>
 <td>SLANet</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/SLANet_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/SLANet_pretrained.pdparams">训练模型</a></td>
 <td>59.52</td>
-<td>522.536</td>
-<td>1845.37</td>
+<td>103.08 / 103.08</td>
+<td>197.99 / 197.99</td>
 <td>6.9 M</td>
-<td rowspan="1">SLANet 是百度飞桨视觉团队自研的表格结构识别模型。该模型通过采用 CPU 友好型轻量级骨干网络 PP-LCNet、高低层特征融合模块 CSP-PAN、结构与位置信息对齐的特征解码模块 SLA Head,大幅提升了表格结构识别的精度和推理速度。</td>
+<td>SLANet 是百度飞桨视觉团队自研的表格结构识别模型。该模型通过采用CPU 友好型轻量级骨干网络PP-LCNet、高低层特征融合模块CSP-PAN、结构与位置信息对齐的特征解码模块SLA Head,大幅提升了表格结构识别的精度和推理速度。</td>
 </tr>
 <tr>
 <td>SLANet_plus</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/SLANet_plus_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/SLANet_plus_pretrained.pdparams">训练模型</a></td>
 <td>63.69</td>
-<td>522.536</td>
-<td>1845.37</td>
+<td>140.29 / 140.29</td>
+<td>195.39 / 195.39</td>
 <td>6.9 M</td>
-<td rowspan="1">SLANet_plus 是百度飞桨视觉团队自研的表格结构识别模型 SLANet 的增强版。相较于 SLANet,SLANet_plus 对无线表、复杂表格的识别能力得到了大幅提升,并降低了模型对表格定位准确性的敏感度,即使表格定位出现偏移,也能够较准确地进行识别。
-</td>
-</tr>
-<tr>
-<td>SLANeXt_wired</td>
-<td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/SLANeXt_wired_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/SLANeXt_wired_pretrained.pdparams">训练模型</a></td>
-<td rowspan="2">69.65</td>
-<td rowspan="2">--</td>
-<td rowspan="2">--</td>
-<td rowspan="2">--</td>
-<td rowspan="2">SLANeXt 系列是百度飞桨视觉团队自研的新一代表格结构识别模型。相较于 SLANet 和 SLANet_plus,SLANeXt 专注于对表格结构进行识别,并且对有线表格(wired)和无线表格(wireless)的识别分别训练了专用的权重,对各类型表格的识别能力都得到了明显提高,特别是对有线表格的识别能力得到了大幅提升。</td>
-</tr>
-<tr>
-<td>SLANeXt_wireless</td>
-<td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/SLANeXt_wireless_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/SLANeXt_wireless_pretrained.pdparams">训练模型</a></td>
-</tr>
-</table>
-<p><b>注:以上精度指标测量PaddleX 内部自建英文表格识别数据集。所有模型 GPU 推理耗时基于 NVIDIA Tesla T4 机器,精度类型为 FP32, CPU 推理速度基于 Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz,线程数为8,精度类型为 FP32。</b></p>
-
-<p><b>表格分类模块模型:</b></p>
-<table>
-<tr>
-<th>模型</th><th>模型下载链接</th>
-<th>Top1 Acc(%)</th>
-<th>GPU推理耗时 (ms)</th>
-<th>CPU推理耗时 (ms)</th>
-<th>模型存储大小 (M)</th>
-</tr>
-<tr>
-<td>PP-LCNet_x1_0_table_cls</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0b2/CLIP_vit_base_patch16_224_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-LCNet_x1_0_table_cls_pretrained.pdparams">训练模型</a></td>
-<td>--</td>
-<td>--</td>
-<td>--</td>
-<td>--</td>
-</tr>
-</table>
-<p><b>注:以上精度指标测量自 PaddleX 内部自建表格分类数据集。所有模型 GPU 推理耗时基于 NVIDIA Tesla T4 机器,精度类型为 FP32, CPU 推理速度基于 Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz,线程数为8,精度类型为 FP32。</b></p>
-
-<p><b>表格单元格检测模块模型:</b></p>
-<table>
-<tr>
-<th>模型</th><th>模型下载链接</th>
-<th>mAP(%)</th>
-<th>GPU推理耗时 (ms)</th>
-<th>CPU推理耗时 (ms)</th>
-<th>模型存储大小 (M)</th>
-<th>介绍</th>
-</tr>
-<tr>
-<td>RT-DETR-L_wired_table_cell_det</td>
-<td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/RT-DETR-L_wired_table_cell_det_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/RT-DETR-L_wired_table_cell_det_pretrained.pdparams">训练模型</a></td>
-<td rowspan="2">--</td>
-<td rowspan="2">--</td>
-<td rowspan="2">--</td>
-<td rowspan="2">--</td>
-<td rowspan="2">RT-DETR 是第一个实时的端到端目标检测模型。百度飞桨视觉团队基于 RT-DETR-L 作为基础模型,在自建表格单元格检测数据集上完成预训练,实现了对有线表格、无线表格均有较好性能的表格单元格检测。
-</td>
-</tr>
-<tr>
-<td>RT-DETR-L_wireless_table_cell_det</td>
-<td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/RT-DETR-L_wireless_table_cell_det_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/RT-DETR-L_wired_table_cell_det_pretrained.pdparams">训练模型</a></td>
+<td>SLANet_plus 是百度飞桨视觉团队自研的表格结构识别模型SLANet的增强版。相较于SLANet,SLANet_plus 对无线表、复杂表格的识别能力得到了大幅提升,并降低了模型对表格定位准确性的敏感度,即使表格定位出现偏移,也能够较准确地进行识别。</td>
 </tr>
 </table>
-<p><b>注:以上精度指标测量自 PaddleX 内部自建表格单元格检测数据集。所有模型 GPU 推理耗时基于 NVIDIA Tesla T4 机器,精度类型为 FP32, CPU 推理速度基于 Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz,线程数为8,精度类型为 FP32。</b></p>
 
-<p><b>文本检测模块模型:</b></p>
+<p><b>文本检测模块(必选):</b></p>
 <table>
 <thead>
 <tr>
 <th>模型</th><th>模型下载链接</th>
 <th>检测Hmean(%)</th>
-<th>GPU推理耗时(ms)</th>
-<th>CPU推理耗时 (ms)</th>
+<th>GPU推理耗时(ms)<br/>[常规模式 / 高性能模式]</th>
+<th>CPU推理耗时(ms)<br/>[常规模式 / 高性能模式]</th>
 <th>模型存储大小(M)</th>
 <th>介绍</th>
 </tr>
@@ -200,66 +205,99 @@ comments: true
 <tbody>
 <tr>
 <td>PP-OCRv4_server_det</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-OCRv4_server_det_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv4_server_det_pretrained.pdparams">训练模型</a></td>
-<td>82.69</td>
-<td>83.3501</td>
-<td>2434.01</td>
+<td>82.56</td>
+<td>83.34 / 80.91</td>
+<td>442.58 / 442.58</td>
 <td>109</td>
 <td>PP-OCRv4 的服务端文本检测模型,精度更高,适合在性能较好的服务器上部署</td>
 </tr>
 <tr>
 <td>PP-OCRv4_mobile_det</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-OCRv4_mobile_det_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv4_mobile_det_pretrained.pdparams">训练模型</a></td>
-<td>77.79</td>
-<td>10.6923</td>
-<td>120.177</td>
+<td>77.35</td>
+<td>8.79 / 3.13</td>
+<td>51.00 / 28.58</td>
 <td>4.7</td>
 <td>PP-OCRv4 的移动端文本检测模型,效率更高,适合在端侧设备部署</td>
 </tr>
+<tr>
+<td>PP-OCRv3_mobile_det</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-OCRv3_mobile_det_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv3_mobile_det_pretrained.pdparams">训练模型</a></td>
+<td>78.68</td>
+<td>8.44 / 2.91</td>
+<td>27.87 / 27.87</td>
+<td>2.1</td>
+<td>PP-OCRv3 的移动端文本检测模型,效率更高,适合在端侧设备部署</td>
+</tr>
+<tr>
+<td>PP-OCRv3_server_det</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-OCRv3_server_det_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv3_server_det_pretrained.pdparams">训练模型</a></td>
+<td>80.11</td>
+<td>65.41 / 13.67</td>
+<td>305.07 / 305.07</td>
+<td>102.1</td>
+<td>PP-OCRv3 的服务端文本检测模型,精度更高,适合在性能较好的服务器上部署</td>
+</tr>
 </tbody>
 </table>
-<p><b>注:以上精度指标的评估集是 PaddleOCR 自建的中文数据集,覆盖街景、网图、文档、手写多个场景,其中检测包含 500 张图片。以上所有模型 GPU 推理耗时基于 NVIDIA Tesla T4 机器,精度类型为 FP32, CPU 推理速度基于 Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz,线程数为8,精度类型为 FP32。</b></p>
 
-<p><b>文本识别模块模型:</b></p>
+* <b>中文识别模型</b>
 <table>
 <tr>
 <th>模型</th><th>模型下载链接</th>
 <th>识别 Avg Accuracy(%)</th>
-<th>GPU推理耗时(ms)</th>
-<th>CPU推理耗时 (ms)</th>
+<th>GPU推理耗时(ms)<br/>[常规模式 / 高性能模式]</th>
+<th>CPU推理耗时(ms)<br/>[常规模式 / 高性能模式]</th>
 <th>模型存储大小(M)</th>
 <th>介绍</th>
 </tr>
 <tr>
+<td>PP-OCRv4_server_rec_doc</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/\
+PP-OCRv4_server_rec_doc_infer.tar">推理模型</a>/<a href="">训练模型</a></td>
+<td>81.53</td>
+<td>6.65 / 6.65</td>
+<td>32.92 / 32.92</td>
+<td>74.7 M</td>
+<td>PP-OCRv4_server_rec_doc是在PP-OCRv4_server_rec的基础上,在更多中文文档数据和PP-OCR训练数据的混合数据训练而成,增加了部分繁体字、日文、特殊字符的识别能力,可支持识别的字符为1.5万+,除文档相关的文字识别能力提升外,也同时提升了通用文字的识别能力</td>
+</tr>
+<tr>
 <td>PP-OCRv4_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-OCRv4_mobile_rec_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv4_mobile_rec_pretrained.pdparams">训练模型</a></td>
-<td>78.20</td>
-<td>7.95018</td>
-<td>46.7868</td>
+<td>78.74</td>
+<td>4.82 / 4.82</td>
+<td>16.74 / 4.64</td>
 <td>10.6 M</td>
-<td rowspan="2">PP-OCRv4是百度飞桨视觉团队自研的文本识别模型PP-OCRv3的下一个版本,通过引入数据增强方案、GTC-NRTR指导分支等策略,在模型推理速度不变的情况下,进一步提升了文本识别精度。该模型提供了服务端(server)和移动端(mobile)两个不同版本,来满足不同场景下的工业需求。</td>
+<td>PP-OCRv4的轻量级识别模型,推理效率高,可以部署在包含端侧设备的多种硬件设备中</td>
 </tr>
 <tr>
 <td>PP-OCRv4_server_rec </td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-OCRv4_server_rec_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv4_server_rec_pretrained.pdparams">训练模型</a></td>
-<td>79.20</td>
-<td>7.19439</td>
-<td>140.179</td>
+<td>80.61 </td>
+<td>6.58 / 6.58</td>
+<td>33.17 / 33.17</td>
 <td>71.2 M</td>
+<td>PP-OCRv4的服务器端模型,推理精度高,可以部署在多种不同的服务器上</td>
+</tr>
+<tr>
+<td>PP-OCRv3_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/\
+PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模型</a></td>
+<td>72.96</td>
+<td>5.87 / 5.87</td>
+<td>9.07 / 4.28</td>
+<td>9.2 M</td>
+<td>PP-OCRv3的轻量级识别模型,推理效率高,可以部署在包含端侧设备的多种硬件设备中</td>
 </tr>
 </table>
 
-<p><b>注:以上精度指标的评估集是 PaddleOCR 自建的中文数据集,覆盖街景、网图、文档、手写多个场景,其中文本识别包含 1.1w 张图片。所有模型 GPU 推理耗时基于 NVIDIA Tesla T4 机器,精度类型为 FP32, CPU 推理速度基于 Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz,线程数为8,精度类型为 FP32。</b></p>
 <table>
 <tr>
 <th>模型</th><th>模型下载链接</th>
 <th>识别 Avg Accuracy(%)</th>
-<th>GPU推理耗时(ms)</th>
-<th>CPU推理耗时(ms)</th>
+<th>GPU推理耗时(ms)<br/>[常规模式 / 高性能模式]</th>
+<th>CPU推理耗时(ms)<br/>[常规模式 / 高性能模式]</th>
 <th>模型存储大小(M)</th>
 <th>介绍</th>
 </tr>
 <tr>
 <td>ch_SVTRv2_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/ch_SVTRv2_rec_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/ch_SVTRv2_rec_pretrained.pdparams">训练模型</a></td>
 <td>68.81</td>
-<td>8.36801</td>
-<td>165.706</td>
+<td>8.08 / 8.08</td>
+<td>50.17 / 42.50</td>
 <td>73.9 M</td>
 <td rowspan="1">
 SVTRv2 是一种由复旦大学视觉与学习实验室(FVL)的OpenOCR团队研发的服务端文本识别模型,其在PaddleOCR算法模型挑战赛 - 赛题一:OCR端到端识别任务中荣获一等奖,A榜端到端识别精度相比PP-OCRv4提升6%。
@@ -267,108 +305,270 @@ SVTRv2 是一种由复旦大学视觉与学习实验室(FVL)的OpenOCR团队
 </tr>
 </table>
 
-<p><b>注:以上精度指标的评估集是 <a href="https://aistudio.baidu.com/competition/detail/1131/0/introduction">PaddleOCR算法模型挑战赛 - 赛题一:OCR端到端识别任务</a>A榜。 所有模型 GPU 推理耗时基于 NVIDIA Tesla T4 机器,精度类型为 FP32, CPU 推理速度基于 Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz,线程数为8,精度类型为 FP32。</b></p>
 <table>
 <tr>
 <th>模型</th><th>模型下载链接</th>
 <th>识别 Avg Accuracy(%)</th>
-<th>GPU推理耗时(ms)</th>
-<th>CPU推理耗时(ms)</th>
+<th>GPU推理耗时(ms)<br/>[常规模式 / 高性能模式]</th>
+<th>CPU推理耗时(ms)<br/>[常规模式 / 高性能模式]</th>
 <th>模型存储大小(M)</th>
 <th>介绍</th>
 </tr>
 <tr>
 <td>ch_RepSVTR_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/ch_RepSVTR_rec_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/ch_RepSVTR_rec_pretrained.pdparams">训练模型</a></td>
 <td>65.07</td>
-<td>10.5047</td>
-<td>51.5647</td>
+<td>5.93 / 5.93</td>
+<td>20.73 / 7.32</td>
 <td>22.1 M</td>
 <td rowspan="1">    RepSVTR 文本识别模型是一种基于SVTRv2 的移动端文本识别模型,其在PaddleOCR算法模型挑战赛 - 赛题一:OCR端到端识别任务中荣获一等奖,B榜端到端识别精度相比PP-OCRv4提升2.5%,推理速度持平。</td>
 </tr>
 </table>
 
-<p><b>注:以上精度指标的评估集是 <a href="https://aistudio.baidu.com/competition/detail/1131/0/introduction">PaddleOCR算法模型挑战赛 - 赛题一:OCR端到端识别任务</a>B榜。 所有模型 GPU 推理耗时基于 NVIDIA Tesla T4 机器,精度类型为 FP32, CPU 推理速度基于 Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz,线程数为8,精度类型为 FP32。</b></p>
 
-<p><b>印章文本检测模块模型:</b></p>
+* <b>英文识别模型</b>
 <table>
-<thead>
 <tr>
 <th>模型</th><th>模型下载链接</th>
-<th>检测Hmean(%)</th>
-<th>GPU推理耗时(ms)</th>
-<th>CPU推理耗时 (ms)</th>
+<th>识别 Avg Accuracy(%)</th>
+<th>GPU推理耗时(ms)<br/>[常规模式 / 高性能模式]</th>
+<th>CPU推理耗时(ms)<br/>[常规模式 / 高性能模式]</th>
+<th>模型存储大小(M)</th>
+<th>介绍</th>
+</tr>
+<tr>
+<td>en_PP-OCRv4_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/\
+en_PP-OCRv4_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模型</a></td>
+<td> 70.39</td>
+<td>4.81 / 4.81</td>
+<td>16.10 / 5.31</td>
+<td>6.8 M</td>
+<td>基于PP-OCRv4识别模型训练得到的超轻量英文识别模型,支持英文、数字识别</td>
+</tr>
+<tr>
+<td>en_PP-OCRv3_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/\
+en_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模型</a></td>
+<td>70.69</td>
+<td>5.44 / 5.44</td>
+<td>8.65 / 5.57</td>
+<td>7.8 M </td>
+<td>基于PP-OCRv3识别模型训练得到的超轻量英文识别模型,支持英文、数字识别</td>
+</tr>
+</table>
+
+* <b>多语言识别模型</b>
+<table>
+<tr>
+<th>模型</th><th>模型下载链接</th>
+<th>识别 Avg Accuracy(%)</th>
+<th>GPU推理耗时(ms)<br/>[常规模式 / 高性能模式]</th>
+<th>CPU推理耗时(ms)<br/>[常规模式 / 高性能模式]</th>
+<th>模型存储大小(M)</th>
+<th>介绍</th>
+</tr>
+<tr>
+<td>korean_PP-OCRv3_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/\
+korean_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模型</a></td>
+<td>60.21</td>
+<td>5.40 / 5.40</td>
+<td>9.11 / 4.05</td>
+<td>8.6 M</td>
+<td>基于PP-OCRv3识别模型训练得到的超轻量韩文识别模型,支持韩文、数字识别</td>
+</tr>
+<tr>
+<td>japan_PP-OCRv3_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/\
+japan_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模型</a></td>
+<td>45.69</td>
+<td>5.70 / 5.70</td>
+<td>8.48 / 4.07</td>
+<td>8.8 M </td>
+<td>基于PP-OCRv3识别模型训练得到的超轻量日文识别模型,支持日文、数字识别</td>
+</tr>
+<tr>
+<td>chinese_cht_PP-OCRv3_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/\
+chinese_cht_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模型</a></td>
+<td>82.06</td>
+<td>5.90 / 5.90</td>
+<td>9.28 / 4.34</td>
+<td>9.7 M </td>
+<td>基于PP-OCRv3识别模型训练得到的超轻量繁体中文识别模型,支持繁体中文、数字识别</td>
+</tr>
+<tr>
+<td>te_PP-OCRv3_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/\
+te_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模型</a></td>
+<td>95.88</td>
+<td>5.42 / 5.42</td>
+<td>8.10 / 6.91</td>
+<td>7.8 M </td>
+<td>基于PP-OCRv3识别模型训练得到的超轻量泰卢固文识别模型,支持泰卢固文、数字识别</td>
+</tr>
+<tr>
+<td>ka_PP-OCRv3_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/\
+ka_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模型</a></td>
+<td>96.96</td>
+<td>5.25 / 5.25</td>
+<td>9.09 / 3.86</td>
+<td>8.0 M </td>
+<td>基于PP-OCRv3识别模型训练得到的超轻量卡纳达文识别模型,支持卡纳达文、数字识别</td>
+</tr>
+<tr>
+<td>ta_PP-OCRv3_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/\
+ta_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模型</a></td>
+<td>76.83</td>
+<td>5.23 / 5.23</td>
+<td>10.13 / 4.30</td>
+<td>8.0 M </td>
+<td>基于PP-OCRv3识别模型训练得到的超轻量泰米尔文识别模型,支持泰米尔文、数字识别</td>
+</tr>
+<tr>
+<td>latin_PP-OCRv3_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/\
+latin_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模型</a></td>
+<td>76.93</td>
+<td>5.20 / 5.20</td>
+<td>8.83 / 7.15</td>
+<td>7.8 M</td>
+<td>基于PP-OCRv3识别模型训练得到的超轻量拉丁文识别模型,支持拉丁文、数字识别</td>
+</tr>
+<tr>
+<td>arabic_PP-OCRv3_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/\
+arabic_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模型</a></td>
+<td>73.55</td>
+<td>5.35 / 5.35</td>
+<td>8.80 / 4.56</td>
+<td>7.8 M</td>
+<td>基于PP-OCRv3识别模型训练得到的超轻量阿拉伯字母识别模型,支持阿拉伯字母、数字识别</td>
+</tr>
+<tr>
+<td>cyrillic_PP-OCRv3_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/\
+cyrillic_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模型</a></td>
+<td>94.28</td>
+<td>5.23 / 5.23</td>
+<td>8.89 / 3.88</td>
+<td>7.9 M  </td>
+<td>基于PP-OCRv3识别模型训练得到的超轻量斯拉夫字母识别模型,支持斯拉夫字母、数字识别</td>
+</tr>
+<tr>
+<td>devanagari_PP-OCRv3_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/\
+devanagari_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模型</a></td>
+<td>96.44</td>
+<td>5.22 / 5.22</td>
+<td>8.56 / 4.06</td>
+<td>7.9 M</td>
+<td>基于PP-OCRv3识别模型训练得到的超轻量梵文字母识别模型,支持梵文字母、数字识别</td>
+</tr>
+</table>
+
+<p><b>文本行方向分类模块(可选):</b></p>
+<table>
+<thead>
+<tr>
+<th>模型</th>
+<th>模型下载链接</th>
+<th>Top-1 Acc(%)</th>
+<th>GPU推理耗时(ms)<br/>[常规模式 / 高性能模式]</th>
+<th>CPU推理耗时(ms)<br/>[常规模式 / 高性能模式]</th>
 <th>模型存储大小(M)</th>
 <th>介绍</th>
 </tr>
 </thead>
 <tbody>
 <tr>
-<td>PP-OCRv4_server_seal_det</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-OCRv4_server_seal_det_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv4_server_seal_det_pretrained.pdparams">训练模型</a></td>
-<td>98.21</td>
-<td>84.341</td>
-<td>2425.06</td>
-<td>109</td>
-<td>PP-OCRv4的服务端印章文本检测模型,精度更高,适合在较好的服务器上部署</td>
-</tr>
-<tr>
-<td>PP-OCRv4_mobile_seal_det</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-OCRv4_mobile_seal_det_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv4_mobile_seal_det_pretrained.pdparams">训练模型</a></td>
-<td>96.47</td>
-<td>10.5878</td>
-<td>131.813</td>
-<td>4.6</td>
-<td>PP-OCRv4的移动端印章文本检测模型,效率更高,适合在端侧部署</td>
+<td>PP-LCNet_x0_25_textline_ori</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-LCNet_x0_25_textline_ori_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-LCNet_x0_25_textline_ori_pretrained.pdparams">训练模型</a></td>
+<td>95.54</td>
+<td>-</td>
+<td>-</td>
+<td>0.32</td>
+<td>基于PP-LCNet_x0_25的文本行分类模型,含有两个类别,即0度,180度</td>
 </tr>
 </tbody>
 </table>
-<p><b>注:以上精度指标的评估集是自建的数据集,包含500张圆形印章图像。GPU 推理耗时基于 NVIDIA Tesla T4 机器,精度类型为 FP32, CPU 推理速度基于 Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz,线程数为 8,精度类型为 FP32。</b></p>
 
-<p><b>文本图像矫正模块模型:</b></p>
+<p><b>公式识别模块(可选):</b></p>
 <table>
 <thead>
 <tr>
 <th>模型</th><th>模型下载链接</th>
-<th>MS-SSIM (%)</th>
-<th>模型存储大小(M)</th>
-<th>介绍</th>
+<th>BLEU score</th>
+<th>normed edit distance</th>
+<th>ExpRate (%)</th>
+<th>GPU推理耗时(ms)<br/>[常规模式 / 高性能模式]</th>
+<th>CPU推理耗时(ms)<br/>[常规模式 / 高性能模式]</th>
+<th>模型存储大小</th>
 </tr>
 </thead>
 <tbody>
 <tr>
-<td>UVDoc</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/UVDoc_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/UVDoc_pretrained.pdparams">训练模型</a></td>
-<td>54.40</td>
-<td>30.3 M</td>
-<td>高精度文本图像矫正模型</td>
+<td>LaTeX_OCR_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/LaTeX_OCR_rec_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/LaTeX_OCR_rec_pretrained.pdparams">训练模型</a></td>
+<td>0.8821</td>
+<td>0.0823</td>
+<td>40.01</td>
+<td>2047.13 / 2047.13</td>
+<td>10582.73 / 10582.73</td>
+<td>89.7 M</td>
 </tr>
 </tbody>
 </table>
-<p><b>模型的精度指标测量自 <a href="https://www3.cs.stonybrook.edu/~cvl/docunet.html">DocUNet benchmark</a>。</b></p>
 
-<p><b>文档图像方向分类模块模型:</b></p>
+<p><b>印章文本检测模块(可选):</b></p>
 <table>
 <thead>
 <tr>
 <th>模型</th><th>模型下载链接</th>
-<th>Top-1 Acc(%)</th>
-<th>GPU推理耗时(ms)</th>
-<th>CPU推理耗时 (ms)</th>
+<th>检测Hmean(%)</th>
+<th>GPU推理耗时(ms)<br/>[常规模式 / 高性能模式]</th>
+<th>CPU推理耗时(ms)<br/>[常规模式 / 高性能模式]</th>
 <th>模型存储大小(M)</th>
 <th>介绍</th>
 </tr>
 </thead>
 <tbody>
 <tr>
-<td>PP-LCNet_x1_0_doc_ori</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-LCNet_x1_0_doc_ori_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-LCNet_x1_0_doc_ori_pretrained.pdparams">训练模型</a></td>
-<td>99.06</td>
-<td>3.84845</td>
-<td>9.23735</td>
-<td>7</td>
-<td>基于PP-LCNet_x1_0的文档图像分类模型,含有四个类别,即0度,90度,180度,270度</td>
+<td>PP-OCRv4_server_seal_det</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-OCRv4_server_seal_det_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv4_server_seal_det_pretrained.pdparams">训练模型</a></td>
+<td>98.21</td>
+<td>74.75 / 67.72</td>
+<td>382.55 / 382.55</td>
+<td>109</td>
+<td>PP-OCRv4的服务端印章文本检测模型,精度更高,适合在较好的服务器上部署</td>
+</tr>
+<tr>
+<td>PP-OCRv4_mobile_seal_det</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-OCRv4_mobile_seal_det_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv4_mobile_seal_det_pretrained.pdparams">训练模型</a></td>
+<td>96.47</td>
+<td>7.82 / 3.09</td>
+<td>48.28 / 23.97</td>
+<td>4.6</td>
+<td>PP-OCRv4的移动端印章文本检测模型,效率更高,适合在端侧部署</td>
 </tr>
 </tbody>
 </table>
-<p><b>注:以上精度指标的评估集是自建的数据集,覆盖证件和文档等多个场景,包含 1000 张图片。GPU 推理耗时基于 NVIDIA Tesla T4 机器,精度类型为 FP32, CPU 推理速度基于 Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz,线程数为 8,精度类型为 FP32。</b></p></details>
 
-<b></b>
+**测试环境说明:**
+
+- **性能测试环境**
+  - **测试数据集**:
+    - 文档图像方向分类模型:PaddleX 自建的数据集,覆盖证件和文档等多个场景,包含 1000 张图片。
+    - 文本图像矫正模型:<a href="https://www3.cs.stonybrook.edu/~cvl/docunet.html">DocUNet</a>。
+    - 版面区域检测模型:PaddleOCR 自建的版面区域分析数据集,包含中英文论文、杂志和研报等常见的 1w 张文档类型图片。
+    - 表格结构识别模型:PaddleX 内部自建英文表格识别数据集。
+    - 文本检测模型:PaddleOCR 自建的中文数据集,覆盖街景、网图、文档、手写多个场景,其中检测包含 500 张图片。
+    - 中文识别模型: PaddleOCR 自建的中文数据集,覆盖街景、网图、文档、手写多个场景,其中文本识别包含 1.1w 张图片。
+    - ch_SVTRv2_rec:<a href="https://aistudio.baidu.com/competition/detail/1131/0/introduction">PaddleOCR算法模型挑战赛 - 赛题一:OCR端到端识别任务</a>A榜评估集。
+    - ch_RepSVTR_rec:<a href="https://aistudio.baidu.com/competition/detail/1131/0/introduction">PaddleOCR算法模型挑战赛 - 赛题一:OCR端到端识别任务</a>B榜评估集。
+    - 英文识别模型:PaddleX 自建的英文数据集。
+    - 多语言识别模型:PaddleX 自建的多语种数据集。
+    - 文本行方向分类模型:PaddleX 自建的数据集,覆盖证件和文档等多个场景,包含 1000 张图片。
+    - 印章文本检测模型:PaddleX 自建的数据集,包含500张圆形印章图像。
+  - **硬件配置**:
+    - GPU:NVIDIA Tesla T4
+    - CPU:Intel Xeon Gold 6271C @ 2.60GHz
+    - 其他环境:Ubuntu 20.04 / cuDNN 8.6 / TensorRT 8.5.2.2
+
+- **推理模式说明**
+
+| 模式        | GPU配置                          | CPU配置          | 加速技术组合                                |
+|-------------|----------------------------------|------------------|---------------------------------------------|
+| 常规模式    | FP32精度 / 无TRT加速             | FP32精度 / 8线程       | PaddleInference                             |
+| 高性能模式  | 选择先验精度类型和加速策略的最优组合         | FP32精度 / 8线程       | 选择先验最优后端(Paddle/OpenVINO/TRT等) |
+
+</details>
 
 ## 2. 快速开始
 PaddleX 所提供的预训练的模型产线均可以快速体验效果,你可以在本地使用  Python 体验文档场景信息抽取v4产线的效果。

File diff suppressed because it is too large
+ 12 - 29
docs/pipeline_usage/tutorials/ocr_pipelines/layout_parsing.en.md


File diff suppressed because it is too large
+ 2 - 3
docs/pipeline_usage/tutorials/ocr_pipelines/layout_parsing.md


Some files were not shown because too many files changed in this diff