|
|
@@ -7,7 +7,7 @@ PP-ChatOCRv3-doc is a unique intelligent analysis solution for documents and ima
|
|
|
|
|
|

|
|
|
|
|
|
-The **PP-ChatOCRv3-doc** pipeline includes modules for **Table Structure Recognition, Layout Region Detection, Text Detection, Text Recognition, Seal Text Detection, Text Image Rectification, and Document Image Orientation Classification**.
|
|
|
+The **PP-ChatOCRv3-doc** pipeline includes modules for **Table Structure Recognition**, **Layout Region Detection**, **Text Detection**, **Text Recognition**, **Seal Text Detection**, **Text Image Rectification**, and **Document Image Orientation Classification**.
|
|
|
|
|
|
**If you prioritize model accuracy, choose a model with higher accuracy. If you prioritize inference speed, choose a model with faster inference speed. If you prioritize model storage size, choose a model with a smaller storage size.** Some benchmarks for these models are as follows:
|
|
|
|
|
|
@@ -186,18 +186,21 @@ A few lines of code are all you need to complete the quick inference of the pipe
|
|
|
```python
|
|
|
from paddlex import create_pipeline
|
|
|
|
|
|
-predict = create_pipeline(pipeline="PP-ChatOCRv3-doc",
|
|
|
- llm_name="ernie-3.5",
|
|
|
- llm_params={"api_type":"qianfan","ak":"","sk":""}) ## Please fill in your ak and sk, or you cannot call the large model
|
|
|
+pipeline = create_pipeline(
|
|
|
+ pipeline="PP-ChatOCRv3-doc",
|
|
|
+ llm_name="ernie-3.5",
|
|
|
+ llm_params={"api_type": "qianfan", "ak": "", "sk": ""} # Please fill in ak and sk, required for LLM.
|
|
|
+ )
|
|
|
|
|
|
-visual_result, visual_inf = predict(["contract.pdf"])
|
|
|
+visual_result, visual_info = pipeline.visual_predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/contract.pdf")
|
|
|
|
|
|
for res in visual_result:
|
|
|
res.save_to_img("./output")
|
|
|
res.save_to_html('./output')
|
|
|
res.save_to_xlsx('./output')
|
|
|
|
|
|
-print(predict.chat("乙方,手机号"))
|
|
|
+chat_result = pipeline.chat(["乙方", "手机号"])
|
|
|
+chat_result.print()
|
|
|
```
|
|
|
**Note**: Please first obtain your ak and sk on the [Baidu Cloud Qianfan Platform](https://console.bce.baidu.com/qianfan/ais/console/onlineService) (for detailed steps, please refer to the [AK and SK Authentication API Call Process](https://cloud.baidu.com/doc/WENXINWORKSHOP/s/Hlwerugt8)), and fill in your ak and sk to the specified locations to enable normal calls to the large model.
|
|
|
|
|
|
@@ -209,36 +212,43 @@ After running, the output is as follows:
|
|
|
|
|
|
In the above Python script, the following steps are executed:
|
|
|
|
|
|
-(1) Instantiate the `create_pipeline` to create a PP-ChatOCRv3-doc pipeline object: Specific parameter descriptions are as follows:
|
|
|
+(1) Call the `create_pipeline` to instantiate a PP-ChatOCRv3-doc pipeline object, related parameters descriptions are as follows:
|
|
|
|
|
|
-| Parameter | Description | Default | Type |
|
|
|
+| Parameter | Type | Default | Description |
|
|
|
|-|-|-|-|
|
|
|
-| `pipeline` | Pipeline name or pipeline configuration file path. If it's a pipeline name, it must be supported by PaddleX. | None | str |
|
|
|
-| `llm_name` | Large Language Model name | "ernie-3.5" | str |
|
|
|
-| `llm_params` | API configuration | {} | dict |
|
|
|
-| `device(kwargs)` | Running device (None for automatic adaptation) | None | str/None |
|
|
|
+| `pipeline` | str | None | Pipeline name or pipeline configuration file path. If it's a pipeline name, it must be supported by PaddleX; |
|
|
|
+| `llm_name` | str | "ernie-3.5" | Large Language Model name; |
|
|
|
+| `llm_params` | dict | `{}` | API configuration; |
|
|
|
+| `device(kwargs)` | str/`None` | `None` | Running device (`None` meaning automatic selection); |
|
|
|
|
|
|
-(2) Call the `predict` method of the PP-ChatOCRv3-doc pipeline object for inference prediction: The `predict` method parameter is `x`, used to input data to be predicted, supporting multiple input methods, as shown in the following examples:
|
|
|
+(2) Call the `visual_predict` of the PP-ChatOCRv3-doc pipeline object to visual predict, related parameters descriptions are as follows:
|
|
|
|
|
|
-| Parameter Type | Description |
|
|
|
-|-|-|
|
|
|
-| Python Var | Supports directly passing Python variables, such as numpy.ndarray representing image data; |
|
|
|
-| str | Supports passing the path of the file to be predicted, such as the local path of an image file: /root/data/img.jpg; |
|
|
|
-| str | Supports passing the URL of the file to be predicted, such as [example](https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/contract.pdf); |
|
|
|
-| str | Supports passing a local directory, which should contain files to be predicted, such as the local path: /root/data/; |
|
|
|
-| dict | Supports passing a dictionary type, where the key needs to correspond to the specific pipeline, such as "img
|
|
|
-
|
|
|
-(3) Obtain prediction results by calling the `predict` method: The `predict` method is a `generator`, so prediction results need to be obtained through calls. The `predict` method predicts data in batches, so the prediction results are represented as a list of prediction results.
|
|
|
-
|
|
|
-(4) Interact with the large model by calling the `predict.chat` method, which takes as input keywords (multiple keywords are supported) for information extraction. The prediction results are represented as a list of information extraction results.
|
|
|
+| Parameter | Type | Default | Description |
|
|
|
+|-|-|-|-|
|
|
|
+|`input`|Python Var|无|Support to pass Python variables directly, such as `numpy.ndarray` representing image data;|
|
|
|
+|`input`|str|无|Support to pass the path of the file to be predicted, such as the local path of an image file: `/root/data/img.jpg`;|
|
|
|
+|`input`|str|无|Support to pass the URL of the file to be predicted, such as: `https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/contract.pdf`;|
|
|
|
+|`input`|str|无|Support to pass the local directory, which should contain files to be predicted, such as: `/root/data/`;|
|
|
|
+|`input`|dict|无|Support to pass a dictionary, where the key needs to correspond to the specific pipeline, such as: `{"img": "/root/data1"}`;|
|
|
|
+|`input`|list|无|Support to pass a list, where the elements must be of the above types of data, such as: `[numpy.ndarray, numpy.ndarray]`,`["/root/data/img1.jpg", "/root/data/img2.jpg"]`,`["/root/data1", "/root/data2"]`,`[{"img": "/root/data1"}, {"img": "/root/data2/img.jpg"}]`;|
|
|
|
+|`use_doc_image_ori_cls_model`|bool|`True`|Whether or not to use the orientation classification model;|
|
|
|
+|`use_doc_image_unwarp_model`|bool|`True`|Whether or not to use the unwarp model;|
|
|
|
+|`use_seal_text_det_model`|bool|`True`|Whether or not to use the seal text detection model;|
|
|
|
+
|
|
|
+(3) Call the relevant functions of prediction object to save the prediction results. The related functions are as follows:
|
|
|
+
|
|
|
+|Function|Parameter|Description|
|
|
|
+|-|-|-|
|
|
|
+|`save_to_img`|`save_path`|Save OCR prediction results, layout results, and table recognition results as image files, with the parameter `save_path` used to specify the save path;|
|
|
|
+|`save_to_html`|`save_path`|Save the table recognition results as an HTML file, with the parameter 'save_path' used to specify the save path;|
|
|
|
+|`save_to_xlsx`|`save_path`|Save the table recognition results as an Excel file, with the parameter 'save_path' used to specify the save path;|
|
|
|
|
|
|
-(5) Process the prediction results: The prediction result for each sample is in the form of a dict, which supports printing or saving to a file. The supported file types depend on the specific pipeline, such as:
|
|
|
+(4) Call the `chat` of PP-ChatOCRv3-doc pipeline object to query information with LLM, related parameters are described as follows:
|
|
|
|
|
|
-| Method | Description | Method Parameters |
|
|
|
-|-|-|-|
|
|
|
-| save_to_img | Saves layout analysis, table recognition, etc. results as image files. | `save_path`: str, the file path to save. |
|
|
|
-| save_to_html | Saves table recognition results as HTML files. | `save_path`: str, the file path to save. |
|
|
|
-| save_to_xlsx | Saves table recognition results as Excel files. | `save_path`: str, the file path to save. |
|
|
|
+| Parameter | Type | Default | Description |
|
|
|
+|-|-|-|-|
|
|
|
+|`key_list`|str|-|Keywords used to query. A string composed of multiple keywords with "," as separators, such as "Party B, phone number";|
|
|
|
+|`key_list`|list|-|Keywords used to query. A list composed of multiple keywords.|
|
|
|
|
|
|
When executing the above command, the default Pipeline configuration file is loaded. If you need to customize the configuration file, you can use the following command to obtain it:
|
|
|
|
|
|
@@ -277,19 +287,18 @@ For example, if your configuration file is saved at `./my_path/PP-ChatOCRv3-doc.
|
|
|
|
|
|
```python
|
|
|
from paddlex import create_pipeline
|
|
|
-
|
|
|
-predict = create_pipeline(pipeline="./my_path/PP-ChatOCRv3-doc.yaml",
|
|
|
- llm_name="ernie-3.5",
|
|
|
- llm_params={"api_type":"qianfan","ak":"","sk":""} ) ## Please fill in your ak and sk, or you will not be able to call the large language model
|
|
|
-
|
|
|
-visual_result, visual_inf = predict(["contract.pdf"])
|
|
|
-
|
|
|
+pipeline = create_pipeline(
|
|
|
+ pipeline="./my_path/PP-ChatOCRv3-doc.yaml",
|
|
|
+ llm_name="ernie-3.5",
|
|
|
+ llm_params={"api_type": "qianfan", "ak": "", "sk": ""} # Please fill in ak and sk, required for LLM.
|
|
|
+ )
|
|
|
+visual_result, visual_info = pipeline.visual_predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/contract.pdf")
|
|
|
for res in visual_result:
|
|
|
res.save_to_img("./output")
|
|
|
res.save_to_html('./output')
|
|
|
res.save_to_xlsx('./output')
|
|
|
-
|
|
|
-print(predict.chat("乙方,手机号"))
|
|
|
+chat_result = pipeline.chat(["乙方", "手机号"])
|
|
|
+chat_result.print()
|
|
|
```
|
|
|
|
|
|
## 3. Development Integration/Deployment
|
|
|
@@ -590,16 +599,21 @@ Pipeline:
|
|
|
Subsequently, load the modified pipeline configuration file using the command-line interface or Python script as described in the local experience section.
|
|
|
|
|
|
## 5. Multi-hardware Support
|
|
|
-PaddleX supports various mainstream hardware devices such as NVIDIA GPUs, Kunlun XPU, Ascend NPU, and Cambricon MLU. **Seamless switching between different hardware can be achieved by simply setting the `--device` parameter**.
|
|
|
|
|
|
-For example, to perform inference using the PP-ChatOCRv3-doc Pipeline on an NVIDIA GPU.
|
|
|
-At this point, if you wish to switch the hardware to Ascend NPU, simply modify the `--device` in the script to `npu`:
|
|
|
+PaddleX supports various devices such as NVIDIA GPUs, Kunlun XPU, Ascend NPU, and Cambricon MLU. Only need to set the **`device` parameter** simply.
|
|
|
+
|
|
|
+例如,使用文档场景信息抽取v3产线时,将运行设备从英伟达 GPU 更改为昇腾 NPU,仅需将脚本中的 `device` 修改为 npu 即可:
|
|
|
+
|
|
|
+For example, when using the PP-ChatOCRv3-doc Pipeline, changing the running device from Nvidia GPU to Ascend NPU only requires modifying the `device`:
|
|
|
|
|
|
```python
|
|
|
from paddlex import create_pipeline
|
|
|
-predict = create_pipeline(pipeline="PP-ChatOCRv3-doc",
|
|
|
- llm_name="ernie-3.5",
|
|
|
- llm_params = {"api_type":"qianfan","ak":"","sk":""}, ## Please fill in your ak and sk, or you will not be able to call the large model
|
|
|
- device = "npu:0")
|
|
|
+predict = create_pipeline(
|
|
|
+ pipeline="PP-ChatOCRv3-doc",
|
|
|
+ llm_name="ernie-3.5",
|
|
|
+ llm_params={"api_type": "qianfan", "ak": "", "sk": ""}, # Please fill in ak and sk, required for LLM.
|
|
|
+ device="npu:0" # gpu:0 --> npu:0
|
|
|
+ )
|
|
|
```
|
|
|
+
|
|
|
If you want to use the PP-ChatOCRv3-doc Pipeline on more types of hardware, please refer to the [PaddleX Multi-Device Usage Guide](../../../installation/installation_other_devices_en.md).
|