浏览代码

fix chatocr api & doc (#2204)

Tingquan Gao 1 年之前
父节点
当前提交
661d826d4f

+ 63 - 57
docs/pipeline_usage/tutorials/information_extration_pipelines/document_scene_information_extraction.md

@@ -7,7 +7,7 @@
 
 ![](https://github.com/user-attachments/assets/90cb740b-7741-4383-bc4c-663f9d042d02)
 
-文档场景信息抽取v3**产线中包含表格结构识别模块、版面区域检测模块、文本检测模块、文本识别模块、印章文本检测模块、文本图像矫正模块、文档图像方向分类模块**。
+文档场景信息抽取v3产线中包含**表格结构识别模块****版面区域检测模块****文本检测模块****文本识别模块****印章文本检测模块****文本图像矫正模块****文档图像方向分类模块**。
 
 **如您更考虑模型精度,请选择精度较高的模型,如您更考虑模型推理速度,请选择推理速度较快的模型,如您更考虑模型存储大小,请选择存储大小较小的模型**。其中部分模型的 benchmark 如下:
 
@@ -189,18 +189,21 @@ PaddleX 所提供的预训练的模型产线均可以快速体验效果,你可
 ```python
 from paddlex import create_pipeline
 
-predict = create_pipeline( pipeline="PP-ChatOCRv3-doc",
-                            llm_name="ernie-3.5",
-                            llm_params = {"api_type":"qianfan","ak":"","sk":""} )  ## 请填入您的ak与sk,否则无法调用大模型
+pipeline = create_pipeline(
+    pipeline="PP-ChatOCRv3-doc",
+    llm_name="ernie-3.5",
+    llm_params={"api_type": "qianfan", "ak": "", "sk": ""} # 请填入您的ak与sk,否则无法调用大模型
+    )
 
-visual_result, visual_inf = predict(["contract.pdf"])
+visual_result, visual_info = pipeline.visual_predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/contract.pdf")
 
 for res in visual_result:
     res.save_to_img("./output")
     res.save_to_html('./output')
     res.save_to_xlsx('./output')
 
-print(predict.chat("乙方,手机号"))
+chat_result = pipeline.chat(["乙方", "手机号"])
+chat_result.print()
 ```
 **注**:请先在[百度云千帆平台](https://console.bce.baidu.com/qianfan/ais/console/onlineService)获取自己的ak与sk(详细流程请参考[AK和SK鉴权调用API流程](https://cloud.baidu.com/doc/WENXINWORKSHOP/s/Hlwerugt8)),将ak与sk填入至指定位置后才能正常调用大模型。
 
@@ -210,55 +213,61 @@ print(predict.chat("乙方,手机号"))
 {'chat_res': {'乙方': '股份测试有限公司', '手机号': '19331729920'}, 'prompt': ''}
 ```
 
-在上述 Python 脚本中,执行了如下个步骤:
+在上述 Python 脚本中,执行了如下个步骤:
 
-(1)实例化 `create_pipeline` 实例化文档场景信息抽取v3产线对象:具体参数说明如下:
+(1)调用 `create_pipeline` 方法实例化文档场景信息抽取v3产线对象,相关参数说明如下:
 
-|参数|参数说明|默认值|参数类型|
+|参数|参数类型|默认值|参数说明|
 |-|-|-|-|
-|`pipeline`|产线名称或是产线配置文件路径。如为产线名称,则必须为 PaddleX 所支持的产线。|无|str|
-|`llm_name`|大语言模型名称|"ernie-3.5"|str|
-|`llm_params`|api配置|{}|dict|
-|`device(kwargs)`|运行设备(None为自动适配)|None|str/None|
-
-(2)调用文档场景信息抽取v3产线对象的 `predict` 方法进行推理预测:`predict` 方法参数为`x`,用于输入待预测数据,支持多种输入方式,具体示例如下:
-
-|参数类型|参数说明|
-|-|-|
-|Python Var|支持直接传入Python变量,如numpy.ndarray表示的图像数据;|
-|str|支持传入待预测数据文件路径,如图像文件的本地路径:/root/data/img.jpg;|
-|str|支持传入待预测数据文件url,如[示例](https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/contract.pdf);|
-|str|支持传入本地目录,该目录下需包含待预测数据文件,如本地路径:/root/data/;|
-|dict|支持传入字典类型,字典的key需要与具体产线对应,如文档场景信息抽取v3产线为"img",字典的val支持上述类型数据,如:{"img": "/root/data1"};|
-|list|支持传入列表,列表元素需为上述类型数据,如[numpy.ndarray, numpy.ndarray, ],["/root/data/img1.jpg", "/root/data/img2.jpg", ],["/root/data1", "/root/data2", ],[{"img": "/root/data1"}, {"img": "/root/data2/img.jpg"}, ];|
-|use_oricls_model|是否使用方向分类模型 (默认是False)|
-|use_curve_model|是否使用弯曲文本检测产线 (默认是False)|
-|use_uvdoc_model|是否使用版面矫正产线 (默认是False)|
-
-(3)调用 `predict` 方法获取预测结果:`predict` 方法为`generator`,因此需要通过调用获得预测结果,`predict`方法以 batch 为单位对数据进行预测,因此预测结果为 list 形式表示的一组预测结果。
-
-(4)调用 `predict.chat` 方法与大模型进行交互,其传入参数为需要抽取信息的关键字(支持多个),因此预测结果为 list 形式表示的一组信息抽取结果。
-
-(5)对预测结果进行处理:每个样本的预测结果均为 dict 类型,且支持打印,或保存为文件,支持保存的类型与具体产线相关,如:
-|方法|说明|方法参数|
+|`pipeline`|str|无|产线名称或是产线配置文件路径,如为产线名称,则必须为 PaddleX 所支持的产线;|
+|`llm_name`|str|"ernie-3.5"|大语言模型名称;|
+|`llm_params`|dict|`{}`|LLM相关API配置;|
+|`device`|str、None|`None`|运行设备(`None`为自动适配);|
+
+(2)调用文档场景信息抽取v3产线对象的 `visual_predict` 方法进行视觉推理预测,相关参数说明如下:
+
+|参数|参数类型|默认值|参数说明|
+|-|-|-|-|
+|`input`|Python Var|无|用于输入待预测数据,支持直接传入Python变量,如`numpy.ndarray`表示的图像数据;|
+|`input`|str|无|用于输入待预测数据,支持传入待预测数据文件路径,如图像文件的本地路径:`/root/data/img.jpg`;|
+|`input`|str|无|用于输入待预测数据,支持传入待预测数据文件url,如`https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/contract.pdf`;|
+|`input`|str|无|用于输入待预测数据,支持传入本地目录,该目录下需包含待预测数据文件,如本地路径:`/root/data/`;|
+|`input`|dict|无|用于输入待预测数据,支持传入字典类型,字典的key需要与具体产线对应,如文档场景信息抽取v3产线为"img",字典的val支持上述类型数据,如:`{"img": "/root/data1"}`;|
+|`input`|list|无|用于输入待预测数据,支持传入列表,列表元素需为上述类型数据,如`[numpy.ndarray, numpy.ndarray]`,`["/root/data/img1.jpg", "/root/data/img2.jpg"]`,`["/root/data1", "/root/data2"]`,`[{"img": "/root/data1"}, {"img": "/root/data2/img.jpg"}]`;|
+|`use_doc_image_ori_cls_model`|bool|`True`|是否使用方向分类模型;|
+|`use_doc_image_unwarp_model`|bool|`True`|是否使用版面矫正产线;|
+|`use_seal_text_det_model`|bool|`True`|是否使用弯曲文本检测产线;|
+
+(3)调用视觉推理预测结果对象的相关方法对视觉推理预测结果进行保存,具体方法如下:
+
+|方法|参数|方法说明|
 |-|-|-|
-|save_to_img|将版面分析、表格识别等结果保存为图片格式的文件|`save_path`:str类型,保存的文件路径;|
-|save_to_html|将表格识别等结果保存为html格式的文件|`save_path`:str类型,保存的文件路径;|
-|save_to_xlsx|将表格识别等结果保存为表格格式的文件|`save_path`:str类型,保存的文件路径;|
+|`save_to_img`|`save_path`|将OCR预测结果、版面分析结果、表格识别结果保存为图片文件,参数`save_path`用于指定保存的路径;|
+|`save_to_html`|`save_path`|将表格识别结果保存为html文件,参数`save_path`用于指定保存的路径;|
+|`save_to_xlsx`|`save_path`|将表格识别结果保存为xlsx文件,参数`save_path`用于指定保存的路径;|
+
+(4)调用文档场景信息抽取v3产线对象的 `chat` 方法与大模型进行交互,相关参数说明如下:
+
+|参数|参数类型|默认值|参数说明|
+|-|-|-|-|
+|`key_list`|str|无|用于查询的关键字(query);支持“,”或“,”作为分隔符的多个关键字组成的字符串,如“乙方,手机号”;|
+|`key_list`|list|无|用于查询的关键字(query),支持`list`形式表示的一组关键字,其元素为`str`类型;|
 
 在执行上述 Python 脚本时,加载的是默认的文档场景信息抽取v3产线配置文件,若您需要自定义配置文件,可执行如下命令获取:
 
 ```
 paddlex --get_pipeline_config PP-ChatOCRv3-doc
 ```
+
 执行后,文档场景信息抽取v3产线配置文件将被保存在当前路径。若您希望自定义保存位置,可执行如下命令(假设自定义保存位置为 `./my_path` ):
 
 ```
 paddlex --get_pipeline_config PP-ChatOCRv3-doc --save_path ./my_path
 ```
+
 获取配置文件后,您即可对文档场景信息抽取v3产线各项配置进行自定义:
 
-```
+```yaml
 Pipeline:
   layout_model: RT-DETR-H_layout_3cls
   table_model: SLANet_plus
@@ -283,18 +292,21 @@ Pipeline:
 ```python
 from paddlex import create_pipeline
 
-predict = create_pipeline( pipeline="./my_path/PP-ChatOCRv3-doc.yaml",
-                            llm_name="ernie-3.5",
-                            llm_params = {"api_type":"qianfan","ak":"","sk":""} )  ## 请填入您的ak与sk,否则无法调用大模型
+pipeline = create_pipeline(
+    pipeline="./my_path/PP-ChatOCRv3-doc.yaml",
+    llm_name="ernie-3.5",
+    llm_params={"api_type": "qianfan", "ak": "", "sk": ""} # 请填入您的ak与sk,否则无法调用大模型
+    )
 
-visual_result, visual_inf = predict(["contract.pdf"])
+visual_result, visual_info = pipeline.visual_predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/contract.pdf")
 
 for res in visual_result:
     res.save_to_img("./output")
     res.save_to_html('./output')
     res.save_to_xlsx('./output')
 
-print(predict.chat("乙方,手机号"))
+chat_result = pipeline.chat(["乙方", "手机号"])
+chat_result.print()
 ```
 
 ## 3. 开发集成/部署
@@ -692,24 +704,18 @@ Pipeline:
 随后, 参考本地体验中的命令行方式或 Python 脚本方式,加载修改后的产线配置文件即可。
 
 ##  5. 多硬件支持
-PaddleX 支持英伟达 GPU、昆仑芯 XPU、昇腾 NPU和寒武纪 MLU 等多种主流硬件设备,**仅需设置 `--device` 参数**即可完成不同硬件之间的无缝切换。
+PaddleX 支持英伟达 GPU、昆仑芯 XPU、昇腾 NPU和寒武纪 MLU 等多种主流硬件设备,**仅需设置 `device` 参数**即可完成不同硬件之间的无缝切换。
 
-例如,您使用英伟达 GPU 进行文档场景信息抽取v3产线的推理,使用的 Python 脚本为
+例如,使用文档场景信息抽取v3产线时,将运行设备从英伟达 GPU 更改为昇腾 NPU,仅需将脚本中的 `device` 修改为 npu 即可
 
 ```python
 from paddlex import create_pipeline
-predict = create_pipeline( pipeline="PP-ChatOCRv3-doc",
-                            llm_name="ernie-3.5",
-                            llm_params = {"api_type":"qianfan","ak":"","sk":""},  ## 请填入您的ak与sk,否则无法调用大模型
-                            device = "gpu:0" )
+predict = create_pipeline(
+    pipeline="PP-ChatOCRv3-doc",
+    llm_name="ernie-3.5",
+    llm_params={"api_type": "qianfan", "ak": "", "sk": ""},  # 请填入您的ak与sk,否则无法调用大模型
+    device="npu:0" # gpu:0 --> npu:0
+    )
 ```
-此时,若您想将硬件切换为昇腾 NPU,仅需对脚本中的 `--device` 修改为 npu 即可:
 
-```python
-from paddlex import create_pipeline
-predict = create_pipeline( pipeline="PP-ChatOCRv3-doc",
-                            llm_name="ernie-3.5",
-                            llm_params = {"api_type":"qianfan","ak":"","sk":""},  ## 请填入您的ak与sk,否则无法调用大模型
-                            device = "npu:0" )
-```
 若您想在更多种类的硬件上使用通用文档场景信息抽取产线,请参考[PaddleX多硬件使用指南](../../../other_devices_support/installation_other_devices.md)。

+ 60 - 46
docs/pipeline_usage/tutorials/information_extration_pipelines/document_scene_information_extraction_en.md

@@ -7,7 +7,7 @@ PP-ChatOCRv3-doc is a unique intelligent analysis solution for documents and ima
 
 ![](https://github.com/user-attachments/assets/90cb740b-7741-4383-bc4c-663f9d042d02)
 
-The **PP-ChatOCRv3-doc** pipeline includes modules for **Table Structure Recognition, Layout Region Detection, Text Detection, Text Recognition, Seal Text Detection, Text Image Rectification, and Document Image Orientation Classification**.
+The **PP-ChatOCRv3-doc** pipeline includes modules for **Table Structure Recognition**, **Layout Region Detection**, **Text Detection**, **Text Recognition**, **Seal Text Detection**, **Text Image Rectification**, and **Document Image Orientation Classification**.
 
 **If you prioritize model accuracy, choose a model with higher accuracy. If you prioritize inference speed, choose a model with faster inference speed. If you prioritize model storage size, choose a model with a smaller storage size.** Some benchmarks for these models are as follows:
 
@@ -186,18 +186,21 @@ A few lines of code are all you need to complete the quick inference of the pipe
 ```python
 from paddlex import create_pipeline
 
-predict = create_pipeline(pipeline="PP-ChatOCRv3-doc",
-                          llm_name="ernie-3.5",
-                          llm_params={"api_type":"qianfan","ak":"","sk":""})  ## Please fill in your ak and sk, or you cannot call the large model
+pipeline = create_pipeline(
+    pipeline="PP-ChatOCRv3-doc",
+    llm_name="ernie-3.5",
+    llm_params={"api_type": "qianfan", "ak": "", "sk": ""} # Please fill in ak and sk, required for LLM.
+    )
 
-visual_result, visual_inf = predict(["contract.pdf"])
+visual_result, visual_info = pipeline.visual_predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/contract.pdf")
 
 for res in visual_result:
     res.save_to_img("./output")
     res.save_to_html('./output')
     res.save_to_xlsx('./output')
 
-print(predict.chat("乙方,手机号"))
+chat_result = pipeline.chat(["乙方", "手机号"])
+chat_result.print()
 ```
 **Note**: Please first obtain your ak and sk on the [Baidu Cloud Qianfan Platform](https://console.bce.baidu.com/qianfan/ais/console/onlineService) (for detailed steps, please refer to the [AK and SK Authentication API Call Process](https://cloud.baidu.com/doc/WENXINWORKSHOP/s/Hlwerugt8)), and fill in your ak and sk to the specified locations to enable normal calls to the large model.
 
@@ -209,36 +212,43 @@ After running, the output is as follows:
 
 In the above Python script, the following steps are executed:
 
-(1) Instantiate the `create_pipeline` to create a PP-ChatOCRv3-doc pipeline object: Specific parameter descriptions are as follows:
+(1) Call the `create_pipeline` to instantiate a PP-ChatOCRv3-doc pipeline object, related parameters descriptions are as follows:
 
-| Parameter | Description | Default | Type |
+| Parameter | Type | Default | Description |
 |-|-|-|-|
-| `pipeline` | Pipeline name or pipeline configuration file path. If it's a pipeline name, it must be supported by PaddleX. | None | str |
-| `llm_name` | Large Language Model name | "ernie-3.5" | str |
-| `llm_params` | API configuration | {} | dict |
-| `device(kwargs)` | Running device (None for automatic adaptation) | None | str/None |
+| `pipeline` | str | None | Pipeline name or pipeline configuration file path. If it's a pipeline name, it must be supported by PaddleX; |
+| `llm_name` | str | "ernie-3.5" | Large Language Model name; |
+| `llm_params` | dict | `{}` | API configuration; |
+| `device(kwargs)` | str/`None` | `None` | Running device (`None` meaning automatic selection); |
 
-(2) Call the `predict` method of the PP-ChatOCRv3-doc pipeline object for inference prediction: The `predict` method parameter is `x`, used to input data to be predicted, supporting multiple input methods, as shown in the following examples:
+(2) Call the `visual_predict` of the PP-ChatOCRv3-doc pipeline object to visual predict, related parameters descriptions are as follows:
 
-| Parameter Type | Description |
-|-|-|
-| Python Var | Supports directly passing Python variables, such as numpy.ndarray representing image data; |
-| str | Supports passing the path of the file to be predicted, such as the local path of an image file: /root/data/img.jpg; |
-| str | Supports passing the URL of the file to be predicted, such as [example](https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/contract.pdf); |
-| str | Supports passing a local directory, which should contain files to be predicted, such as the local path: /root/data/; |
-| dict | Supports passing a dictionary type, where the key needs to correspond to the specific pipeline, such as "img
-
-(3) Obtain prediction results by calling the `predict` method: The `predict` method is a `generator`, so prediction results need to be obtained through calls. The `predict` method predicts data in batches, so the prediction results are represented as a list of prediction results.
-
-(4) Interact with the large model by calling the `predict.chat` method, which takes as input keywords (multiple keywords are supported) for information extraction. The prediction results are represented as a list of information extraction results.
+| Parameter | Type | Default | Description |
+|-|-|-|-|
+|`input`|Python Var|无|Support to pass Python variables directly, such as `numpy.ndarray` representing image data;|
+|`input`|str|无|Support to pass the path of the file to be predicted, such as the local path of an image file: `/root/data/img.jpg`;|
+|`input`|str|无|Support to pass the URL of the file to be predicted, such as: `https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/contract.pdf`;|
+|`input`|str|无|Support to pass the local directory, which should contain files to be predicted, such as: `/root/data/`;|
+|`input`|dict|无|Support to pass a dictionary, where the key needs to correspond to the specific pipeline, such as: `{"img": "/root/data1"}`;|
+|`input`|list|无|Support to pass a list, where the elements must be of the above types of data, such as: `[numpy.ndarray, numpy.ndarray]`,`["/root/data/img1.jpg", "/root/data/img2.jpg"]`,`["/root/data1", "/root/data2"]`,`[{"img": "/root/data1"}, {"img": "/root/data2/img.jpg"}]`;|
+|`use_doc_image_ori_cls_model`|bool|`True`|Whether or not to use the orientation classification model;|
+|`use_doc_image_unwarp_model`|bool|`True`|Whether or not to use the unwarp model;|
+|`use_seal_text_det_model`|bool|`True`|Whether or not to use the seal text detection model;|
+
+(3) Call the relevant functions of prediction object to save the prediction results. The related functions are as follows:
+
+|Function|Parameter|Description|
+|-|-|-|
+|`save_to_img`|`save_path`|Save OCR prediction results, layout results, and table recognition results as image files, with the parameter `save_path` used to specify the save path;|
+|`save_to_html`|`save_path`|Save the table recognition results as an HTML file, with the parameter 'save_path' used to specify the save path;|
+|`save_to_xlsx`|`save_path`|Save the table recognition results as an Excel file, with the parameter 'save_path' used to specify the save path;|
 
-(5) Process the prediction results: The prediction result for each sample is in the form of a dict, which supports printing or saving to a file. The supported file types depend on the specific pipeline, such as:
+(4) Call the `chat` of PP-ChatOCRv3-doc pipeline object to query information with LLM, related parameters are described as follows:
 
-| Method | Description | Method Parameters |
-|-|-|-|
-| save_to_img | Saves layout analysis, table recognition, etc. results as image files. | `save_path`: str, the file path to save. |
-| save_to_html | Saves table recognition results as HTML files. | `save_path`: str, the file path to save. |
-| save_to_xlsx | Saves table recognition results as Excel files. | `save_path`: str, the file path to save. |
+| Parameter | Type | Default | Description |
+|-|-|-|-|
+|`key_list`|str|-|Keywords used to query. A string composed of multiple keywords with "," as separators, such as "Party B, phone number";|
+|`key_list`|list|-|Keywords used to query. A list composed of multiple keywords.|
 
 When executing the above command, the default Pipeline configuration file is loaded. If you need to customize the configuration file, you can use the following command to obtain it:
 
@@ -277,19 +287,18 @@ For example, if your configuration file is saved at `./my_path/PP-ChatOCRv3-doc.
 
 ```python
 from paddlex import create_pipeline
-
-predict = create_pipeline(pipeline="./my_path/PP-ChatOCRv3-doc.yaml",
-                          llm_name="ernie-3.5",
-                          llm_params={"api_type":"qianfan","ak":"","sk":""} )  ## Please fill in your ak and sk, or you will not be able to call the large language model
-
-visual_result, visual_inf = predict(["contract.pdf"])
-
+pipeline = create_pipeline(
+    pipeline="./my_path/PP-ChatOCRv3-doc.yaml",
+    llm_name="ernie-3.5",
+    llm_params={"api_type": "qianfan", "ak": "", "sk": ""} # Please fill in ak and sk, required for LLM.
+    )
+visual_result, visual_info = pipeline.visual_predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/contract.pdf")
 for res in visual_result:
     res.save_to_img("./output")
     res.save_to_html('./output')
     res.save_to_xlsx('./output')
-
-print(predict.chat("乙方,手机号"))
+chat_result = pipeline.chat(["乙方", "手机号"])
+chat_result.print()
 ```
 
 ## 3. Development Integration/Deployment
@@ -590,16 +599,21 @@ Pipeline:
 Subsequently, load the modified pipeline configuration file using the command-line interface or Python script as described in the local experience section.
 
 ## 5. Multi-hardware Support
-PaddleX supports various mainstream hardware devices such as NVIDIA GPUs, Kunlun XPU, Ascend NPU, and Cambricon MLU. **Seamless switching between different hardware can be achieved by simply setting the `--device` parameter**.
 
-For example, to perform inference using the PP-ChatOCRv3-doc Pipeline on an NVIDIA GPU.
-At this point, if you wish to switch the hardware to Ascend NPU, simply modify the `--device` in the script to `npu`:
+PaddleX supports various devices such as NVIDIA GPUs, Kunlun XPU, Ascend NPU, and Cambricon MLU. Only need to set the **`device` parameter** simply.
+
+例如,使用文档场景信息抽取v3产线时,将运行设备从英伟达 GPU 更改为昇腾 NPU,仅需将脚本中的 `device` 修改为 npu 即可:
+
+For example, when using the PP-ChatOCRv3-doc Pipeline, changing the running device from Nvidia GPU to Ascend NPU only requires modifying the `device`:
 
 ```python
 from paddlex import create_pipeline
-predict = create_pipeline(pipeline="PP-ChatOCRv3-doc",
-                            llm_name="ernie-3.5",
-                            llm_params = {"api_type":"qianfan","ak":"","sk":""},  ## Please fill in your ak and sk, or you will not be able to call the large model
-                            device = "npu:0")
+predict = create_pipeline(
+    pipeline="PP-ChatOCRv3-doc",
+    llm_name="ernie-3.5",
+    llm_params={"api_type": "qianfan", "ak": "", "sk": ""},  # Please fill in ak and sk, required for LLM.
+    device="npu:0" # gpu:0 --> npu:0
+    )
 ```
+
 If you want to use the PP-ChatOCRv3-doc Pipeline on more types of hardware, please refer to the [PaddleX Multi-Device Usage Guide](../../../installation/installation_other_devices_en.md).

+ 24 - 13
paddlex/inference/pipelines/ppchatocrv3/ppchatocrv3.py

@@ -184,7 +184,13 @@ class PPChatOCRPipeline(_TableRecPipeline):
             self.layout_batch_size.set_predictor(device=device)
             self.ocr_pipeline.set_predictor(device=device)
 
-    def predict(
+    def predict(self, *args, **kwargs):
+        logging.error(
+            "PP-ChatOCRv3-doc Pipeline do not support to call `predict()` directly! Please call `visual_predict(input)` firstly to get visual prediction of `input` and call `chat(key_list)` to get the result of query specified by `key_list`."
+        )
+        return
+
+    def visual_predict(
         self,
         input,
         use_doc_image_ori_cls_model=True,
@@ -194,6 +200,11 @@ class PPChatOCRPipeline(_TableRecPipeline):
         **kwargs,
     ):
         self.set_predictor(**kwargs)
+        if self.uvdoc_predictor and uvdoc_batch_size:
+            self.uvdoc_predictor.set_predictor(
+                batch_size=uvdoc_batch_size, device=device
+            )
+
         visual_info = {"ocr_text": [], "table_html": [], "table_text": []}
         # get all visual result
         visual_result = list(
@@ -390,7 +401,7 @@ class PPChatOCRPipeline(_TableRecPipeline):
 
         return ocr_text, table_text_list, table_html
 
-    def get_vector_text(
+    def build_vector(
         self,
         llm_name=None,
         llm_params={},
@@ -439,7 +450,7 @@ class PPChatOCRPipeline(_TableRecPipeline):
 
         return VectorResult({"vector": text_result})
 
-    def get_retrieval_text(
+    def retrieval(
         self,
         key_list,
         visual_info=None,
@@ -457,7 +468,7 @@ class PPChatOCRPipeline(_TableRecPipeline):
         is_seving = visual_info and llm_name
 
         if self.visual_flag and not is_seving:
-            self.vector = self.get_vector_text()
+            self.vector = self.build_vector()
 
         if not any([vector, self.vector]):
             logging.warning(
@@ -465,11 +476,11 @@ class PPChatOCRPipeline(_TableRecPipeline):
             )
             if is_seving:
                 # for serving
-                vector = self.get_vector_text(
+                vector = self.build_vector(
                     llm_name=llm_name, llm_params=llm_params, visual_info=visual_info
                 )
             else:
-                self.vector = self.get_vector_text()
+                self.vector = self.build_vector()
 
         if vector and llm_name:
             _vector = vector["vector"]
@@ -497,7 +508,7 @@ class PPChatOCRPipeline(_TableRecPipeline):
         user_task_description="",
         rules="",
         few_shot="",
-        use_vector=True,
+        use_retrieval=True,
         save_prompt=False,
         llm_name="ernie-3.5",
         llm_params={},
@@ -539,8 +550,8 @@ class PPChatOCRPipeline(_TableRecPipeline):
                 res = self.get_llm_result(prompt)
                 # TODO: why use one html but the whole table_text in next step
                 if list(res.values())[0] in failed_results:
-                    logging.info(
-                        "table html sequence is too much longer, using ocr directly"
+                    logging.debug(
+                        "table html sequence is too much longer, using ocr directly!"
                     )
                     prompt = self.get_prompt_for_ocr(
                         table_text, key_list, rules, few_shot, user_task_description
@@ -553,12 +564,12 @@ class PPChatOCRPipeline(_TableRecPipeline):
                         key_list.remove(key)
                         final_results[key] = value
         if len(key_list) > 0:
-            logging.info("get result from ocr")
+            logging.debug("get result from ocr")
             if retrieval_result:
                 ocr_text = retrieval_result.get("retrieval")
-            elif use_vector and any([visual_info, vector]):
+            elif use_retrieval and any([visual_info, vector]):
                 # for serving or local
-                ocr_text = self.get_retrieval_text(
+                ocr_text = self.retrieval(
                     key_list=key_list,
                     visual_info=visual_info,
                     vector=vector,
@@ -567,7 +578,7 @@ class PPChatOCRPipeline(_TableRecPipeline):
                 )["retrieval"]
             else:
                 # for local
-                ocr_text = self.get_retrieval_text(key_list=key_list)["retrieval"]
+                ocr_text = self.retrieval(key_list=key_list)["retrieval"]
             prompt = self.get_prompt_for_ocr(
                 ocr_text,
                 key_list,