|
@@ -232,12 +232,12 @@ In the above Python script, the following steps are executed:
|
|
|
|
|
|
|
|
| Parameter | Type | Default | Description |
|
|
| Parameter | Type | Default | Description |
|
|
|
|-|-|-|-|
|
|
|-|-|-|-|
|
|
|
-|`input`|Python Var|无|Support to pass Python variables directly, such as `numpy.ndarray` representing image data;|
|
|
|
|
|
-|`input`|str|无|Support to pass the path of the file to be predicted, such as the local path of an image file: `/root/data/img.jpg`;|
|
|
|
|
|
-|`input`|str|无|Support to pass the URL of the file to be predicted, such as: `https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/contract.pdf`;|
|
|
|
|
|
-|`input`|str|无|Support to pass the local directory, which should contain files to be predicted, such as: `/root/data/`;|
|
|
|
|
|
-|`input`|dict|无|Support to pass a dictionary, where the key needs to correspond to the specific pipeline, such as: `{"img": "/root/data1"}`;|
|
|
|
|
|
-|`input`|list|无|Support to pass a list, where the elements must be of the above types of data, such as: `[numpy.ndarray, numpy.ndarray]`,`["/root/data/img1.jpg", "/root/data/img2.jpg"]`,`["/root/data1", "/root/data2"]`,`[{"img": "/root/data1"}, {"img": "/root/data2/img.jpg"}]`;|
|
|
|
|
|
|
|
+|`input`|Python Var|-|Support to pass Python variables directly, such as `numpy.ndarray` representing image data;|
|
|
|
|
|
+|`input`|str|-|Support to pass the path of the file to be predicted, such as the local path of an image file: `/root/data/img.jpg`;|
|
|
|
|
|
+|`input`|str|-|Support to pass the URL of the file to be predicted, such as: `https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/contract.pdf`;|
|
|
|
|
|
+|`input`|str|-|Support to pass the local directory, which should contain files to be predicted, such as: `/root/data/`;|
|
|
|
|
|
+|`input`|dict|-|Support to pass a dictionary, where the key needs to correspond to the specific pipeline, such as: `{"img": "/root/data1"}`;|
|
|
|
|
|
+|`input`|list|-|Support to pass a list, where the elements must be of the above types of data, such as: `[numpy.ndarray, numpy.ndarray]`,`["/root/data/img1.jpg", "/root/data/img2.jpg"]`,`["/root/data1", "/root/data2"]`,`[{"img": "/root/data1"}, {"img": "/root/data2/img.jpg"}]`;|
|
|
|
|`use_doc_image_ori_cls_model`|bool|`True`|Whether or not to use the orientation classification model;|
|
|
|`use_doc_image_ori_cls_model`|bool|`True`|Whether or not to use the orientation classification model;|
|
|
|
|`use_doc_image_unwarp_model`|bool|`True`|Whether or not to use the unwarp model;|
|
|
|`use_doc_image_unwarp_model`|bool|`True`|Whether or not to use the unwarp model;|
|
|
|
|`use_seal_text_det_model`|bool|`True`|Whether or not to use the seal text detection model;|
|
|
|`use_seal_text_det_model`|bool|`True`|Whether or not to use the seal text detection model;|
|
|
@@ -338,7 +338,7 @@ If you need to directly apply the pipeline in your Python project, you can refer
|
|
|
|
|
|
|
|
Additionally, PaddleX provides three other deployment methods, detailed as follows:
|
|
Additionally, PaddleX provides three other deployment methods, detailed as follows:
|
|
|
|
|
|
|
|
-🚀 **High-Performance Inference**: In actual production environments, many applications have stringent standards for the performance metrics (especially response speed) of deployment strategies to ensure efficient system operation and smooth user experience. To this end, PaddleX provides high-performance inference plugins aimed at deeply optimizing model inference and pre/post-processing to significantly speed up the end-to-end process. For detailed High-Performance Inference procedures, please refer to the [PaddleX High-Performance Inference Guide](../../../pipeline_deploy/high_performance_deploy_en.md).
|
|
|
|
|
|
|
+🚀 **High-Performance Inference**: In actual production environments, many applications have stringent standards for the performance metrics (especially response speed) of deployment strategies to ensure efficient system operation and smooth user experience. To this end, PaddleX provides high-performance inference plugins aimed at deeply optimizing model inference and pre/post-processing to significantly speed up the end-to-end process. For detailed high-performance inference procedures, please refer to the [PaddleX High-Performance Inference Guide](../../../pipeline_deploy/high_performance_inference_en.md).
|
|
|
|
|
|
|
|
☁️ **Service-Oriented Deployment**: Service-oriented deployment is a common deployment form in actual production environments. By encapsulating inference functions as services, clients can access these services through network requests to obtain inference results. PaddleX supports users in achieving low-cost service-oriented deployment of pipelines. For detailed service-oriented deployment procedures, please refer to the [PaddleX Service-Oriented Deployment Guide](../../../pipeline_deploy/service_deploy_en.md).
|
|
☁️ **Service-Oriented Deployment**: Service-oriented deployment is a common deployment form in actual production environments. By encapsulating inference functions as services, clients can access these services through network requests to obtain inference results. PaddleX supports users in achieving low-cost service-oriented deployment of pipelines. For detailed service-oriented deployment procedures, please refer to the [PaddleX Service-Oriented Deployment Guide](../../../pipeline_deploy/service_deploy_en.md).
|
|
|
|
|
|
|
@@ -370,95 +370,167 @@ Operations provided by the service are as follows:
|
|
|
|
|
|
|
|
- **`analyzeImage`**
|
|
- **`analyzeImage`**
|
|
|
|
|
|
|
|
- Analyzes images using computer vision models to obtain OCR, table recognition results, etc., and extracts key information from the images.
|
|
|
|
|
|
|
+ Analyze images using computer vision models to obtain OCR, table recognition results, and extract key information from the images.
|
|
|
|
|
|
|
|
`POST /chatocr-vision`
|
|
`POST /chatocr-vision`
|
|
|
|
|
|
|
|
- Request body properties:
|
|
- Request body properties:
|
|
|
|
|
|
|
|
| Name | Type | Description | Required |
|
|
| Name | Type | Description | Required |
|
|
|
- |------|------|-------------|----------|
|
|
|
|
|
- | `image` | `string` | The URL of an image file or PDF file accessible by the service, or the Base64 encoded result of the content of the above-mentioned file types. For PDF files with more than 10 pages, only the content of the first 10 pages will be used. | Yes |
|
|
|
|
|
- | `fileType` | `integer` | File type. `0` indicates a PDF file, `1` indicates an image file. If this property is not present in the request body, the service will attempt to automatically infer the file type based on the URL. | No |
|
|
|
|
|
- | `useOricls` | `boolean` | Whether to enable document image orientation classification. This feature is enabled by default. | No |
|
|
|
|
|
- | `useCurve` | `boolean` | Whether to enable seal text detection. This feature is enabled by default. | No |
|
|
|
|
|
- | `useUvdoc` | `boolean` | Whether to enable text image correction. This feature is enabled by default. | No |
|
|
|
|
|
- | `inferenceParams` | `object` | Inference parameters. | No |
|
|
|
|
|
|
|
+ |-|-|-|-|
|
|
|
|
|
+ |`image`|`string`|The URL of an accessible image file or PDF file, or the Base64 encoded content of the above file types. For PDF files with more than 10 pages, only the first 10 pages will be used. | Yes |
|
|
|
|
|
+ |`fileType`|`integer`|File type. `0` represents PDF files, `1` represents image files. If this property is not present in the request body, the service will attempt to infer the file type automatically based on the URL. | No |
|
|
|
|
|
+ |`useOricls`|`boolean`|Whether to enable document image orientation classification. This feature is enabled by default. | No |
|
|
|
|
|
+ |`useCurve`|`boolean`|Whether to enable seal text detection. This feature is enabled by default. | No |
|
|
|
|
|
+ |`useUvdoc`|`boolean`|Whether to enable text image correction. This feature is enabled by default. | No |
|
|
|
|
|
+ |`inferenceParams`|`object`|Inference parameters. | No |
|
|
|
|
|
|
|
|
Properties of `inferenceParams`:
|
|
Properties of `inferenceParams`:
|
|
|
|
|
|
|
|
| Name | Type | Description | Required |
|
|
| Name | Type | Description | Required |
|
|
|
- |------|------|-------------|----------|
|
|
|
|
|
- | `maxLongSide` | `integer` | During inference, if the length of the longer side of the input image for the text detection model is greater than `maxLongSide`, the image will be scaled so that the length of the longer side equals `maxLongSide`. | No |
|
|
|
|
|
|
|
+ |-|-|-|-|
|
|
|
|
|
+ |`maxLongSide`|`integer`|During inference, if the length of the longer side of the input image for the text detection model is greater than `maxLongSide`, the image will be scaled so that the length of the longer side equals `maxLongSide`. | No |
|
|
|
|
|
|
|
|
- - When the request is processed successfully, the `result` of the response body has the following properties:
|
|
|
|
|
|
|
+ - When the request is processed successfully, the `result` in the response body has the following properties:
|
|
|
|
|
|
|
|
| Name | Type | Description |
|
|
| Name | Type | Description |
|
|
|
- |------|------|-------------|
|
|
|
|
|
- | `visionResults` | `array` | Analysis results obtained using computer vision models. The array length is 1 (for image input) or the smaller of the number of document pages and 10 (for PDF input). For PDF input, each element in the array represents the processing result of each page in the PDF file. |
|
|
|
|
|
- | `visionInfo` | `object` | Key information in the image, which can be used as input for other operations. |
|
|
|
|
|
|
|
+ |-|-|-|
|
|
|
|
|
+ |`visionResults`|`array`|Analysis results obtained using the computer vision model. The array length is 1 (for image input) or the smaller of the number of document pages and 10 (for PDF input). For PDF input, each element in the array represents the processing result of each page in the PDF file in sequence. |
|
|
|
|
|
+ |`visionInfo`|`object`|Key information in the image, which can be used as input for other operations. |
|
|
|
|
|
|
|
|
Each element in `visionResults` is an `object` with the following properties:
|
|
Each element in `visionResults` is an `object` with the following properties:
|
|
|
|
|
|
|
|
| Name | Type | Description |
|
|
| Name | Type | Description |
|
|
|
- |------|------|-------------|
|
|
|
|
|
- | `texts` | `array` | Text positions, contents, and scores. |
|
|
|
|
|
- | `tables` | `array` | Table positions and contents. |
|
|
|
|
|
- | `inputImage` | `string` | Input image. The image is in JPEG format and encoded using Base64. |
|
|
|
|
|
- | `ocrImage` | `string` | OCR result image. The image is in JPEG format and encoded using Base64. |
|
|
|
|
|
- | `layoutImage` | `string` | Layout area detection result image. The image is in JPEG format and encoded using Base64. |
|
|
|
|
|
|
|
+ |-|-|-|
|
|
|
|
|
+ |`texts`|`array`|Text locations, contents, and scores. |
|
|
|
|
|
+ |`tables`|`array`|Table locations and contents. |
|
|
|
|
|
+ |`inputImage`|`string`|Input image. The image is in JPEG format and encoded in Base64. |
|
|
|
|
|
+ |`ocrImage`|`string`|OCR result image. The image is in JPEG format and encoded in Base64. |
|
|
|
|
|
+ |`layoutImage`|`string`|Layout area detection result image. The image is in JPEG format and encoded in Base64. |
|
|
|
|
|
|
|
|
Each element in `texts` is an `object` with the following properties:
|
|
Each element in `texts` is an `object` with the following properties:
|
|
|
|
|
|
|
|
| Name | Type | Description |
|
|
| Name | Type | Description |
|
|
|
- |------|------|-------------|
|
|
|
|
|
- | `poly` | `array` | Text position. The elements in the array are the vertex coordinates of the polygon enclosing the text in```markdown
|
|
|
|
|
-### chat
|
|
|
|
|
-
|
|
|
|
|
-Interact with large language models to extract key information.
|
|
|
|
|
-
|
|
|
|
|
-`POST /chatocr-vision`
|
|
|
|
|
-
|
|
|
|
|
-- Request body properties:
|
|
|
|
|
-
|
|
|
|
|
- | Name | Type | Description | Required |
|
|
|
|
|
- |------|------|-------------|----------|
|
|
|
|
|
- |`keys`|`array`|List of keywords.|Yes|
|
|
|
|
|
- |`visionInfo`|`object`|Key information from the image. Provided by the `analyzeImage` operation.|Yes|
|
|
|
|
|
- |`taskDescription`|`string`|Task prompt.|No|
|
|
|
|
|
- |`rules`|`string`|Extraction rules. Used to customize the information extraction rules, e.g., to specify output formats.|No|
|
|
|
|
|
- |`fewShot`|`string`|Example prompts.|No|
|
|
|
|
|
- |`useVectorStore`|`boolean`|Whether to enable the vector database. Enabled by default.|No|
|
|
|
|
|
- |`vectorStore`|`object`|Serialized result of the vector database. Provided by the `buildVectorStore` operation.|No|
|
|
|
|
|
- |`retrievalResult`|`string`|Knowledge retrieval result. Provided by the `retrieveKnowledge` operation.|No|
|
|
|
|
|
- |`returnPrompts`|`boolean`|Whether to return the prompts used. Enabled by default.|No|
|
|
|
|
|
- |`llmName`|`string`|Name of the large language model.|No|
|
|
|
|
|
- |`llmParams`|`object`|API parameters for the large language model.|No|
|
|
|
|
|
-
|
|
|
|
|
- Currently, `llmParams` can take the following form:
|
|
|
|
|
-
|
|
|
|
|
- ```json
|
|
|
|
|
- {
|
|
|
|
|
- "apiType": "qianfan",
|
|
|
|
|
- "apiKey": "{Qianfan Platform API key}",
|
|
|
|
|
- "secretKey": "{Qianfan Platform secret key}"
|
|
|
|
|
- }
|
|
|
|
|
- ```
|
|
|
|
|
|
|
+ |-|-|-|
|
|
|
|
|
+ |`poly`|`array`|Text location. The elements in the array are the vertex coordinates of the polygon enclosing the text in sequence. |
|
|
|
|
|
+ |`text`|`string`|Text content. |
|
|
|
|
|
+ |`score`|`number`|Text recognition score. |
|
|
|
|
|
|
|
|
-- On successful request processing, the `result` in the response body has the following properties:
|
|
|
|
|
|
|
+ Each element in `tables` is an `object` with the following properties:
|
|
|
|
|
|
|
|
- | Name | Type | Description |
|
|
|
|
|
- |------|------|-------------|
|
|
|
|
|
- |`chatResult`|`string`|Extracted key information result.|
|
|
|
|
|
- |`prompts`|`object`|Prompts used.|
|
|
|
|
|
|
|
+ | Name | Type | Description |
|
|
|
|
|
+ |-|-|-|
|
|
|
|
|
+ |`bbox`|`array`|Table location. The elements in the array are the x-coordinate of the top-left corner, the y-coordinate of the top-left corner, the x-coordinate of the bottom-right corner, and the y-coordinate of the bottom-right corner of the bounding box in sequence. |
|
|
|
|
|
+ |`html`|`string`|Table recognition result in HTML format. |
|
|
|
|
|
|
|
|
- Properties of `prompts`:
|
|
|
|
|
|
|
+- **`buildVectorStore`**
|
|
|
|
|
|
|
|
- | Name | Type | Description |
|
|
|
|
|
- |------|------|-------------|
|
|
|
|
|
- |`ocr`|`string`|OCR prompt.|
|
|
|
|
|
- |`table`|`string`|Table prompt.|
|
|
|
|
|
- |`html`|`string`|HTML prompt.|
|
|
|
|
|
|
|
+ Builds a vector database.
|
|
|
|
|
+
|
|
|
|
|
+ `POST /chatocr-vector`
|
|
|
|
|
+
|
|
|
|
|
+ - The request body properties are as follows:
|
|
|
|
|
+
|
|
|
|
|
+ | Name | Type | Description | Required |
|
|
|
|
|
+ |-|-|-|-|
|
|
|
|
|
+ |`visionInfo`|`object`|Key information from the image. Provided by the `analyzeImage` operation.|Yes|
|
|
|
|
|
+ |`minChars`|`integer`|Minimum data length to enable the vector database.|No|
|
|
|
|
|
+ |`llmRequestInterval`|`number`|Interval time for calling the large language model API.|No|
|
|
|
|
|
+ |`llmName`|`string`|Name of the large language model.|No|
|
|
|
|
|
+ |`llmParams`|`object`|Parameters for the large language model API.|No|
|
|
|
|
|
+
|
|
|
|
|
+ Currently, `llmParams` can take the following form:
|
|
|
|
|
+
|
|
|
|
|
+ ```json
|
|
|
|
|
+ {
|
|
|
|
|
+ "apiType": "qianfan",
|
|
|
|
|
+ "apiKey": "{qianfan API key}",
|
|
|
|
|
+ "secretKey": "{qianfan secret key}"
|
|
|
|
|
+ }
|
|
|
|
|
+ ```
|
|
|
|
|
+
|
|
|
|
|
+ - When the request is processed successfully, the `result` in the response body has the following properties:
|
|
|
|
|
+
|
|
|
|
|
+ | Name | Type | Description |
|
|
|
|
|
+ |-|-|-|
|
|
|
|
|
+ |`vectorStore`|`object`|Serialized result of the vector database, which can be used as input for other operations.|
|
|
|
|
|
+
|
|
|
|
|
+- **`retrieveKnowledge`**
|
|
|
|
|
+
|
|
|
|
|
+ Perform knowledge retrieval.
|
|
|
|
|
+
|
|
|
|
|
+ `POST /chatocr-retrieval`
|
|
|
|
|
+
|
|
|
|
|
+ - The request body properties are as follows:
|
|
|
|
|
+
|
|
|
|
|
+ | Name | Type | Description | Required |
|
|
|
|
|
+ |-|-|-|-|
|
|
|
|
|
+ |`keys`|`array`|List of keywords.|Yes|
|
|
|
|
|
+ |`vectorStore`|`object`|Serialized result of the vector database. Provided by the `buildVectorStore` operation.|Yes|
|
|
|
|
|
+ |`llmName`|`string`|Name of the large language model.|No|
|
|
|
|
|
+ |`llmParams`|`object`|API parameters for the large language model.|No|
|
|
|
|
|
+
|
|
|
|
|
+ Currently, `llmParams` can take the following form:
|
|
|
|
|
+
|
|
|
|
|
+ ```json
|
|
|
|
|
+ {
|
|
|
|
|
+ "apiType": "qianfan",
|
|
|
|
|
+ "apiKey": "{Qianfan Platform API key}",
|
|
|
|
|
+ "secretKey": "{Qianfan Platform secret key}"
|
|
|
|
|
+ }
|
|
|
|
|
+ ```
|
|
|
|
|
+
|
|
|
|
|
+ - When the request is processed successfully, the `result` in the response body has the following properties:
|
|
|
|
|
+
|
|
|
|
|
+ | Name | Type | Description |
|
|
|
|
|
+ |-|-|-|
|
|
|
|
|
+ |`retrievalResult`|`object`|The result of knowledge retrieval, which can be used as input for other operations.|
|
|
|
|
|
+
|
|
|
|
|
+- **`chat`**
|
|
|
|
|
+
|
|
|
|
|
+ Interact with large language models to extract key information.
|
|
|
|
|
+
|
|
|
|
|
+ `POST /chatocr-vision`
|
|
|
|
|
+
|
|
|
|
|
+ - Request body properties:
|
|
|
|
|
+
|
|
|
|
|
+ | Name | Type | Description | Required |
|
|
|
|
|
+ |-|-|-|-|
|
|
|
|
|
+ |`keys` | `array` | List of keywords. | Yes |
|
|
|
|
|
+ |`visionInfo` | `object` | Key information from images. Provided by the `analyzeImage` operation. | Yes |
|
|
|
|
|
+ |`taskDescription` | `string` | Task prompt. | No |
|
|
|
|
|
+ |`rules` | `string` | Custom extraction rules, e.g., for output formatting. | No |
|
|
|
|
|
+ |`fewShot` | `string` | Example prompts. | No |
|
|
|
|
|
+ |`vectorStore` | `object` | Serialized result of the vector database. Provided by the `buildVectorStore` operation. | No |
|
|
|
|
|
+ |`retrievalResult` | `object` | Results of knowledge retrieval. Provided by the `retrieveKnowledge` operation. | No |
|
|
|
|
|
+ |`returnPrompts` | `boolean` | Whether to return the prompts used. Enabled by default. | No |
|
|
|
|
|
+ |`llmName` | `string` | Name of the large language model. | No |
|
|
|
|
|
+ |`llmParams` | `object` | API parameters for the large language model. | No |
|
|
|
|
|
+
|
|
|
|
|
+ Currently, `llmParams` can take the following form:
|
|
|
|
|
+
|
|
|
|
|
+ ```json
|
|
|
|
|
+ {
|
|
|
|
|
+ "apiType": "qianfan",
|
|
|
|
|
+ "apiKey": "{Qianfan Platform API key}",
|
|
|
|
|
+ "secretKey": "{Qianfan Platform secret key}"
|
|
|
|
|
+ }
|
|
|
|
|
+ ```
|
|
|
|
|
+
|
|
|
|
|
+ - On successful request processing, the `result` in the response body has the following properties:
|
|
|
|
|
+
|
|
|
|
|
+ | Name | Type | Description |
|
|
|
|
|
+ |-|-|-|
|
|
|
|
|
+ |`chatResult` | `string` | Extracted key information. |
|
|
|
|
|
+ |`prompts` | `object` | Prompts used. |
|
|
|
|
|
+
|
|
|
|
|
+ Properties of `prompts`:
|
|
|
|
|
+
|
|
|
|
|
+ | Name | Type | Description |
|
|
|
|
|
+ |-|-|-|
|
|
|
|
|
+ |`ocr` | `string` | OCR prompt. |
|
|
|
|
|
+ |`table` | `string` | Table prompt. |
|
|
|
|
|
+ |`html` | `string` | HTML prompt. |
|
|
|
|
|
|
|
|
</details>
|
|
</details>
|
|
|
|
|
|
|
@@ -523,7 +595,6 @@ if __name__ == "__main__":
|
|
|
f.write(base64.b64decode(res["layoutImage"]))
|
|
f.write(base64.b64decode(res["layoutImage"]))
|
|
|
print(f"Output images saved at {ocr_img_path} and {layout_img_path}")
|
|
print(f"Output images saved at {ocr_img_path} and {layout_img_path}")
|
|
|
print("")
|
|
print("")
|
|
|
- print("="*50 + "\n\n")
|
|
|
|
|
|
|
|
|
|
payload = {
|
|
payload = {
|
|
|
"visionInfo": result_vision["visionInfo"],
|
|
"visionInfo": result_vision["visionInfo"],
|
|
@@ -540,7 +611,6 @@ if __name__ == "__main__":
|
|
|
pprint.pp(resp_vector.json())
|
|
pprint.pp(resp_vector.json())
|
|
|
sys.exit(1)
|
|
sys.exit(1)
|
|
|
result_vector = resp_vector.json()["result"]
|
|
result_vector = resp_vector.json()["result"]
|
|
|
- print("="*50 + "\n\n")
|
|
|
|
|
|
|
|
|
|
payload = {
|
|
payload = {
|
|
|
"keys": keys,
|
|
"keys": keys,
|
|
@@ -556,9 +626,6 @@ if __name__ == "__main__":
|
|
|
pprint.pp(resp_retrieval.json())
|
|
pprint.pp(resp_retrieval.json())
|
|
|
sys.exit(1)
|
|
sys.exit(1)
|
|
|
result_retrieval = resp_retrieval.json()["result"]
|
|
result_retrieval = resp_retrieval.json()["result"]
|
|
|
- print("Knowledge retrieval result:")
|
|
|
|
|
- print(result_retrieval["retrievalResult"])
|
|
|
|
|
- print("="*50 + "\n\n")
|
|
|
|
|
|
|
|
|
|
payload = {
|
|
payload = {
|
|
|
"keys": keys,
|
|
"keys": keys,
|
|
@@ -566,7 +633,6 @@ if __name__ == "__main__":
|
|
|
"taskDescription": "",
|
|
"taskDescription": "",
|
|
|
"rules": "",
|
|
"rules": "",
|
|
|
"fewShot": "",
|
|
"fewShot": "",
|
|
|
- "useVectorStore": True,
|
|
|
|
|
"vectorStore": result_vector["vectorStore"],
|
|
"vectorStore": result_vector["vectorStore"],
|
|
|
"retrievalResult": result_retrieval["retrievalResult"],
|
|
"retrievalResult": result_retrieval["retrievalResult"],
|
|
|
"returnPrompts": True,
|
|
"returnPrompts": True,
|
|
@@ -581,33 +647,44 @@ if __name__ == "__main__":
|
|
|
pprint.pp(resp_chat.json())
|
|
pprint.pp(resp_chat.json())
|
|
|
sys.exit(1)
|
|
sys.exit(1)
|
|
|
result_chat = resp_chat.json()["result"]
|
|
result_chat = resp_chat.json()["result"]
|
|
|
- print("Prompts:")
|
|
|
|
|
|
|
+ print("\nPrompts:")
|
|
|
pprint.pp(result_chat["prompts"])
|
|
pprint.pp(result_chat["prompts"])
|
|
|
print("Final result:")
|
|
print("Final result:")
|
|
|
print(len(result_chat["chatResult"]))
|
|
print(len(result_chat["chatResult"]))
|
|
|
```
|
|
```
|
|
|
|
|
+
|
|
|
**Note**: Please fill in your API key and secret key at `API_KEY` and `SECRET_KEY`.
|
|
**Note**: Please fill in your API key and secret key at `API_KEY` and `SECRET_KEY`.
|
|
|
|
|
+
|
|
|
</details>
|
|
</details>
|
|
|
</details>
|
|
</details>
|
|
|
<br/>
|
|
<br/>
|
|
|
|
|
|
|
|
📱 **Edge Deployment**: Edge deployment is a method that places computing and data processing functions on user devices themselves, allowing devices to process data directly without relying on remote servers. PaddleX supports deploying models on edge devices such as Android. For detailed edge deployment procedures, please refer to the [PaddleX Edge Deployment Guide](../../../pipeline_deploy/lite_deploy_en.md).
|
|
📱 **Edge Deployment**: Edge deployment is a method that places computing and data processing functions on user devices themselves, allowing devices to process data directly without relying on remote servers. PaddleX supports deploying models on edge devices such as Android. For detailed edge deployment procedures, please refer to the [PaddleX Edge Deployment Guide](../../../pipeline_deploy/lite_deploy_en.md).
|
|
|
|
|
+
|
|
|
## 4. Custom Development
|
|
## 4. Custom Development
|
|
|
|
|
+
|
|
|
If the default model weights provided by the PP-ChatOCRv3-doc Pipeline do not meet your requirements in terms of accuracy or speed for your specific scenario, you can attempt to further **fine-tune** the existing models using **your own domain-specific or application-specific data** to enhance the recognition performance of the general table recognition pipeline in your scenario.
|
|
If the default model weights provided by the PP-ChatOCRv3-doc Pipeline do not meet your requirements in terms of accuracy or speed for your specific scenario, you can attempt to further **fine-tune** the existing models using **your own domain-specific or application-specific data** to enhance the recognition performance of the general table recognition pipeline in your scenario.
|
|
|
|
|
|
|
|
### 4.1 Model Fine-tuning
|
|
### 4.1 Model Fine-tuning
|
|
|
|
|
+
|
|
|
Since the PP-ChatOCRv3-doc Pipeline comprises six modules, unsatisfactory performance may stem from any of these modules (note that the text image rectification module does not support customization at this time).
|
|
Since the PP-ChatOCRv3-doc Pipeline comprises six modules, unsatisfactory performance may stem from any of these modules (note that the text image rectification module does not support customization at this time).
|
|
|
|
|
|
|
|
You can analyze images with poor recognition results and follow the guidelines below for analysis and model fine-tuning:
|
|
You can analyze images with poor recognition results and follow the guidelines below for analysis and model fine-tuning:
|
|
|
|
|
|
|
|
* Incorrect table structure detection (e.g., row/column misidentification, cell position errors) may indicate deficiencies in the table structure recognition module. You need to refer to the **Customization** section in the [Table Structure Recognition Module Development Tutorial](../../../module_usage/tutorials/ocr_modules/table_structure_recognition_en.md) and fine-tune the table structure recognition model using your private dataset.
|
|
* Incorrect table structure detection (e.g., row/column misidentification, cell position errors) may indicate deficiencies in the table structure recognition module. You need to refer to the **Customization** section in the [Table Structure Recognition Module Development Tutorial](../../../module_usage/tutorials/ocr_modules/table_structure_recognition_en.md) and fine-tune the table structure recognition model using your private dataset.
|
|
|
|
|
+
|
|
|
* Misplaced layout elements (e.g., incorrect positioning of tables or seals) may suggest issues with the layout detection module. Consult the **Customization** section in the [Layout Detection Module Development Tutorial](../../../module_usage/tutorials/ocr_modules/layout_detection_en.md) and fine-tune the layout detection model with your private dataset.
|
|
* Misplaced layout elements (e.g., incorrect positioning of tables or seals) may suggest issues with the layout detection module. Consult the **Customization** section in the [Layout Detection Module Development Tutorial](../../../module_usage/tutorials/ocr_modules/layout_detection_en.md) and fine-tune the layout detection model with your private dataset.
|
|
|
|
|
+
|
|
|
* Frequent undetected text (i.e., text leakage) may indicate limitations in the text detection model. Refer to the **Customization** section in the [Text Detection Module Development Tutorial](../../../module_usage/tutorials/ocr_modules/text_detection_en.md) and fine-tune the text detection model using your private dataset.
|
|
* Frequent undetected text (i.e., text leakage) may indicate limitations in the text detection model. Refer to the **Customization** section in the [Text Detection Module Development Tutorial](../../../module_usage/tutorials/ocr_modules/text_detection_en.md) and fine-tune the text detection model using your private dataset.
|
|
|
|
|
+
|
|
|
* High text recognition errors (i.e., recognized text content does not match the actual text) suggest that the text recognition model requires improvement. Follow the **Customization** section in the [Text Recognition Module Development Tutorial](../../../module_usage/tutorials/ocr_modules/text_recognition_en.md) to fine-tune the text recognition model.
|
|
* High text recognition errors (i.e., recognized text content does not match the actual text) suggest that the text recognition model requires improvement. Follow the **Customization** section in the [Text Recognition Module Development Tutorial](../../../module_usage/tutorials/ocr_modules/text_recognition_en.md) to fine-tune the text recognition model.
|
|
|
|
|
+
|
|
|
* Frequent recognition errors in detected seal text indicate that the seal text detection model needs further refinement. Consult the **Customization** section in the [Seal Text Detection Module Development Tutorials](../../../module_usage/tutorials/ocr_modules/seal_text_detection_en.md) to fine-tune the seal text detection model.
|
|
* Frequent recognition errors in detected seal text indicate that the seal text detection model needs further refinement. Consult the **Customization** section in the [Seal Text Detection Module Development Tutorials](../../../module_usage/tutorials/ocr_modules/seal_text_detection_en.md) to fine-tune the seal text detection model.
|
|
|
|
|
+
|
|
|
* Frequent misidentifications of document or certificate orientations with text regions suggest that the document image orientation classification model requires improvement. Refer to the **Customization** section in the [Document Image Orientation Classification Module Development Tutorial](../../../module_usage/tutorials/ocr_modules/doc_img_orientation_classification_en.md) to fine-tune the document image orientation classification model.
|
|
* Frequent misidentifications of document or certificate orientations with text regions suggest that the document image orientation classification model requires improvement. Refer to the **Customization** section in the [Document Image Orientation Classification Module Development Tutorial](../../../module_usage/tutorials/ocr_modules/doc_img_orientation_classification_en.md) to fine-tune the document image orientation classification model.
|
|
|
|
|
|
|
|
### 4.2 Model Deployment
|
|
### 4.2 Model Deployment
|
|
|
|
|
+
|
|
|
After fine-tuning your models using your private dataset, you will obtain local model weights files.
|
|
After fine-tuning your models using your private dataset, you will obtain local model weights files.
|
|
|
|
|
|
|
|
To use the fine-tuned model weights, simply modify the pipeline configuration file by replacing the local paths of the default model weights with those of your fine-tuned models:
|
|
To use the fine-tuned model weights, simply modify the pipeline configuration file by replacing the local paths of the default model weights with those of your fine-tuned models:
|
|
@@ -642,4 +719,5 @@ pipeline = create_pipeline(
|
|
|
device="npu:0" # gpu:0 --> npu:0
|
|
device="npu:0" # gpu:0 --> npu:0
|
|
|
)
|
|
)
|
|
|
```
|
|
```
|
|
|
-If you want to use the PP-ChatOCRv3-doc Pipeline on more types of hardware, please refer to the [PaddleX Multi-Device Usage Guide](../../../installation/installation_other_devices_en.md).
|
|
|
|
|
|
|
+
|
|
|
|
|
+If you want to use the PP-ChatOCRv3-doc Pipeline on more types of hardware, please refer to the [PaddleX Multi-Device Usage Guide](../../../installation/multi_devices_use_guide_en.md).
|