пре 8 месеци · 9155d03cf8
--- a/README.md
+++ b/README.md
@@ -43,7 +43,7 @@ PaddleX 3.0 是基于飞桨框架构建的低代码开发工具，它集成了
 
				 
			
 
				 🔥🔥 **2025.2.14**，PaddleX v3.0.0rc0 重磅升级。本次版本全面适配 PaddlePaddle 3.0rc0，核心升级如下：
			
 
				 
			
 
				-- 新增 12 条高价值产线，重磅推出自研 **[版面解析v2产线](docs/pipeline_usage/tutorials/ocr_pipelines/layout_parsing_v2.md)**、**[PP-ChatOCRv4-doc产线](docs/pipeline_usage/tutorials/information_extraction_pipelines/document_scene_information_extraction_v4.md)**、**[表格识别v2产线](docs/pipeline_usage/tutorials/ocr_pipelines/table_recognition_v2.md)**。此外新增了文档处理、旋转框检测、开放词汇检测/分割、视频分析、多语种语音识别、3D 等场景的产线。
			
 
				+- 新增 12 条高价值产线，重磅推出自研 **[通用版面解析v3产线](docs/pipeline_usage/tutorials/ocr_pipelines/PP-StructureV3.md)**、**[PP-ChatOCRv4-doc产线](docs/pipeline_usage/tutorials/information_extraction_pipelines/document_scene_information_extraction_v4.md)**、**[表格识别v2产线](docs/pipeline_usage/tutorials/ocr_pipelines/table_recognition_v2.md)**。此外新增了文档处理、旋转框检测、开放词汇检测/分割、视频分析、多语种语音识别、3D 等场景的产线。
			
 
				 
			
 
				 - 扩充 48 个前沿模型，包括重磅推出的 OCR 领域的**版面区域检测模型 [PP-DocLayout](docs/module_usage/tutorials/ocr_modules/layout_detection.md)**、**公式识别模型 [PP-FormulaNet](docs/module_usage/tutorials/ocr_modules/formula_recognition.md)**，**表格结构识别模型 [SLANeXt](docs/module_usage/tutorials/ocr_modules/table_structure_recognition.md)**，**文本识别模型 [PP-OCRv4_server_rec_doc](docs/module_usage/tutorials/ocr_modules/text_recognition.md)**。CV 领域的 3D 检测、人体关键点、开放词汇检测/分割模型，以及语音识别领域的 Whisper 系列等模型。
			
 
				 
			
--- a/README_en.md
+++ b/README_en.md
@@ -46,7 +46,7 @@ PaddleX 3.0 is a low-code development tool for AI models built on the PaddlePadd
 
				 
			
 
				 🔥🔥 **2025.2.14**, PaddleX v3.0.0rc0 major upgrade. This version fully adapts to PaddlePaddle 3.0rc0, with the following core upgrades:
			
 
				 
			
 
				-- **Added 12 high-value pipelines**, launching self-developed **[Layout Parsing v2 Pipeline](docs/pipeline_usage/tutorials/ocr_pipelines/layout_parsing_v2.en.md)**, **[PP-ChatOCRv4-doc Pipeline](docs/pipeline_usage/tutorials/information_extraction_pipelines/document_scene_information_extraction_v4.en.md)**, **[Table Recognition v2 Pipeline](docs/pipeline_usage/tutorials/ocr_pipelines/table_recognition_v2.en.md)**. Additionally, new pipelines for document processing, rotated box detection, open vocabulary detection/segmentation, video analysis, multilingual speech recognition, 3D, and other scenarios have been added.
			
 
				+- **Added 12 high-value pipelines**, launching self-developed **[Layout Parsing v2 Pipeline](docs/pipeline_usage/tutorials/ocr_pipelines/PP-StructureV3.en.md)**, **[PP-ChatOCRv4-doc Pipeline](docs/pipeline_usage/tutorials/information_extraction_pipelines/document_scene_information_extraction_v4.en.md)**, **[Table Recognition v2 Pipeline](docs/pipeline_usage/tutorials/ocr_pipelines/table_recognition_v2.en.md)**. Additionally, new pipelines for document processing, rotated box detection, open vocabulary detection/segmentation, video analysis, multilingual speech recognition, 3D, and other scenarios have been added.
			
 
				 
			
 
				 - **Expanded 48 cutting-edge models**, including the major releases in the OCR field such as **Document Layout Detection Model [PP-DocLayout](docs/module_usage/tutorials/ocr_modules/layout_detection.en.md)**, **Formula Recognition Model [PP-FormulaNet](docs/module_usage/tutorials/ocr_modules/formula_recognition.en.md)**, **Table Structure Recognition Model [SLANeXt](docs/module_usage/tutorials/ocr_modules/table_structure_recognition.en.md)**, **Text Recognition Model [PP-OCRv4_server_rec_doc](docs/module_usage/tutorials/ocr_modules/text_recognition.en.md)**. In the CV field, models for 3D detection, human keypoints, open vocabulary detection/segmentation, and in the speech recognition field, models from the Whisper series, among others.
			
 
				 
			
--- a/api_examples/pipelines/test_layout_parsing_v2.py
+++ b/api_examples/pipelines/test_layout_parsing_v2.py
@@ -14,7 +14,7 @@
 
				 
			
 
				 from paddlex import create_pipeline
			
 
				 
			
 
				-pipeline = create_pipeline(pipeline="layout_parsing_v2")
			
 
				+pipeline = create_pipeline(pipeline="PP-StructureV3")
			
 
				 
			
 
				 output = pipeline.predict(
			
 
				     "./test_samples/demo_paper.png",
			
--- a/docs/CHANGLOG.en.md
+++ b/docs/CHANGLOG.en.md
@@ -12,7 +12,7 @@ PaddleX 3.0 rc0 is fully compatible with PaddlePaddle 3.0rc0 version, adding 10+
 
				 - <b>New pipelines</b>:
			
 
				   - <b>[Document Image Preprocessing Pipeline](pipeline_usage/tutorials/ocr_pipelines/doc_preprocessor.en.md)</b>, supporting the correction of rotated and distorted document images.
			
 
				   - <b>[PP-ChatOCRv4-doc Pipeline](pipeline_usage/tutorials/information_extraction_pipelines/document_scene_information_extraction_v4.en.md)</b>, which integrates multimodal capabilities on the basis of the Document PP-ChatOCRv3-doc pipeline, enhances OCR recognition capabilities, optimizes Prompts, and ultimately improves the accuracy of document information extraction by 15 percentage points. Supports local large model OpenAI interface calls.
			
 
				-  - <b>[Layout Parsing v3 Pipeline](pipeline_usage/tutorials/ocr_pipelines/layout_parsing.en.md)</b>, the core solution of PP-StructureV3. Based on the General Layout Parsing v1 pipeline, it optimizes layout area detection, table recognition, formula recognition, and reading order recovery capabilities, supports converting different types of document images and document PDF files into standard Markdown files, and performs strongly in document recovery capabilities in most scenarios.
			
 
				+  - <b>[PP-StructureV3 Pipeline](pipeline_usage/tutorials/ocr_pipelines/PP-StructureV3.en.md)</b>, the core solution of PP-StructureV3. Based on the General Layout Parsing v1 pipeline, it optimizes layout area detection, table recognition, formula recognition, and reading order recovery capabilities, supports converting different types of document images and document PDF files into standard Markdown files, and performs strongly in document recovery capabilities in most scenarios.
			
 
				   - <b>[Table Recognition v2 Pipeline](pipeline_usage/tutorials/ocr_pipelines/table_recognition_v2.en.md)</b>, adopting a multi-model series networking solution of "table classification + table structure recognition + cell detection" to achieve higher precision end-to-end table recognition.
			
 
				   - <b>[Rotated Object Detection Pipeline](pipeline_usage/tutorials/cv_pipelines/rotated_object_detection.en.md)</b>, supporting the detection of rotated objects.
			
 
				   - <b>[Human Keypoint Detection Pipeline](pipeline_usage/tutorials/cv_pipelines/human_keypoint_detection.en.md)</b>, supporting precise acquisition of human keypoint positions such as shoulders, elbows, knees, etc., for pose estimation and behavior recognition.
			
--- a/docs/CHANGLOG.md
+++ b/docs/CHANGLOG.md
@@ -11,7 +11,7 @@ PaddleX 3.0 rc0 全面适配 PaddlePaddle 3.0rc0 版本，新增10+条产线，4
 
				 - 新增产线：
			
 
				   - 新增[文档预处理产线](pipeline_usage/tutorials/ocr_pipelines/doc_preprocessor.md)，支持将矫正旋转和扭曲的文档图像。
			
 
				   - 新增[文档场景信息抽取v4产线](pipeline_usage/tutorials/information_extraction_pipelines/document_scene_information_extraction_v4.md)，在文档场景信息抽取v3产线的基础上，融合了多模态能力，增强了OCR识别能力，优化了Prompt，最终文档信息抽取的准确率提升15个百分点。支持本地大模型OpenAI接口调用。
			
 
				-  - 新增[通用版面解析v3产线](pipeline_usage/tutorials/ocr_pipelines/layout_parsing.md)，PP-StructureV3 的核心方案。在通用版面解析v1产线的基础上，优化了版面区域检测、表格识别、公式识别、阅读顺序恢复的能力，支持将不同类型的文档图像和文档PDF文件转换为标准的Markdown文件，在大多数场景的文档恢复能力表现强劲。
			
 
				+  - 新增[通用版面解析v3产线](pipeline_usage/tutorials/ocr_pipelines/PP-StructureV3.md)，PP-StructureV3 的核心方案。在通用版面解析v1产线的基础上，优化了版面区域检测、表格识别、公式识别、阅读顺序恢复的能力，支持将不同类型的文档图像和文档PDF文件转换为标准的Markdown文件，在大多数场景的文档恢复能力表现强劲。
			
 
				   - 新增[通用表格识别v2产线](pipeline_usage/tutorials/ocr_pipelines/table_recognition_v2.md)，采用了“表格分类+表格结构识别+单元格检测”的多模型串联组网方案，实现更高精度的端到端表格识别。
			
 
				   - 新增[旋转框检测产线](pipeline_usage/tutorials/cv_pipelines/rotated_object_detection.md)，支持对旋转目标进行检测。
			
 
				   - 新增[人体关键点检测产线](pipeline_usage/tutorials/cv_pipelines/human_keypoint_detection.md)，支持精确获取人体的关键点位置，如肩膀、肘部、膝盖等，从而进行姿态估计和行为识别。
			
--- a/docs/pipeline_usage/tutorials/ocr_pipelines/layout_parsing_v3.en.md
+++ b/docs/pipeline_usage/tutorials/ocr_pipelines/layout_parsing_v3.en.md
@@ -2,12 +2,12 @@
 
				 comments: true
			
 
				 ---
			
 
				 
			
 
				-# Universal PP-StructureV3 Pipeline Tutorial
			
 
				+# PP-StructureV3 Pipeline Tutorial
			
 
				 
			
 
				-## 1. Introduction to Universal PP-StructureV3 pipeline
			
 
				-Layout parsing is a technology that extracts structured information from document images, primarily used to convert complex document layouts into machine-readable data formats. This technology is widely applied in document management, information extraction, and data digitization. By combining Optical Character Recognition (OCR), image processing, and machine learning algorithms, layout parsing can identify and extract text blocks, headings, paragraphs, images, tables, and other layout elements from documents. The process typically involves three main steps: layout detection, element analysis, and data formatting, ultimately generating structured document data to improve the efficiency and accuracy of data processing. <b>The Universal PP-StructureV3 pipeline, based on the v1 pipeline, enhances the capabilities of layout region detection, table recognition, and formula recognition, and adds the ability to restore multi-column reading order and convert results into Markdown files. It performs excellently on various document data and can handle more complex document data.</b> This pipeline also provides flexible serving deployment options, supporting the use of multiple programming languages on various hardware. Moreover, this pipeline offers the capability for custom development; you can train and optimize models on your own dataset based on this pipeline, and the trained models can be seamlessly integrated.
			
 
				+## 1. Introduction to PP-StructureV3 pipeline
			
 
				+Layout parsing is a technology that extracts structured information from document images, primarily used to convert complex document layouts into machine-readable data formats. This technology is widely applied in document management, information extraction, and data digitization. By combining Optical Character Recognition (OCR), image processing, and machine learning algorithms, layout parsing can identify and extract text blocks, headings, paragraphs, images, tables, and other layout elements from documents. The process typically involves three main steps: layout detection, element analysis, and data formatting, ultimately generating structured document data to improve the efficiency and accuracy of data processing. <b>The PP-StructureV3 pipeline, based on the v1 pipeline, enhances the capabilities of layout region detection, table recognition, and formula recognition, and adds the ability to restore multi-column reading order and convert results into Markdown files. It performs excellently on various document data and can handle more complex document data.</b> This pipeline also provides flexible serving deployment options, supporting the use of multiple programming languages on various hardware. Moreover, this pipeline offers the capability for custom development; you can train and optimize models on your own dataset based on this pipeline, and the trained models can be seamlessly integrated.
			
 
				 
			
 
				-<b>The Universal PP-StructureV3 pipeline includes a mandatory layout region analysis module and a general OCR sub-pipeline,</b> as well as optional sub-pipelines for document image preprocessing, table recognition, seal recognition, and formula recognition.
			
 
				+<b>The PP-StructureV3 pipeline includes a mandatory layout region analysis module and a general OCR sub-pipeline,</b> as well as optional sub-pipelines for document image preprocessing, table recognition, seal recognition, and formula recognition.
			
 
				 
			
 
				 <b>If you prioritize model accuracy, choose a high-accuracy model; if you prioritize model inference speed, choose a faster inference model; if you prioritize model storage size, choose a smaller storage model.</b>
			
 
				 <details><summary>👉 Model List Details</summary>
			
@@ -599,17 +599,17 @@ The ultra-lightweight cyrillic alphabet recognition model trained based on the P
 
				 </details>
			
 
				 
			
 
				 ## 2. Quick Start
			
 
				-All the model pipelines provided by PaddleX can be quickly experienced. You can use the command line or Python on your local machine to experience the effect of the General PP-StructureV3 pipeline.
			
 
				+All the model pipelines provided by PaddleX can be quickly experienced. You can use the command line or Python on your local machine to experience the effect of the PP-StructureV3 pipeline.
			
 
				 
			
 
				-Before using the General PP-StructureV3 pipeline locally, please ensure that you have completed the installation of the PaddleX wheel package according to the [PaddleX Local Installation Guide](../../../installation/installation.en.md).
			
 
				+Before using the PP-StructureV3 pipeline locally, please ensure that you have completed the installation of the PaddleX wheel package according to the [PaddleX Local Installation Guide](../../../installation/installation.en.md).
			
 
				 
			
 
				 ### 2.1 Experiencing via Command Line
			
 
				 
			
 
				-You can quickly experience the PP-StructureV3 pipeline with a single command. Use the [test file](https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/layout_parsing_v3_demo.png) and replace `--input` with the local path to perform prediction.
			
 
				+You can quickly experience the PP-StructureV3 pipeline with a single command. Use the [test file](https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/pp_structrue_v3_demo.png) and replace `--input` with the local path to perform prediction.
			
 
				 
			
 
				 ```
			
 
				 paddlex --pipeline PP-StructureV3 \
			
 
				-        --input layout_parsing_v3_demo.png \
			
 
				+        --input pp_structrue_v3_demo.png \
			
 
				         --use_doc_orientation_classify False \
			
 
				         --use_doc_unwarping False \
			
 
				         --use_textline_orientation False \
			
@@ -661,7 +661,7 @@ The result parameter description can be found in the result interpretation in [2
 
				 <b>Note:</b> Since the default model of the pipeline is relatively large, the inference speed may be slow. You can refer to the model list in Section 1 and replace it with a model that has faster inference speed.
			
 
				 
			
 
				 ### 2.2 Python Script Integration
			
 
				-Just a few lines of code can complete the quick inference of the pipeline. Taking the General PP-StructureV3 pipeline as an example:
			
 
				+Just a few lines of code can complete the quick inference of the pipeline. Taking the PP-StructureV3 pipeline as an example:
			
 
				 
			
 
				 ```python
			
 
				 from paddlex import create_pipeline
			
@@ -669,7 +669,7 @@ from paddlex import create_pipeline
 
				 pipeline = create_pipeline(pipeline="PP-StructureV3")
			
 
				 
			
 
				 output = pipeline.predict(
			
 
				-    input="./layout_parsing_v3_demo.png",
			
 
				+    input="./pp_structrue_v3_demo.png",
			
 
				     use_doc_orientation_classify=False,
			
 
				     use_doc_unwarping=False,
			
 
				     use_textline_orientation=False,
			
@@ -1300,7 +1300,7 @@ If you have obtained the configuration file, you can customize the PP-StructureV
 
				 from paddlex import create_pipeline
			
 
				 pipeline = create_pipeline(pipeline="./my_path/PP-StructureV3.yaml")
			
 
				 output = pipeline.predict(
			
 
				-    input="./layout_parsing_v3_demo.png",,
			
 
				+    input="./pp_structrue_v3_demo.png",,
			
 
				     use_doc_orientation_classify=False,
			
 
				     use_doc_unwarping=False,
			
 
				     use_textline_orientation=False,
			
@@ -1312,7 +1312,7 @@ for res in output:
 
				     res.save_to_markdown(save_path="output") ## Save the result of the current image in Markdown format
			
 
				 ```
			
 
				 
			
 
				-<b>Note:</b> The parameters in the configuration file are the pipeline initialization parameters. If you wish to change the initialization parameters of the General PP-StructureV3 pipeline, you can directly modify the parameters in the configuration file and load the configuration file for prediction. Additionally, CLI prediction also supports passing in a configuration file, simply specify the path of the configuration file with `--pipeline`.
			
 
				+<b>Note:</b> The parameters in the configuration file are the pipeline initialization parameters. If you wish to change the initialization parameters of the PP-StructureV3 pipeline, you can directly modify the parameters in the configuration file and load the configuration file for prediction. Additionally, CLI prediction also supports passing in a configuration file, simply specify the path of the configuration file with `--pipeline`.
			
 
				 
			
 
				 ## 3. Development Integration/Deployment
			
 
				 If the pipeline meets your requirements for inference speed and accuracy, you can proceed directly with development integration/deployment.
			
@@ -1707,12 +1707,12 @@ for i, res in enumerate(result["layoutParsingResults"]):
 
				 You can choose the appropriate deployment method based on your needs to integrate the model into your pipeline and proceed with subsequent AI application integration.
			
 
				 
			
 
				 ## 4. Custom Development
			
 
				-If the default model weights provided by the General PP-StructureV3 pipeline do not meet your requirements in terms of accuracy or speed, you can try to <b>fine-tune</b> the existing model using <b>your own domain-specific or application-specific data</b> to improve the recognition performance of the General PP-StructureV3 pipeline in your scenario.
			
 
				+If the default model weights provided by the PP-StructureV3 pipeline do not meet your requirements in terms of accuracy or speed, you can try to <b>fine-tune</b> the existing model using <b>your own domain-specific or application-specific data</b> to improve the recognition performance of the PP-StructureV3 pipeline in your scenario.
			
 
				 
			
 
				 ### 4.1 Model Fine-Tuning
			
 
				-Since the General PP-StructureV3 pipeline consists of 7 modules, the unsatisfactory performance of the pipeline may originate from any one of these modules.
			
 
				+Since the PP-StructureV3 pipeline consists of 7 modules, the unsatisfactory performance of the pipeline may originate from any one of these modules.
			
 
				 
			
 
				-Since the General PP-StructureV3 pipeline includes several modules, the unsatisfactory performance of the pipeline may originate from any one of these modules. You can analyze the cases with poor extraction results, identify which module is problematic through visualizing the images, and refer to the corresponding fine-tuning tutorial links in the table below to fine-tune the model.
			
 
				+Since the PP-StructureV3 pipeline includes several modules, the unsatisfactory performance of the pipeline may originate from any one of these modules. You can analyze the cases with poor extraction results, identify which module is problematic through visualizing the images, and refer to the corresponding fine-tuning tutorial links in the table below to fine-tune the model.
			
 
				 
			
 
				 <table>
			
 
				 <thead>
			
@@ -1819,7 +1819,7 @@ For example, if you use Ascend NPU for PP-StructureV3 pipeline inference, the CL
 
				 
			
 
				 ```bash
			
 
				 paddlex --pipeline PP-StructureV3 \
			
 
				-        --input layout_parsing_v3_demo.png  \
			
 
				+        --input pp_structrue_v3_demo.png  \
			
 
				         --use_doc_orientation_classify False \
			
 
				         --use_doc_unwarping False \
			
 
				         --use_textline_orientation False \
			
@@ -1829,4 +1829,4 @@ paddlex --pipeline PP-StructureV3 \
 
				 
			
 
				 Of course, you can also specify the hardware device when calling `create_pipeline()` or `predict()` in your Python script.
			
 
				 
			
 
				-If you want to use the Universal PP-StructureV3 pipeline on a wider range of hardware, please refer to the [PaddleX Multi-Device Usage Guide](../../../other_devices_support/multi_devices_use_guide.en.md).
			
 
				+If you want to use the PP-StructureV3 pipeline on a wider range of hardware, please refer to the [PaddleX Multi-Device Usage Guide](../../../other_devices_support/multi_devices_use_guide.en.md).
			
--- a/docs/pipeline_usage/tutorials/ocr_pipelines/PP-StructureV3.md
+++ b/docs/pipeline_usage/tutorials/ocr_pipelines/PP-StructureV3.md
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -331,7 +331,7 @@ nav:
 
				          - 通用表格识别产线: pipeline_usage/tutorials/ocr_pipelines/table_recognition.md
			
 
				          - 通用表格识别v2产线: pipeline_usage/tutorials/ocr_pipelines/table_recognition_v2.md
			
 
				          - 通用版面解析产线: pipeline_usage/tutorials/ocr_pipelines/layout_parsing_.md
			
 
				-         - 通用版面解析v2产线: pipeline_usage/tutorials/ocr_pipelines/layout_parsing_v2.md
			
 
				+         - 通用版面解析v3产线: pipeline_usage/tutorials/ocr_pipelines/PP-StructureV3.md
			
 
				          - 公式识别产线: pipeline_usage/tutorials/ocr_pipelines/formula_recognition.md
			
 
				          - 印章文本识别产线: pipeline_usage/tutorials/ocr_pipelines/seal_recognition.md
			
 
				          - 文档图像预处理产线: pipeline_usage/tutorials/ocr_pipelines/doc_preprocessor.md