7 ماه پیش · 6495b3f30f
--- a/README.md
+++ b/README.md
@@ -401,6 +401,16 @@ PaddleX的各个产线均支持本地**快速推理**，部分模型支持在[AI
 
				         <td>✅</td>
			
 
				         <td>🚧</td>
			
 
				     </tr>
			
 
				+    <tr>
			
 
				+        <td><a href="https://paddlepaddle.github.io/PaddleX/latest/pipeline_usage/tutorials/vlm_pipelines/doc_understanding.html">文档理解</a></td>
			
 
				+        <td>🚧</td>
			
 
				+        <td>✅</td>
			
 
				+        <td>🚧</td>
			
 
				+        <td>✅</td>
			
 
				+        <td>🚧</td>
			
 
				+        <td>🚧</td>
			
 
				+        <td>🚧</td>
			
 
				+    </tr>
			
 
				 
			
 
				 
			
 
				 </table>
			
@@ -697,6 +707,7 @@ for res in output:
 
				 | 多语种语音识别       | `multilingual_speech_recognition`                           | [多语种语音识别产线Python脚本使用说明](https://paddlepaddle.github.io/PaddleX/latest/pipeline_usage/tutorials/time_series_pipelines/multilingual_speech_recognition.html#212-python脚本方式集成)                 |
			
 
				 | 通用视频分类       | `video_classification`                           | [通用视频分类产线Python脚本使用说明](https://paddlepaddle.github.io/PaddleX/latest/pipeline_usage/tutorials/time_series_pipelines/video_classification.html#22-python脚本方式集成)                 |
			
 
				 | 通用视频检测       | `video_detection`                           | [通用视频检测产线Python脚本使用说明](https://paddlepaddle.github.io/PaddleX/latest/pipeline_usage/tutorials/time_series_pipelines/video_detection.html#212-python脚本方式集成)                 |
			
 
				+| 文档理解       | `doc_understanding`                           | [文档理解产线Python脚本使用说明](https://paddlepaddle.github.io/PaddleX/latest/pipeline_usage/tutorials/vlm_pipelines/doc_understanding.html#211-python脚本方式集成)                 |
			
 
				 
			
 
				 </details>
			
 
				 
			
@@ -775,6 +786,12 @@ for res in output:
 
				     * [📈 通用视频分类产线使用教程](https://paddlepaddle.github.io/PaddleX/latest/pipeline_usage/tutorials/video_pipelines/video_classification.html)
			
 
				     * [🔍 通用视频检测产线使用教程](https://paddlepaddle.github.io/PaddleX/latest/pipeline_usage/tutorials/video_pipelines/video_detection.html)
			
 
				 
			
 
				+* <details open>
			
 
				+    <summary> <b> 🌐 多模态视觉语言模型</b> </summary>
			
 
				+
			
 
				+   * [📝 文档理解产线使用教程](https://paddlepaddle.github.io/PaddleX/latest/pipeline_usage/tutorials/vlm_pipelines/doc_understanding.html)
			
 
				+  </details>
			
 
				+
			
 
				 * <details>
			
 
				     <summary> <b>🔧 相关说明文件</b> </summary>
			
 
				 
			
@@ -883,6 +900,12 @@ for res in output:
 
				   * [📈 视频分类模块使用教程](https://paddlepaddle.github.io/PaddleX/latest/module_usage/tutorials/video_modules/video_classification.html)
			
 
				   * [🔍 视频检测模块使用教程](https://paddlepaddle.github.io/PaddleX/latest/module_usage/tutorials/video_modules/video_detection.html)
			
 
				 
			
 
				+* <details open>
			
 
				+  <summary> <b> 🌐 多模态视觉语言模型 </b></summary>
			
 
				+
			
 
				+  * [📝 文档类视觉语言模型模块使用教程](https://paddlepaddle.github.io/PaddleX/latest/module_usage/tutorials/vlm_modules/doc_vlm.html)
			
 
				+  </details>
			
 
				+
			
 
				 * <details>
			
 
				   <summary> <b> 📄 相关说明文件 </b></summary>
			
 
				 
			
--- a/README_en.md
+++ b/README_en.md
@@ -399,6 +399,16 @@ In addition, PaddleX provides developers with a full-process efficient model tra
 
				         <td>✅</td>
			
 
				         <td>🚧</td>
			
 
				     </tr>
			
 
				+    <tr>
			
 
				+        <td><a href="https://paddlepaddle.github.io/PaddleX/latest/pipeline_usage/tutorials/vlm_pipelines/doc_understanding.html">Document Understanding</a></td>
			
 
				+        <td>🚧</td>
			
 
				+        <td>✅</td>
			
 
				+        <td>🚧</td>
			
 
				+        <td>✅</td>
			
 
				+        <td>🚧</td>
			
 
				+        <td>🚧</td>
			
 
				+        <td>🚧</td>
			
 
				+    </tr>
			
 
				 
			
 
				 </table>
			
 
				 
			
@@ -696,6 +706,7 @@ To use the Python script for other pipelines, simply adjust the `pipeline` param
 
				 | Multilingual Speech Recognition | `multilingual_speech_recognition` | [Instructions for Using the Multilingual Speech Recognition Pipeline Python Script](https://paddlepaddle.github.io/PaddleX/latest/en/pipeline_usage/tutorials/time_series_pipelines/multilingual_speech_recognition.html#212-python-script-integration) |
			
 
				 | General Video Classification | `video_classification` | [Instructions for Using the General Video Classification Pipeline Python Script](https://paddlepaddle.github.io/PaddleX/latest/en/pipeline_usage/tutorials/time_series_pipelines/video_classification.html#22-python-script-integration) |
			
 
				 | General Video Detection | `video_detection` | [Instructions for Using the General Video Detection Pipeline Python Script](https://paddlepaddle.github.io/PaddleX/latest/en/pipeline_usage/tutorials/time_series_pipelines/video_detection.html#212-python-script-integration) |
			
 
				+| Document Understanding       | `doc_understanding`                           | [Instructions for Using the Document Understanding Pipeline Python Script](https://paddlepaddle.github.io/PaddleX/latest/en/pipeline_usage/tutorials/vlm_pipelines/doc_understanding.html#211-python-script-integration)   |
			
 
				 </details>
			
 
				 
			
 
				 ## 📖 Documentation
			
@@ -772,6 +783,12 @@ To use the Python script for other pipelines, simply adjust the `pipeline` param
 
				     * [🔍 General Video Detection Pipeline Usage Tutorial](https://paddlepaddle.github.io/PaddleX/latest/en/pipeline_usage/tutorials/video_pipelines/video_detection.html)
			
 
				 
			
 
				 * <details open>
			
 
				+    <summary> <b> 🌐 Multimodal Vision-Language Model</b> </summary>
			
 
				+
			
 
				+   * [📝 Doc Understanding Pipeline Usage Tutorial](https://paddlepaddle.github.io/PaddleX/latest/en/pipeline_usage/tutorials/vlm_pipelines/doc_understanding.html)
			
 
				+  </details>
			
 
				+
			
 
				+* <details open>
			
 
				     <summary> <b>🔧 Related Instructions</b> </summary>
			
 
				 
			
 
				    * [🖥️ PaddleX pipeline Command Line Instruction](https://paddlepaddle.github.io/PaddleX/latest/en/pipeline_usage/instructions/pipeline_CLI_usage.html)
			
@@ -884,6 +901,12 @@ To use the Python script for other pipelines, simply adjust the `pipeline` param
 
				   * [🔍 Video Detection Module Usage Tutorial](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/video_modules/video_detection.html)
			
 
				 
			
 
				 * <details open>
			
 
				+  <summary> <b> 🌐 Multimodal Vision-Language Model </b></summary>
			
 
				+
			
 
				+  * [📝 Document Vision-Language Model Usage Tutorial](https://paddlepaddle.github.io/PaddleX/latest/module_usage/tutorials/vlm_modules/doc_vlm.html)
			
 
				+  </details>
			
 
				+
			
 
				+* <details open>
			
 
				   <summary> <b> 📄 Related Instructions </b></summary>
			
 
				 
			
 
				   * [📝 PaddleX Single Model Python Script Instruction](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/instructions/model_python_API.html)
			
--- a/docs/index.en.md
+++ b/docs/index.en.md
@@ -1180,7 +1180,7 @@ The following steps were executed:
 
				 * Process the prediction results
			
 
				 
			
 
				 
			
 
				-!!! example "OCR-related Python"
			
 
				+!!! example "OCR-related Python Usage"
			
 
				 
			
 
				     === "OCR"
			
 
				 
			
@@ -1342,7 +1342,7 @@ The following steps were executed:
 
				             res.save_to_json(save_path="./output/")
			
 
				         ```
			
 
				 
			
 
				-!!! example "Computer Vision Pipeline Command-Line Usage"
			
 
				+!!! example "Computer Vision Pipeline Python Usage"
			
 
				 
			
 
				     === "General Image Classification"
			
 
				 
			
@@ -1526,7 +1526,7 @@ The following steps were executed:
 
				             res.save_to_json(save_path="./output/")
			
 
				         ```
			
 
				 
			
 
				-!!! example "Command Line Usage for Time Series pipelines"
			
 
				+!!! example "Time Series pipelines Python Usage"
			
 
				 
			
 
				     === "Time Series Forecasting"
			
 
				 
			
@@ -1568,7 +1568,7 @@ The following steps were executed:
 
				             res.save_to_json(save_path="./output/") ## Save results in JSON format
			
 
				         ```
			
 
				 
			
 
				-!!! example "Command Line Usage for Speech pipelines"
			
 
				+!!! example "Speech pipelines Python Usage"
			
 
				 
			
 
				     === "Multilingual Speech Recognition"
			
 
				 
			
@@ -1583,7 +1583,7 @@ The following steps were executed:
 
				             res.save_to_json(save_path="./output/")
			
 
				         ```
			
 
				 
			
 
				-!!! example "Command Line Usage for Video pipelines"
			
 
				+!!! example "Video pipelines Python Usage"
			
 
				 
			
 
				     === "General Video Classification"
			
 
				 
			
@@ -1612,6 +1612,25 @@ The following steps were executed:
 
				             res.save_to_json(save_path="./output/") ## Save the structured prediction output
			
 
				         ```
			
 
				 
			
 
				+!!! example "Multimodal Vision-Language Model pipelines Python Usage"
			
 
				+
			
 
				+    === "doc_understanding"
			
 
				+
			
 
				+        ```python
			
 
				+        from paddlex import create_pipeline
			
 
				+        pipeline = create_pipeline(pipeline="doc_understanding")
			
 
				+        output = pipeline.predict(
			
 
				+            {
			
 
				+                "image": "medal_table.png",
			
 
				+                "query": "识别这份表格的内容"
			
 
				+            }
			
 
				+        )
			
 
				+        for res in output:
			
 
				+            res.print() ## Print the structured prediction output
			
 
				+            res.save_to_json("./output/") ## Save the structured prediction output
			
 
				+        ```
			
 
				+
			
 
				+
			
 
				 ## 🚀 Detailed Tutorials
			
 
				 
			
 
				 <div class="grid cards" markdown>
			
--- a/docs/index.md
+++ b/docs/index.md
@@ -1136,7 +1136,7 @@ for res in output:
 
				             res.save_to_json("./output/")
			
 
				         ```
			
 
				 
			
 
				-!!! example "计算机视觉相关产线命令行使用"
			
 
				+!!! example "计算机视觉相关产线Python脚本使用"
			
 
				 
			
 
				     === "通用图像分类"
			
 
				 
			
@@ -1320,7 +1320,7 @@ for res in output:
 
				             res.save_to_json(save_path="./output/")
			
 
				         ```
			
 
				 
			
 
				-!!! example "时序分析相关产线命令行使用"
			
 
				+!!! example "时序分析相关产线Python脚本使用"
			
 
				 
			
 
				     === "时序预测"
			
 
				 
			
@@ -1362,7 +1362,7 @@ for res in output:
 
				             res.save_to_json(save_path="./output/") ## 保存json格式结果
			
 
				         ```
			
 
				 
			
 
				-!!! example "语音相关产线命令行使用"
			
 
				+!!! example "语音相关产线Python脚本使用"
			
 
				 
			
 
				     === "多语种语音识别"
			
 
				 
			
@@ -1377,7 +1377,7 @@ for res in output:
 
				             res.save_to_json(save_path="./output/")
			
 
				         ```
			
 
				 
			
 
				-!!! example "视频相关产线命令行使用"
			
 
				+!!! example "视频相关产线Python脚本使用"
			
 
				 
			
 
				     === "通用视频分类"
			
 
				 
			
@@ -1406,6 +1406,24 @@ for res in output:
 
				             res.save_to_json(save_path="./output/") ## 保存预测的结构化输出
			
 
				         ```
			
 
				 
			
 
				+!!! example "多模态视觉语言模型相关产线Python脚本使用"
			
 
				+
			
 
				+    === "文档理解"
			
 
				+
			
 
				+        ```python
			
 
				+        from paddlex import create_pipeline
			
 
				+        pipeline = create_pipeline(pipeline="doc_understanding")
			
 
				+        output = pipeline.predict(
			
 
				+            {
			
 
				+                "image": "medal_table.png",
			
 
				+                "query": "识别这份表格的内容"
			
 
				+            }
			
 
				+        )
			
 
				+        for res in output:
			
 
				+            res.print() ## 打印预测的结构化输出
			
 
				+            res.save_to_json("./output/") ## 保存预测的结构化输出
			
 
				+        ```
			
 
				+
			
 
				 ## 🚀 详细教程
			
 
				 
			
 
				 <div class="grid cards" markdown>
			
--- a/docs/support_list/models_list.en.md
+++ b/docs/support_list/models_list.en.md
@@ -1371,6 +1371,15 @@ PaddleX includes multiple pipelines, each containing several modules, and each m
 
				 <td>658.3</td>
			
 
				 <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/GroundingDINO-T_infer.tar">Inference Model</a></td>
			
 
				 </tr>
			
 
				+<tr>
			
 
				+<td>YOLO-Worldv2-L</td>
			
 
				+<td>44.4</td>
			
 
				+<td>59.8</td>
			
 
				+<td>24.32</td>
			
 
				+<td>374.89</td>
			
 
				+<td>421.4</td>
			
 
				+<td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/YOLO-Worldv2-L_infer.tar">Inference Model</a></td>
			
 
				+</tr>
			
 
				 </table>
			
 
				 <b>Note: The above accuracy metrics are based on the COCO val2017 validation set mAP(0.5:0.95).</b>
			
 
				 
			
@@ -2899,6 +2908,27 @@ PaddleX includes multiple pipelines, each containing several modules, and each m
 
				 </table>
			
 
				 <p><b>Note: The above accuracy metrics are based on the test dataset <a href="http://www.thumos.info/download.html">UCF101-24</a>, using the Frame-mAP (@ IoU 0.5) metric.</b></p>
			
 
				 
			
 
				+## [Document Vision-Language Model Module](../module_usage/tutorials/vlm_modules/doc_vlm.en.md)
			
 
				+
			
 
				+<table>
			
 
				+<tr>
			
 
				+<th>Model</th>
			
 
				+<th>Model Storage Size（GB）</th>
			
 
				+<th>Model Download Lin</th>
			
 
				+</tr>
			
 
				+<tr>
			
 
				+<td>PP-DocBee-2B</td>
			
 
				+<td>4.2</td>
			
 
				+<td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-DocBee-2B_infer.tar">Inference Model</a></td>
			
 
				+</tr>
			
 
				+<tr>
			
 
				+<td>PP-DocBee-7B</td>
			
 
				+<td>15.8</td>
			
 
				+<td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-DocBee-7B_infer.tar">Inference Model</a></td>
			
 
				+</tr>
			
 
				+</table>
			
 
				+
			
 
				+
			
 
				 **Test Environment Description:**
			
 
				 
			
 
				 - **Performance Test Environment**
			
--- a/docs/support_list/models_list.md
+++ b/docs/support_list/models_list.md
@@ -1256,6 +1256,15 @@ PaddleX 内置了多条产线，每条产线都包含了若干模块，每个模
 
				 <td>658.3</td>
			
 
				 <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/GroundingDINO-T_infer.tar">推理模型</a></td>
			
 
				 </tr>
			
 
				+<tr>
			
 
				+<td>YOLO-Worldv2-L</td>
			
 
				+<td>44.4</td>
			
 
				+<td>59.8</td>
			
 
				+<td>24.32</td>
			
 
				+<td>374.89</td>
			
 
				+<td>421.4</td>
			
 
				+<td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/YOLO-Worldv2-L_infer.tar">推理模型</a></td>
			
 
				+</tr>
			
 
				 </table>
			
 
				 <b>注：以上精度指标为 COCO val2017 验证集 mAP(0.5:0.95)。</b>。
			
 
				 
			
@@ -2776,6 +2785,26 @@ devanagari_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模
 
				 </table>
			
 
				 <p><b>注：以上精度指标为 <a href="http://www.thumos.info/download.html">UCF101-24</a> test数据集上的测试指标Frame-mAP (@ IoU 0.5)</b></p>
			
 
				 
			
 
				+## [文档类视觉语言模型模块](../module_usage/tutorials/vlm_modules/doc_vlm.md)
			
 
				+
			
 
				+<table>
			
 
				+<tr>
			
 
				+<th>模型</th>
			
 
				+<th>模型存储大小（GB）</th>
			
 
				+<th>模型下载链接</th>
			
 
				+</tr>
			
 
				+<tr>
			
 
				+<td>PP-DocBee-2B</td>
			
 
				+<td>4.2</td>
			
 
				+<td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-DocBee-2B_infer.tar">推理模型</a></td>
			
 
				+</tr>
			
 
				+<tr>
			
 
				+<td>PP-DocBee-7B</td>
			
 
				+<td>15.8</td>
			
 
				+<td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-DocBee-7B_infer.tar">推理模型</a></td>
			
 
				+</tr>
			
 
				+</table>
			
 
				+
			
 
				 <strong>测试环境说明:</strong>
			
 
				 
			
 
				   <ul>
			
--- a/docs/support_list/pipelines_list.en.md
+++ b/docs/support_list/pipelines_list.en.md
@@ -526,6 +526,19 @@ comments: true
 
				     </ul>
			
 
				     </td>
			
 
				 </tr>
			
 
				+<tr>
			
 
				+    <td>Document Understanding</td>
			
 
				+    <td>Document-related Visual Language Model</td>
			
 
				+    <td>Not Available</td>
			
 
				+    <td>The document understanding product line is an advanced document processing technology based on Visual-Language Models (VLM), aiming to overcome the limitations of traditional document processing. Traditional methods rely on fixed templates or predefined rules to parse documents. In contrast, this product line leverages the multimodal capabilities of VLM to accurately answer user queries by integrating visual and linguistic information, with only the document image and user question as input. This technology does not require pre-training for specific document formats, allowing it to flexibly handle diverse document content, significantly enhancing the generalization and practicality of document processing. It has broad application prospects in scenarios such as intelligent Q&A and information extraction.</td>
			
 
				+    <td>
			
 
				+    <ul>
			
 
				+        <li>Intelligent Q&A</li>
			
 
				+        <li>Information Extraction</li>
			
 
				+        <li>Contract Review and Risk Management</li>
			
 
				+    </ul>
			
 
				+    </td>
			
 
				+</tr>
			
 
				 </table>
			
 
				 
			
 
				 ## 2. Featured Pipelines
			
--- a/docs/support_list/pipelines_list.md
+++ b/docs/support_list/pipelines_list.md
@@ -527,6 +527,19 @@ comments: true
 
				     </ul>
			
 
				     </td>
			
 
				 </tr>
			
 
				+<tr>
			
 
				+    <td>文档理解</td>
			
 
				+    <td>文档类视觉语言模型</td>
			
 
				+    <td>暂无</td>
			
 
				+    <td>文档理解产线是基于视觉-语言模型（VLM）打造的先进文档处理技术，旨在突破传统文档处理的局限。传统方法依赖固定模板或预定义规则解析文档，而该产线借助VLM的多模态能力，仅需输入文档图片和用户问题，即可通过融合视觉与语言信息，精准回答用户提问。这种技术无需针对特定文档格式预训练，能够灵活应对多样化文档内容，显著提升文档处理的泛化性与实用性，在智能问答、信息提取等场景中具有广阔应用前景。</td>
			
 
				+    <td>
			
 
				+    <ul>
			
 
				+        <li>智能问答</li>
			
 
				+        <li>信息提取</li>
			
 
				+        <li>合同审查与风险管理</li>
			
 
				+    </ul>
			
 
				+    </td>
			
 
				+</tr>
			
 
				 </table>
			
 
				 
			
 
				 
			
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -168,6 +168,7 @@ plugins:
 
				             实例分割模块: Instance Segmentation
			
 
				             人脸特征模块: Face Feature
			
 
				             图像异常检测模块: Image Anomaly Detection
			
 
				+            文档类视觉语言模型模块: Document Vision-Language Model
			
 
				             时序分析: Time Series Analysis
			
 
				             时序预测模块: Time Series Forecasting
			
 
				             时序异常检测模块: Time Series Anomaly Detection
			
@@ -361,6 +362,8 @@ nav:
 
				        - 视频分析:
			
 
				          - 通用视频分类: pipeline_usage/tutorials/video_pipelines/video_classification.md
			
 
				          - 通用视频检测: pipeline_usage/tutorials/video_pipelines/video_detection.md
			
 
				+       - 多模态视觉语言模型:
			
 
				+         - 文档理解产线: pipeline_usage/tutorials/vlm_pipelines/doc_understanding.md
			
 
				        - 说明文件: 
			
 
				          - PaddleX产线命令行使用说明: pipeline_usage/instructions/pipeline_CLI_usage.md
			
 
				          - PaddleX产线Python脚本使用说明: pipeline_usage/instructions/pipeline_python_API.md
			
@@ -407,6 +410,8 @@ nav:
 
				          - 多语种语音识别模块: module_usage/tutorials/speech_modules/multilingual_speech_recognition.md
			
 
				        - 3D检测: 
			
 
				          - BEV融合3D检测模块: module_usage/tutorials/cv_modules/3d_bev_detection.md
			
 
				+       - 多模态视觉语言模型:
			
 
				+         - 文档类视觉语言模型模块: module_usage/tutorials/vlm_modules/doc_vlm.md
			
 
				        - 说明文件:
			
 
				          - PaddleX单模型Python脚本使用说明: module_usage/instructions/model_python_API.md
			
 
				          - PaddleX通用模型配置文件参数说明: module_usage/instructions/config_parameters_common.md