소스 검색

Add misc x docs (#3981)

* add module doc for docbee2 and chart2table

* add docs

* add misc x docs
Zhang Zelun 6 달 전
부모
커밋
c6a24da39a

+ 4 - 0
README.md

@@ -903,22 +903,26 @@ for res in output:
   <summary> <b> 📦 3D </b></summary>
 
   * [📦 3D多模态融合检测模块使用教程](https://paddlepaddle.github.io/PaddleX/latest/module_usage/tutorials/cv_modules/3d_bev_detection.html)
+  </details>
 
 * <details open>
   <summary> <b> 🎤 语音识别 </b></summary>
 
   * [🌐 多语种语音识别模块使用教程](https://paddlepaddle.github.io/PaddleX/latest/module_usage/tutorials/speech_modules/multilingual_speech_recognition.html)
+  </details>
 
 * <details open>
   <summary> <b> 🎥 视频识别 </b></summary>
 
   * [📈 视频分类模块使用教程](https://paddlepaddle.github.io/PaddleX/latest/module_usage/tutorials/video_modules/video_classification.html)
   * [🔍 视频检测模块使用教程](https://paddlepaddle.github.io/PaddleX/latest/module_usage/tutorials/video_modules/video_detection.html)
+  </details>
 
 * <details open>
   <summary> <b> 🌐 多模态视觉语言模型 </b></summary>
 
   * [📝 文档类视觉语言模型模块使用教程](https://paddlepaddle.github.io/PaddleX/latest/module_usage/tutorials/vlm_modules/doc_vlm.html)
+  * [📈 图表解析模块使用教程](https://paddlepaddle.github.io/PaddleX/latest/module_usage/tutorials/vlm_modules/chart_parsing.html)
   </details>
 
 * <details>

+ 5 - 1
README_en.md

@@ -902,22 +902,26 @@ To use the Python script for other pipelines, simply adjust the `pipeline` param
   <summary> <b> 📦 3D  </b></summary>
 
   * [📦 3D Multimodal Fusion Detection Module Usage Tutorial](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/cv_modules/3d_bev_detection.html)
+  </details>
 
 * <details open>
   <summary> <b> 🎤 Speech Recognition </b></summary>
 
   * [🌐 Multilingual Speech Recognition Module Usage Tutorial](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/speech_modules/multilingual_speech_recognition.html)
+  </details>
 
 * <details open>
   <summary> <b> 🎥 Video Recognition </b></summary>
 
   * [📈 Video Classification Module Usage Tutorial](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/video_modules/video_classification.html)
   * [🔍 Video Detection Module Usage Tutorial](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/video_modules/video_detection.html)
+  </details>
 
 * <details open>
   <summary> <b> 🌐 Multimodal Vision-Language Model </b></summary>
 
-  * [📝 Document Vision-Language Model Usage Tutorial](https://paddlepaddle.github.io/PaddleX/latest/module_usage/tutorials/vlm_modules/doc_vlm.html)
+  * [📝 Document Vision-Language Model Usage Tutorial](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/vlm_modules/doc_vlm.html)
+  * [📈 Chart Parsing Module Usage Tutorial](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/vlm_modules/chart_parsing.html)
   </details>
 
 * <details open>

+ 2 - 0
docs/module_usage/tutorials/vlm_modules/chart_parsing.en.md

@@ -14,11 +14,13 @@ Multimodal chart parsing is a cutting-edge technology in the OCR field, focusing
 <table>
 <tr>
 <th>Model</th><th>Model Download Link</th>
+<th>Model parameter size(B)</th>
 <th>Model Storage Size (GB)</th>
 <th>Description</th>
 </tr>
 <tr>
 <td>PP-Chart2Table</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-Chart2Table_infer.tar">Inference Model</a></td>
+<td>0.58</td>
 <td>1.4</td>
 <td>PP-Chart2Table is a self-developed multimodal model by the PaddlePaddle team, focusing on chart parsing, demonstrating outstanding performance in both Chinese and English chart parsing tasks. The team adopted a carefully designed data generation strategy, constructing a high-quality multimodal dataset of nearly 700,000 entries covering common chart types like pie charts, bar charts, stacked area charts, and various application scenarios. They also designed a two-stage training method, utilizing large model distillation to fully leverage massive unlabeled OOD data. In internal business tests in both Chinese and English scenarios, PP-Chart2Table not only achieved the SOTA level among models of the same parameter scale but also reached accuracy comparable to 7B parameter scale VLM models in critical scenarios.</td>
 </tr>

+ 2 - 0
docs/module_usage/tutorials/vlm_modules/chart_parsing.md

@@ -15,11 +15,13 @@ comments: true
 <table>
 <tr>
 <th>模型</th><th>模型下载链接</th>
+<th>模型参数规模(B)</th>
 <th>模型存储大小(GB)</th>
 <th>介绍</th>
 </tr>
 <tr>
 <td>PP-Chart2Table</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-Chart2Table_infer.tar">推理模型</a></td>
+<td>0.58</td>
 <td>1.4</td>
 <td>PP-Chart2Table是飞桨团队自研的一款专注于图表解析的多模态模型,在中英文图表解析任务中展现出卓越性能。团队采用精心设计的数据生成策略,构建了近70万条高质量的图表解析多模态数据集,全面覆盖饼图、柱状图、堆叠面积图等常见图表类型及各类应用场景。同时设计了二阶段训练方法,结合大模型蒸馏实现对海量无标注OOD数据的充分利用。在内部业务的中英文场景测试中,PP-Chart2Table不仅达到同参数量级模型中的SOTA水平,更在关键场景中实现了与7B参数量级VLM模型相当的精度。</td>
 </tr>

+ 15 - 0
docs/support_list/models_list.en.md

@@ -2913,19 +2913,34 @@ PaddleX includes multiple pipelines, each containing several modules, and each m
 <table>
 <tr>
 <th>Model</th>
+<th>Model Parameter Size(B)</th>
 <th>Model Storage Size(GB)</th>
 <th>Model Download Lin</th>
 </tr>
 <tr>
 <td>PP-DocBee-2B</td>
+<td>2</td>
 <td>4.2</td>
 <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-DocBee-2B_infer.tar">Inference Model</a></td>
 </tr>
 <tr>
 <td>PP-DocBee-7B</td>
+<td>7</td>
 <td>15.8</td>
 <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-DocBee-7B_infer.tar">Inference Model</a></td>
 </tr>
+<tr>
+<td>PP-DocBee2-3B</td>
+<td>3</td>
+<td>7.6</td>
+<td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-DocBee2-3B_infer.tar">Inference Model</a></td>
+</tr>
+<tr>
+<td>PP-Chart2Table</td>
+<td>0.58</td>
+<td>1.4</td>
+<td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-Chart2Table_infer.tar">Inference Model</a></td>
+</tr>
 </table>
 
 

+ 15 - 0
docs/support_list/models_list.md

@@ -2864,19 +2864,34 @@ devanagari_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模
 <table>
 <tr>
 <th>模型</th>
+<th>模型参数尺寸(B)</th>
 <th>模型存储大小(GB)</th>
 <th>模型下载链接</th>
 </tr>
 <tr>
 <td>PP-DocBee-2B</td>
+<td>2</td>
 <td>4.2</td>
 <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-DocBee-2B_infer.tar">推理模型</a></td>
 </tr>
 <tr>
 <td>PP-DocBee-7B</td>
+<td>7</td>
 <td>15.8</td>
 <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-DocBee-7B_infer.tar">推理模型</a></td>
 </tr>
+<tr>
+<td>PP-DocBee2-3B</td>
+<td>3</td>
+<td>7.6</td>
+<td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-DocBee2-3B_infer.tar">推理模型</a></td>
+</tr>
+<tr>
+<td>PP-Chart2Table</td>
+<td>0.58</td>
+<td>1.4</td>
+<td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-Chart2Table_infer.tar">推理模型</a></td>
+</tr>
 </table>
 
 <strong>测试环境说明:</strong>

+ 1 - 0
mkdocs.yml

@@ -413,6 +413,7 @@ nav:
          - BEV融合3D检测模块: module_usage/tutorials/cv_modules/3d_bev_detection.md
        - 多模态视觉语言模型:
          - 文档类视觉语言模型模块: module_usage/tutorials/vlm_modules/doc_vlm.md
+         - 图表解析模块: module_usage/tutorials/vlm_modules/chart_parsing.md
        - 说明文件:
          - PaddleX单模型Python脚本使用说明: module_usage/instructions/model_python_API.md
          - PaddleX通用模型配置文件参数说明: module_usage/instructions/config_parameters_common.md