瀏覽代碼

Fix sth (#3986)

* add module doc for docbee2 and chart2table

* add docs

* add misc x docs

* fix sth
Zhang Zelun 6 月之前
父節點
當前提交
7ca9abbd09

+ 5 - 0
docs/pipeline_usage/tutorials/vlm_pipelines/doc_understanding.en.md

@@ -27,6 +27,11 @@ The Document Understanding Pipeline is an advanced document processing technolog
 <td>PP-DocBee-7B</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-DocBee-7B_infer.tar">Inference Model</a></td>
 <td>15.8</td>
 </tr>
+<tr>
+<td>PP-DocBee2-3B</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-DocBee2-3B_infer.tar">Inference Model</a></td>
+<td>7.6</td>
+<td>PP-DocBee2 is a multimodal large model developed by the PaddlePaddle team, specifically designed for document understanding. Building upon PP-DocBee, the team has further optimized the foundational model and introduced a new data optimization scheme to enhance data quality. With just a relatively small dataset of 470,000 samples generated using the team's proprietary data synthesis strategy, PP-DocBee2 demonstrates superior performance in Chinese document understanding tasks. In terms of internal business metrics for Chinese language scenarios, PP-DocBee2 has achieved an approximately 11.4% improvement over PP-DocBee, and it also outperforms current popular open-source and closed-source models of a similar scale.</td>
+</tr>
 </table>
 
 ## 2. Quick Start

+ 5 - 0
docs/pipeline_usage/tutorials/vlm_pipelines/doc_understanding.md

@@ -30,6 +30,11 @@ comments: true
 <td>PP-DocBee-7B</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-DocBee-7B_infer.tar">推理模型</a></td>
 <td>15.8</td>
 </tr>
+<tr>
+<td>PP-DocBee2-3B</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-DocBee2-3B_infer.tar">推理模型</a></td>
+<td>7.6</td>
+<td>PP-DocBee2 是飞桨团队自研的一款专注于文档理解的多模态大模型,在PP-DocBee的基础上进一步优化了基础模型,并引入了新的数据优化方案,提高了数据质量,使用自研数据合成策略生成的少量的47万数据便使得PP-DocBee2在中文文档理解任务上表现更佳。在内部业务中文场景类的指标上,PP-DocBee2相较于PP-DocBee提升了约11.4%,同时也高于目前的同规模热门开源和闭源模型。</td>
+</tr>
 </table>
 
 ## 2. 快速开始

+ 1 - 1
paddlex/configs/pipelines/doc_understanding.yaml

@@ -4,6 +4,6 @@ pipeline_name: doc_understanding
 SubModules:
   DocUnderstanding:
     module_name: doc_vlm
-    model_name: PP-DocBee-2B
+    model_name: PP-DocBee2-3B
     model_dir: null
     batch_size: 1