Pārlūkot izejas kodu

update README (#4054)

* update README

* update
cuicheng01 6 mēneši atpakaļ
vecāks
revīzija
6ac249d14f
4 mainītis faili ar 94 papildinājumiem un 67 dzēšanām
  1. 24 23
      README.md
  2. 24 26
      README_en.md
  3. 24 11
      docs/CHANGELOG.en.md
  4. 22 7
      docs/CHANGELOG.md

+ 24 - 23
README.md

@@ -35,29 +35,30 @@ PaddleX 3.0 是基于飞桨框架构建的低代码开发工具,它集成了
 
 ## 📣 近期更新
 
-🔥🔥 **2025.5.20,发布 PaddleX v3.0.0**,核心升级如下:
-- **重要能力发布:**
-  - **重磅发布文字识别模型 PP-OCRv5**:全场景 OCR 识别精度跃升13%,单模型同时支持 5 种文字类型(简体中文、繁体中文、中文拼音、英文和日文),在中英文手写字体、竖直文本、生僻字等提升非常明显。可在 [在线Demo](https://aistudio.baidu.com/community/app/91660/webUI?source=appCenter) 中立即体验。
-  - **重磅发布文档解析方案 PP-StructureV3**:强化了版面区域检测、表格识别、中英文公式识别、多栏阅读顺序的恢复能力,增加了图表理解能力,在 OmniDocBench 榜单上,PP-StructureV3 的整体中文和英文的编辑距离均达到 SOTA 水平。可在 [在线Demo](https://aistudio.baidu.com/community/app/518494/webUI?source=appCenter) 中立即体验。
-  - **优化PP-ChatOCRv4**:原生支持文心大模型4.5T,结合新增的PP-DocBee2,关键信息抽取精度相比上一代提升15.7个百分点。可在 [在线Demo](https://aistudio.baidu.com/community/app/518493/webUI?source=appCenter) 中立即体验。
-- **推理能力优化:**
-  - 通用OCR、通用版面解析v3、公式识别、印章文本识别、文档图像预处理产线支持设置batch size>1,一次处理多个页面。
-  - 通用OCR、通用版面解析v3等17条产线支持多卡并行推理;新增产线多进程并行推理示例代码。
-
-🔥🔥 **2025.4.22,发布 PaddleX v3.0.0rc1 。** 本次版本全面适配 PaddlePaddle 3.0正式版,核心升级如下:
-
-- **全面适配飞桨框架3.0新特性**:支持编译器训练,训练命令通过追加 `-o Global.dy2st=True` 即可开启编译器训练,在 GPU 上,多数模型训练速度可提升 10% 以上,少部分模型训练速度可以提升 30% 以上。推理方面,模型整体适配飞桨 3.0 中间表示技术(PIR),拥有更加灵活的扩展能力和兼容性,静态图模型存储文件名由 `xxx.pdmodel` 改为 `xxx.json`。
-- **新增飞桨自研文档图像理解多模态大模型 PP-DocBee**:在学术界及内部业务场景文档理解评测榜单上,PP-DocBee 均达到同参数量级别模型的 SOTA 水平。可应用到财报、研报、合同、说明书、法律法规等文档 QA 场景。
-- **全面支持 ONNX 格式模型,支持通过Paddle2ONNX插件转换模型格式。**
-- **升级高性能推理:**
-    - **新增对 ONNX、OM 格式模型的支持:** PaddleX 可以根据需要智能选择模型格式;
-    - **扩展支持的产线和模块:** 所有静态图推理的单功能模块与产线均可使用高性能推理插件来提升推理性能;
-    - **支持 CLI、API、配置文件 3 种配置方式:** 支持更精细的配置,用户可以在子产线、子模块粒度启用和禁用高性能推理插件。
-
-- **多硬件支持扩展:**
-  - **NPU:昇腾全面验证的模型数量提升到 200 个。此外,通用 OCR、图像分类、目标检测等常用产线支持 OM 模型格式推理,推理速度能够提升 113.8%-226.4%,支持在 Atlas 200、Atlas 300 系列产品上推理部署。**
-  - **GCU:燧原正式纳入飞桨例行发版体系,完成了 PaddleX 生态适配。支持 90 个模型的训练和推理。**
-
+🔥🔥 **2025.5.20,发布 PaddleX v3.0.0**,相比PaddleX v2.x,核心升级如下:
+
+**丰富的模型库:**
+- **模型丰富:** PaddleX3.0 包含270+模型,涵盖了图像(视频)分类/检测/分割、OCR、语音识别、时序等多种场景。
+- **方案成熟:** PaddleX3.0 基于丰富的模型库,**提供了通用文档解析、关键信息抽取、文档理解、表格识别、通用图像识别等多种重要且成熟的AI解决方案。**
+
+**统一推理接口,重构部署能力:**
+- **推理接口标准化**,降低不同种类模型带来的API接口差异,减少用户学习成本,提升企业落地效率。
+- **提供多模型组合能力**,复杂任务可以通过不同的模型方便地进行组合使用,实现1+1>2 的能力。
+- **部署能力升级,多种模型部署可以使用统一的命令管理,支持多卡推理,支持多卡多实例服务化部署。**
+
+**全面适配飞桨框架3.0:**
+- **全面适配飞桨框架3.0新特性:** 支持编译器训练,训练命令通过追加 `-o Global.dy2st=True` 即可开启编译器训练,在 GPU 上,多数模型训练速度可提升 10% 以上,少部分模型训练速度可以提升 30% 以上。推理方面,模型整体适配飞桨 3.0 中间表示技术(PIR),拥有更加灵活的扩展能力和兼容性,静态图模型存储文件名由 `xxx.pdmodel` 改为 `xxx.json`。
+- **全面支持 ONNX 格式模型:** 支持通过Paddle2ONNX插件转换模型格式。
+
+**重磅能力支撑:**
+- **支撑PP-OCRv5的串联逻辑和多硬件推理、多后端推理、服务化部署能力。**
+- **支撑PP-StructureV3的复杂模型串联和并联的逻辑,首次串联并联共15个模型,实现多模型协同的复杂pipeline。精度在 OmniDocBench 榜单上达到 SOTA 水平。**
+- **支撑PP-ChatOCRv4的大模型串联逻辑,结合文心大模型4.5Turbo,结合新增的PP-DocBee2,关键信息抽取精度相比上一代提升15.7个百分点。**
+
+**多硬件支持:**
+- **整体支持英伟达、英特尔、苹果M系列、昆仑芯、昇腾、寒武纪、海光、燧原等芯片的训练和推理。**
+- **在昇腾上,全面适配的模型达到200个,** 支持OM高性能推理的模型达到21个。此外支持PP-OCRv5、PP-StructureV3等重要模型方案。
+- 在昆仑芯上支持重要分类、检测、OCR类模型(含PP-OCRv5)。
 
  ## 🔠 模型产线说明
 

+ 24 - 26
README_en.md

@@ -42,32 +42,30 @@ PaddleX 3.0 is a low-code development tool for AI models built on the PaddlePadd
 
 Core upgrades are as follows:
 
-- **Major Capability Releases:**
-  - **Launch of the groundbreaking text recognition model PP-OCRv5**: Achieves a 13% improvement in OCR accuracy across all scenarios. A single model now supports 5 types of text (Simplified Chinese, Traditional Chinese, Chinese Pinyin, English, and Japanese), with significant enhancements in recognizing handwritten fonts, vertical text, and rare characters in both Chinese and English. You can experience it immediately in the [online demo](https://aistudio.baidu.com/community/app/91660/webUI?source=appCenter).
-  
-  - **Launch of the groundbreaking document parsing solution PP-StructureV3**: Enhanced capabilities in layout area detection, table recognition, Chinese and English formula recognition, and restoration of multi-column reading order, with added abilities for chart understanding. PP-StructureV3 achieves state-of-the-art (SOTA) levels in both Chinese and English editing distances on the OmniDocBench leaderboard. Experience it in the [online demo](https://aistudio.baidu.com/community/app/518494/webUI?source=appCenter).
-  
-  - **Optimization of PP-ChatOCRv4**: Supports the Ernie 4.5T. Combined with PP-DocBee2, it shows a 15.7 percentage point improvement in key information extraction accuracy compared to the previous generation. Experience it in the [online demo](https://aistudio.baidu.com/community/app/518493/webUI?source=appCenter).
-
-- **Inference Capability Optimization:**
-  - The general OCR, PP-StructureV3, formula recognition, seal text recognition, and document image preprocessing pipelines support setting batch size >1, allowing multiple pages to be processed at once.
-  
-  - 17 pipelines, including general OCR and PP-StructureV3, now support multi-GPU parallel inference. Sample code for multi-process parallel inference has been added.
-
-
-🔥 **2025.4.22, PaddleX v3.0.0rc1 major upgrade.** This version fully adapts to PaddlePaddle 3.0.0, with the following core upgrades:
-
-- **Adapts to New Features of PaddlePaddle 3.0**: Supports compiler training, which can be enabled by appending `-o Global.dy2st=True` to the training command. On GPUs, the training speed of most models can be improved by over 10%, and for a few models, the improvement can exceed 30%. For inference, the models are fully adapted to PaddlePaddle 3.0's Intermediate Representation (PIR) technology, offering more flexible extensibility and compatibility. The file names for inference model have been changed from `xxx.pdmodel` to `xxx.json`.
-- **Newly Added Self-developed MLLM for Document Image Understanding, PP-DocBee**: PP-DocBee has achieved SOTA performance among models with similar parameter sizes on academic and internal business scenario document understanding evaluation benchmarks. It can be applied to document QA scenarios such as financial reports, research reports, contracts, manuals, and legal regulations.
-- **Full Support for ONNX Format Models, with Support for Model Format Conversion via the Paddle2ONNX Plugin.**
-- **Enhanced High-Performance Inference**:
-    - **Added Support for ONNX and OM Format Models**: PaddleX can intelligently select the model format based on needs;
-    - **Expanded Supported Pipelines and Modules**: All single modules and pipelines for inference model can use the high-performance inference plugin to improve inference performance;
-    - **Support for 3 Configuration Methods: CLI, API, and Configuration Files**: Enables more granular configuration, allowing users to enable and disable the high-performance inference plugin at the sub-pipeline and sub-module level.
-
-- **Expanded Multi-Hardware Support**:
-  - **NPU: The number of models fully validated on Ascend NPU has increased to 200. Additionally, common pipelines such as general OCR, image classification, and object detection support OM model format inference, with inference speed improvements ranging from 113.8% to 226.4%. Inference deployment is supported on Atlas 200 and Atlas 300 series products.**
-  - **GCU: Enflame has been officially integrated into the PaddlePaddle regular release system, completing the adaptation of the PaddleX ecosystem. Supports the training and inference of 90 models.**
+- **Rich Model Library:**  
+  - **Extensive Model Coverage:** PaddleX 3.0 includes **270+ models**, covering diverse scenarios such as image/video classification/detection/segmentation, OCR, speech recognition, time series analysis, and more.  
+  - **Mature Solutions:** Built on this robust model library, PaddleX 3.0 offers **critical and production-ready AI solutions**, including general document parsing, key information extraction, document understanding, table recognition, and general image recognition.  
+
+- **Unified Inference API & Enhanced Deployment Capabilities:**  
+  - **Standardized Inference Interface:** Reduces API fragmentation across model types, lowering the learning curve for users and accelerating enterprise adoption.  
+  - **Multi-Model Composition:** Complex tasks can be efficiently tackled by combining different models, achieving synergistic performance (1+1>2).  
+  - **Upgraded Deployment:** Unified commands now manage deployments for diverse models, supporting **multi-GPU inference** and **multi-instance serving deployments**.  
+
+- **Full Compatibility with PaddlePaddle Framework 3.0:**  
+  - **Leveraging New Paddle 3.0 Features:**  
+    - Compiler-accelerated training: Enable by appending `-o Global.dy2st=True` to training commands. **Most GPU-based models see >10% speed gains, with some exceeding 30%.**  
+    - Inference upgrades: Full adaptation to Paddle 3.0’s Program Intermediate Representation (PIR) enhances flexibility and compatibility. Static graph models now use `xxx.json` instead of `xxx.pdmodel`.  
+  - **ONNX Model Support:** Seamless format conversion via the Paddle2ONNX plugin.  
+
+- **Flagship Capabilities:**  
+  - **PP-OCRv5:** Powers **multi-hardware inference, multi-backend support, and serving deployments** for this industry-leading OCR system.  
+  - **PP-StructureV3:** Orchestrates **15+ models** in hybrid (serial/parallel) pipelines, achieving **SOTA accuracy on OmniDocBench**.  
+  - **PP-ChatOCRv4:** Integrates with **PP-DocBee2 and ERNIE 4.5Turbo**, boosting key information extraction accuracy by **15.7 percentage points** over the previous generation.  
+
+- **Multi-Hardware Support:**  
+  - **Broad Compatibility:** Training and inference supported on **NVIDIA, Intel, Apple M-series, Kunlunxin, Ascend, Cambricon, Hygon, Enflame**, and more.  
+  - **Ascend-Optimized:** **200+ fully adapted models**, including **21 OM-accelerated inference models**, plus key solutions like PP-OCRv5 and PP-StructureV3.  
+  - **Kunlunxin-Optimized:** Critical classification, detection, and OCR models (including PP-OCRv5) are fully supported.  
 
 
 ## 🔠 Explanation of Pipeline

+ 24 - 11
docs/CHANGELOG.en.md

@@ -10,17 +10,30 @@ comments: true
 
 Core upgrades are as follows:
 
-- **Major Capability Releases:**
-  - **Launch of the groundbreaking text recognition model PP-OCRv5**: Achieves a 13% improvement in OCR accuracy across all scenarios. A single model now supports 5 types of text (Simplified Chinese, Traditional Chinese, Chinese Pinyin, English, and Japanese), with significant enhancements in recognizing handwritten fonts, vertical text, and rare characters in both Chinese and English. You can experience it immediately in the [online demo](https://aistudio.baidu.com/community/app/91660/webUI?source=appCenter).
-  
-  - **Launch of the groundbreaking document parsing solution PP-StructureV3**: Enhanced capabilities in layout area detection, table recognition, Chinese and English formula recognition, and restoration of multi-column reading order, with added abilities for chart understanding. PP-StructureV3 achieves state-of-the-art (SOTA) levels in both Chinese and English editing distances on the OmniDocBench leaderboard. Experience it in the [online demo](https://aistudio.baidu.com/community/app/518494/webUI?source=appCenter).
-  
-  - **Optimization of PP-ChatOCRv4**: Supports the Ernie 4.5T. Combined with PP-DocBee2, it shows a 15.7 percentage point improvement in key information extraction accuracy compared to the previous generation. Experience it in the [online demo](https://aistudio.baidu.com/community/app/518493/webUI?source=appCenter).
-
-- **Inference Capability Optimization:**
-  - The general OCR, PP-StructureV3, formula recognition, seal text recognition, and document image preprocessing pipelines support setting batch size >1, allowing multiple pages to be processed at once.
-  
-  - 17 pipelines, including general OCR and PP-StructureV3, now support multi-GPU parallel inference. Sample code for multi-process parallel inference has been added.
+- **Rich Model Library:**  
+  - **Extensive Model Coverage:** PaddleX 3.0 includes **270+ models**, covering diverse scenarios such as image/video classification/detection/segmentation, OCR, speech recognition, time series analysis, and more.  
+  - **Mature Solutions:** Built on this robust model library, PaddleX 3.0 offers **critical and production-ready AI solutions**, including general document parsing, key information extraction, document understanding, table recognition, and general image recognition.  
+
+- **Unified Inference API & Enhanced Deployment Capabilities:**  
+  - **Standardized Inference Interface:** Reduces API fragmentation across model types, lowering the learning curve for users and accelerating enterprise adoption.  
+  - **Multi-Model Composition:** Complex tasks can be efficiently tackled by combining different models, achieving synergistic performance (1+1>2).  
+  - **Upgraded Deployment:** Unified commands now manage deployments for diverse models, supporting **multi-GPU inference** and **multi-instance serving deployments**.  
+
+- **Full Compatibility with PaddlePaddle Framework 3.0:**  
+  - **Leveraging New Paddle 3.0 Features:**  
+    - Compiler-accelerated training: Enable by appending `-o Global.dy2st=True` to training commands. **Most GPU-based models see >10% speed gains, with some exceeding 30%.**  
+    - Inference upgrades: Full adaptation to Paddle 3.0’s Program Intermediate Representation (PIR) enhances flexibility and compatibility. Static graph models now use `xxx.json` instead of `xxx.pdmodel`.  
+  - **ONNX Model Support:** Seamless format conversion via the Paddle2ONNX plugin.  
+
+- **Flagship Capabilities:**  
+  - **PP-OCRv5:** Powers **multi-hardware inference, multi-backend support, and serving deployments** for this industry-leading OCR system.  
+  - **PP-StructureV3:** Orchestrates **15+ models** in hybrid (serial/parallel) pipelines, achieving **SOTA accuracy on OmniDocBench**.  
+  - **PP-ChatOCRv4:** Integrates with **PP-DocBee2 and ERNIE 4.5Turbo**, boosting key information extraction accuracy by **15.7 percentage points** over the previous generation.  
+
+- **Multi-Hardware Support:**  
+  - **Broad Compatibility:** Training and inference supported on **NVIDIA, Intel, Apple M-series, Kunlunxin, Ascend, Cambricon, Hygon, Enflame**, and more.  
+  - **Ascend-Optimized:** **200+ fully adapted models**, including **21 OM-accelerated inference models**, plus key solutions like PP-OCRv5 and PP-StructureV3.  
+  - **Kunlunxin-Optimized:** Critical classification, detection, and OCR models (including PP-OCRv5) are fully supported.  
 
 ### PaddleX v3.0.0rc1(4.22/2025)
 

+ 22 - 7
docs/CHANGELOG.md

@@ -9,13 +9,28 @@ comments: true
 
 ### PaddleX v3.0.0(5.20/2025) 
 
-- **重要能力发布:**
-  - **重磅发布文字识别模型 PP-OCRv5**:全场景 OCR 识别精度跃升13%,单模型同时支持 5 种文字类型(简体中文、繁体中文、中文拼音、英文和日文),在中英文手写字体、竖直文本、生僻字等提升非常明显。
-  - **重磅发布文档解析方案 PP-StructureV3**:强化了版面区域检测、表格识别、中英文公式识别、多栏阅读顺序的恢复能力,增加了图表理解能力,在 OmniDocBench 榜单上,PP-StructureV3 的整体中文和英文的编辑距离均达到 SOTA 水平。
-  - **优化PP-ChatOCRv4**:原生支持文心大模型4.5T,结合新增的PP-DocBee2,关键信息抽取精度相比上一代提升15.7个百分点。
-- **推理能力优化:**
-  - 通用OCR、通用版面解析v3、公式识别、印章文本识别、文档图像预处理产线支持设置batch size>1,一次处理多个页面。
-  - 通用OCR、通用版面解析v3等17条产线支持多卡并行推理;新增产线多进程并行推理示例代码。
+**丰富的模型库:**
+- **模型丰富:** PaddleX3.0 包含270+模型,涵盖了图像(视频)分类/检测/分割、OCR、语音识别、时序等多种场景。
+- **方案成熟:** PaddleX3.0 基于丰富的模型库,**提供了通用文档解析、关键信息抽取、文档理解、表格识别、通用图像识别等多种重要且成熟的AI解决方案。**
+
+**统一推理接口,重构部署能力:**
+- **推理接口标准化**,降低不同种类模型带来的API接口差异,减少用户学习成本,提升企业落地效率。
+- **提供多模型组合能力**,复杂任务可以通过不同的模型方便地进行组合使用,实现1+1>2 的能力。
+- **部署能力升级,多种模型部署可以使用统一的命令管理,支持多卡推理,支持多卡多实例服务化部署。**
+
+**全面适配飞桨框架3.0:**
+- **全面适配飞桨框架3.0新特性:** 支持编译器训练,训练命令通过追加 `-o Global.dy2st=True` 即可开启编译器训练,在 GPU 上,多数模型训练速度可提升 10% 以上,少部分模型训练速度可以提升 30% 以上。推理方面,模型整体适配飞桨 3.0 中间表示技术(PIR),拥有更加灵活的扩展能力和兼容性,静态图模型存储文件名由 `xxx.pdmodel` 改为 `xxx.json`。
+- **全面支持 ONNX 格式模型:** 支持通过Paddle2ONNX插件转换模型格式。
+
+**重磅能力支撑:**
+- **支撑PP-OCRv5的串联逻辑和多硬件推理、多后端推理、服务化部署能力。**
+- **支撑PP-StructureV3的复杂模型串联和并联的逻辑,首次串联并联共15个模型,实现多模型协同的复杂pipeline。精度在 OmniDocBench 榜单上达到 SOTA 水平。**
+- **支撑PP-ChatOCRv4的大模型串联逻辑,结合文心大模型4.5Turbo,结合新增的PP-DocBee2,关键信息抽取精度相比上一代提升15.7个百分点。**
+
+**多硬件支持:**
+- **整体支持英伟达、英特尔、苹果M系列、昆仑芯、昇腾、寒武纪、海光、燧原等芯片的训练和推理。**
+- **在昇腾上,全面适配的模型达到200个,** 支持OM高性能推理的模型达到21个。此外支持PP-OCRv5、PP-StructureV3等重要模型方案。
+- 在昆仑芯上支持重要分类、检测、OCR类模型(含PP-OCRv5)。
 
 ### PaddleX v3.0.0rc1(4.22/2025)