|
|
@@ -42,32 +42,30 @@ PaddleX 3.0 is a low-code development tool for AI models built on the PaddlePadd
|
|
|
|
|
|
Core upgrades are as follows:
|
|
|
|
|
|
-- **Major Capability Releases:**
|
|
|
- - **Launch of the groundbreaking text recognition model PP-OCRv5**: Achieves a 13% improvement in OCR accuracy across all scenarios. A single model now supports 5 types of text (Simplified Chinese, Traditional Chinese, Chinese Pinyin, English, and Japanese), with significant enhancements in recognizing handwritten fonts, vertical text, and rare characters in both Chinese and English. You can experience it immediately in the [online demo](https://aistudio.baidu.com/community/app/91660/webUI?source=appCenter).
|
|
|
-
|
|
|
- - **Launch of the groundbreaking document parsing solution PP-StructureV3**: Enhanced capabilities in layout area detection, table recognition, Chinese and English formula recognition, and restoration of multi-column reading order, with added abilities for chart understanding. PP-StructureV3 achieves state-of-the-art (SOTA) levels in both Chinese and English editing distances on the OmniDocBench leaderboard. Experience it in the [online demo](https://aistudio.baidu.com/community/app/518494/webUI?source=appCenter).
|
|
|
-
|
|
|
- - **Optimization of PP-ChatOCRv4**: Supports the Ernie 4.5T. Combined with PP-DocBee2, it shows a 15.7 percentage point improvement in key information extraction accuracy compared to the previous generation. Experience it in the [online demo](https://aistudio.baidu.com/community/app/518493/webUI?source=appCenter).
|
|
|
-
|
|
|
-- **Inference Capability Optimization:**
|
|
|
- - The general OCR, PP-StructureV3, formula recognition, seal text recognition, and document image preprocessing pipelines support setting batch size >1, allowing multiple pages to be processed at once.
|
|
|
-
|
|
|
- - 17 pipelines, including general OCR and PP-StructureV3, now support multi-GPU parallel inference. Sample code for multi-process parallel inference has been added.
|
|
|
-
|
|
|
-
|
|
|
-🔥 **2025.4.22, PaddleX v3.0.0rc1 major upgrade.** This version fully adapts to PaddlePaddle 3.0.0, with the following core upgrades:
|
|
|
-
|
|
|
-- **Adapts to New Features of PaddlePaddle 3.0**: Supports compiler training, which can be enabled by appending `-o Global.dy2st=True` to the training command. On GPUs, the training speed of most models can be improved by over 10%, and for a few models, the improvement can exceed 30%. For inference, the models are fully adapted to PaddlePaddle 3.0's Intermediate Representation (PIR) technology, offering more flexible extensibility and compatibility. The file names for inference model have been changed from `xxx.pdmodel` to `xxx.json`.
|
|
|
-- **Newly Added Self-developed MLLM for Document Image Understanding, PP-DocBee**: PP-DocBee has achieved SOTA performance among models with similar parameter sizes on academic and internal business scenario document understanding evaluation benchmarks. It can be applied to document QA scenarios such as financial reports, research reports, contracts, manuals, and legal regulations.
|
|
|
-- **Full Support for ONNX Format Models, with Support for Model Format Conversion via the Paddle2ONNX Plugin.**
|
|
|
-- **Enhanced High-Performance Inference**:
|
|
|
- - **Added Support for ONNX and OM Format Models**: PaddleX can intelligently select the model format based on needs;
|
|
|
- - **Expanded Supported Pipelines and Modules**: All single modules and pipelines for inference model can use the high-performance inference plugin to improve inference performance;
|
|
|
- - **Support for 3 Configuration Methods: CLI, API, and Configuration Files**: Enables more granular configuration, allowing users to enable and disable the high-performance inference plugin at the sub-pipeline and sub-module level.
|
|
|
-
|
|
|
-- **Expanded Multi-Hardware Support**:
|
|
|
- - **NPU: The number of models fully validated on Ascend NPU has increased to 200. Additionally, common pipelines such as general OCR, image classification, and object detection support OM model format inference, with inference speed improvements ranging from 113.8% to 226.4%. Inference deployment is supported on Atlas 200 and Atlas 300 series products.**
|
|
|
- - **GCU: Enflame has been officially integrated into the PaddlePaddle regular release system, completing the adaptation of the PaddleX ecosystem. Supports the training and inference of 90 models.**
|
|
|
+- **Rich Model Library:**
|
|
|
+ - **Extensive Model Coverage:** PaddleX 3.0 includes **270+ models**, covering diverse scenarios such as image/video classification/detection/segmentation, OCR, speech recognition, time series analysis, and more.
|
|
|
+ - **Mature Solutions:** Built on this robust model library, PaddleX 3.0 offers **critical and production-ready AI solutions**, including general document parsing, key information extraction, document understanding, table recognition, and general image recognition.
|
|
|
+
|
|
|
+- **Unified Inference API & Enhanced Deployment Capabilities:**
|
|
|
+ - **Standardized Inference Interface:** Reduces API fragmentation across model types, lowering the learning curve for users and accelerating enterprise adoption.
|
|
|
+ - **Multi-Model Composition:** Complex tasks can be efficiently tackled by combining different models, achieving synergistic performance (1+1>2).
|
|
|
+ - **Upgraded Deployment:** Unified commands now manage deployments for diverse models, supporting **multi-GPU inference** and **multi-instance serving deployments**.
|
|
|
+
|
|
|
+- **Full Compatibility with PaddlePaddle Framework 3.0:**
|
|
|
+ - **Leveraging New Paddle 3.0 Features:**
|
|
|
+ - Compiler-accelerated training: Enable by appending `-o Global.dy2st=True` to training commands. **Most GPU-based models see >10% speed gains, with some exceeding 30%.**
|
|
|
+ - Inference upgrades: Full adaptation to Paddle 3.0’s Program Intermediate Representation (PIR) enhances flexibility and compatibility. Static graph models now use `xxx.json` instead of `xxx.pdmodel`.
|
|
|
+ - **ONNX Model Support:** Seamless format conversion via the Paddle2ONNX plugin.
|
|
|
+
|
|
|
+- **Flagship Capabilities:**
|
|
|
+ - **PP-OCRv5:** Powers **multi-hardware inference, multi-backend support, and serving deployments** for this industry-leading OCR system.
|
|
|
+ - **PP-StructureV3:** Orchestrates **15+ models** in hybrid (serial/parallel) pipelines, achieving **SOTA accuracy on OmniDocBench**.
|
|
|
+ - **PP-ChatOCRv4:** Integrates with **PP-DocBee2 and ERNIE 4.5Turbo**, boosting key information extraction accuracy by **15.7 percentage points** over the previous generation.
|
|
|
+
|
|
|
+- **Multi-Hardware Support:**
|
|
|
+ - **Broad Compatibility:** Training and inference supported on **NVIDIA, Intel, Apple M-series, Kunlunxin, Ascend, Cambricon, Hygon, Enflame**, and more.
|
|
|
+ - **Ascend-Optimized:** **200+ fully adapted models**, including **21 OM-accelerated inference models**, plus key solutions like PP-OCRv5 and PP-StructureV3.
|
|
|
+ - **Kunlunxin-Optimized:** Critical classification, detection, and OCR models (including PP-OCRv5) are fully supported.
|
|
|
|
|
|
|
|
|
## 🔠 Explanation of Pipeline
|