Ver Fonte

update README & CHANGELOG for 3.0rc0 (#3340)

cuicheng01 há 9 meses atrás
pai
commit
166c010e10
4 ficheiros alterados com 133 adições e 4 exclusões
  1. 18 2
      README.md
  2. 14 2
      README_en.md
  3. 51 0
      docs/CHANGLOG.en.md
  4. 50 0
      docs/CHANGLOG.md

+ 18 - 2
README.md

@@ -41,9 +41,25 @@ PaddleX 3.0 是基于飞桨框架构建的低代码开发工具,它集成了
 
 ## 📣 近期更新
 
-🔥🔥 **2024.11.15**,PaddleX 3.0 Beta2 开源版正式发布,全面适配 PaddlePaddle 3.0b2 版本。**新增通用图像识别、人脸识别、车辆属性识别和行人属性识别产线,同时新增 42 个模型开发全流程适配昇腾 910B,并全面支持[GitHub 站点文档](https://paddlepaddle.github.io/PaddleX/latest/index.html)。**
+🔥🔥 **2025.2.14**,PaddleX v3.0.0rc0 重磅升级。本次版本全面适配 PaddlePaddle 3.0rc0,核心升级如下:
 
-🔥🔥 **2024.9.30**,PaddleX 3.0 Beta1 开源版正式发布,提供 **200+ 模型** 通过极简的 Python API 一键调用;实现基于统一命令的模型全流程开发,并开源 **PP-ChatOCRv3** 特色模型产线基础能力;支持 **100+ 模型高性能推理和服务化部署**(持续迭代中),**4条模型产线8个重点视觉模型端侧部署**;**100+ 模型开发全流程适配昇腾 910B**,**39+ 模型开发全流程适配昆仑芯和寒武纪**。
+- 新增 12 条高价值产线,重磅推出自研 **[版面解析v2产线](docs/pipeline_usage/tutorials/ocr_pipelines/layout_parsing_v2.md)**、**[PP-ChatOCRv4-doc产线](docs/pipeline_usage/tutorials/information_extraction_pipelines/document_scene_information_extraction_v4.md)**、**[表格识别v2产线](docs/pipeline_usage/tutorials/ocr_pipelines/table_recognition_v2.md)**。此外新增了文档处理、旋转框检测、开放词汇检测/分割、视频分析、多语种语音识别、3D 等场景的产线。
+
+- 扩充 48 个前沿模型,包括重磅推出的 OCR 领域的**版面区域检测模型 [PP-DocLayout](docs/module_usage/tutorials/ocr_modules/layout_detection.md)**、**公式识别模型 [PP-FormulaNet](docs/module_usage/tutorials/ocr_modules/formula_recognition.md)**,**表格结构识别模型 [SLANeXt](docs/module_usage/tutorials/ocr_modules/table_structure_recognition.md)**,**文本识别模型 [PP-OCRv4_server_rec_doc](docs/module_usage/tutorials/ocr_modules/text_recognition.md)**。CV 领域的 3D 检测、人体关键点、开放词汇检测/分割模型,以及语音识别领域的 Whisper 系列等模型。
+
+- 优化和升级模型和产线的推理 API,支持更多参数的配置,提升模型和产线推理的灵活性,[详情](docs/API_change_log/v3.0.0rc.md)。
+
+- 多硬件支持扩展:新增燧原 GCU 支持(90+模型),昇腾 NPU/昆仑芯 XPU/寒武纪 MLU/海光 DCU 模型数量显著提升。
+
+- 全场景部署能力升级:
+  - 高性能推理支持一键安装、Windows 系统及 220+ 模型,核心库 ultra-infer 开源;
+  - 服务化部署新增高稳定性方案,支持动态配置优化。
+
+- 系统兼容性增强:适配 Windows 训练/推理,全面支持 Python 3.11/3.12。
+
+🔥 **2024.11.15**,PaddleX 3.0 Beta2 开源版正式发布,全面适配 PaddlePaddle 3.0b2 版本。**新增通用图像识别、人脸识别、车辆属性识别和行人属性识别产线,同时新增 42 个模型开发全流程适配昇腾 910B,并全面支持[GitHub 站点文档](https://paddlepaddle.github.io/PaddleX/latest/index.html)。**
+
+🔥 **2024.9.30**,PaddleX 3.0 Beta1 开源版正式发布,提供 **200+ 模型** 通过极简的 Python API 一键调用;实现基于统一命令的模型全流程开发,并开源 **PP-ChatOCRv3** 特色模型产线基础能力;支持 **100+ 模型高性能推理和服务化部署**(持续迭代中),**4条模型产线8个重点视觉模型端侧部署**;**100+ 模型开发全流程适配昇腾 910B**,**39+ 模型开发全流程适配昆仑芯和寒武纪**。
 
 
 🔥 **2024.6.27**,PaddleX 3.0 Beta 开源版正式发布,支持以低代码的方式在本地端使用多种主流硬件进行产线和模型开发。

+ 14 - 2
README_en.md

@@ -44,9 +44,21 @@ PaddleX 3.0 is a low-code development tool for AI models built on the PaddlePadd
 
 ## 📣 Recent Updates
 
-🔥🔥 **"PaddleX Document Information Personalized Extraction Upgrade"**, PP-ChatOCRv3 innovatively provides custom development functions for OCR models based on data fusion technology, offering stronger model fine-tuning capabilities. Millions of high-quality general OCR text recognition data are automatically integrated into vertical model training data at a specific ratio, solving the problem of weakened general text recognition capabilities caused by vertical model training in the industry. Suitable for practical scenarios in industries such as automated office, financial risk control, healthcare, education and publishing, and legal and government sectors. **October 24th (Thursday) 19:00** Join our live session for an in-depth analysis of the open-source version of PP-ChatOCRv3 and the outstanding advantages of PaddleX 3.0 Beta1 in terms of accuracy and speed. [Registration Link](https://www.wjx.top/vm/wpPu8HL.aspx?udsid=994465)
+🔥🔥 **2025.2.14**, PaddleX v3.0.0rc0 major upgrade. This version fully adapts to PaddlePaddle 3.0rc0, with the following core upgrades:
 
-> [❗ Get more courses for free](https://aistudio.baidu.com/education/group/info/32160)
+- **Added 12 high-value pipelines**, launching self-developed **[Layout Parsing v2 Pipeline](docs/pipeline_usage/tutorials/ocr_pipelines/layout_parsing_v2.en.md)**, **[PP-ChatOCRv4-doc Pipeline](docs/pipeline_usage/tutorials/information_extraction_pipelines/document_scene_information_extraction_v4.en.md)**, **[Table Recognition v2 Pipeline](docs/pipeline_usage/tutorials/ocr_pipelines/table_recognition_v2.en.md)**. Additionally, new pipelines for document processing, rotated box detection, open vocabulary detection/segmentation, video analysis, multilingual speech recognition, 3D, and other scenarios have been added.
+
+- **Expanded 48 cutting-edge models**, including the major releases in the OCR field such as **Document Layout Detection Model [PP-DocLayout](docs/module_usage/tutorials/ocr_modules/layout_detection.en.md)**, **Formula Recognition Model [PP-FormulaNet](docs/module_usage/tutorials/ocr_modules/formula_recognition.en.md)**, **Table Structure Recognition Model [SLANeXt](docs/module_usage/tutorials/ocr_modules/table_structure_recognition.en.md)**, **Text Recognition Model [PP-OCRv4_server_rec_doc](docs/module_usage/tutorials/ocr_modules/text_recognition.en.md)**. In the CV field, models for 3D detection, human keypoints, open vocabulary detection/segmentation, and in the speech recognition field, models from the Whisper series, among others.
+
+- **Optimized and upgraded the inference APIs for models and pipelines**, supporting more parameter configurations to enhance the flexibility of model and pipeline inference. [Details](docs/API_change_log/v3.0.0rc.en.md).
+
+- **Expanded hardware support:** added support for Suoyuan GCU (90+ models), and significantly increased the number of models for Ascend NPU/Kunlunxin XPU/Cambricon MLU/Hygon DCU.
+
+- **Upgraded full-scenario deployment capabilities:**
+  - High-performance inference supports one-click installation, Windows systems, and 220+ models, with the core library ultra-infer open-sourced;
+  - Serving deployment added a highly stable solution, supporting dynamic configuration optimization.
+
+- **Enhanced system compatibility:** adapted to Windows training/inference, fully supporting Python 3.11/3.12.
 
 🔥🔥 **11.15, 2024**, PaddleX 3.0 Beta2 open source version is officially released, PaddleX 3.0 Beta2 is fully compatible with the PaddlePaddle 3.0b2 version. <b>This update introduces new pipelines for general image recognition, face recognition, vehicle attribute recognition, and pedestrian attribute recognition. We have also developed 42 new models to fully support the Ascend 910B, with extensive documentation available on [GitHub Pages](https://paddlepaddle.github.io/PaddleX/latest/en/index.html).</b>
 

+ 51 - 0
docs/CHANGLOG.en.md

@@ -5,6 +5,57 @@ comments: true
 # Version Update Information
 
 ## Latest Version Information
+### PaddleX v3.0.0rc0(2.14/2025)
+
+PaddleX 3.0 rc0 is fully compatible with PaddlePaddle 3.0rc0 version, adding 10+ pipelines, 40+ models, optimizing model and pipeline APIs, and adapting more models to multiple hardware. The high-performance inference and serving capabilities have been comprehensively upgraded. The specific new features are as follows:
+
+- <b>New pipelines</b>:
+  - <b>[Document Image Preprocessing Pipeline](pipeline_usage/tutorials/ocr_pipelines/doc_preprocessor.en.md)</b>, supporting the correction of rotated and distorted document images.
+  - <b>[PP-ChatOCRv4-doc Pipeline](pipeline_usage/tutorials/information_extraction_pipelines/document_scene_information_extraction_v4.en.md)</b>, which integrates multimodal capabilities on the basis of the Document PP-ChatOCRv3-doc pipeline, enhances OCR recognition capabilities, optimizes Prompts, and ultimately improves the accuracy of document information extraction by 15 percentage points. Supports local large model OpenAI interface calls.
+  - <b>[Layout Parsing v2 Pipeline](pipeline_usage/tutorials/ocr_pipelines/layout_parsing_v2.en.md)</b>, the core solution of PP-StructureV3. Based on the General Layout Parsing v1 pipeline, it optimizes layout area detection, table recognition, formula recognition, and reading order recovery capabilities, supports converting different types of document images and document PDF files into standard Markdown files, and performs strongly in document recovery capabilities in most scenarios.
+  - <b>[Table Recognition v2 Pipeline](pipeline_usage/tutorials/ocr_pipelines/table_recognition_v2.en.md)</b>, adopting a multi-model series networking solution of "table classification + table structure recognition + cell detection" to achieve higher precision end-to-end table recognition.
+  - <b>[Rotated Object Detection Pipeline](pipeline_usage/tutorials/cv_pipelines/rotated_object_detection.en.md)</b>, supporting the detection of rotated objects.
+  - <b>[Human Keypoint Detection Pipeline](pipeline_usage/tutorials/cv_pipelines/human_keypoint_detection.en.md)</b>, supporting precise acquisition of human keypoint positions such as shoulders, elbows, knees, etc., for pose estimation and behavior recognition.
+  - <b>[Open Vocabulary Object Detection Pipeline](pipeline_usage/tutorials/cv_pipelines/open_vocabulary_detection.en.md)</b>, supporting the detection of open-domain objects and predicting categories.
+  - <b>[Open Vocabulary Segmentation Pipeline](pipeline_usage/tutorials/cv_pipelines/open_vocabulary_segmentation.en.md)</b>, supporting image segmentation of open-domain objects.
+  - <b>[Video Detection Pipeline](pipeline_usage/tutorials/video_pipelines/video_detection.en.md)</b>, supporting efficient extraction of spatial and temporal features from videos, achieving accurate recognition and localization of objects in videos.
+  - <b>[Video Classification Pipeline](pipeline_usage/tutorials/video_pipelines/video_classification.en.md)</b>, supporting the extraction of spatiotemporal features from videos and accurate classification.
+  - <b>[Multilingual Speech Recognition Pipeline](pipeline_usage/tutorials/speech_pipelines/multilingual_speech_recognition.en.md)</b>, supporting automatic conversion of human-spoken multiple languages into corresponding text or commands.
+  - <b>[3D Bev Detection Pipeline](pipeline_usage/tutorials/cv_pipelines/3d_bev_detection.en.md)</b>, supporting input from multiple sensors (LiDAR, surround-view RGB cameras, etc.), processing data through deep learning methods, and outputting information such as position, shape, orientation, and category of objects in three-dimensional space.
+
+- <b>New models:</b>
+  - Added 28 OCR models, including the self-developed PP-DocLayout series for high precision and efficiency in layout area detection, the self-developed PP-FormulaNet series for high precision and efficiency in formula recognition, the self-developed SLANeXt series for table structure recognition, and the PP-OCRv4_server_rec_doc model for higher recognition accuracy in text recognition.
+  - Added 11 CV models, including 3D multimodal fusion detection models, open vocabulary object detection and segmentation models, rotated object detection models, and human keypoint detection models.
+  - Added 5 Speech models, including 5 models from the Whisper series.
+  - Added 4 Video models, including 1 video detection model and 3 video classification models.
+
+- <b>Model and pipeline capability upgrades:</b>
+  - Models and pipelines support more parameters, such as category thresholds for object detection models, expansion coefficients for text detection models, etc. CV and OCR models support PDF file input.
+  - OCR pipelines support document preprocessing operations such as document orientation classification and document correction, with built-in text line orientation classification models.
+  - Document Scene Information Extraction v3 pipeline supports standard OpenAI interface calls to large language models, supporting more large language model calls.
+  - Optimized user experience, with changes to some model and pipeline interfaces. For details, refer to the [API Upgrade Document](API_change_log/v3.0.0rc.md).
+
+- <b>Multi-hardware support:</b>
+  - Added model training and inference capabilities for the Suiyuan GCU hardware, supporting 90+ models, [GCU Model List](support_list/model_list_gcu.md)
+  - Added 50+ models for Ascend NPU, [NPU Model List](support_list/model_list_npu.en.md)
+  - Added 10+ models for Kunlunxin XPU, [XPU Model List](support_list/model_list_xpu.en.md)
+  - Added 10+ models for Cambricon MLU, [MLU Model List](support_list/model_list_mlu.en.md)
+  - Added 30+ models for Hygon DCU, [DCU Model List](support_list/model_list_dcu.en.md)
+
+- <b>Multi-environment adaptation:</b>
+  - Adapted to Windows systems, supporting the use of PaddleX for model training and inference on Windows. Fixed some installation failures on Windows systems.
+  - Training and inference fully adapted to Python3.11 and Python3.12.
+
+- <b>Comprehensive upgrade of deployment capabilities:</b>
+  - <b>High-performance inference:</b>
+    - Simplified installation and usage: Supports one-click installation of high-performance inference plugins using PaddleX CLI; no authentication is required to use high-performance inference plugins.
+    - Cross-platform support: Added support for Windows systems.
+    - Expanded model support: Increased the number of supported models, currently supporting 220+ models.
+    - Open-sourced core code: Open-sourced the core inference library ultra-infer, facilitating secondary development and customization by developers.
+  - <b>Serving:</b>
+    - Upgraded basic serving solutions: Upgraded basic serving solutions, supporting new pipelines and adapting to new features of existing pipelines.
+    - Added high-stability serving solutions: Added high-stability serving solutions, supporting flexible adjustment of service configurations to optimize service performance, with multiple solutions to meet different user needs.
+
 ### PaddleX v3.0.0beta2 (11.15/2024)
 
 PaddleX 3.0 Beta2 is fully compatible with the PaddlePaddle 3.0b2 version. <b>This update introduces new pipelines for general image recognition, face recognition, vehicle attribute recognition, and pedestrian attribute recognition. We have also developed 42 new models to fully support the Ascend 910B, with extensive documentation available on [GitHub Pages](https://paddlepaddle.github.io/PaddleX/latest/en/index.html).</b> Here’s a detailed look at the new features and improvements:

+ 50 - 0
docs/CHANGLOG.md

@@ -5,6 +5,56 @@ comments: true
 # 版本更新信息
 
 ## 最新版本信息
+### PaddleX v3.0.0rc0(2.14/2025)
+PaddleX 3.0 rc0 全面适配 PaddlePaddle 3.0rc0 版本,新增10+条产线,40+个模型,优化模型和产线API,多硬件适配更多模型。全面升级高性能推理和服务化部署能力。具体新增能力如下:
+
+- 新增产线:
+  - 新增[文档预处理产线](pipeline_usage/tutorials/ocr_pipelines/doc_preprocessor.md),支持将矫正旋转和扭曲的文档图像。
+  - 新增[文档场景信息抽取v4产线](pipeline_usage/tutorials/information_extraction_pipelines/document_scene_information_extraction_v4.md),在文档场景信息抽取v3产线的基础上,融合了多模态能力,增强了OCR识别能力,优化了Prompt,最终文档信息抽取的准确率提升15个百分点。支持本地大模型OpenAI接口调用。
+  - 新增[通用版面解析v2产线](pipeline_usage/tutorials/ocr_pipelines/layout_parsing_v2.md),PP-StructureV3 的核心方案。在通用版面解析v1产线的基础上,优化了版面区域检测、表格识别、公式识别、阅读顺序恢复的能力,支持将不同类型的文档图像和文档PDF文件转换为标准的Markdown文件,在大多数场景的文档恢复能力表现强劲。
+  - 新增[通用表格识别v2产线](pipeline_usage/tutorials/ocr_pipelines/table_recognition_v2.md),采用了“表格分类+表格结构识别+单元格检测”的多模型串联组网方案,实现更高精度的端到端表格识别。
+  - 新增[旋转框检测产线](pipeline_usage/tutorials/cv_pipelines/rotated_object_detection.md),支持对旋转目标进行检测。
+  - 新增[人体关键点检测产线](pipeline_usage/tutorials/cv_pipelines/human_keypoint_detection.md),支持精确获取人体的关键点位置,如肩膀、肘部、膝盖等,从而进行姿态估计和行为识别。
+  - 新增[开放词汇目标检测产线](pipeline_usage/tutorials/cv_pipelines/open_vocabulary_detection.md),支持对开放域目标进行检测,并预测类别。
+  - 新增[开放词汇分割产线](pipeline_usage/tutorials/cv_pipelines/open_vocabulary_segmentation.md),支持对开放域目标进行图像分割。
+  - 新增[通用视频检测产线](pipeline_usage/tutorials/video_pipelines/video_detection.md),支持高效提取视频中的空间和时间特征,实现视频中目标的精准识别和定位。
+  - 新增[通用视频分类产线](pipeline_usage/tutorials/video_pipelines/video_classification.md),支持提取视频中的时空特征并进行准确分类。
+  - 新增[多语种语音识别产线](pipeline_usage/tutorials/speech_pipelines/multilingual_speech_recognition.md),支持将人类口述的多种语言自动转换为相应的文本或命令。
+  - 新增[3D多模态融合检测产线](pipeline_usage/tutorials/cv_pipelines/3d_bev_detection.md),支持输入多种传感器(激光雷达、环视RGB相机等)数据,通过深度学习等方法对数据进行处理,输出三维空间中物体的位置、形状、朝向、类别等信息。
+
+- 新增模型:
+  - 新增 OCR 类模型 28 个,其中包含兼顾高精度和高效率的自研版面区域检测模型 PP-DocLayout 系列、兼顾高精度和高效率的自研公式识别模型 PP-FormulaNet 系列、自研表格结构识别模型 SLANeXt 系列、更高识别精度的自研文本识别模型 PP-OCRv4_server_rec_doc 模型等。
+  - 新增 CV 类模型 11 个,新增了 3D 多模态融合检测模型、开放词汇目标检测和分割模型、旋转框检测模型、人体关键点检测模型等。
+  - 新增 Speech 类模型 5 个,新增了 Whisper 系列的 5 个模型。
+  - 新增 Video 类模型 4 个,包含视频检测模型 1 个、视频分类模型 3 个。
+
+- 模型和产线能力升级:
+  - 模型和产线支持更多参数,如目标检测模型的类别阈值、文本检测模型的膨胀系数等,CV 类和 OCR 类模型支持 PDF 格式文件输入。
+  - OCR 类产线支持文档前处理操作,如文档方向分类、文档矫正等,内置文本行方向分类模型。
+  - 文档场景信息抽取 v3 产线支持标准 OpenAI 接口调用大语言模型,支持更多大语言模型的调用。
+  - 优化使用体验,部分模型和产线接口发生变化,详情参考 [API升级文档](API_change_log/v3.0.0rc.md)。
+
+- 多硬件支持:
+  - 新增燧原 GCU 硬件的模型训推能力,支持模型数量90+,[GCU 模型列表](support_list/model_list_gcu.md)
+  - 昇腾 NPU 新增模型数量 50+,[NPU 模型列表](support_list/model_list_npu.md)
+  - 昆仑芯 XPU 新增模型数量 10+,[XPU 模型列表](support_list/model_list_xpu.md)
+  - 寒武纪 MLU 新增模型数量 10+,[MLU 模型列表](support_list/model_list_mlu.md)
+  - 海光 DCU 新增模型数量 30+,[DCU 模型列表](support_list/model_list_dcu.md)
+
+- 多环境适配
+  - 适配 Windows 系统,支持在 Windows 下使用 PaddleX 进行模型训练和推理。修复部分 Windows 系统下安装失败的问题。
+  - 训练和推理全面适配 Python3.11,Python3.12。
+
+- 部署能力全面升级:
+  - 高性能推理:
+    - 安装使用简化:支持使用 PaddleX CLI 一键安装高性能推理插件;使用高性能推理插件无需进行鉴权。
+    - 跨平台支持:新增对 Windows 系统的支持。
+    - 模型支持扩展:扩增支持模型数量,目前总计支持 220+ 个模型。
+    - 核心代码开源:开源核心推理库 ultra-infer,便于开发者进行二次开发和定制。
+  - 服务化部署:
+    - 基础服务化部署方案升级:升级基础服务化部署方案,支持新增产线,并适配原有产线新增功能。
+    - 高稳定性服务化部署方案支持:新增高稳定性服务化部署方案,支持灵活调整服务配置以优化服务性能,多种部署方案满足不同用户需求。
+
 ### PaddleX v3.0.0beta2(11.15/2024)
 PaddleX 3.0 Beta2 全面适配 PaddlePaddle 3.0b2 版本。**新增通用图像识别、人脸识别、车辆属性识别和行人属性识别产线,同时新增 42 个模型开发全流程适配昇腾 910B,并全面支持[GitHub 站点文档](https://paddlepaddle.github.io/PaddleX/latest/index.html)。** 具体新增能力如下: