Explorar el Código

bugfix (#4086)

* fix RGB channel in table recognition result of PP-StructureV3

* update pdx version from 3.0rc1 to 3.0.x in readme

* update pdx installation cmd

* [TEMP] support mkldnn block list to avoid trige error in PP-StructureV3 when using MKLDNN
Tingquan Gao hace 5 meses
padre
commit
540ed5590e

+ 2 - 2
README.md

@@ -548,7 +548,7 @@ PaddleX的各个产线均支持本地**快速推理**,部分模型支持在[AI
 
 ### 🛠️ 安装
 
-> ❗在安装 PaddleX 之前,请确保您已具备基本的 **Python 运行环境**(注:目前支持 Python 3.8 至 Python 3.12)。PaddleX 3.0-rc1 版本依赖的 PaddlePaddle 版本为 3.0.0 及以上版本,请在使用前务必保证版本的对应关系。
+> ❗在安装 PaddleX 之前,请确保您已具备基本的 **Python 运行环境**(注:目前支持 Python 3.8 至 Python 3.12)。PaddleX 3.0.x 版本依赖的 PaddlePaddle 版本为 3.0.0 及以上版本,请在使用前务必保证版本的对应关系。
 
 * **安装 PaddlePaddle**
 ```bash
@@ -566,7 +566,7 @@ python -m pip install paddlepaddle==3.0.0 -i https://www.paddlepaddle.org.cn/pac
 * **安装PaddleX**
 
 ```bash
-pip install paddlex[base]==3.0.0
+pip install "paddlex[base]==3.0.0"
 ```
 
 > ❗ 更多安装方式参考 [PaddleX 安装教程](https://paddlepaddle.github.io/PaddleX/latest/installation/installation.html)

+ 22 - 22
README_en.md

@@ -42,30 +42,30 @@ PaddleX 3.0 is a low-code development tool for AI models built on the PaddlePadd
 
 Core upgrades are as follows:
 
-- **Rich Model Library:**  
-  - **Extensive Model Coverage:** PaddleX 3.0 includes **270+ models**, covering diverse scenarios such as image/video classification/detection/segmentation, OCR, speech recognition, time series analysis, and more.  
-  - **Mature Solutions:** Built on this robust model library, PaddleX 3.0 offers **critical and production-ready AI solutions**, including general document parsing, key information extraction, document understanding, table recognition, and general image recognition.  
+- **Rich Model Library:**
+  - **Extensive Model Coverage:** PaddleX 3.0 includes **270+ models**, covering diverse scenarios such as image/video classification/detection/segmentation, OCR, speech recognition, time series analysis, and more.
+  - **Mature Solutions:** Built on this robust model library, PaddleX 3.0 offers **critical and production-ready AI solutions**, including general document parsing, key information extraction, document understanding, table recognition, and general image recognition.
 
-- **Unified Inference API & Enhanced Deployment Capabilities:**  
-  - **Standardized Inference Interface:** Reduces API fragmentation across model types, lowering the learning curve for users and accelerating enterprise adoption.  
-  - **Multi-Model Composition:** Complex tasks can be efficiently tackled by combining different models, achieving synergistic performance (1+1>2).  
-  - **Upgraded Deployment:** Unified commands now manage deployments for diverse models, supporting **multi-GPU inference** and **multi-instance serving deployments**.  
+- **Unified Inference API & Enhanced Deployment Capabilities:**
+  - **Standardized Inference Interface:** Reduces API fragmentation across model types, lowering the learning curve for users and accelerating enterprise adoption.
+  - **Multi-Model Composition:** Complex tasks can be efficiently tackled by combining different models, achieving synergistic performance (1+1>2).
+  - **Upgraded Deployment:** Unified commands now manage deployments for diverse models, supporting **multi-GPU inference** and **multi-instance serving deployments**.
 
-- **Full Compatibility with PaddlePaddle Framework 3.0:**  
-  - **Leveraging New Paddle 3.0 Features:**  
-    - Compiler-accelerated training: Enable by appending `-o Global.dy2st=True` to training commands. **Most GPU-based models see >10% speed gains, with some exceeding 30%.**  
-    - Inference upgrades: Full adaptation to Paddle 3.0’s Program Intermediate Representation (PIR) enhances flexibility and compatibility. Static graph models now use `xxx.json` instead of `xxx.pdmodel`.  
-  - **ONNX Model Support:** Seamless format conversion via the Paddle2ONNX plugin.  
+- **Full Compatibility with PaddlePaddle Framework 3.0:**
+  - **Leveraging New Paddle 3.0 Features:**
+    - Compiler-accelerated training: Enable by appending `-o Global.dy2st=True` to training commands. **Most GPU-based models see >10% speed gains, with some exceeding 30%.**
+    - Inference upgrades: Full adaptation to Paddle 3.0’s Program Intermediate Representation (PIR) enhances flexibility and compatibility. Static graph models now use `xxx.json` instead of `xxx.pdmodel`.
+  - **ONNX Model Support:** Seamless format conversion via the Paddle2ONNX plugin.
 
-- **Flagship Capabilities:**  
-  - **PP-OCRv5:** Powers **multi-hardware inference, multi-backend support, and serving deployments** for this industry-leading OCR system.  
-  - **PP-StructureV3:** Orchestrates **15+ models** in hybrid (serial/parallel) pipelines, achieving **SOTA accuracy on OmniDocBench**.  
-  - **PP-ChatOCRv4:** Integrates with **PP-DocBee2 and ERNIE 4.5Turbo**, boosting key information extraction accuracy by **15.7 percentage points** over the previous generation.  
+- **Flagship Capabilities:**
+  - **PP-OCRv5:** Powers **multi-hardware inference, multi-backend support, and serving deployments** for this industry-leading OCR system.
+  - **PP-StructureV3:** Orchestrates **15+ models** in hybrid (serial/parallel) pipelines, achieving **SOTA accuracy on OmniDocBench**.
+  - **PP-ChatOCRv4:** Integrates with **PP-DocBee2 and ERNIE 4.5Turbo**, boosting key information extraction accuracy by **15.7 percentage points** over the previous generation.
 
-- **Multi-Hardware Support:**  
-  - **Broad Compatibility:** Training and inference supported on **NVIDIA, Intel, Apple M-series, Kunlunxin, Ascend, Cambricon, Hygon, Enflame**, and more.  
-  - **Ascend-Optimized:** **200+ fully adapted models**, including **21 OM-accelerated inference models**, plus key solutions like PP-OCRv5 and PP-StructureV3.  
-  - **Kunlunxin-Optimized:** Critical classification, detection, and OCR models (including PP-OCRv5) are fully supported.  
+- **Multi-Hardware Support:**
+  - **Broad Compatibility:** Training and inference supported on **NVIDIA, Intel, Apple M-series, Kunlunxin, Ascend, Cambricon, Hygon, Enflame**, and more.
+  - **Ascend-Optimized:** **200+ fully adapted models**, including **21 OM-accelerated inference models**, plus key solutions like PP-OCRv5 and PP-StructureV3.
+  - **Kunlunxin-Optimized:** Critical classification, detection, and OCR models (including PP-OCRv5) are fully supported.
 
 
 ## 🔠 Explanation of Pipeline
@@ -553,7 +553,7 @@ In addition, PaddleX provides developers with a full-process efficient model tra
 
 ### 🛠️ Installation
 
-> ❗Before installing PaddleX, please ensure that you have a basic **Python runtime environment** (Note: Currently supports Python 3.8 to Python 3.12). The PaddleX 3.0-rc1 version depends on PaddlePaddle version 3.0.0 and above. Please make sure the version compatibility is maintained before use.
+> ❗Before installing PaddleX, please ensure that you have a basic **Python runtime environment** (Note: Currently supports Python 3.8 to Python 3.12). The PaddleX 3.0.x version depends on PaddlePaddle version 3.0.0 and above. Please make sure the version compatibility is maintained before use.
 
 * **Installing PaddlePaddle**
 
@@ -572,7 +572,7 @@ python -m pip install paddlepaddle==3.0.0 -i https://www.paddlepaddle.org.cn/pac
 * **Installing PaddleX**
 
 ```bash
-pip install paddlex[base]==3.0.0
+pip install "paddlex[base]==3.0.0"
 ```
 
 > ❗For more installation methods, refer to the [PaddleX Installation Guide](https://paddlepaddle.github.io/PaddleX/latest/en/installation/installation.html).

+ 9 - 3
paddlex/inference/pipelines/base.py

@@ -96,14 +96,20 @@ class BasePipeline(ABC, metaclass=AutoRegisterABCMetaClass):
 
         logging.info("Creating model: %s", (config["model_name"], model_dir))
 
+        # TODO(gaotingquan): support to specify pp_option by model in pipeline
+        if self.pp_option is not None:
+            pp_option = self.pp_option.copy()
+            pp_option.model_name = config["model_name"]
+            pp_option.run_mode = self.pp_option.run_mode
+        else:
+            pp_option = None
+
         model = create_predictor(
             model_name=config["model_name"],
             model_dir=model_dir,
             device=self.device,
             batch_size=config.get("batch_size", 1),
-            pp_option=(
-                self.pp_option.copy() if self.pp_option is not None else self.pp_option
-            ),
+            pp_option=pp_option,
             use_hpip=use_hpip,
             hpi_config=hpi_config,
             **kwargs,

+ 1 - 1
paddlex/inference/pipelines/layout_parsing/result_v2.py

@@ -220,7 +220,7 @@ class LayoutParsingResultV2(BaseCVResult, HtmlMixin, XlsxMixin, MarkdownMixin):
 
         if model_settings["use_table_recognition"] and len(self["table_res_list"]) > 0:
             table_cell_img = Image.fromarray(
-                copy.deepcopy(self["doc_preprocessor_res"]["output_img"])
+                copy.deepcopy(self["doc_preprocessor_res"]["output_img"][:, :, ::-1])
             )
             table_draw = ImageDraw.Draw(table_cell_img)
             rectangle_color = (255, 0, 0)

+ 25 - 0
paddlex/inference/utils/mkldnn_blocklist.py

@@ -0,0 +1,25 @@
+# Copyright (c) 2024 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+MKLDNN_BLOCKLIST = [
+    "SLANeXt_wired",
+    "SLANeXt_wireless",
+    "LaTeX_OCR_rec",
+    "PP-FormulaNet-L",
+    "PP-FormulaNet-S",
+    "UniMERNet",
+    "PP-FormulaNet_plus-L",
+    "PP-FormulaNet_plus-M",
+    "PP-FormulaNet_plus-S",
+]

+ 16 - 7
paddlex/inference/utils/pp_option.py

@@ -24,6 +24,7 @@ from ...utils.device import (
     set_env_for_device_type,
 )
 from ...utils.flags import USE_PIR_TRT
+from .mkldnn_blocklist import MKLDNN_BLOCKLIST
 from .new_ir_blocklist import NEWIR_BLOCKLIST
 from .trt_blocklist import TRT_BLOCKLIST
 from .trt_config import TRT_CFG_SETTING, TRT_PRECISION_MAP
@@ -45,7 +46,7 @@ class PaddlePredictorOption(object):
     )
     SUPPORT_DEVICE = ("gpu", "cpu", "npu", "xpu", "mlu", "dcu", "gcu")
 
-    def __init__(self, model_name, **kwargs):
+    def __init__(self, model_name=None, **kwargs):
         super().__init__()
         self._model_name = model_name
         self._cfg = {}
@@ -137,12 +138,20 @@ class PaddlePredictorOption(object):
             raise ValueError(
                 f"`run_mode` must be {support_run_mode_str}, but received {repr(run_mode)}."
             )
-        # TRT Blocklist
-        if run_mode.startswith("trt") and self._model_name in TRT_BLOCKLIST:
-            logging.warning(
-                f"The model({self._model_name}) is not supported to run in trt mode! Using `paddle` instead!"
-            )
-            run_mode = "paddle"
+
+        if self._model_name is not None:
+            # TRT Blocklist
+            if run_mode.startswith("trt") and self._model_name in TRT_BLOCKLIST:
+                logging.warning(
+                    f"The model({self._model_name}) is not supported to run in trt mode! Using `paddle` instead!"
+                )
+                run_mode = "paddle"
+            # MKLDNN Blocklist
+            elif run_mode.startswith("mkldnn") and self._model_name in MKLDNN_BLOCKLIST:
+                logging.warning(
+                    f"The model({self._model_name}) is not supported to run in MKLDNN mode! Using `paddle` instead!"
+                )
+                run_mode = "paddle"
 
         self._update("run_mode", run_mode)