Эх сурвалжийг харах

Add methods for table related pipelines (#3616)

* fix bugs

* refine codes

* fix bugs

* refine code

* add new algorithm

* refine codes

* refine codes

* refine table method

* fix bugs

* test

* refine codes

* refine codes

* refine codes

* refine codes

* refine codes

* refine codes

* refine codes

* refine codes
Liu Jiaxuan 8 сар өмнө
parent
commit
34c4f8b660

+ 15 - 8
docs/pipeline_usage/tutorials/ocr_pipelines/table_recognition.en.md

@@ -7,7 +7,7 @@ comments: true
 ## 1. Introduction to General Table Recognition Pipeline
 Table recognition is a technology that automatically identifies and extracts table content and structure from documents or images. It is widely used in data entry, information retrieval, and document analysis. By using computer vision and machine learning algorithms, table recognition can convert complex table information into editable formats, facilitating further processing and analysis of data.
 
-The General Table Recognition Pipeline is designed to solve table recognition tasks by identifying tables in images and outputting them in HTML format. This pipeline integrates the well-known SLANet and SLANet_plus table recognition models. Based on this pipeline, precise predictions of tables can be achieved, covering a wide range of applications in general, manufacturing, finance, transportation, and other fields. The pipeline also provides flexible service deployment options, supporting various hardware and programming languages for integration. Moreover, it offers custom development capabilities, allowing you to train and optimize models on your own dataset, which can then be seamlessly integrated.
+The General Table Recognition Pipeline is designed to solve table recognition tasks by identifying tables in images and outputting them in HTML format. This pipeline integrates the well-known SLANet and SLANet_plus table structure recognition models. Based on this pipeline, precise predictions of tables can be achieved, covering a wide range of applications in general, manufacturing, finance, transportation, and other fields. The pipeline also provides flexible service deployment options, supporting various hardware and programming languages for integration. Moreover, it offers custom development capabilities, allowing you to train and optimize models on your own dataset, which can then be seamlessly integrated.
 
 <img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/pipelines/table_recognition/01.png"/>
 <b>The General Table Recognition Pipeline includes essential modules for table structure recognition, text detection, and text recognition, as well as optional modules for layout area detection, document image orientation classification, and text image correction.</b>
@@ -16,7 +16,7 @@ The General Table Recognition Pipeline is designed to solve table recognition ta
 
 <details><summary>👉Model List Details</summary>
 
-<p><b>Table Recognition Module Models:</b></p>
+<p><b>Table Structure Recognition Module Models:</b></p>
 <table>
 <tr>
 <th>Model</th><th>Model Download Link</th>
@@ -868,6 +868,13 @@ In the above Python script, the following steps are executed:
 </td>
 <td><code>None</code></td>
 </tr>
+<td><code>use_table_cells_ocr_results</code></td>
+<td>Whether to enable Table-Cells-OCR mode, when not enabled, use global OCR result to fill to HTML table, when enabled, do OCR cell by cell and fill to HTML table (it will increase the time consuming). Both of them perform differently in different scenarios, please choose according to the actual situation.</td>
+<td><code>bool|False</code></td>
+<td>
+<ul>
+<li><b>bool</b>:<code>True</code> or <code>False</code>
+<td><code>False</code></td>
 </table>
 
 (3) Process the prediction results. Each sample's prediction result is represented as a corresponding Result object, and supports operations such as printing, saving as an image, saving as an `xlsx` file, saving as an `HTML` file, and saving as a `json` file.
@@ -1390,12 +1397,12 @@ SubModules:
   LayoutDetection:
     module_name: layout_detection
     model_name: PicoDet_layout_1x_table
-    model_dir: null # 替换为微调后的版面区域检测模型权重路径
+    model_dir: null # Replace with fine-tuned model weight paths
 
   TableStructureRecognition:
     module_name: table_structure_recognition
     model_name: SLANet_plus
-    model_dir: null # 替换为微调后的表格结构识别模型权重路径
+    model_dir: null # Replace with fine-tuned model weight paths
 
 SubPipelines:
   DocPreprocessor:
@@ -1406,7 +1413,7 @@ SubPipelines:
       DocOrientationClassify:
         module_name: doc_text_orientation
         model_name: PP-LCNet_x1_0_doc_ori
-        model_dir: null # 替换为微调后的文档图像方向分类模型权重路径
+        model_dir: null # Replace with fine-tuned model weight paths
 
       DocUnwarping:
         module_name: image_unwarping
@@ -1422,16 +1429,16 @@ SubPipelines:
       TextDetection:
         module_name: text_detection
         model_name: PP-OCRv4_server_det
-        model_dir: null # 替换为微调后的文本检测模型权重路径
+        model_dir: null # Replace with fine-tuned model weight paths
         limit_side_len: 960
         limit_type: max
         thresh: 0.3
-        box_thresh: 0.6
+        box_thresh: 0.4
         unclip_ratio: 2.0
       TextRecognition:
         module_name: text_recognition
         model_name: PP-OCRv4_server_rec
-        model_dir: null # 替换为微调后文本识别的模型权重路径
+        model_dir: null # Replace with fine-tuned model weight paths
         batch_size: 1
         score_thresh: 0
 ```

+ 11 - 3
docs/pipeline_usage/tutorials/ocr_pipelines/table_recognition.md

@@ -7,7 +7,7 @@ comments: true
 ## 1. 通用表格识别产线介绍
 表格识别是一种自动从文档或图像中识别和提取表格内容及其结构的技术,广泛应用于数据录入、信息检索和文档分析等领域。通过使用计算机视觉和机器学习算法,表格识别能够将复杂的表格信息转换为可编辑的格式,方便用户进一步处理和分析数据。
 
-通用表格识别产线用于解决表格识别任务,对图片中的表格进行识别,并以HTML格式输出。本产线集成了业界知名的 SLANet 和 SLANet_plus 表格识别模型。基于本产线,可实现对表格的精准预测,使用场景覆盖通用、制造、金融、交通等各个领域。本产线同时提供了灵活的服务化部署方式,支持在多种硬件上使用多种编程语言调用。不仅如此,本产线也提供了二次开发的能力,您可以基于本产线在您自己的数据集上训练调优,训练后的模型也可以无缝集成。
+通用表格识别产线用于解决表格识别任务,对图片中的表格进行识别,并以HTML格式输出。本产线集成了业界知名的 SLANet 和 SLANet_plus 表格结构识别模型。基于本产线,可实现对表格的精准预测,使用场景覆盖通用、制造、金融、交通等各个领域。本产线同时提供了灵活的服务化部署方式,支持在多种硬件上使用多种编程语言调用。不仅如此,本产线也提供了二次开发的能力,您可以基于本产线在您自己的数据集上训练调优,训练后的模型也可以无缝集成。
 
 <img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/pipelines/table_recognition/01.png"/>
 <b>通用</b><b>表格识别</b><b>产线中包含必选的表格结构识别模块、文本检测模块和文本识别模块,以及可选的版面区域检测模块、文档图像方向分类模块和文本图像矫正模块</b>。
@@ -16,7 +16,7 @@ comments: true
 
 <details><summary> 👉模型列表详情</summary>
 
-<p><b>表格识别模块模型:</b></p>
+<p><b>表格结构识别模块模型:</b></p>
 <table>
 <tr>
 <th>模型</th><th>模型下载链接</th>
@@ -812,6 +812,14 @@ for res in output:
 <li><b>float</b>:大于 <code>0</code> 的任意浮点数
     <li><b>None</b>:如果设置为 <code>None</code>, 将默认使用产线初始化的该参数值 <code>0.0</code>。即不设阈值</li></li></ul></td>
 <td><code>None</code></td>
+</tr>
+<td><code>use_table_cells_ocr_results</code></td>
+<td>是否启用单元格OCR模式,不启用时采用全局OCR结果填充至HTML表格,启用时逐个单元格做OCR并填充至HTML表格(会增加耗时)。二者在不同场景下性能不同,请根据实际情况选择。</td>
+<td><code>bool|False</code></td>
+<td>
+<ul>
+<li><b>bool</b>:<code>True</code> 或者 <code>False</code>
+<td><code>False</code></td>
 
 </tr></table>
 
@@ -1367,7 +1375,7 @@ SubPipelines:
         limit_side_len: 960
         limit_type: max
         thresh: 0.3
-        box_thresh: 0.6
+        box_thresh: 0.4
         unclip_ratio: 2.0
       TextRecognition:
         module_name: text_recognition

+ 45 - 6
docs/pipeline_usage/tutorials/ocr_pipelines/table_recognition_v2.en.md

@@ -7,9 +7,7 @@ comments: true
 ## 1. Introduction to General Table Recognition v2 Pipeline
 Table recognition is a technology that automatically identifies and extracts table content and its structure from documents or images. It is widely used in data entry, information retrieval, and document analysis. By using computer vision and machine learning algorithms, table recognition can convert complex table information into an editable format, making it easier for users to further process and analyze data.
 
-The General Table Recognition v2 Pipeline(PP-TableMagic) is designed to solve table recognition tasks by identifying tables in images and outputting them in HTML format. Unlike the General Table Recognition Pipeline, this pipeline introduces two additional modules: table classification and table cell detection, which are linked with the table structure recognition module to complete the table recognition task. This pipeline can achieve accurate table predictions and is applicable in various fields such as general, manufacturing, finance, and transportation. It also provides flexible service deployment options, supporting multiple programming languages on various hardware. Additionally, it offers custom development capabilities, allowing you to train and fine-tune models on your own dataset, with seamless integration of the trained models.
-
-<b>❗ The General Table Recognition v2 Pipeline is still being optimized and the final version will be released in the next version of PaddleX. In order to maintain the stability of use, you can use the General Table Recognition Pipeline for table processing first, and we will release a notice when the final version of v2 is open-sourced, so please stay tuned!</b>
+The General Table Recognition v2 Pipeline (PP-TableMagic) is designed to solve table recognition tasks by identifying tables in images and outputting them in HTML format. Unlike the General Table Recognition Pipeline, this pipeline introduces two additional modules: table classification and table cell detection, which are linked with the table structure recognition module to complete the table recognition task. This pipeline can achieve accurate table predictions and is applicable in various fields such as general, manufacturing, finance, and transportation. It also provides flexible service deployment options, supporting multiple programming languages on various hardware. Additionally, it offers custom development capabilities, allowing you to train and fine-tune models on your own dataset, with seamless integration of the trained models. <b> In addition, the General Table Recognition v2 Pipeline also supports the use of end-to-end table structure recognition models (e.g. SLANet, SLANet_plus, etc.), and supports independent configuration of table recognition for wired and wireless table, allowing developers to freely select and combine the best table recognition solutions.</b>
 
 <img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/pipelines/table_recognition_v2/01.png"/>
 
@@ -19,7 +17,7 @@ The General Table Recognition v2 Pipeline(PP-TableMagic) is designed to solve ta
 
 <details><summary> 👉Model List Details</summary>
 
-<p><b>Table Recognition Module Models:</b></p>
+<p><b>Table Structure Recognition Module Models:</b></p>
 <table>
 <tr>
 <th>Model</th><th>Model Download Link</th>
@@ -894,7 +892,23 @@ In the above Python script, the following steps are executed:
 <td><code>None</code></td>
 </tr>
 <td><code>use_table_cells_ocr_results</code></td>
-<td>Whether to enable Table-Cells-OCR mode, when not enabled, use global OCR result to fill to html table, when enabled, do OCR cell by cell and fill to html table. Both of them perform differently in different scenarios, please choose according to the actual situation.</td>
+<td>Whether to enable Table-Cells-OCR mode, when not enabled, use global OCR result to fill to HTML table, when enabled, do OCR cell by cell and fill to HTML table (it will increase the time consuming). Both of them perform differently in different scenarios, please choose according to the actual situation.</td>
+<td><code>bool|False</code></td>
+<td>
+<ul>
+<li><b>bool</b>:<code>True</code> or <code>False</code>
+<td><code>False</code></td>
+</tr>
+<td><code>use_e2e_wired_table_rec_model</code></td>
+<td>Whether to enable the wired table end-to-end prediction mode, when not enabled, using the table cells detection model prediction results filled to the HTML table, when enabled, using the end-to-end table structure recognition model cell prediction results filled to the HTML table. Both of them have different performance in different scenarios, please choose according to the actual situation.</td>
+<td><code>bool|False</code></td>
+<td>
+<ul>
+<li><b>bool</b>:<code>True</code> or <code>False</code>
+<td><code>False</code></td>
+</tr>
+<td><code>use_e2e_wireless_table_rec_model</code></td>
+<td>Whether to enable the wireless table end-to-end prediction mode, when not enabled, using the table cells detection model prediction results filled to the HTML table, when enabled, using the end-to-end table structure recognition model cell prediction results filled to the HTML table. Both of them have different performance in different scenarios, please choose according to the actual situation.</td>
 <td><code>bool|False</code></td>
 <td>
 <ul>
@@ -902,6 +916,31 @@ In the above Python script, the following steps are executed:
 <td><code>False</code></td>
 </table>
 
+<b>If you need to use the end-to-end table structure recognition model, just replace the corresponding table structure recognition model with the end-to-end table structure recognition model in the pipeline config file, and then load the modified config file and modify the corresponding `predict()` method parameter</b>. For example, if you need to use SLANet_plus to do end-to-end table recognition for wireless tables, just replace `model_name` with SLANet_plus in `WirelessTableStructureRecognition` in the config file (as shown below) and specify `use_e2e_ wireless_table_rec_model=True` in the prediction, the rest of the parts do not need to be modified, at this time the wireless table cells detection model will not take effect, but directly use SLANet_plus for end-to-end table recognition.
+
+```yaml
+SubModules:
+  WiredTableStructureRecognition:
+    module_name: table_structure_recognition
+    model_name: SLANeXt_wired
+    model_dir: null
+
+  WirelessTableStructureRecognition:
+    module_name: table_structure_recognition
+    model_name: SLANet_plus  # Replace with the end-to-end table structure recognition model
+    model_dir: null
+
+  WiredTableCellsDetection:
+    module_name: table_cells_detection
+    model_name: RT-DETR-L_wired_table_cell_det
+    model_dir: null
+
+  WirelessTableCellsDetection:
+    module_name: table_cells_detection
+    model_name: RT-DETR-L_wireless_table_cell_det
+    model_dir: null
+```
+
 (3) Process the prediction results, where each sample's prediction result is represented as a corresponding Result object, and supports operations such as printing, saving as an image, saving as an `xlsx` file, saving as an `HTML` file, and saving as a `json` file:
 
 <table>
@@ -1471,7 +1510,7 @@ SubPipelines:
         limit_side_len: 960
         limit_type: max
         thresh: 0.3
-        box_thresh: 0.6
+        box_thresh: 0.4
         unclip_ratio: 2.0
 
       TextRecognition:

+ 46 - 6
docs/pipeline_usage/tutorials/ocr_pipelines/table_recognition_v2.md

@@ -7,19 +7,17 @@ comments: true
 ## 1. 通用表格识别v2产线介绍
 表格识别是一种自动从文档或图像中识别和提取表格内容及其结构的技术,广泛应用于数据录入、信息检索和文档分析等领域。通过使用计算机视觉和机器学习算法,表格识别能够将复杂的表格信息转换为可编辑的格式,方便用户进一步处理和分析数据。
 
-通用表格识别v2产线(PP-TableMagic)用于解决表格识别任务,对图片中的表格进行识别,并以HTML格式输出。与通用表格识别产线不同,本产线新引入了表格分类和表格单元格检测两个模块,通过采用“表格分类+表格结构识别+单元格检测”多模型串联组网方案,实现了相比通用表格识别产线更好的端到端表格识别性能。除此之外,通用表格识别v2产线原生支持针对性地模型微调,各类开发者均能对通用表格识别v2产线进行不同程度的自定义微调,使其在不同应用场景下都能得到令人满意的性能。
+通用表格识别v2产线(PP-TableMagic)用于解决表格识别任务,对图片中的表格进行识别,并以HTML格式输出。与通用表格识别产线不同,本产线新引入了表格分类和表格单元格检测两个模块,通过<b>采用“表格分类+表格结构识别+单元格检测”多模型串联组网方案</b>,实现了相比通用表格识别产线更好的端到端表格识别性能。基于此,通用表格识别v2产线<b>原生支持针对性地模型微调</b>,各类开发者均能对通用表格识别v2产线进行不同程度的自定义微调,使其在不同应用场景下都能得到令人满意的性能。<b>除此之外,通用表格识别v2产线同样支持使用端到端表格结构识别模型(例如 SLANet、SLANet_plus 等),并且支持有线表、无线表独立配置表格识别方式,开发者可以自由选取和组合最佳的表格识别方案。</b>
 
 本产线的使用场景覆盖通用、制造、金融、交通等各个领域。本产线同时提供了灵活的服务化部署方式,支持在多种硬件上使用多种编程语言调用。不仅如此,本产线也提供了二次开发的能力,您可以基于本产线在您自己的数据集上训练调优,训练后的模型也可以无缝集成。
 
-<b>❗ 通用表格识别v2产线仍在持续优化中,将在 PaddleX 下一版本发布最终版。为保持使用的稳定性,您可以先使用通用表格识别产线进行表格处理,v2最终版开源后我们将发布通知,敬请期待!</b>
-
 <img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/pipelines/table_recognition_v2/01.png"/>
 
 <b>通用</b><b>表格识别</b><b>产线v2中包含必选的表格结构识别模块、表格分类模块、表格单元格定位模块、文本检测模块和文本识别模块,以及可选的版面区域检测模块、文档图像方向分类模块和文本图像矫正模块</b>。
 
 <b>如果您更注重模型的精度,请选择精度较高的模型;如果您更在意模型的推理速度,请选择推理速度较快的模型;如果您关注模型的存储大小,请选择存储体积较小的模型。</b>
 <details><summary> 👉模型列表详情</summary>
-<p><b>表格识别模块模型:</b></p>
+<p><b>表格结构识别模块模型:</b></p>
 <table>
 <tr>
 <th>模型</th><th>模型下载链接</th>
@@ -897,7 +895,23 @@ for res in output:
 <td><code>None</code></td>
 </tr>
 <td><code>use_table_cells_ocr_results</code></td>
-<td>是否启用单元格OCR模式,不启用时采用全局OCR结果填充至html表格,启用时逐个单元格做OCR并填充至html表格。二者在不同场景下表现不同,请根据实际情况选择。</td>
+<td>是否启用单元格OCR模式,不启用时采用全局OCR结果填充至HTML表格,启用时逐个单元格做OCR并填充至HTML表格(会增加耗时)。二者在不同场景下性能不同,请根据实际情况选择。</td>
+<td><code>bool|False</code></td>
+<td>
+<ul>
+<li><b>bool</b>:<code>True</code> 或者 <code>False</code>
+<td><code>False</code></td>
+</tr>
+<td><code>use_e2e_wired_table_rec_model</code></td>
+<td>是否启用有线表格端到端预测模式,不启用时采用表格单元格检测模型预测结果填充至HTML表格,启用时采用端到端表格结构识别模型的单元格预测结果填充至HTML表格。二者在不同场景下性能不同,请根据实际情况选择。</td>
+<td><code>bool|False</code></td>
+<td>
+<ul>
+<li><b>bool</b>:<code>True</code> 或者 <code>False</code>
+<td><code>False</code></td>
+</tr>
+<td><code>use_e2e_wireless_table_rec_model</code></td>
+<td>是否启用无线表格端到端预测模式,不启用时采用表格单元格检测模型预测结果填充至HTML表格,启用时采用端到端表格结构识别模型的单元格预测结果填充至HTML表格。二者在不同场景下性能不同,请根据实际情况选择。</td>
 <td><code>bool|False</code></td>
 <td>
 <ul>
@@ -906,6 +920,32 @@ for res in output:
 
 </tr></table>
 
+<b>如果您需要使用端到端表格结构识别模型,只需在产线配置文件中将对应的表格结构识别模型替换为端到端表格结构识别模型,然后直接加载修改后的配置文件并修改对应的`predict()` 方法参数即可</b>。例如,如果您需要使用 SLANet_plus 对无线表格做端到端表格识别,只需将配置文件中 `WirelessTableStructureRecognition` 中的 `model_name` 替换为 SLANet_plus(如下所示),并在预测时指定 `use_e2e_wireless_table_rec_model=True` 即可,其余部分无需修改,此时无线表单元格检测模型将不会生效,而是直接使用 SLANet_plus 进行端到端表格识别。
+
+```yaml
+SubModules:
+  WiredTableStructureRecognition:
+    module_name: table_structure_recognition
+    model_name: SLANeXt_wired
+    model_dir: null
+
+  WirelessTableStructureRecognition:
+    module_name: table_structure_recognition
+    model_name: SLANet_plus  # 替换为需使用的端到端表格结构识别模型
+    model_dir: null
+
+  WiredTableCellsDetection:
+    module_name: table_cells_detection
+    model_name: RT-DETR-L_wired_table_cell_det
+    model_dir: null
+
+  WirelessTableCellsDetection:
+    module_name: table_cells_detection
+    model_name: RT-DETR-L_wireless_table_cell_det
+    model_dir: null
+```
+
+
 (3)对预测结果进行处理,每个样本的预测结果均为对应的Result对象,且支持打印、保存为图片、保存为`xlsx`文件、保存为`HTML`文件、保存为`json`文件的操作:
 
 <table>
@@ -1473,7 +1513,7 @@ SubPipelines:
         limit_side_len: 960
         limit_type: max
         thresh: 0.3
-        box_thresh: 0.6
+        box_thresh: 0.4
         unclip_ratio: 2.0
 
       TextRecognition:

+ 1 - 1
paddlex/configs/pipelines/table_recognition_v2.yaml

@@ -65,7 +65,7 @@ SubPipelines:
         limit_side_len: 960
         limit_type: max
         thresh: 0.3
-        box_thresh: 0.6
+        box_thresh: 0.4
         unclip_ratio: 2.0
         
       TextRecognition:

+ 42 - 0
paddlex/inference/pipelines/table_recognition/pipeline.py

@@ -15,6 +15,7 @@
 import os, sys
 from typing import Any, Dict, Optional, Union, Tuple, List
 import numpy as np
+import math
 import cv2
 from ..base import BasePipeline
 from ..components import CropByBoxes
@@ -216,12 +217,40 @@ class TableRecognitionPipeline(BasePipeline):
             doc_preprocessor_res = {}
             doc_preprocessor_image = image_array
         return doc_preprocessor_res, doc_preprocessor_image
+    
+    def split_ocr_bboxes_by_table_cells(self, ori_img, cells_bboxes):
+        """
+        Splits OCR bounding boxes by table cells and retrieves text.
+
+        Args:
+            ori_img (ndarray): The original image from which text regions will be extracted.
+            cells_bboxes (list or ndarray): Detected cell bounding boxes to extract text from.
+
+        Returns:
+            list: A list containing the recognized texts from each cell.
+        """
+
+        # Check if cells_bboxes is a list and convert it if not.
+        if not isinstance(cells_bboxes, list):
+            cells_bboxes = cells_bboxes.tolist()
+        texts_list = []  # Initialize a list to store the recognized texts.
+        # Process each bounding box provided in cells_bboxes.
+        for i in range(len(cells_bboxes)):
+            # Extract and round up the coordinates of the bounding box.
+            x1, y1, x2, y2 = [math.ceil(k) for k in cells_bboxes[i]]
+            # Perform OCR on the defined region of the image and get the recognized text.
+            rec_te = next(self.general_ocr_pipeline(ori_img[y1:y2, x1:x2, :]))
+            # Concatenate the texts and append them to the texts_list.
+            texts_list.append(''.join(rec_te["rec_texts"]))
+        # Return the list of recognized texts from each cell.
+        return texts_list
 
     def predict_single_table_recognition_res(
         self,
         image_array: np.ndarray,
         overall_ocr_res: OCRResult,
         table_box: list,
+        use_table_cells_ocr_results: bool = False,
         flag_find_nei_text: bool = True,
         cell_sort_by_y_projection: bool = False,
     ) -> SingleTableRecognitionResult:
@@ -233,16 +262,25 @@ class TableRecognitionPipeline(BasePipeline):
             overall_ocr_res (OCRResult): Overall OCR result obtained after running the OCR pipeline.
                 The overall OCR results containing text recognition information.
             table_box (list): The table box coordinates.
+            use_table_cells_ocr_results (bool): whether to use OCR results with cells.
             flag_find_nei_text (bool): Whether to find neighboring text.
             cell_sort_by_y_projection (bool): Whether to sort the matched OCR boxes by y-projection.
         Returns:
             SingleTableRecognitionResult: single table recognition result.
         """
         table_structure_pred = next(self.table_structure_model(image_array))
+        if use_table_cells_ocr_results == True:
+            table_cells_result = list(map(lambda arr: arr.tolist(), table_structure_pred['bbox']))
+            table_cells_result = [[rect[0], rect[1], rect[4], rect[5]] for rect in table_cells_result]
+            cells_texts_list = self.split_ocr_bboxes_by_table_cells(image_array, table_cells_result)
+        else:
+            cells_texts_list = []
         single_table_recognition_res = get_table_recognition_res(
             table_box,
             table_structure_pred,
             overall_ocr_res,
+            cells_texts_list,
+            use_table_cells_ocr_results,
             cell_sort_by_y_projection=cell_sort_by_y_projection,
         )
         neighbor_text = ""
@@ -271,6 +309,7 @@ class TableRecognitionPipeline(BasePipeline):
         text_det_box_thresh: Optional[float] = None,
         text_det_unclip_ratio: Optional[float] = None,
         text_rec_score_thresh: Optional[float] = None,
+        use_table_cells_ocr_results: Optional[bool] = False,
         cell_sort_by_y_projection: Optional[bool] = None,
         **kwargs,
     ) -> TableRecognitionResult:
@@ -286,6 +325,7 @@ class TableRecognitionPipeline(BasePipeline):
                 It will be used if it is not None and use_ocr_model is False.
             layout_det_res (DetResult): The layout detection result.
                 It will be used if it is not None and use_layout_detection is False.
+            use_table_cells_ocr_results (bool): whether to use OCR results with cells.
             cell_sort_by_y_projection (bool): Whether to sort the matched OCR boxes by y-projection.
             **kwargs: Additional keyword arguments.
 
@@ -346,6 +386,7 @@ class TableRecognitionPipeline(BasePipeline):
                     doc_preprocessor_image,
                     overall_ocr_res,
                     table_box,
+                    use_table_cells_ocr_results,
                     flag_find_nei_text=False,
                     cell_sort_by_y_projection=cell_sort_by_y_projection,
                 )
@@ -366,6 +407,7 @@ class TableRecognitionPipeline(BasePipeline):
                                 crop_img_info["img"],
                                 overall_ocr_res,
                                 table_box,
+                                use_table_cells_ocr_results,
                                 cell_sort_by_y_projection=cell_sort_by_y_projection,
                             )
                         )

+ 62 - 22
paddlex/inference/pipelines/table_recognition/pipeline_v2.py

@@ -22,6 +22,7 @@ from ..base import BasePipeline
 from ..components import CropByBoxes
 from .utils import get_neighbor_boxes_idx
 from .table_recognition_post_processing_v2 import get_table_recognition_res
+from .table_recognition_post_processing import get_table_recognition_res as get_table_recognition_res_e2e
 from .result import SingleTableRecognitionResult, TableRecognitionResult
 from ....utils import logging
 from ...utils.pp_option import PaddlePredictorOption
@@ -532,6 +533,8 @@ class TableRecognitionPipelineV2(BasePipeline):
         overall_ocr_res: OCRResult,
         table_box: list,
         use_table_cells_ocr_results: bool = False,
+        use_e2e_wired_table_rec_model: bool = False, 
+        use_e2e_wireless_table_rec_model: bool = False,
         flag_find_nei_text: bool = True,
     ) -> SingleTableRecognitionResult:
         """
@@ -542,6 +545,9 @@ class TableRecognitionPipelineV2(BasePipeline):
             overall_ocr_res (OCRResult): Overall OCR result obtained after running the OCR pipeline.
                 The overall OCR results containing text recognition information.
             table_box (list): The table box coordinates.
+            use_table_cells_ocr_results (bool): whether to use OCR results with cells.
+            use_e2e_wired_table_rec_model (bool): Whether to use end-to-end wired table recognition model.
+            use_e2e_wireless_table_rec_model (bool): Whether to use end-to-end wireless table recognition model.
             flag_find_nei_text (bool): Whether to find neighboring text.
         Returns:
             SingleTableRecognitionResult: single table recognition result.
@@ -549,32 +555,53 @@ class TableRecognitionPipelineV2(BasePipeline):
 
         table_cls_pred = next(self.table_cls_model(image_array))
         table_cls_result = self.extract_results(table_cls_pred, "cls")
+        use_e2e_model = False
+
         if table_cls_result == "wired_table":
             table_structure_pred = next(self.wired_table_rec_model(image_array))
-            table_cells_pred = next(
-                self.wired_table_cells_detection_model(image_array, threshold=0.3)
-            ) # Setting the threshold to 0.3 can improve the accuracy of table cells detection. 
-              # If you really want more or fewer table cells detection boxes, the threshold can be adjusted.
+            if use_e2e_wired_table_rec_model == True:
+                use_e2e_model = True
+            else:
+                table_cells_pred = next(
+                    self.wired_table_cells_detection_model(image_array, threshold=0.3)
+                ) # Setting the threshold to 0.3 can improve the accuracy of table cells detection. 
+                  # If you really want more or fewer table cells detection boxes, the threshold can be adjusted.
         elif table_cls_result == "wireless_table":
             table_structure_pred = next(self.wireless_table_rec_model(image_array))
-            table_cells_pred = next(
-                self.wireless_table_cells_detection_model(image_array, threshold=0.3)
-            ) # Setting the threshold to 0.3 can improve the accuracy of table cells detection. 
-              # If you really want more or fewer table cells detection boxes, the threshold can be adjusted.
-        table_structure_result = self.extract_results(table_structure_pred, "table_stru")
-        table_cells_result, table_cells_score = self.extract_results(table_cells_pred, "det")
-        table_cells_result, table_cells_score = self.cells_det_results_nms(table_cells_result, table_cells_score)
-        ocr_det_boxes = self.get_region_ocr_det_boxes(overall_ocr_res["rec_boxes"].tolist(), table_box)
-        table_cells_result = self.cells_det_results_reprocessing(
-            table_cells_result, table_cells_score, ocr_det_boxes, len(table_structure_pred['bbox'])
-        )
-        if use_table_cells_ocr_results == True:
-            cells_texts_list = self.split_ocr_bboxes_by_table_cells(image_array, table_cells_result)
+            if use_e2e_wireless_table_rec_model == True:
+                use_e2e_model = True
+            else:
+                table_cells_pred = next(
+                    self.wireless_table_cells_detection_model(image_array, threshold=0.3)
+                ) # Setting the threshold to 0.3 can improve the accuracy of table cells detection. 
+                  # If you really want more or fewer table cells detection boxes, the threshold can be adjusted.
+
+        if use_e2e_model == False:
+            table_structure_result = self.extract_results(table_structure_pred, "table_stru")
+            table_cells_result, table_cells_score = self.extract_results(table_cells_pred, "det")
+            table_cells_result, table_cells_score = self.cells_det_results_nms(table_cells_result, table_cells_score)
+            ocr_det_boxes = self.get_region_ocr_det_boxes(overall_ocr_res["rec_boxes"].tolist(), table_box)
+            table_cells_result = self.cells_det_results_reprocessing(
+                table_cells_result, table_cells_score, ocr_det_boxes, len(table_structure_pred['bbox'])
+            )
+            if use_table_cells_ocr_results == True:
+                cells_texts_list = self.split_ocr_bboxes_by_table_cells(image_array, table_cells_result)
+            else:
+                cells_texts_list = []
+            single_table_recognition_res = get_table_recognition_res(
+                table_box, table_structure_result, table_cells_result, overall_ocr_res, cells_texts_list, use_table_cells_ocr_results
+            )
         else:
-            cells_texts_list = []
-        single_table_recognition_res = get_table_recognition_res(
-            table_box, table_structure_result, table_cells_result, overall_ocr_res, cells_texts_list, use_table_cells_ocr_results
-        )
+            if use_table_cells_ocr_results == True:
+                table_cells_result_e2e = list(map(lambda arr: arr.tolist(), table_structure_pred['bbox']))
+                table_cells_result_e2e = [[rect[0], rect[1], rect[4], rect[5]] for rect in table_cells_result_e2e]
+                cells_texts_list = self.split_ocr_bboxes_by_table_cells(image_array, table_cells_result_e2e)
+            else:
+                cells_texts_list = []
+            single_table_recognition_res = get_table_recognition_res_e2e(
+                table_box, table_structure_pred, overall_ocr_res, cells_texts_list, use_table_cells_ocr_results
+            )
+
         neighbor_text = ""
         if flag_find_nei_text:
             match_idx_list = get_neighbor_boxes_idx(
@@ -602,6 +629,8 @@ class TableRecognitionPipelineV2(BasePipeline):
         text_det_unclip_ratio: Optional[float] = None,
         text_rec_score_thresh: Optional[float] = None,
         use_table_cells_ocr_results: Optional[bool] = False,
+        use_e2e_wired_table_rec_model: Optional[bool] = False,
+        use_e2e_wireless_table_rec_model: Optional[bool] = False,
         **kwargs,
     ) -> TableRecognitionResult:
         """
@@ -616,6 +645,10 @@ class TableRecognitionPipelineV2(BasePipeline):
                 It will be used if it is not None and use_ocr_model is False.
             layout_det_res (DetResult): The layout detection result.
                 It will be used if it is not None and use_layout_detection is False.
+            use_table_cells_ocr_results (bool): whether to use OCR results with cells.
+            use_e2e_wired_table_rec_model (bool): Whether to use end-to-end wired table recognition model.
+            use_e2e_wireless_table_rec_model (bool): Whether to use end-to-end wireless table recognition model.
+            flag_find_nei_text (bool): Whether to find neighboring text.
             **kwargs: Additional keyword arguments.
 
         Returns:
@@ -674,6 +707,8 @@ class TableRecognitionPipelineV2(BasePipeline):
                     overall_ocr_res,
                     table_box,
                     use_table_cells_ocr_results,
+                    use_e2e_wired_table_rec_model,
+                    use_e2e_wireless_table_rec_model,
                     flag_find_nei_text=False,
                 )
                 single_table_rec_res["table_region_id"] = table_region_id
@@ -690,7 +725,12 @@ class TableRecognitionPipelineV2(BasePipeline):
                         table_box = crop_img_info["box"]
                         single_table_rec_res = (
                             self.predict_single_table_recognition_res(
-                                crop_img_info["img"], overall_ocr_res, table_box, use_table_cells_ocr_results
+                                crop_img_info["img"],
+                                overall_ocr_res,
+                                table_box,
+                                use_table_cells_ocr_results,
+                                use_e2e_wired_table_rec_model,
+                                use_e2e_wireless_table_rec_model,
                             )
                         )
                         single_table_rec_res["table_region_id"] = table_region_id

+ 24 - 2
paddlex/inference/pipelines/table_recognition/table_recognition_post_processing.py

@@ -299,6 +299,8 @@ def get_table_recognition_res(
     table_box: list,
     table_structure_pred: dict,
     overall_ocr_res: OCRResult,
+    cells_texts_list: list,
+    use_table_cells_ocr_results: bool,
     cell_sort_by_y_projection: bool = False,
 ) -> SingleTableRecognitionResult:
     """
@@ -308,6 +310,8 @@ def get_table_recognition_res(
         table_box (list): Information about the location of cropped image, including the bounding box.
         table_structure_pred (dict): Predicted table structure.
         overall_ocr_res (OCRResult): Overall OCR result from the input image.
+        cells_texts_list (list): OCR results with cells.
+        use_table_cells_ocr_results (bool): whether to use OCR results with cells.
         cell_sort_by_y_projection (bool): Whether to sort the matched OCR boxes by y-projection.
 
     Returns:
@@ -319,13 +323,31 @@ def get_table_recognition_res(
     crop_start_point = [table_box[0][0], table_box[0][1]]
     img_shape = overall_ocr_res["doc_preprocessor_res"]["output_img"].shape[0:2]
 
+    if len(table_structure_pred['bbox']) == 0 or len(table_ocr_pred["rec_boxes"]) == 0:
+        pred_html = ' '.join(list(table_structure_pred["structure"]))
+        if len(table_structure_pred['bbox']) != 0:
+            convert_table_structure_pred_bbox(table_structure_pred, crop_start_point, img_shape)
+            table_cells_result = table_structure_pred["cell_box_list"]
+        else:
+            table_cells_result = []
+        single_img_res = {
+            "cell_box_list": table_cells_result,
+            "table_ocr_pred": table_ocr_pred,
+            "pred_html": pred_html,
+        }
+        return SingleTableRecognitionResult(single_img_res)
+
     convert_table_structure_pred_bbox(table_structure_pred, crop_start_point, img_shape)
 
     structures = table_structure_pred["structure"]
     cell_box_list = table_structure_pred["cell_box_list"]
 
-    ocr_dt_boxes = table_ocr_pred["rec_boxes"]
-    ocr_texts_res = table_ocr_pred["rec_texts"]
+    if use_table_cells_ocr_results == True:
+        ocr_dt_boxes = cell_box_list
+        ocr_texts_res = cells_texts_list
+    else:
+        ocr_dt_boxes = table_ocr_pred["rec_boxes"]
+        ocr_texts_res = table_ocr_pred["rec_texts"]
 
     matched_index = match_table_and_ocr(
         cell_box_list, ocr_dt_boxes, cell_sort_by_y_projection=cell_sort_by_y_projection

+ 4 - 4
paddlex/inference/pipelines/table_recognition/table_recognition_post_processing_v2.py

@@ -446,12 +446,12 @@ def get_table_recognition_res(
         table_cells_result, crop_start_point, img_shape
     )
 
-    if use_table_cells_ocr_results == False:
-        ocr_dt_boxes = table_ocr_pred["rec_boxes"]
-        ocr_texts_res = table_ocr_pred["rec_texts"]
-    else:
+    if use_table_cells_ocr_results == True:
         ocr_dt_boxes = table_cells_result
         ocr_texts_res = cells_texts_list
+    else:
+        ocr_dt_boxes = table_ocr_pred["rec_boxes"]
+        ocr_texts_res = table_ocr_pred["rec_texts"]
 
     table_cells_result, table_cells_flag = sort_table_cells_boxes(table_cells_result)
     row_start_index = find_row_start_index(table_structure_result)

+ 15 - 0
paddlex/utils/pipeline_arguments.py

@@ -144,6 +144,11 @@ PIPELINE_ARGUMENTS = {
     ],
     "table_recognition": [
         {
+            "name": "--use_table_cells_ocr_results",
+            "type": bool,
+            "help": "Determines whether to use cells OCR results",
+        },
+        {
             "name": "--use_doc_orientation_classify",
             "type": bool,
             "help": "Determines whether to use document preprocessing",
@@ -201,6 +206,16 @@ PIPELINE_ARGUMENTS = {
             "help": "Determines whether to use cells OCR results",
         },
         {
+            "name": "--use_e2e_wired_table_rec_model",
+            "type": bool,
+            "help": "Determines whether to use end-to-end wired table recognition model",
+        },
+        {
+            "name": "--use_e2e_wireless_table_rec_model",
+            "type": bool,
+            "help": "Determines whether to use end-to-end wireless table recognition model",
+        },
+        {
             "name": "--use_doc_orientation_classify",
             "type": bool,
             "help": "Determines whether to use document preprocessing",