فهرست منبع

refine codes for pp-tablemagic (#3538)

* fix bugs

* refine codes

* fix bugs

* refine code

* add new algorithm

* refine codes

* refine codes

* refine table method

* fix bugs

* test

* refine codes

* refine codes
Liu Jiaxuan 8 ماه پیش
والد
کامیت
a00bb9b041

+ 0 - 3
docs/module_usage/tutorials/ocr_modules/table_cells_detection.en.md

@@ -505,8 +505,5 @@ The table cell detection module can be integrated into the PaddleX pipeline [Gen
 2.<b>Module Integration</b>
 
 The weights you generate can be directly integrated into the table cell detection module. You can refer to the Python example code in [Quick Integration](#3-Quick-Integration). Simply replace the model with the path of the model you have trained.
-<<<<<<< HEAD
 
 You can also use the PaddleX high-performance inference plugin to optimize the inference process of your model and further improve efficiency. For detailed procedures, please refer to the [PaddleX High-Performance Inference Guide](../../../pipeline_deploy/high_performance_inference.en.md).
-=======
->>>>>>> update docs of benchmark

+ 20 - 20
docs/pipeline_usage/tutorials/ocr_pipelines/table_recognition_v2.en.md

@@ -7,7 +7,7 @@ comments: true
 ## 1. Introduction to General Table Recognition v2 Pipeline
 Table recognition is a technology that automatically identifies and extracts table content and its structure from documents or images. It is widely used in data entry, information retrieval, and document analysis. By using computer vision and machine learning algorithms, table recognition can convert complex table information into an editable format, making it easier for users to further process and analyze data.
 
-The General Table Recognition v2 Pipeline is designed to solve table recognition tasks by identifying tables in images and outputting them in HTML format. Unlike the General Table Recognition Pipeline, this pipeline introduces two additional modules: table classification and table cell detection, which are linked with the table structure recognition module to complete the table recognition task. This pipeline can achieve accurate table predictions and is applicable in various fields such as general, manufacturing, finance, and transportation. It also provides flexible service deployment options, supporting multiple programming languages on various hardware. Additionally, it offers custom development capabilities, allowing you to train and fine-tune models on your own dataset, with seamless integration of the trained models.
+The General Table Recognition v2 Pipeline(PP-TableMagic) is designed to solve table recognition tasks by identifying tables in images and outputting them in HTML format. Unlike the General Table Recognition Pipeline, this pipeline introduces two additional modules: table classification and table cell detection, which are linked with the table structure recognition module to complete the table recognition task. This pipeline can achieve accurate table predictions and is applicable in various fields such as general, manufacturing, finance, and transportation. It also provides flexible service deployment options, supporting multiple programming languages on various hardware. Additionally, it offers custom development capabilities, allowing you to train and fine-tune models on your own dataset, with seamless integration of the trained models.
 
 <b>❗ The General Table Recognition v2 Pipeline is still being optimized and the final version will be released in the next version of PaddleX. In order to maintain the stability of use, you can use the General Table Recognition Pipeline for table processing first, and we will release a notice when the final version of v2 is open-sourced, so please stay tuned!</b>
 
@@ -630,49 +630,49 @@ Online experience is not supported at the moment.
 Before using the General Table Recognition v2 Pipeline locally, please ensure that you have completed the installation of the PaddleX wheel package according to the [PaddleX Local Installation Tutorial](../../../installation/installation.en.md).
 
 ### 2.3 Command Line Experience
-You can quickly experience the table recognition pipeline with a single command. Use the [test file](https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/table_recognition.jpg) (Note: The link may not be accessible due to network issues or link validity. Please check the link and try again if necessary.) and replace `--input` with the local path for prediction.
+You can quickly experience the table recognition pipeline with a single command. Use the [test file](https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/table_recognition_v2.jpg) (Note: The link may not be accessible due to network issues or link validity. Please check the link and try again if necessary.) and replace `--input` with the local path for prediction.
 
 ```bash
 paddlex --pipeline table_recognition_v2 \
         --use_doc_orientation_classify=False \
         --use_doc_unwarping=False \
-        --input table_recognition.jpg \
+        --input table_recognition_v2.jpg \
         --save_path ./output \
         --device gpu:0
 ```
 
 <details><summary>👉 <b>After running, the result obtained is: (Click to expand)</b></summary>
 
-```bash
-{'res': {'input_path': 'table_recognition.jpg', 'page_index': None, 'model_settings': {'use_doc_preprocessor': False, 'use_layout_detection': True, 'use_ocr_model': True}, 'layout_det_res': {'input_path': None, 'page_index': None, 'boxes': [{'cls_id': 0, 'label': 'Table', 'score': 0.9922188520431519, 'coordinate': [3.0127392, 0.14648987, 547.5102, 127.72023]}]}, 'overall_ocr_res': {'input_path': None, 'page_index': None, 'model_settings': {'use_doc_preprocessor': False, 'use_textline_orientation': False}, 'dt_polys': array([[[234,   6],
+```
+{'res': {'input_path': 'table_recognition_v2.jpg', 'page_index': None, 'model_settings': {'use_doc_preprocessor': False, 'use_layout_detection': True, 'use_ocr_model': True}, 'layout_det_res': {'input_path': None, 'page_index': None, 'boxes': [{'cls_id': 8, 'label': 'table', 'score': 0.86655592918396, 'coordinate': [0.0125130415, 0.41920784, 1281.3737, 585.3884]}]}, 'overall_ocr_res': {'input_path': None, 'page_index': None, 'model_settings': {'use_doc_preprocessor': False, 'use_textline_orientation': False}, 'dt_polys': array([[[   9,   21],
         ...,
-        [234,  25]],
+        [   9,   59]],
 
        ...,
 
-       [[448, 101],
+       [[1046,  536],
         ...,
-        [448, 121]]], dtype=int16), 'text_det_params': {'limit_side_len': 960, 'limit_type': 'max', 'thresh': 0.3, 'box_thresh': 0.6, 'unclip_ratio': 2.0}, 'text_type': 'general', 'textline_orientation_angles': array([-1, ..., -1]), 'text_rec_score_thresh': 0, 'rec_texts': ['CRuncover', 'Dres', '连续工作3', '取出来放在网上', '没想', '江、整江等八大', 'Abstr', 'rSrivi', '$709.', 'cludingGiv', '2.72', 'Ingcubic', '$744.78'], 'rec_scores': array([0.99512607, ..., 0.99844509]), 'rec_polys': array([[[234,   6],
+        [1046,  573]]], dtype=int16), 'text_det_params': {'limit_side_len': 960, 'limit_type': 'max', 'thresh': 0.3, 'box_thresh': 0.6, 'unclip_ratio': 2.0}, 'text_type': 'general', 'textline_orientation_angles': array([-1, ..., -1]), 'text_rec_score_thresh': 0, 'rec_texts': ['部门', '报销人', '报销事由', '批准人:', '单据', '张', '合计金额', '元', '车费票', '其', '火车费票', '飞机票', '中', '旅住宿费', '其他', '补贴'], 'rec_scores': array([0.99958128, ..., 0.99317062]), 'rec_polys': array([[[   9,   21],
         ...,
-        [234,  25]],
+        [   9,   59]],
 
        ...,
 
-       [[448, 101],
+       [[1046,  536],
         ...,
-        [448, 121]]], dtype=int16), 'rec_boxes': array([[234, ...,  25],
+        [1046,  573]]], dtype=int16), 'rec_boxes': array([[   9, ...,   59],
        ...,
-       [448, ..., 121]], dtype=int16)}, 'table_res_list': [{'cell_box_list': [array([ 3.18822289, ..., 30.87823655]), array([ 3.21032453, ..., 65.14108063]), array([110.18174553, ...,  65.02860047]), array([212.96108818, ...,  64.99535157]), array([404.08112907, ...,  65.0847223 ]), array([ 3.21772957, ..., 96.07921387]), array([110.23703575, ...,  96.01378419]), array([213.06095695, ...,  95.97141816]), array([404.23704338, ...,  96.03654267]), array([  3.22793937, ..., 127.08698823]), array([110.40586662, ..., 127.07002045]), array([213.12627983, ..., 127.02842499]), array([404.33042717, ..., 126.45088746])], 'pred_html': '<html><body><table><tr><td colspan="4">CRuncover</td></tr><tr><td>Dres</td><td>连续工作3</td><td>取出来放在网上 没想</td><td>江、整江等八大</td></tr><tr><td>Abstr</td><td></td><td>rSrivi</td><td>$709.</td></tr><tr><td>cludingGiv</td><td>2.72</td><td>Ingcubic</td><td>$744.78</td></tr></table></body></html>', 'table_ocr_pred': {'rec_polys': array([[[234,   6],
+       [1046, ...,  573]], dtype=int16)}, 'table_res_list': [{'cell_box_list': [array([ 0.13052222, ..., 73.08310249]), array([104.43082511, ...,  73.27777413]), array([319.39041221, ...,  73.30439308]), array([424.2436837 , ...,  73.44736794]), array([580.75836265, ...,  73.24003914]), array([723.04370201, ...,  73.22717598]), array([984.67315757, ...,  73.20420387]), array([1.25130415e-02, ..., 5.85419208e+02]), array([984.37072837, ..., 137.02281502]), array([984.26586998, ..., 201.22290352]), array([984.24017417, ..., 585.30775765]), array([1039.90606773, ...,  265.44664314]), array([1039.69549644, ...,  329.30540779]), array([1039.66546714, ...,  393.57319954]), array([1039.5122689 , ...,  457.74644783]), array([1039.55535972, ...,  521.73030403]), array([1039.58612144, ...,  585.09468392])], 'pred_html': '<html><body><table><tbody><tr><td>部门</td><td></td><td>报销人</td><td></td><td>报销事由</td><td></td><td colspan="2">批准人:</td></tr><tr><td colspan="6" rowspan="8"></td><td colspan="2">单据 张</td></tr><tr><td colspan="2">合计金额 元</td></tr><tr><td rowspan="6">其 中</td><td>车费票</td></tr><tr><td>火车费票</td></tr><tr><td>飞机票</td></tr><tr><td>旅住宿费</td></tr><tr><td>其他</td></tr><tr><td>补贴</td></tr></tbody></table></body></html>', 'table_ocr_pred': {'rec_polys': array([[[   9,   21],
         ...,
-        [234,  25]],
+        [   9,   59]],
 
        ...,
 
-       [[448, 101],
+       [[1046,  536],
         ...,
-        [448, 121]]], dtype=int16), 'rec_texts': ['CRuncover', 'Dres', '连续工作3', '取出来放在网上', '没想', '江、整江等八大', 'Abstr', 'rSrivi', '$709.', 'cludingGiv', '2.72', 'Ingcubic', '$744.78'], 'rec_scores': array([0.99512607, ..., 0.99844509]), 'rec_boxes': array([[234, ...,  25],
+        [1046,  573]]], dtype=int16), 'rec_texts': ['部门', '报销人', '报销事由', '批准人:', '单据', '张', '合计金额', '元', '车费票', '其', '火车费票', '飞机票', '中', '旅住宿费', '其他', '补贴'], 'rec_scores': array([0.99958128, ..., 0.99317062]), 'rec_boxes': array([[   9, ...,   59],
        ...,
-       [448, ..., 121]], dtype=int16)}}]}}
+       [1046, ...,  573]], dtype=int16)}}]}}
 ```
 
 The explanation of the running result parameters can refer to the result interpretation in [2.2.2 Python Script Integration](#222-python-script-integration).
@@ -693,7 +693,7 @@ from paddlex import create_pipeline
 pipeline = create_pipeline(pipeline="table_recognition_v2")
 
 output = pipeline.predict(
-    input="table_recognition.jpg",
+    input="table_recognition_v2.jpg",
     use_doc_orientation_classify=False,
     use_doc_unwarping=False,
 )
@@ -767,7 +767,7 @@ In the above Python script, the following steps are executed:
 <td>
 <ul>
 <li><b>Python Var</b>: Image data represented by <code>numpy.ndarray</code>.</li>
-<li><b>str</b>: Local path of image or PDF files, e.g., <code>/root/data/img.jpg</code>; <b>URL link</b>, such as the network URL of an image or PDF file: <a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/table_recognition.jpg">Example</a>; <b>Local directory</b>, the directory should contain images to be predicted, e.g., <code>/root/data/</code> (currently, prediction for PDF files in directories is not supported; PDF files must specify the exact file path).</li>
+<li><b>str</b>: Local path of image or PDF files, e.g., <code>/root/data/img.jpg</code>; <b>URL link</b>, such as the network URL of an image or PDF file: <a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/table_recognition_v2.jpg">Example</a>; <b>Local directory</b>, the directory should contain images to be predicted, e.g., <code>/root/data/</code> (currently, prediction for PDF files in directories is not supported; PDF files must specify the exact file path).</li>
 <li><b>List</b>: List elements must be of the above types, such as <code>[numpy.ndarray, numpy.ndarray]</code>, <code>[“/root/data/img1.jpg”, “/root/data/img2.jpg”]</code>, <code>[“/root/data1”, “/root/data2”]</code>.</li>
 </ul>
 </td>
@@ -1065,7 +1065,7 @@ from paddlex import create_pipeline
 pipeline = create_pipeline(pipeline="./my_path/table_recognition_v2.yaml")
 
 output = pipeline.predict(
-    input="table_recognition.jpg",
+    input="table_recognition_v2.jpg",
     use_doc_orientation_classify=False,
     use_doc_unwarping=False,
 )
@@ -1486,7 +1486,7 @@ For example, if you use Ascend NPU for OCR pipeline inference, the CLI command i
 paddlex --pipeline table_recognition_v2 \
         --use_doc_orientation_classify=False \
         --use_doc_unwarping=False \
-        --input table_recognition.jpg \
+        --input table_recognition_v2.jpg \
         --save_path ./output \
         --device npu:0
 ```

+ 20 - 20
docs/pipeline_usage/tutorials/ocr_pipelines/table_recognition_v2.md

@@ -7,7 +7,7 @@ comments: true
 ## 1. 通用表格识别v2产线介绍
 表格识别是一种自动从文档或图像中识别和提取表格内容及其结构的技术,广泛应用于数据录入、信息检索和文档分析等领域。通过使用计算机视觉和机器学习算法,表格识别能够将复杂的表格信息转换为可编辑的格式,方便用户进一步处理和分析数据。
 
-通用表格识别v2产线用于解决表格识别任务,对图片中的表格进行识别,并以HTML格式输出。与通用表格识别产线不同,本产线新引入了表格分类和表格单元格检测两个模块,通过采用“表格分类+表格结构识别+单元格检测”多模型串联组网方案,实现了相比通用表格识别产线更好的端到端表格识别性能。除此之外,通用表格识别v2产线原生支持针对性地模型微调,各类开发者均能对通用表格识别v2产线进行不同程度的自定义微调,使其在不同应用场景下都能得到令人满意的性能。
+通用表格识别v2产线(PP-TableMagic)用于解决表格识别任务,对图片中的表格进行识别,并以HTML格式输出。与通用表格识别产线不同,本产线新引入了表格分类和表格单元格检测两个模块,通过采用“表格分类+表格结构识别+单元格检测”多模型串联组网方案,实现了相比通用表格识别产线更好的端到端表格识别性能。除此之外,通用表格识别v2产线原生支持针对性地模型微调,各类开发者均能对通用表格识别v2产线进行不同程度的自定义微调,使其在不同应用场景下都能得到令人满意的性能。
 
 本产线的使用场景覆盖通用、制造、金融、交通等各个领域。本产线同时提供了灵活的服务化部署方式,支持在多种硬件上使用多种编程语言调用。不仅如此,本产线也提供了二次开发的能力,您可以基于本产线在您自己的数据集上训练调优,训练后的模型也可以无缝集成。
 
@@ -642,13 +642,13 @@ PaddleX 所提供的模型产线均可以快速体验效果,你可以在本地
 在本地使用通用表格识别v2产线前,请确保您已经按照[PaddleX本地安装教程](../../../installation/installation.md)完成了PaddleX的wheel包安装。
 
 ### 2.1 命令行方式体验
-一行命令即可快速体验表格识别产线效果,使用 [测试文件](https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/table_recognition.jpg),并将 `--input` 替换为本地路径,进行预测
+一行命令即可快速体验表格识别产线效果,使用 [测试文件](https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/table_recognition_v2.jpg),并将 `--input` 替换为本地路径,进行预测
 
 ```bash
 paddlex --pipeline table_recognition_v2 \
         --use_doc_orientation_classify=False \
         --use_doc_unwarping=False \
-        --input table_recognition.jpg \
+        --input table_recognition_v2.jpg \
         --save_path ./output \
         --device gpu:0
 ```
@@ -657,36 +657,36 @@ paddlex --pipeline table_recognition_v2 \
 
 <details><summary>👉 <b>运行后,得到的结果为:(点击展开)</b></summary>
 
-```bash
-{'res': {'input_path': 'table_recognition.jpg', 'page_index': None, 'model_settings': {'use_doc_preprocessor': False, 'use_layout_detection': True, 'use_ocr_model': True}, 'layout_det_res': {'input_path': None, 'page_index': None, 'boxes': [{'cls_id': 0, 'label': 'Table', 'score': 0.9922188520431519, 'coordinate': [3.0127392, 0.14648987, 547.5102, 127.72023]}]}, 'overall_ocr_res': {'input_path': None, 'page_index': None, 'model_settings': {'use_doc_preprocessor': False, 'use_textline_orientation': False}, 'dt_polys': array([[[234,   6],
+```
+{'res': {'input_path': 'table_recognition_v2.jpg', 'page_index': None, 'model_settings': {'use_doc_preprocessor': False, 'use_layout_detection': True, 'use_ocr_model': True}, 'layout_det_res': {'input_path': None, 'page_index': None, 'boxes': [{'cls_id': 8, 'label': 'table', 'score': 0.86655592918396, 'coordinate': [0.0125130415, 0.41920784, 1281.3737, 585.3884]}]}, 'overall_ocr_res': {'input_path': None, 'page_index': None, 'model_settings': {'use_doc_preprocessor': False, 'use_textline_orientation': False}, 'dt_polys': array([[[   9,   21],
         ...,
-        [234,  25]],
+        [   9,   59]],
 
        ...,
 
-       [[448, 101],
+       [[1046,  536],
         ...,
-        [448, 121]]], dtype=int16), 'text_det_params': {'limit_side_len': 960, 'limit_type': 'max', 'thresh': 0.3, 'box_thresh': 0.6, 'unclip_ratio': 2.0}, 'text_type': 'general', 'textline_orientation_angles': array([-1, ..., -1]), 'text_rec_score_thresh': 0, 'rec_texts': ['CRuncover', 'Dres', '连续工作3', '取出来放在网上', '没想', '江、整江等八大', 'Abstr', 'rSrivi', '$709.', 'cludingGiv', '2.72', 'Ingcubic', '$744.78'], 'rec_scores': array([0.99512607, ..., 0.99844509]), 'rec_polys': array([[[234,   6],
+        [1046,  573]]], dtype=int16), 'text_det_params': {'limit_side_len': 960, 'limit_type': 'max', 'thresh': 0.3, 'box_thresh': 0.6, 'unclip_ratio': 2.0}, 'text_type': 'general', 'textline_orientation_angles': array([-1, ..., -1]), 'text_rec_score_thresh': 0, 'rec_texts': ['部门', '报销人', '报销事由', '批准人:', '单据', '张', '合计金额', '元', '车费票', '其', '火车费票', '飞机票', '中', '旅住宿费', '其他', '补贴'], 'rec_scores': array([0.99958128, ..., 0.99317062]), 'rec_polys': array([[[   9,   21],
         ...,
-        [234,  25]],
+        [   9,   59]],
 
        ...,
 
-       [[448, 101],
+       [[1046,  536],
         ...,
-        [448, 121]]], dtype=int16), 'rec_boxes': array([[234, ...,  25],
+        [1046,  573]]], dtype=int16), 'rec_boxes': array([[   9, ...,   59],
        ...,
-       [448, ..., 121]], dtype=int16)}, 'table_res_list': [{'cell_box_list': [array([ 3.18822289, ..., 30.87823655]), array([ 3.21032453, ..., 65.14108063]), array([110.18174553, ...,  65.02860047]), array([212.96108818, ...,  64.99535157]), array([404.08112907, ...,  65.0847223 ]), array([ 3.21772957, ..., 96.07921387]), array([110.23703575, ...,  96.01378419]), array([213.06095695, ...,  95.97141816]), array([404.23704338, ...,  96.03654267]), array([  3.22793937, ..., 127.08698823]), array([110.40586662, ..., 127.07002045]), array([213.12627983, ..., 127.02842499]), array([404.33042717, ..., 126.45088746])], 'pred_html': '<html><body><table><tr><td colspan="4">CRuncover</td></tr><tr><td>Dres</td><td>连续工作3</td><td>取出来放在网上 没想</td><td>江、整江等八大</td></tr><tr><td>Abstr</td><td></td><td>rSrivi</td><td>$709.</td></tr><tr><td>cludingGiv</td><td>2.72</td><td>Ingcubic</td><td>$744.78</td></tr></table></body></html>', 'table_ocr_pred': {'rec_polys': array([[[234,   6],
+       [1046, ...,  573]], dtype=int16)}, 'table_res_list': [{'cell_box_list': [array([ 0.13052222, ..., 73.08310249]), array([104.43082511, ...,  73.27777413]), array([319.39041221, ...,  73.30439308]), array([424.2436837 , ...,  73.44736794]), array([580.75836265, ...,  73.24003914]), array([723.04370201, ...,  73.22717598]), array([984.67315757, ...,  73.20420387]), array([1.25130415e-02, ..., 5.85419208e+02]), array([984.37072837, ..., 137.02281502]), array([984.26586998, ..., 201.22290352]), array([984.24017417, ..., 585.30775765]), array([1039.90606773, ...,  265.44664314]), array([1039.69549644, ...,  329.30540779]), array([1039.66546714, ...,  393.57319954]), array([1039.5122689 , ...,  457.74644783]), array([1039.55535972, ...,  521.73030403]), array([1039.58612144, ...,  585.09468392])], 'pred_html': '<html><body><table><tbody><tr><td>部门</td><td></td><td>报销人</td><td></td><td>报销事由</td><td></td><td colspan="2">批准人:</td></tr><tr><td colspan="6" rowspan="8"></td><td colspan="2">单据 张</td></tr><tr><td colspan="2">合计金额 元</td></tr><tr><td rowspan="6">其 中</td><td>车费票</td></tr><tr><td>火车费票</td></tr><tr><td>飞机票</td></tr><tr><td>旅住宿费</td></tr><tr><td>其他</td></tr><tr><td>补贴</td></tr></tbody></table></body></html>', 'table_ocr_pred': {'rec_polys': array([[[   9,   21],
         ...,
-        [234,  25]],
+        [   9,   59]],
 
        ...,
 
-       [[448, 101],
+       [[1046,  536],
         ...,
-        [448, 121]]], dtype=int16), 'rec_texts': ['CRuncover', 'Dres', '连续工作3', '取出来放在网上', '没想', '江、整江等八大', 'Abstr', 'rSrivi', '$709.', 'cludingGiv', '2.72', 'Ingcubic', '$744.78'], 'rec_scores': array([0.99512607, ..., 0.99844509]), 'rec_boxes': array([[234, ...,  25],
+        [1046,  573]]], dtype=int16), 'rec_texts': ['部门', '报销人', '报销事由', '批准人:', '单据', '张', '合计金额', '元', '车费票', '其', '火车费票', '飞机票', '中', '旅住宿费', '其他', '补贴'], 'rec_scores': array([0.99958128, ..., 0.99317062]), 'rec_boxes': array([[   9, ...,   59],
        ...,
-       [448, ..., 121]], dtype=int16)}}]}}
+       [1046, ...,  573]], dtype=int16)}}]}}
 ```
 运行结果参数说明可以参考[2.2 Python脚本方式集成](#22-python脚本方式集成)中的结果解释。
 
@@ -705,7 +705,7 @@ from paddlex import create_pipeline
 pipeline = create_pipeline(pipeline="table_recognition_v2")
 
 output = pipeline.predict(
-    input="table_recognition.jpg",
+    input="table_recognition_v2.jpg",
     use_doc_orientation_classify=False,
     use_doc_unwarping=False,
 )
@@ -778,7 +778,7 @@ for res in output:
 <td>
 <ul>
 <li><b>Python Var</b>:如 <code>numpy.ndarray</code> 表示的图像数据</li>
-<li><b>str</b>:如图像文件或者PDF文件的本地路径:<code>/root/data/img.jpg</code>;<b>如URL链接</b>,如图像文件或PDF文件的网络URL:<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/table_recognition.jpg">示例</a>;<b>如本地目录</b>,该目录下需包含待预测图像,如本地路径:<code>/root/data/</code>(当前不支持目录中包含PDF文件的预测,PDF文件需要指定到具体文件路径)</li>
+<li><b>str</b>:如图像文件或者PDF文件的本地路径:<code>/root/data/img.jpg</code>;<b>如URL链接</b>,如图像文件或PDF文件的网络URL:<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/table_recognition_v2.jpg">示例</a>;<b>如本地目录</b>,该目录下需包含待预测图像,如本地路径:<code>/root/data/</code>(当前不支持目录中包含PDF文件的预测,PDF文件需要指定到具体文件路径)</li>
 <li><b>List</b>:列表元素需为上述类型数据,如<code>[numpy.ndarray, numpy.ndarray]</code>,<code>[\"/root/data/img1.jpg\", \"/root/data/img2.jpg\"]</code>,<code>[\"/root/data1\", \"/root/data2\"]</code></li>
 </ul>
 </td>
@@ -1066,7 +1066,7 @@ from paddlex import create_pipeline
 pipeline = create_pipeline(pipeline="./my_path/table_recognition_v2.yaml")
 
 output = pipeline.predict(
-    input="table_recognition.jpg",
+    input="table_recognition_v2.jpg",
     use_doc_orientation_classify=False,
     use_doc_unwarping=False,
 )
@@ -1486,7 +1486,7 @@ PaddleX 支持英伟达 GPU、昆仑芯 XPU、昇腾 NPU和寒武纪 MLU 等多
 paddlex --pipeline table_recognition_v2 \
         --use_doc_orientation_classify=False \
         --use_doc_unwarping=False \
-        --input table_recognition.jpg \
+        --input table_recognition_v2.jpg \
         --save_path ./output \
         --device npu:0
 ```

+ 1 - 1
paddlex/configs/modules/table_cells_detection/RT-DETR-L_wired_table_cell_det.yaml

@@ -17,7 +17,7 @@ CheckDataset:
 Train:
   num_classes: 1
   epochs_iters: 40
-  batch_size: 2
+  batch_size: 8
   learning_rate: 0.0001
   pretrain_weight_path: "https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/RT-DETR-L_wired_table_cell_det_pretrained.pdparams"
   warmup_steps: 100

+ 1 - 1
paddlex/configs/modules/table_cells_detection/RT-DETR-L_wireless_table_cell_det.yaml

@@ -17,7 +17,7 @@ CheckDataset:
 Train:
   num_classes: 1
   epochs_iters: 40
-  batch_size: 2
+  batch_size: 8
   learning_rate: 0.0001
   pretrain_weight_path: "https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/RT-DETR-L_wireless_table_cell_det_pretrained.pdparams"
   warmup_steps: 100

+ 2 - 2
paddlex/configs/pipelines/table_recognition.yaml

@@ -8,7 +8,7 @@ use_ocr_model: True
 SubModules:
   LayoutDetection:
     module_name: layout_detection
-    model_name: PicoDet_layout_1x_table
+    model_name: PP-DocLayout-L
     model_dir: null
 
   TableStructureRecognition:
@@ -50,7 +50,7 @@ SubPipelines:
         
       TextRecognition:
         module_name: text_recognition
-        model_name: PP-OCRv4_server_rec
+        model_name: PP-OCRv4_server_rec_doc
         model_dir: null
         batch_size: 1
         score_thresh: 0

+ 2 - 2
paddlex/configs/pipelines/table_recognition_v2.yaml

@@ -8,7 +8,7 @@ use_ocr_model: True
 SubModules:
   LayoutDetection:
     module_name: layout_detection
-    model_name: PicoDet_layout_1x_table
+    model_name: PP-DocLayout-L
     model_dir: null
   
   TableClassification:
@@ -70,7 +70,7 @@ SubPipelines:
         
       TextRecognition:
         module_name: text_recognition
-        model_name: PP-OCRv4_server_rec
+        model_name: PP-OCRv4_server_rec_doc
         model_dir: null
         batch_size: 1
         score_thresh: 0

+ 115 - 56
paddlex/inference/pipelines/table_recognition/table_recognition_post_processing_v2.py

@@ -120,6 +120,7 @@ def compute_iou(rec1: list, rec2: list) -> float:
         intersect = (right_line - left_line) * (bottom_line - top_line)
         return (intersect / (sum_area - intersect)) * 1.0
 
+
 def compute_inter(rec1, rec2):
     """
     computing intersection over rec2_area
@@ -144,7 +145,8 @@ def compute_inter(rec1, rec2):
     iou = inter_area / rec2_area
     return iou
 
-def match_table_and_ocr(cell_box_list: list, ocr_dt_boxes: list) -> dict:
+
+def match_table_and_ocr(cell_box_list, ocr_dt_boxes, table_cells_flag, row_start_index):
     """
     match table and ocr
 
@@ -155,57 +157,36 @@ def match_table_and_ocr(cell_box_list: list, ocr_dt_boxes: list) -> dict:
     Returns:
         dict: matched dict, key is table index, value is ocr index
     """
-    matched = {}
-    del_ocr = []
-    for i, table_box in enumerate(cell_box_list):
-        if len(table_box) == 8:
-            table_box = [
-                np.min(table_box[0::2]),
-                np.min(table_box[1::2]),
-                np.max(table_box[0::2]),
-                np.max(table_box[1::2]),
-            ]
-        for j, ocr_box in enumerate(np.array(ocr_dt_boxes)):
-            if compute_inter(table_box, ocr_box) > 0.8:
-                if i not in matched.keys():
-                    matched[i] = [j]
-                else:
-                    matched[i].append(j)
-                del_ocr.append(j)
-    miss_ocr = []
-    miss_ocr_index = []
-    for m in range(len(ocr_dt_boxes)):
-        if m not in del_ocr:
-            miss_ocr.append(ocr_dt_boxes[m])
-            miss_ocr_index.append(m)
-    if len(miss_ocr) != 0:
-        for k, miss_ocr_box in enumerate(miss_ocr):
-            distances = []
-            for q, table_box in enumerate(cell_box_list):
-                if len(table_box) == 0:
-                    continue
-                if len(table_box) == 8:
-                    table_box = [
-                        np.min(table_box[0::2]),
-                        np.min(table_box[1::2]),
-                        np.max(table_box[0::2]),
-                        np.max(table_box[1::2]),
-                    ]
-                distances.append(
-                    (distance(table_box, miss_ocr_box), 1.0 - compute_iou(table_box, miss_ocr_box))
-                )  # compute iou and l1 distance
-            sorted_distances = distances.copy()
-            # select det box by iou and l1 distance
-            sorted_distances = sorted(sorted_distances, key=lambda item: (item[1], item[0]))
-            if distances.index(sorted_distances[0]) not in matched.keys():
-                matched[distances.index(sorted_distances[0])] = [miss_ocr_index[k]]
-            else:
-                matched[distances.index(sorted_distances[0])].append(miss_ocr_index[k])
-    # print(matched)
-    return matched
+    all_matched = []
+    for k in range(len(table_cells_flag)-1):
+        matched = {}
+        for i, table_box in enumerate(cell_box_list[table_cells_flag[k]:table_cells_flag[k+1]]):
+            if len(table_box) == 8:
+                table_box = [
+                    np.min(table_box[0::2]),
+                    np.min(table_box[1::2]),
+                    np.max(table_box[0::2]),
+                    np.max(table_box[1::2]),
+                ]
+            for j, ocr_box in enumerate(np.array(ocr_dt_boxes)):
+                if compute_inter(table_box, ocr_box) > 0.7:
+                    if i not in matched.keys():
+                        matched[i] = [j]
+                    else:
+                        matched[i].append(j)
+        real_len=max(matched.keys())+1 if len(matched)!=0 else 0
+        if table_cells_flag[k+1] < row_start_index[k+1]:
+            for s in range(row_start_index[k+1]-table_cells_flag[k+1]):
+                matched[real_len+s] = []
+        elif table_cells_flag[k+1] > row_start_index[k+1]:
+            for s in range(table_cells_flag[k+1]-row_start_index[k+1]):
+                matched[real_len-1].append(matched[real_len+s])
+        all_matched.append(matched)
+    return all_matched
+
 
 def get_html_result(
-    matched_index: dict, ocr_contents: dict, pred_structures: list
+    all_matched_index: dict, ocr_contents: dict, pred_structures: list, table_cells_flag
 ) -> str:
     """
     Generates HTML content based on the matched index, OCR contents, and predicted structures.
@@ -220,10 +201,13 @@ def get_html_result(
     """
     pred_html = []
     td_index = 0
+    td_count = 0
+    matched_list_index = 0
     head_structure = pred_structures[0:3]
     html = "".join(head_structure)
     table_structure = pred_structures[3:-3]
     for tag in table_structure:
+        matched_index = all_matched_index[matched_list_index]
         if "</td>" in tag:
             if "<td></td>" == tag:
                 pred_html.extend("<td>")
@@ -260,6 +244,10 @@ def get_html_result(
             else:
                 pred_html.append(tag)
             td_index += 1
+            td_count += 1
+            if td_count>=table_cells_flag[matched_list_index+1] and matched_list_index<len(all_matched_index)-1:
+                matched_list_index += 1
+                td_index = 0
         else:
             pred_html.append(tag)
     html += "".join(pred_html)
@@ -267,6 +255,7 @@ def get_html_result(
     html += "".join(end_structure)
     return html
 
+
 def sort_table_cells_boxes(boxes):
     """
     Sort the input list of bounding boxes.
@@ -299,8 +288,14 @@ def sort_table_cells_boxes(boxes):
     if current_row:
         current_row.sort(key=lambda x: x[0])
         rows.append(current_row)
-    sorted_boxes = [box for row in rows for box in row] 
-    return sorted_boxes
+    sorted_boxes = []
+    flag = [0]
+    for i in range(len(rows)):
+        sorted_boxes.extend(rows[i])
+        if i < len(rows):
+            flag.append(flag[i] + len(rows[i]))
+    return sorted_boxes, flag
+
 
 def convert_to_four_point_coordinates(boxes):
     """
@@ -342,6 +337,67 @@ def convert_to_four_point_coordinates(boxes):
     return converted_boxes
 
 
+def find_row_start_index(html_list):
+    """
+        find the index of the first cell in each row
+
+        Args:
+            html_list (list): list for html results
+
+        Returns:
+            row_start_indices (list): list for the index of the first cell in each row
+    """
+    # Initialize an empty list to store the indices of row start positions
+    row_start_indices = []
+    # Variable to track the current index in the flattened HTML content
+    current_index = 0
+    # Flag to check if we are inside a table row
+    inside_row = False
+    # Iterate through the HTML tags
+    for keyword in html_list:
+        # If a new row starts, set the inside_row flag to True
+        if keyword == "<tr>":
+            inside_row = True
+        # If we encounter a closing row tag, set the inside_row flag to False
+        elif keyword == "</tr>":
+            inside_row = False
+        # If we encounter a cell and we are inside a row
+        elif (keyword == "<td></td>" or keyword == "</td>") and inside_row:
+            # Append the current index as the starting index of the row
+            row_start_indices.append(current_index)
+            # Set the flag to ensure we only record the first cell of the current row
+            inside_row = False
+        # Increment the current index if we encounter a cell regardless of being inside a row or not
+        if keyword == "<td></td>" or keyword == "</td>":
+            current_index += 1
+    # Return the computed starting indices of each row
+    return row_start_indices
+
+
+def map_and_get_max(table_cells_flag, row_start_index):
+    """
+    Retrieve table recognition result from cropped image info, table structure prediction, and overall OCR result.
+
+    Args:
+        table_cells_flag (list): List of the flags representing the end of each row of the table cells detection results.
+        row_start_index (list): List of the flags representing the end of each row of the table structure predicted results.
+
+    Returns:
+        max_values: List of the process results.
+    """
+
+    max_values = []
+    i = 0
+    max_value = None
+    for j in range(len(row_start_index)):
+        while i < len(table_cells_flag) and table_cells_flag[i] <= row_start_index[j]:
+            if max_value is None or table_cells_flag[i] > max_value:
+                max_value = table_cells_flag[i]
+            i += 1
+        max_values.append(max_value if max_value is not None else row_start_index[j])
+    return max_values
+
+
 def get_table_recognition_res(
     table_box: list,
     table_structure_result: list,
@@ -376,10 +432,13 @@ def get_table_recognition_res(
     ocr_dt_boxes = table_ocr_pred["rec_boxes"]
     ocr_texts_res = table_ocr_pred["rec_texts"]
 
-    table_cells_result = sort_table_cells_boxes(table_cells_result)
-
-    matched_index = match_table_and_ocr(table_cells_result, ocr_dt_boxes)
-    pred_html = get_html_result(matched_index, ocr_texts_res, table_structure_result)
+    table_cells_result, table_cells_flag = sort_table_cells_boxes(table_cells_result)
+    row_start_index = find_row_start_index(table_structure_result)
+    table_cells_flag = map_and_get_max(table_cells_flag, row_start_index)
+    table_cells_flag.append(len(table_cells_result))
+    row_start_index.append(len(table_cells_result))
+    matched_index = match_table_and_ocr(table_cells_result, ocr_dt_boxes, table_cells_flag, table_cells_flag)
+    pred_html = get_html_result(matched_index, ocr_texts_res, table_structure_result, row_start_index)
 
     single_img_res = {
         "cell_box_list": table_cells_result,