Explorar el Código

refine table codes and docs (#3353)

* add table docs

* fix bugs

* refine docs

* refine docs

* refine codes

* refine docs

* refine docs and codes

* refine docs and codes
Liu Jiaxuan hace 9 meses
padre
commit
96cef809a7

+ 64 - 136
docs/pipeline_usage/tutorials/ocr_pipelines/table_recognition.md

@@ -13,6 +13,9 @@ comments: true
 <b>通用</b><b>表格识别</b><b>产线中包含必选的表格结构识别模块、文本检测模块和文本识别模块,以及可选的版面区域检测模块、文档图像方向分类模块和文本图像矫正模块</b>。
 
 <b>如果您更注重模型的精度,请选择精度较高的模型;如果您更在意模型的推理速度,请选择推理速度较快的模型;如果您关注模型的存储大小,请选择存储体积较小的模型。</b>
+
+<details><summary> 👉模型列表详情</summary>
+
 <p><b>表格识别模块模型:</b></p>
 <table>
 <tr>
@@ -310,6 +313,8 @@ PaddleX 所提供的预训练的模型产线均可以快速体验效果,你可
 
 ```bash
 paddlex --pipeline table_recognition \
+        --use_doc_orientation_classify=False \
+        --use_doc_unwarping=False \
         --input table_recognition.jpg \
         --save_path ./output \
         --device gpu:0
@@ -319,145 +324,47 @@ paddlex --pipeline table_recognition \
 
 运行后,会将结果打印到终端上,结果如下:
 
+<details><summary>👉 <b>运行后,得到的结果为:(点击展开)</b></summary>
+
 ```bash
-{'res': {'input_path': 'table_recognition.jpg', 'model_settings': {'use_doc_preprocessor': True, 'use_layout_detection': True, 'use_ocr_model': True}, 'doc_preprocessor_res': {'input_path': '0.jpg', 'model_settings': {'use_doc_orientation_classify': True, 'use_doc_unwarping': True}, 'angle': 0}, 'layout_det_res': {'input_path': None, 'boxes': [{'cls_id': 0, 'label': 'Table', 'score': 0.9196816086769104, 'coordinate': [0, 8.614925, 550.9877, 132]}]}, 'overall_ocr_res': {'input_path': '0.jpg', 'model_settings': {'use_doc_preprocessor': False, 'use_textline_orientation': False}, 'dt_polys': [array([[232,   0],
-       [318,   1],
-       [318,  24],
-       [232,  21]], dtype=int16), array([[32, 38],
-       [67, 38],
-       [67, 55],
-       [32, 55]], dtype=int16), array([[119,  34],
-       [196,  34],
-       [196,  57],
-       [119,  57]], dtype=int16), array([[222,  29],
-       [396,  31],
-       [396,  60],
-       [222,  58]], dtype=int16), array([[420,  30],
-       [542,  32],
-       [542,  61],
-       [419,  59]], dtype=int16), array([[29, 71],
-       [72, 71],
-       [72, 92],
-       [29, 92]], dtype=int16), array([[287,  72],
-       [329,  72],
-       [329,  93],
-       [287,  93]], dtype=int16), array([[458,  68],
-       [501,  71],
-       [499,  94],
-       [456,  91]], dtype=int16), array([[  9, 101],
-       [ 89, 103],
-       [ 89, 130],
-       [  8, 128]], dtype=int16), array([[139, 105],
-       [172, 105],
-       [172, 126],
-       [139, 126]], dtype=int16), array([[274, 103],
-       [339, 101],
-       [340, 128],
-       [275, 130]], dtype=int16), array([[451, 103],
-       [508, 103],
-       [508, 126],
-       [451, 126]], dtype=int16)], 'text_det_params': {'limit_side_len': 960, 'limit_type': 'max', 'thresh': 0.3, 'box_thresh': 0.6, 'unclip_ratio': 2.0}, 'text_type': 'general', 'textline_orientation_angles': [-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1], 'text_rec_score_thresh': 0, 'rec_texts': ['CRuncover', 'Dres', '连续工作3', '取出来放在网上,没想', '江、江等八大', 'Abstr', 'rSrivi', '$709.', 'cludingGiv', '2.72', 'Ingcubic', '$744.78'], 'rec_scores': [0.9943075180053711, 0.9951075315475464, 0.9907732009887695, 0.9975494146347046, 0.9974043369293213, 0.9983242750167847, 0.991967499256134, 0.9898287653923035, 0.9961177110671997, 0.9975040555000305, 0.9986456632614136, 0.9987970590591431], 'rec_polys': [array([[232,   0],
-       [318,   1],
-       [318,  24],
-       [232,  21]], dtype=int16), array([[32, 38],
-       [67, 38],
-       [67, 55],
-       [32, 55]], dtype=int16), array([[119,  34],
-       [196,  34],
-       [196,  57],
-       [119,  57]], dtype=int16), array([[222,  29],
-       [396,  31],
-       [396,  60],
-       [222,  58]], dtype=int16), array([[420,  30],
-       [542,  32],
-       [542,  61],
-       [419,  59]], dtype=int16), array([[29, 71],
-       [72, 71],
-       [72, 92],
-       [29, 92]], dtype=int16), array([[287,  72],
-       [329,  72],
-       [329,  93],
-       [287,  93]], dtype=int16), array([[458,  68],
-       [501,  71],
-       [499,  94],
-       [456,  91]], dtype=int16), array([[  9, 101],
-       [ 89, 103],
-       [ 89, 130],
-       [  8, 128]], dtype=int16), array([[139, 105],
-       [172, 105],
-       [172, 126],
-       [139, 126]], dtype=int16), array([[274, 103],
-       [339, 101],
-       [340, 128],
-       [275, 130]], dtype=int16), array([[451, 103],
-       [508, 103],
-       [508, 126],
-       [451, 126]], dtype=int16)], 'rec_boxes': array([[232,   0, 318,  24],
-       [ 32,  38,  67,  55],
-       [119,  34, 196,  57],
-       [222,  29, 396,  60],
-       [419,  30, 542,  61],
-       [ 29,  71,  72,  92],
-       [287,  72, 329,  93],
-       [456,  68, 501,  94],
-       [  8, 101,  89, 130],
-       [139, 105, 172, 126],
-       [274, 101, 340, 130],
-       [451, 103, 508, 126]], dtype=int16)}, 'table_res_list': [{'cell_box_list': array([[  8.        ,   9.61492538, 532.        ,  26.61492538],
-       [  3.        ,  27.61492538, 104.        ,  65.61492538],
-       [109.        ,  28.61492538, 215.        ,  66.61492538],
-       [219.        ,  28.61492538, 396.        ,  64.61492538],
-       [396.        ,  29.61492538, 546.        ,  66.61492538],
-       [  1.        ,  65.61492538, 110.        ,  93.61492538],
-       [111.        ,  65.61492538, 215.        ,  94.61492538],
-       [220.        ,  66.61492538, 397.        ,  94.61492538],
-       [398.        ,  67.61492538, 544.        ,  94.61492538],
-       [  2.        ,  98.61492538, 111.        , 131.61492538],
-       [113.        ,  98.61492538, 216.        , 131.61492538],
-       [219.        ,  98.61492538, 400.        , 131.61492538],
-       [403.        ,  99.61492538, 545.        , 130.61492538]]), 'pred_html': '<html><body><table><tr><td colspan="4">CRuncover</td></tr><tr><td>Dres</td><td>连续工作3</td><td>取出来放在网上,没想</td><td>江、江等八大</td></tr><tr><td>Abstr</td><td></td><td>rSrivi</td><td>$709.</td></tr><tr><td>cludingGiv</td><td>2.72</td><td>Ingcubic</td><td>$744.78</td></tr></table></body></html>', 'table_ocr_pred': {'rec_polys': [array([[232,   0],
-       [318,   1],
-       [318,  24],
-       [232,  21]], dtype=int16), array([[32, 38],
-       [67, 38],
-       [67, 55],
-       [32, 55]], dtype=int16), array([[119,  34],
-       [196,  34],
-       [196,  57],
-       [119,  57]], dtype=int16), array([[222,  29],
-       [396,  31],
-       [396,  60],
-       [222,  58]], dtype=int16), array([[420,  30],
-       [542,  32],
-       [542,  61],
-       [419,  59]], dtype=int16), array([[29, 71],
-       [72, 71],
-       [72, 92],
-       [29, 92]], dtype=int16), array([[287,  72],
-       [329,  72],
-       [329,  93],
-       [287,  93]], dtype=int16), array([[458,  68],
-       [501,  71],
-       [499,  94],
-       [456,  91]], dtype=int16), array([[  9, 101],
-       [ 89, 103],
-       [ 89, 130],
-       [  8, 128]], dtype=int16), array([[139, 105],
-       [172, 105],
-       [172, 126],
-       [139, 126]], dtype=int16), array([[274, 103],
-       [339, 101],
-       [340, 128],
-       [275, 130]], dtype=int16), array([[451, 103],
-       [508, 103],
-       [508, 126],
-       [451, 126]], dtype=int16)], 'rec_texts': ['CRuncover', 'Dres', '连续工作3', '取出来放在网上,没想', '江、江等八大', 'Abstr', 'rSrivi', '$709.', 'cludingGiv', '2.72', 'Ingcubic', '$744.78'], 'rec_scores': [0.9943075180053711, 0.9951075315475464, 0.9907732009887695, 0.9975494146347046, 0.9974043369293213, 0.9983242750167847, 0.991967499256134, 0.9898287653923035, 0.9961177110671997, 0.9975040555000305, 0.9986456632614136, 0.9987970590591431], 'rec_boxes': [array([232,   0, 318,  24], dtype=int16), array([32, 38, 67, 55], dtype=int16), array([119,  34, 196,  57], dtype=int16), array([222,  29, 396,  60], dtype=int16), array([419,  30, 542,  61], dtype=int16), array([29, 71, 72, 92], dtype=int16), array([287,  72, 329,  93], dtype=int16), array([456,  68, 501,  94], dtype=int16), array([  8, 101,  89, 130], dtype=int16), array([139, 105, 172, 126], dtype=int16), array([274, 101, 340, 130], dtype=int16), array([451, 103, 508, 126], dtype=int16)]}}]}}
+{'res': {'input_path': 'table_recognition.jpg', 'page_index': None, 'model_settings': {'use_doc_preprocessor': False, 'use_layout_detection': True, 'use_ocr_model': True}, 'layout_det_res': {'input_path': None, 'page_index': None, 'boxes': [{'cls_id': 0, 'label': 'Table', 'score': 0.9922188520431519, 'coordinate': [3.0127392, 0.14648987, 547.5102, 127.72023]}]}, 'overall_ocr_res': {'input_path': None, 'page_index': None, 'model_settings': {'use_doc_preprocessor': False, 'use_textline_orientation': False}, 'dt_polys': array([[[234,   6],
+        ...,
+        [234,  25]],
+
+       ...,
+
+       [[448, 101],
+        ...,
+        [448, 121]]], dtype=int16), 'text_det_params': {'limit_side_len': 960, 'limit_type': 'max', 'thresh': 0.3, 'box_thresh': 0.6, 'unclip_ratio': 2.0}, 'text_type': 'general', 'textline_orientation_angles': array([-1, ..., -1]), 'text_rec_score_thresh': 0, 'rec_texts': ['CRuncover', 'Dres', '连续工作3', '取出来放在网上', '没想', '江、整江等八大', 'Abstr', 'rSrivi', '$709.', 'cludingGiv', '2.72', 'Ingcubic', '$744.78'], 'rec_scores': array([0.99512607, ..., 0.99844509]), 'rec_polys': array([[[234,   6],
+        ...,
+        [234,  25]],
+
+       ...,
+
+       [[448, 101],
+        ...,
+        [448, 121]]], dtype=int16), 'rec_boxes': array([[234, ...,  25],
+       ...,
+       [448, ..., 121]], dtype=int16)}, 'table_res_list': [{'cell_box_list': array([[  5.01273918, ...,  32.14648987],
+       ...,
+       [405.01273918, ..., 124.14648987]]), 'pred_html': '<html><body><table><tbody><tr><td colspan="4">CRuncover</td></tr><tr><td>Dres</td><td>连续工作3</td><td>取出来放在网上 没想</td><td>江、整江等八大</td></tr><tr><td>Abstr</td><td></td><td>rSrivi</td><td>$709.</td></tr><tr><td>cludingGiv</td><td>2.72</td><td>Ingcubic</td><td>$744.78</td></tr></tbody></table></body></html>', 'table_ocr_pred': {'rec_polys': array([[[234,   6],
+        ...,
+        [234,  25]],
+
+       ...,
+
+       [[448, 101],
+        ...,
+        [448, 121]]], dtype=int16), 'rec_texts': ['CRuncover', 'Dres', '连续工作3', '取出来放在网上', '没想', '江、整江等八大', 'Abstr', 'rSrivi', '$709.', 'cludingGiv', '2.72', 'Ingcubic', '$744.78'], 'rec_scores': array([0.99512607, ..., 0.99844509]), 'rec_boxes': array([[234, ...,  25],
+       ...,
+       [448, ..., 121]], dtype=int16)}}]}}
 ```
 运行结果参数说明可以参考[2.2 Python脚本方式](#22-python脚本方式集成)中的结果解释。
 
 可视化结果保存在`save_path`下,其中表格识别的可视化结果如下:
 <img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/pipelines/table_recognition/03.png"/>
 
+</details>
 
 ### 2.2 Python脚本方式集成
 * 上述命令行是为了快速体验查看效果,一般来说,在项目中,往往需要通过代码集成,您可以通过几行代码即可完成产线的快速推理,推理代码如下:
@@ -588,6 +495,19 @@ for res in output:
 </td>
 <td><code>None</code></td>
 </tr>
+<tr>
+<td><code>use_layout_detection</code></td>
+<td>是否使用版面检测模块</td>
+<td><code>bool|None</code></td>
+<td>
+<ul>
+<li><b>bool</b>:<code>True</code> 或者 <code>False</code>;</li>
+<li><b>None</b>:如果设置为<code>None</code>, 将默认使用产线初始化的该参数值,初始化为<code>True</code>;</li>
+</ul>
+</td>
+<td><code>None</code></td>
+</tr>
+
 <td><code>text_det_limit_side_len</code></td>
 <td>文本检测的图像边长限制</td>
 <td><code>int|None</code></td>
@@ -598,6 +518,7 @@ for res in output:
 </ul>
 </td>
 <td><code>None</code></td>
+</tr>
 <td><code>text_det_limit_type</code></td>
 <td>文本检测的图像边长限制类型</td>
 <td><code>str|None</code></td>
@@ -608,6 +529,7 @@ for res in output:
 </ul>
 </td>
 <td><code>None</code></td>
+</tr>
 <td><code>text_det_thresh</code></td>
 <td>检测像素阈值,输出的概率图中,得分大于该阈值的像素点才会被认为是文字像素点</td>
 <td><code>float|None</code></td>
@@ -616,6 +538,7 @@ for res in output:
 <li><b>float</b>:大于 <code>0</code> 的任意浮点数
     <li><b>None</b>:如果设置为 <code>None</code>, 将默认使用产线初始化的该参数值 <code>0.3</code></li></li></ul></td>
 <td><code>None</code></td>
+</tr>
 <td><code>text_det_box_thresh</code></td>
 <td>检测框阈值,检测结果边框内,所有像素点的平均得分大于该阈值时,该结果会被认为是文字区域</td>
 <td><code>float|None</code></td>
@@ -624,6 +547,7 @@ for res in output:
 <li><b>float</b>:大于 <code>0</code> 的任意浮点数
     <li><b>None</b>:如果设置为 <code>None</code>, 将默认使用产线初始化的该参数值 <code>0.6</code></li></li></ul></td>
 <td><code>None</code></td>
+</tr>
 <td><code>text_det_unclip_ratio</code></td>
 <td>文本检测扩张系数,使用该方法对文字区域进行扩张,该值越大,扩张的面积越大</td>
 <td><code>float|None</code></td>
@@ -632,6 +556,7 @@ for res in output:
 <li><b>float</b>:大于 <code>0</code> 的任意浮点数
     <li><b>None</b>:如果设置为 <code>None</code>, 将默认使用产线初始化的该参数值 <code>2.0</code></li></li></ul></td>
 <td><code>None</code></td>
+</tr>
 <td><code>text_rec_score_thresh</code></td>
 <td>文本识别阈值,得分大于该阈值的文本结果会被保留</td>
 <td><code>float|None</code></td>
@@ -640,7 +565,8 @@ for res in output:
 <li><b>float</b>:大于 <code>0</code> 的任意浮点数
     <li><b>None</b>:如果设置为 <code>None</code>, 将默认使用产线初始化的该参数值 <code>0.0</code>。即不设阈值</li></li></ul></td>
 <td><code>None</code></td>
-</table>
+
+</tr></table>
 
 (3)对预测结果进行处理,每个样本的预测结果均为对应的Result对象,且支持打印、保存为图片、保存为`xlsx`文件、保存为`HTML`文件、保存为`json`文件的操作:
 
@@ -760,10 +686,10 @@ for res in output:
     - `rec_boxes`: `(numpy.ndarray)` 检测框的矩形边界框数组,shape为(n, 4),dtype为int16。每一行表示一个矩形框的[x_min, y_min, x_max, y_max]坐标
     ,其中(x_min, y_min)为左上角坐标,(x_max, y_max)为右下角坐标
 
-- 调用`save_to_json()` 方法会将上述内容保存到指定的`save_path`中,如果指定为目录,则保存的路径为`save_path/{your_img_basename}.json`,如果指定为文件,则直接保存到该文件中。由于json文件不支持保存numpy数组,因此会将其中的`numpy.array`类型转换为列表形式。
+- 调用`save_to_json()` 方法会将上述内容保存到指定的`save_path`中,如果指定为目录,则保存的路径为`save_path/{your_img_basename}_res.json`,如果指定为文件,则直接保存到该文件中。由于json文件不支持保存numpy数组,因此会将其中的`numpy.array`类型转换为列表形式。
 - 调用`save_to_img()` 方法会将可视化结果保存到指定的`save_path`中,如果指定为目录,则保存的路径为`save_path/{your_img_basename}_ocr_res_img.{your_img_extension}`,如果指定为文件,则直接保存到该文件中。(产线通常包含较多结果图片,不建议直接指定为具体的文件路径,否则多张图会被覆盖,仅保留最后一张图)
-- 调用`save_to_html()` 方法会将上述内容保存到指定的`save_path`中,如果指定为目录,则保存的路径为`save_path/{your_img_basename}.html`,如果指定为文件,则直接保存到该文件中。在通用表格识别产线中,将会把图像中表格的HTML形式写入到指定的html文件中。
-- 调用`save_to_xlsx()` 方法会将上述内容保存到指定的`save_path`中,如果指定为目录,则保存的路径为`save_path/{your_img_basename}.xlsx`,如果指定为文件,则直接保存到该文件中。在通用表格识别产线中,将会把图像中表格的Excel表格形式写入到指定的xlsx文件中。
+- 调用`save_to_html()` 方法会将上述内容保存到指定的`save_path`中,如果指定为目录,则保存的路径为`save_path/{your_img_basename}_table_1.html`,如果指定为文件,则直接保存到该文件中。在通用表格识别v2产线中,将会把图像中表格的HTML形式写入到指定的html文件中。
+- 调用`save_to_xlsx()` 方法会将上述内容保存到指定的`save_path`中,如果指定为目录,则保存的路径为`save_path/{your_img_basename}_res.xlsx`,如果指定为文件,则直接保存到该文件中。在通用表格识别v2产线中,将会把图像中表格的Excel表格形式写入到指定的xlsx文件中。
 
 * 此外,也支持通过属性获取带结果的可视化图像和预测结果,具体如下:
 
@@ -1212,6 +1138,8 @@ PaddleX 支持英伟达 GPU、昆仑芯 XPU、昇腾 NPU和寒武纪 MLU 等多
 
 ```bash
 paddlex --pipeline table_recognition \
+        --use_doc_orientation_classify=False \
+        --use_doc_unwarping=False \
         --input table_recognition.jpg \
         --save_path ./output \
         --device npu:0

+ 23 - 18
docs/pipeline_usage/tutorials/ocr_pipelines/table_recognition_v2.en.md

@@ -2,16 +2,19 @@
 comments: true
 ---
 
-# General Table Recognition Pipeline v2 User Guide
+# General Table Recognition v2 Pipeline User Guide
 
-## 1. Introduction to General Table Recognition Pipeline v2
+## 1. Introduction to General Table Recognition v2 Pipeline
 Table recognition is a technology that automatically identifies and extracts table content and its structure from documents or images. It is widely used in data entry, information retrieval, and document analysis. By using computer vision and machine learning algorithms, table recognition can convert complex table information into an editable format, making it easier for users to further process and analyze data.
 
-The General Table Recognition Pipeline v2 is designed to solve table recognition tasks by identifying tables in images and outputting them in HTML format. Unlike the General Table Recognition Pipeline, this pipeline introduces two additional modules: table classification and table cell detection, which are linked with the table structure recognition module to complete the table recognition task. This pipeline can achieve accurate table predictions and is applicable in various fields such as general, manufacturing, finance, and transportation. It also provides flexible service deployment options, supporting multiple programming languages on various hardware. Additionally, it offers secondary development capabilities, allowing you to train and fine-tune models on your own dataset, with seamless integration of the trained models.
+The General Table Recognition v2 Pipeline is designed to solve table recognition tasks by identifying tables in images and outputting them in HTML format. Unlike the General Table Recognition Pipeline, this pipeline introduces two additional modules: table classification and table cell detection, which are linked with the table structure recognition module to complete the table recognition task. This pipeline can achieve accurate table predictions and is applicable in various fields such as general, manufacturing, finance, and transportation. It also provides flexible service deployment options, supporting multiple programming languages on various hardware. Additionally, it offers secondary development capabilities, allowing you to train and fine-tune models on your own dataset, with seamless integration of the trained models.
 
-<img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/pipelines/table_recognition/01.png"/>
+<b>❗ The General Table Recognition v2 Pipeline is still being optimized and the final version will be released in the next version of PaddleX. In order to maintain the stability of use, you can use the General Table Recognition Pipeline for table processing first, and we will release a notice when the final version of v2 is open-sourced, so please stay tuned!</b>
+
+<img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/pipelines/table_recognition_v2/01.png"/>
+
+<b>The General Table Recognition v2 Pipeline includes mandatory modules such as table structure recognition, table classification, table cell localization, text detection, and text recognition, as well as optional modules like layout area detection, document image orientation classification, and text image correction.</b>
 
-<b>The General Table Recognition Pipeline v2 includes mandatory modules such as table structure recognition, table classification, table cell localization, text detection, and text recognition, as well as optional modules like layout area detection, document image orientation classification, and text image correction.</b>
 <b>If you prioritize model accuracy, choose a model with higher accuracy; if you care more about inference speed, choose a model with faster inference speed; if you are concerned about model storage size, choose a model with a smaller storage size.</b>
 
 <details><summary> 👉Model List Details</summary>
@@ -432,15 +435,17 @@ SVTRv2 is a server-side text recognition model developed by the OpenOCR team fro
 
 </details>
 
+</details>
+
 
 ## 2. Quick Start
-All model pipelines provided by PaddleX can be quickly experienced. You can use the command line or Python locally to experience the effect of the general table recognition pipeline v2.
+All model pipelines provided by PaddleX can be quickly experienced. You can use the command line or Python locally to experience the effect of the General Table Recognition v2 Pipeline.
 
 ### 2.1 Online Experience
 Online experience is not supported at the moment.
 
 ### 2.2 Local Experience
-Before using the General Table Recognition pipeline v2 locally, please ensure that you have completed the installation of the PaddleX wheel package according to the [PaddleX Local Installation Tutorial](../../../installation/installation.en.md).
+Before using the General Table Recognition v2 Pipeline locally, please ensure that you have completed the installation of the PaddleX wheel package according to the [PaddleX Local Installation Tutorial](../../../installation/installation.en.md).
 
 ### 2.1 Command Line Experience
 You can quickly experience the effect of the table recognition pipeline with one command. Use the [test file](https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/table_recognition.jpg), and replace `--input` with the local path for prediction.
@@ -594,7 +599,7 @@ The explanation of the running result parameters can refer to the result interpr
 
 The visualization results are saved under `save_path`, where the visualization result of table recognition is as follows:
 
-<img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/pipelines/table_recognition_v2/01.jpg">
+<img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/pipelines/table_recognition_v2/02.jpg">
 
 </details>
 
@@ -622,7 +627,7 @@ for res in output:
 
 In the above Python script, the following steps are executed:
 
-(1) The `create_pipeline()` function is used to instantiate a General Table Recognition Pipeline v2 object. The specific parameter descriptions are as follows:
+(1) The `create_pipeline()` function is used to instantiate a General Table Recognition v2 Pipeline object. The specific parameter descriptions are as follows:
 
 <table>
 <thead>
@@ -661,7 +666,7 @@ In the above Python script, the following steps are executed:
 </tbody>
 </table>
 
-(2) Call the `predict()` method of the general table recognition pipeline v2 object for inference prediction. This method will return a `generator`. The parameters of the `predict()` method and their descriptions are as follows:
+(2) Call the `predict()` method of the General Table Recognition v2 Pipeline object for inference prediction. This method will return a `generator`. The parameters of the `predict()` method and their descriptions are as follows:
 
 <table>
 <thead>
@@ -987,9 +992,9 @@ In the above Python script, the following steps are executed:
 
 - Calling the `save_to_img()` method will save the visualization results to the specified `save_path`. If specified as a directory, the saved path will be `save_path/{your_img_basename}_ocr_res_img.{your_img_extension}`; if specified as a file, it will be saved directly to that file. (The pipeline usually contains many result images, it is not recommended to specify a specific file path directly, otherwise multiple images will be overwritten, leaving only the last image)
 
-- Calling the `save_to_html()` method will save the above content to the specified `save_path`. If specified as a directory, the saved path will be `save_path/{your_img_basename}.html`; if specified as a file, it will be saved directly to that file. In the general table recognition pipeline v2, the HTML form of the table in the image will be written to the specified HTML file.
+- Calling the `save_to_html()` method will save the above content to the specified `save_path`. If specified as a directory, the saved path will be `save_path/{your_img_basename}.html`; if specified as a file, it will be saved directly to that file. In the General Table Recognition v2 Pipeline, the HTML form of the table in the image will be written to the specified HTML file.
 
-- Calling the `save_to_xlsx()` method will save the above content to the specified `save_path`. If specified as a directory, the saved path will be `save_path/{your_img_basename}.xlsx`; if specified as a file, it will be saved directly to that file. In the general table recognition pipeline v2, the Excel form of the table in the image will be written to the specified XLSX file.
+- Calling the `save_to_xlsx()` method will save the above content to the specified `save_path`. If specified as a directory, the saved path will be `save_path/{your_img_basename}.xlsx`; if specified as a file, it will be saved directly to that file. In the General Table Recognition v2 Pipeline, the Excel form of the table in the image will be written to the specified XLSX file.
 
 * Additionally, it also supports obtaining visualized images and prediction results through attributes, as follows:
 
@@ -1013,13 +1018,13 @@ In the above Python script, the following steps are executed:
 - The prediction result obtained by the `json` attribute is a dict type of data, with content consistent with the content saved by calling the `save_to_json()` method.
 - The prediction result returned by the `img` attribute is a dictionary type of data. The keys are `table_res_img`, `ocr_res_img`, `layout_res_img`, and `preprocessed_img`, and the corresponding values are four `Image.Image` objects, in order: visualized image of table recognition result, visualized image of OCR result, visualized image of layout region detection result, and visualized image of image preprocessing. If a sub-module is not used, the corresponding result image is not included in the dictionary.
 
-In addition, you can obtain the general table recognition pipeline v2 configuration file and load the configuration file for prediction. You can execute the following command to save the result in `my_path`:
+In addition, you can obtain the General Table Recognition v2 Pipeline configuration file and load the configuration file for prediction. You can execute the following command to save the result in `my_path`:
 
 ```
 paddlex --get_pipeline_config table_recognition_v2 --save_path ./my_path
 ```
 
-If you have obtained the configuration file, you can customize the settings for the General Table Recognition Pipeline v2. Simply modify the `pipeline` parameter value in the `create_pipeline` method to the path of the pipeline configuration file. The example is as follows:
+If you have obtained the configuration file, you can customize the settings for the General Table Recognition v2 Pipeline. Simply modify the `pipeline` parameter value in the `create_pipeline` method to the path of the pipeline configuration file. The example is as follows:
 
 ```python
 from paddlex import create_pipeline
@@ -1041,7 +1046,7 @@ for res in output:
 
 ```
 
-<b>Note:</b> The parameters in the configuration file are the initialization parameters for the pipeline. If you want to change the initialization parameters of the General Table Recognition Pipeline v2, you can directly modify the parameters in the configuration file and load the configuration file for prediction. Additionally, CLI prediction also supports passing in the configuration file by specifying the path with `--pipeline`.
+<b>Note:</b> The parameters in the configuration file are the initialization parameters for the pipeline. If you want to change the initialization parameters of the General Table Recognition v2 Pipeline, you can directly modify the parameters in the configuration file and load the configuration file for prediction. Additionally, CLI prediction also supports passing in the configuration file by specifying the path with `--pipeline`.
 
 ## 3. Development Integration/Deployment
 If the pipeline meets your requirements for inference speed and accuracy, you can proceed with development integration/deployment.
@@ -1305,10 +1310,10 @@ for i, res in enumerate(result["tableRecResults"]):
 You can choose the appropriate deployment method based on your needs to integrate the model pipeline into subsequent AI applications.
 
 ## 4. Custom Development
-If the default model weights provided by the General Table Recognition pipeline v2 do not meet your requirements in terms of accuracy or speed, you can try to further <b>fine-tune</b> the existing models using <b>your own domain-specific or application data</b> to improve the recognition performance of the General Table Recognition pipeline v2 in your specific scenario.
+If the default model weights provided by the General Table Recognition v2 Pipeline do not meet your requirements in terms of accuracy or speed, you can try to further <b>fine-tune</b> the existing models using <b>your own domain-specific or application data</b> to improve the recognition performance of the General Table Recognition v2 Pipeline in your specific scenario.
 
 ### 4.1 Model Fine-Tuning
-Since the General Table Recognition pipeline v2 consists of several modules, if the overall performance is not satisfactory, the issue may lie in any one of these modules. You can analyze the images with poor recognition results to identify which module is problematic and refer to the corresponding fine-tuning tutorial links in the table below.
+Since the General Table Recognition v2 Pipeline consists of several modules, if the overall performance is not satisfactory, the issue may lie in any one of these modules. You can analyze the images with poor recognition results to identify which module is problematic and refer to the corresponding fine-tuning tutorial links in the table below.
 
 <table>
 <thead>
@@ -1453,4 +1458,4 @@ paddlex --pipeline table_recognition_v2 \
         --device npu:0
 ```
 
-If you want to use the General Table Recognition pipeline v2 on a wider variety of hardware, please refer to the [PaddleX Multi-Hardware Usage Guide](../../../other_devices_support/multi_devices_use_guide.en.md).
+If you want to use the General Table Recognition v2 Pipeline on a wider variety of hardware, please refer to the [PaddleX Multi-Hardware Usage Guide](../../../other_devices_support/multi_devices_use_guide.en.md).

+ 76 - 209
docs/pipeline_usage/tutorials/ocr_pipelines/table_recognition_v2.md

@@ -2,14 +2,19 @@
 comments: true
 ---
 
-# 通用表格识别产线v2使用教程
+# 通用表格识别v2产线使用教程
 
-## 1. 通用表格识别产线v2介绍
+## 1. 通用表格识别v2产线介绍
 表格识别是一种自动从文档或图像中识别和提取表格内容及其结构的技术,广泛应用于数据录入、信息检索和文档分析等领域。通过使用计算机视觉和机器学习算法,表格识别能够将复杂的表格信息转换为可编辑的格式,方便用户进一步处理和分析数据。
 
-通用表格识别产线v2用于解决表格识别任务,对图片中的表格进行识别,并以HTML格式输出。与通用表格识别产线不同,本产线引入了表格分类和表格单元格检测两个模块,并将其与表格结构识别模块串联以完成表格识别任务。基于本产线,可实现对表格的精准预测,使用场景覆盖通用、制造、金融、交通等各个领域。本产线同时提供了灵活的服务化部署方式,支持在多种硬件上使用多种编程语言调用。不仅如此,本产线也提供了二次开发的能力,您可以基于本产线在您自己的数据集上训练调优,训练后的模型也可以无缝集成。
+通用表格识别v2产线用于解决表格识别任务,对图片中的表格进行识别,并以HTML格式输出。与通用表格识别产线不同,本产线新引入了表格分类和表格单元格检测两个模块,通过采用“表格分类+表格结构识别+单元格检测”多模型串联组网方案,实现了相比通用表格识别产线更好的端到端表格识别性能。除此之外,通用表格识别v2产线原生支持针对性地模型微调,各类开发者均能对通用表格识别v2产线进行不同程度的自定义微调,使其在不同应用场景下都能得到令人满意的性能。
+
+本产线的使用场景覆盖通用、制造、金融、交通等各个领域。本产线同时提供了灵活的服务化部署方式,支持在多种硬件上使用多种编程语言调用。不仅如此,本产线也提供了二次开发的能力,您可以基于本产线在您自己的数据集上训练调优,训练后的模型也可以无缝集成。
+
+<b>❗ 通用表格识别v2产线仍在持续优化中,将在 PaddleX 下一版本发布最终版。为保持使用的稳定性,您可以先使用通用表格识别产线进行表格处理,v2最终版开源后我们将发布通知,敬请期待!</b>
+
+<img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/pipelines/table_recognition_v2/01.png"/>
 
-<img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/pipelines/table_recognition/01.png"/>
 <b>通用</b><b>表格识别</b><b>产线v2中包含必选的表格结构识别模块、表格分类模块、表格单元格定位模块、文本检测模块和文本识别模块,以及可选的版面区域检测模块、文档图像方向分类模块和文本图像矫正模块</b>。
 
 <b>如果您更注重模型的精度,请选择精度较高的模型;如果您更在意模型的推理速度,请选择推理速度较快的模型;如果您关注模型的存储大小,请选择存储体积较小的模型。</b>
@@ -426,19 +431,21 @@ SVTRv2 是一种由复旦大学视觉与学习实验室(FVL)的OpenOCR团队
 </details>
 
 ## 2. 快速开始
-PaddleX 所提供的模型产线均可以快速体验效果,你可以在本地使用命令行或 Python 体验通用表格识别产线v2的效果。
+PaddleX 所提供的模型产线均可以快速体验效果,你可以在本地使用命令行或 Python 体验通用表格识别v2产线的效果。
 
 ### 2.1 在线体验
 暂不支持在线体验。
 
 ### 2.2 本地体验
-在本地使用通用表格识别产线v2前,请确保您已经按照[PaddleX本地安装教程](../../../installation/installation.md)完成了PaddleX的wheel包安装。
+在本地使用通用表格识别v2产线前,请确保您已经按照[PaddleX本地安装教程](../../../installation/installation.md)完成了PaddleX的wheel包安装。
 
 ### 2.1 命令行方式体验
 一行命令即可快速体验表格识别产线效果,使用 [测试文件](https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/table_recognition.jpg),并将 `--input` 替换为本地路径,进行预测
 
 ```bash
 paddlex --pipeline table_recognition_v2 \
+        --use_doc_orientation_classify=False \
+        --use_doc_unwarping=False \
         --input table_recognition.jpg \
         --save_path ./output \
         --device gpu:0
@@ -449,141 +456,40 @@ paddlex --pipeline table_recognition_v2 \
 <details><summary>👉 <b>运行后,得到的结果为:(点击展开)</b></summary>
 
 ```bash
-{'res': {'input_path': 'table_recognition.jpg', 'model_settings': {'use_doc_preprocessor': False, 'use_layout_detection': True, 'use_ocr_model': True}, 'layout_det_res': {'input_path': None, 'page_index': None, 'boxes': [{'cls_id': 0, 'label': 'Table', 'score': 0.9922188520431519, 'coordinate': [3.0127392, 0.14648987, 547.5102, 127.72023]}]}, 'overall_ocr_res': {'input_path': None, 'page_index': None, 'model_settings': {'use_doc_preprocessor': False, 'use_textline_orientation': False}, 'dt_polys': [array([[234,   6],
-       [316,   6],
-       [316,  25],
-       [234,  25]], dtype=int16), array([[38, 39],
-       [73, 39],
-       [73, 57],
-       [38, 57]], dtype=int16), array([[122,  32],
-       [201,  32],
-       [201,  58],
-       [122,  58]], dtype=int16), array([[227,  34],
-       [346,  34],
-       [346,  57],
-       [227,  57]], dtype=int16), array([[351,  34],
-       [391,  34],
-       [391,  58],
-       [351,  58]], dtype=int16), array([[417,  35],
-       [534,  35],
-       [534,  58],
-       [417,  58]], dtype=int16), array([[34, 70],
-       [78, 70],
-       [78, 90],
-       [34, 90]], dtype=int16), array([[287,  70],
-       [328,  70],
-       [328,  90],
-       [287,  90]], dtype=int16), array([[454,  69],
-       [496,  69],
-       [496,  90],
-       [454,  90]], dtype=int16), array([[ 17, 101],
-       [ 95, 101],
-       [ 95, 124],
-       [ 17, 124]], dtype=int16), array([[144, 101],
-       [178, 101],
-       [178, 122],
-       [144, 122]], dtype=int16), array([[278, 101],
-       [338, 101],
-       [338, 124],
-       [278, 124]], dtype=int16), array([[448, 101],
-       [503, 101],
-       [503, 121],
-       [448, 121]], dtype=int16)], 'text_det_params': {'limit_side_len': 960, 'limit_type': 'max', 'thresh': 0.3, 'box_thresh': 0.6, 'unclip_ratio': 2.0}, 'text_type': 'general', 'textline_orientation_angles': [-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1], 'text_rec_score_thresh': 0, 'rec_texts': ['CRuncover', 'Dres', '连续工作3', '取出来放在网上', '没想', '江、整江等八大', 'Abstr', 'rSrivi', '$709.', 'cludingGiv', '2.72', 'Ingcubic', '$744.78'], 'rec_scores': [0.9951260685920715, 0.9943379759788513, 0.9968608021736145, 0.9978817105293274, 0.9985721111297607, 0.9616036415100098, 0.9977153539657593, 0.987593948841095, 0.9906861186027527, 0.9959743618965149, 0.9970152378082275, 0.9977849721908569, 0.9984450936317444], 'rec_polys': [array([[234,   6],
-       [316,   6],
-       [316,  25],
-       [234,  25]], dtype=int16), array([[38, 39],
-       [73, 39],
-       [73, 57],
-       [38, 57]], dtype=int16), array([[122,  32],
-       [201,  32],
-       [201,  58],
-       [122,  58]], dtype=int16), array([[227,  34],
-       [346,  34],
-       [346,  57],
-       [227,  57]], dtype=int16), array([[351,  34],
-       [391,  34],
-       [391,  58],
-       [351,  58]], dtype=int16), array([[417,  35],
-       [534,  35],
-       [534,  58],
-       [417,  58]], dtype=int16), array([[34, 70],
-       [78, 70],
-       [78, 90],
-       [34, 90]], dtype=int16), array([[287,  70],
-       [328,  70],
-       [328,  90],
-       [287,  90]], dtype=int16), array([[454,  69],
-       [496,  69],
-       [496,  90],
-       [454,  90]], dtype=int16), array([[ 17, 101],
-       [ 95, 101],
-       [ 95, 124],
-       [ 17, 124]], dtype=int16), array([[144, 101],
-       [178, 101],
-       [178, 122],
-       [144, 122]], dtype=int16), array([[278, 101],
-       [338, 101],
-       [338, 124],
-       [278, 124]], dtype=int16), array([[448, 101],
-       [503, 101],
-       [503, 121],
-       [448, 121]], dtype=int16)], 'rec_boxes': array([[234,   6, 316,  25],
-       [ 38,  39,  73,  57],
-       [122,  32, 201,  58],
-       [227,  34, 346,  57],
-       [351,  34, 391,  58],
-       [417,  35, 534,  58],
-       [ 34,  70,  78,  90],
-       [287,  70, 328,  90],
-       [454,  69, 496,  90],
-       [ 17, 101,  95, 124],
-       [144, 101, 178, 122],
-       [278, 101, 338, 124],
-       [448, 101, 503, 121]], dtype=int16)}, 'table_res_list': [{'cell_box_list': [array([3.18822289e+00, 1.46489874e-01, 5.46996138e+02, 3.08782365e+01]), array([  3.21032453,  31.1510637 , 110.20750237,  65.14108063]), array([110.18174553,  31.13076188, 213.00813103,  65.02860047]), array([212.96108818,  31.09959008, 404.19618034,  64.99535157]), array([404.08112907,  31.18304802, 547.00864983,  65.0847223 ]), array([  3.21772957,  65.0738733 , 110.33685875,  96.07921387]), array([110.23703575,  65.02486207, 213.08839226,  96.01378419]), array([213.06095695,  64.96230103, 404.28425407,  95.97141816]), array([404.23704338,  65.04879548, 547.01273918,  96.03654267]), array([  3.22793937,  96.08334137, 110.38572502, 127.08698823]), array([110.40586662,  96.10539795, 213.19943047, 127.07002045]), array([213.12627983,  96.0539148 , 404.42686272, 127.02842499]), array([404.33042717,  96.07251526, 547.01273918, 126.45088746])], 'pred_html': '<html><body><table><tr><td colspan="4">CRuncover</td></tr><tr><td>Dres</td><td>连续工作3</td><td>取出来放在网上 没想</td><td>江、整江等八大</td></tr><tr><td>Abstr</td><td></td><td>rSrivi</td><td>$709.</td></tr><tr><td>cludingGiv</td><td>2.72</td><td>Ingcubic</td><td>$744.78</td></tr></table></body></html>', 'table_ocr_pred': {'rec_polys': [array([[234,   6],
-       [316,   6],
-       [316,  25],
-       [234,  25]], dtype=int16), array([[38, 39],
-       [73, 39],
-       [73, 57],
-       [38, 57]], dtype=int16), array([[122,  32],
-       [201,  32],
-       [201,  58],
-       [122,  58]], dtype=int16), array([[227,  34],
-       [346,  34],
-       [346,  57],
-       [227,  57]], dtype=int16), array([[351,  34],
-       [391,  34],
-       [391,  58],
-       [351,  58]], dtype=int16), array([[417,  35],
-       [534,  35],
-       [534,  58],
-       [417,  58]], dtype=int16), array([[34, 70],
-       [78, 70],
-       [78, 90],
-       [34, 90]], dtype=int16), array([[287,  70],
-       [328,  70],
-       [328,  90],
-       [287,  90]], dtype=int16), array([[454,  69],
-       [496,  69],
-       [496,  90],
-       [454,  90]], dtype=int16), array([[ 17, 101],
-       [ 95, 101],
-       [ 95, 124],
-       [ 17, 124]], dtype=int16), array([[144, 101],
-       [178, 101],
-       [178, 122],
-       [144, 122]], dtype=int16), array([[278, 101],
-       [338, 101],
-       [338, 124],
-       [278, 124]], dtype=int16), array([[448, 101],
-       [503, 101],
-       [503, 121],
-       [448, 121]], dtype=int16)], 'rec_texts': ['CRuncover', 'Dres', '连续工作3', '取出来放在网上', '没想', '江、整江等八大', 'Abstr', 'rSrivi', '$709.', 'cludingGiv', '2.72', 'Ingcubic', '$744.78'], 'rec_scores': [0.9951260685920715, 0.9943379759788513, 0.9968608021736145, 0.9978817105293274, 0.9985721111297607, 0.9616036415100098, 0.9977153539657593, 0.987593948841095, 0.9906861186027527, 0.9959743618965149, 0.9970152378082275, 0.9977849721908569, 0.9984450936317444], 'rec_boxes': [array([234,   6, 316,  25], dtype=int16), array([38, 39, 73, 57], dtype=int16), array([122,  32, 201,  58], dtype=int16), array([227,  34, 346,  57], dtype=int16), array([351,  34, 391,  58], dtype=int16), array([417,  35, 534,  58], dtype=int16), array([34, 70, 78, 90], dtype=int16), array([287,  70, 328,  90], dtype=int16), array([454,  69, 496,  90], dtype=int16), array([ 17, 101,  95, 124], dtype=int16), array([144, 101, 178, 122], dtype=int16), array([278, 101, 338, 124], dtype=int16), array([448, 101, 503, 121], dtype=int16)]}}]}}
+{'res': {'input_path': 'table_recognition.jpg', 'page_index': None, 'model_settings': {'use_doc_preprocessor': False, 'use_layout_detection': True, 'use_ocr_model': True}, 'layout_det_res': {'input_path': None, 'page_index': None, 'boxes': [{'cls_id': 0, 'label': 'Table', 'score': 0.9922188520431519, 'coordinate': [3.0127392, 0.14648987, 547.5102, 127.72023]}]}, 'overall_ocr_res': {'input_path': None, 'page_index': None, 'model_settings': {'use_doc_preprocessor': False, 'use_textline_orientation': False}, 'dt_polys': array([[[234,   6],
+        ...,
+        [234,  25]],
+
+       ...,
+
+       [[448, 101],
+        ...,
+        [448, 121]]], dtype=int16), 'text_det_params': {'limit_side_len': 960, 'limit_type': 'max', 'thresh': 0.3, 'box_thresh': 0.6, 'unclip_ratio': 2.0}, 'text_type': 'general', 'textline_orientation_angles': array([-1, ..., -1]), 'text_rec_score_thresh': 0, 'rec_texts': ['CRuncover', 'Dres', '连续工作3', '取出来放在网上', '没想', '江、整江等八大', 'Abstr', 'rSrivi', '$709.', 'cludingGiv', '2.72', 'Ingcubic', '$744.78'], 'rec_scores': array([0.99512607, ..., 0.99844509]), 'rec_polys': array([[[234,   6],
+        ...,
+        [234,  25]],
+
+       ...,
+
+       [[448, 101],
+        ...,
+        [448, 121]]], dtype=int16), 'rec_boxes': array([[234, ...,  25],
+       ...,
+       [448, ..., 121]], dtype=int16)}, 'table_res_list': [{'cell_box_list': [array([ 3.18822289, ..., 30.87823655]), array([ 3.21032453, ..., 65.14108063]), array([110.18174553, ...,  65.02860047]), array([212.96108818, ...,  64.99535157]), array([404.08112907, ...,  65.0847223 ]), array([ 3.21772957, ..., 96.07921387]), array([110.23703575, ...,  96.01378419]), array([213.06095695, ...,  95.97141816]), array([404.23704338, ...,  96.03654267]), array([  3.22793937, ..., 127.08698823]), array([110.40586662, ..., 127.07002045]), array([213.12627983, ..., 127.02842499]), array([404.33042717, ..., 126.45088746])], 'pred_html': '<html><body><table><tr><td colspan="4">CRuncover</td></tr><tr><td>Dres</td><td>连续工作3</td><td>取出来放在网上 没想</td><td>江、整江等八大</td></tr><tr><td>Abstr</td><td></td><td>rSrivi</td><td>$709.</td></tr><tr><td>cludingGiv</td><td>2.72</td><td>Ingcubic</td><td>$744.78</td></tr></table></body></html>', 'table_ocr_pred': {'rec_polys': array([[[234,   6],
+        ...,
+        [234,  25]],
+
+       ...,
+
+       [[448, 101],
+        ...,
+        [448, 121]]], dtype=int16), 'rec_texts': ['CRuncover', 'Dres', '连续工作3', '取出来放在网上', '没想', '江、整江等八大', 'Abstr', 'rSrivi', '$709.', 'cludingGiv', '2.72', 'Ingcubic', '$744.78'], 'rec_scores': array([0.99512607, ..., 0.99844509]), 'rec_boxes': array([[234, ...,  25],
+       ...,
+       [448, ..., 121]], dtype=int16)}}]}}
 ```
 运行结果参数说明可以参考[2.2 Python脚本方式集成](#22-python脚本方式集成)中的结果解释。
 
 可视化结果保存在`save_path`下,其中表格识别的可视化结果如下:
-<img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/pipelines/table_recognition_v2/01.jpg">
+<img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/pipelines/table_recognition_v2/02.jpg">
 
 </details>
 
@@ -612,7 +518,7 @@ for res in output:
 
 在上述 Python 脚本中,执行了如下几个步骤:
 
-(1)通过 `create_pipeline()` 实例化通用表格识别产线v2对象,具体参数说明如下:
+(1)通过 `create_pipeline()` 实例化通用表格识别v2产线对象,具体参数说明如下:
 
 <table>
 <thead>
@@ -651,7 +557,7 @@ for res in output:
 </tbody>
 </table>
 
-(2)调用通用表格识别产线v2对象的 `predict()` 方法进行推理预测。该方法将返回一个 `generator`。以下是 `predict()` 方法的参数及其说明:
+(2)调用通用表格识别v2产线对象的 `predict()` 方法进行推理预测。该方法将返回一个 `generator`。以下是 `predict()` 方法的参数及其说明:
 
 <table>
 <thead>
@@ -717,6 +623,19 @@ for res in output:
 </td>
 <td><code>None</code></td>
 </tr>
+<tr>
+<td><code>use_layout_detection</code></td>
+<td>是否使用版面检测模块</td>
+<td><code>bool|None</code></td>
+<td>
+<ul>
+<li><b>bool</b>:<code>True</code> 或者 <code>False</code>;</li>
+<li><b>None</b>:如果设置为<code>None</code>, 将默认使用产线初始化的该参数值,初始化为<code>True</code>;</li>
+</ul>
+</td>
+<td><code>None</code></td>
+</tr>
+
 <td><code>text_det_limit_side_len</code></td>
 <td>文本检测的图像边长限制</td>
 <td><code>int|None</code></td>
@@ -727,6 +646,7 @@ for res in output:
 </ul>
 </td>
 <td><code>None</code></td>
+</tr>
 <td><code>text_det_limit_type</code></td>
 <td>文本检测的图像边长限制类型</td>
 <td><code>str|None</code></td>
@@ -737,6 +657,7 @@ for res in output:
 </ul>
 </td>
 <td><code>None</code></td>
+</tr>
 <td><code>text_det_thresh</code></td>
 <td>检测像素阈值,输出的概率图中,得分大于该阈值的像素点才会被认为是文字像素点</td>
 <td><code>float|None</code></td>
@@ -745,6 +666,7 @@ for res in output:
 <li><b>float</b>:大于 <code>0</code> 的任意浮点数
     <li><b>None</b>:如果设置为 <code>None</code>, 将默认使用产线初始化的该参数值 <code>0.3</code></li></li></ul></td>
 <td><code>None</code></td>
+</tr>
 <td><code>text_det_box_thresh</code></td>
 <td>检测框阈值,检测结果边框内,所有像素点的平均得分大于该阈值时,该结果会被认为是文字区域</td>
 <td><code>float|None</code></td>
@@ -753,6 +675,7 @@ for res in output:
 <li><b>float</b>:大于 <code>0</code> 的任意浮点数
     <li><b>None</b>:如果设置为 <code>None</code>, 将默认使用产线初始化的该参数值 <code>0.6</code></li></li></ul></td>
 <td><code>None</code></td>
+</tr>
 <td><code>text_det_unclip_ratio</code></td>
 <td>文本检测扩张系数,使用该方法对文字区域进行扩张,该值越大,扩张的面积越大</td>
 <td><code>float|None</code></td>
@@ -761,6 +684,7 @@ for res in output:
 <li><b>float</b>:大于 <code>0</code> 的任意浮点数
     <li><b>None</b>:如果设置为 <code>None</code>, 将默认使用产线初始化的该参数值 <code>2.0</code></li></li></ul></td>
 <td><code>None</code></td>
+</tr>
 <td><code>text_rec_score_thresh</code></td>
 <td>文本识别阈值,得分大于该阈值的文本结果会被保留</td>
 <td><code>float|None</code></td>
@@ -769,66 +693,7 @@ for res in output:
 <li><b>float</b>:大于 <code>0</code> 的任意浮点数
     <li><b>None</b>:如果设置为 <code>None</code>, 将默认使用产线初始化的该参数值 <code>0.0</code>。即不设阈值</li></li></ul></td>
 <td><code>None</code></td>
-<tr>
-<td><code>use_layout_detection</code></td>
-<td>是否使用版面检测模块</td>
-<td><code>bool|None</code></td>
-<td>
-<ul>
-<li><b>bool</b>:<code>True</code> 或者 <code>False</code>;</li>
-<li><b>None</b>:如果设置为<code>None</code>, 将默认使用产线初始化的该参数值,初始化为<code>True</code>;</li>
-</ul>
-</td>
-<td><code>None</code></td>
-</tr>
-<tr>
-<td><code>layout_threshold</code></td>
-<td>版面检测置信度阈值,得分大于该阈值才会被输出</td>
-<td><code>float|dict|None</code></td>
-<td>
-<ul>
-<li><b>float</b>:大于 <code>0</code> 的任意浮点数
-    <li><b>dict</b>:key是int类别id, value是大于 <code>0</code> 的任意浮点数
-    <li><b>None</b>:如果设置为 <code>None</code>, 将默认使用产线初始化的该参数值 <code>0.5</code></li></li></li></ul></td>
-<td><code>None</code></td>
-</tr>
-<tr>
-<td><code>layout_nms</code></td>
-<td>是否使用版面检测后处理NMS</td>
-<td><code>bool|None</code></td>
-<td>
-<ul>
-<li><b>bool</b>:<code>True</code> 或者 <code>False</code>;</li>
-<li><b>None</b>:如果设置为<code>None</code>, 将默认使用产线初始化的该参数值,初始化为<code>True</code>;</li>
-</ul>
-</td>
-<td><code>None</code></td>
-</tr>
-<tr>
-<td><code>layout_unclip_ratio</code></td>
-<td>检测框的边长缩放倍数;如果不指定,将默认使用PaddleX官方模型配置</td>
-<td><code>float|list|None</code></td>
-<td>
-<ul>
-<li><b>float</b>, 大于0的浮点数,如 1.1 , 表示将模型输出的检测框中心不变,宽和高都扩张1.1倍</li>
-<li><b>列表</b>, 如 [1.2, 1.5] , 表示将模型输出的检测框中心不变,宽度扩张1.2倍,高度扩张1.5倍</li>
-<li><b>None</b>:如果设置为<code>None</code>, 将默认使用产线初始化的该参数值,初始化为1.0</li>
-</ul>
-</td>
-<tr>
-<td><code>layout_merge_bboxes_mode</code></td>
-<td>模型输出的检测框的合并处理模式;如果不指定,将默认使用PaddleX官方模型配置</td>
-<td><code>string|None</code></td>
-<td>
-<ul>
-<li><b>large</b>, 设置为large时,表示在模型输出的检测框中,对于互相重叠包含的检测框,只保留外部最大的框,删除重叠的内部框。</li>
-<li><b>small</b>, 设置为small,表示在模型输出的检测框中,对于互相重叠包含的检测框,只保留内部被包含的小框,删除重叠的外部框。</li>
-<li><b>union</b>, 不进行框的过滤处理,内外框都保留</li>
-<li><b>None</b>:如果设置为<code>None</code>, 将默认使用产线初始化的该参数值,初始化为<code>large</code></li>
-</ul>
-</td>
-<td>None</td>
-</tr>
+
 </tr></table>
 
 (3)对预测结果进行处理,每个样本的预测结果均为对应的Result对象,且支持打印、保存为图片、保存为`xlsx`文件、保存为`HTML`文件、保存为`json`文件的操作:
@@ -958,10 +823,10 @@ for res in output:
     - `rec_boxes`: `(numpy.ndarray)` 检测框的矩形边界框数组,shape为(n, 4),dtype为int16。每一行表示一个矩形框的[x_min, y_min, x_max, y_max]坐标
     ,其中(x_min, y_min)为左上角坐标,(x_max, y_max)为右下角坐标
 
-- 调用`save_to_json()` 方法会将上述内容保存到指定的`save_path`中,如果指定为目录,则保存的路径为`save_path/{your_img_basename}.json`,如果指定为文件,则直接保存到该文件中。由于json文件不支持保存numpy数组,因此会将其中的`numpy.array`类型转换为列表形式。
+- 调用`save_to_json()` 方法会将上述内容保存到指定的`save_path`中,如果指定为目录,则保存的路径为`save_path/{your_img_basename}_res.json`,如果指定为文件,则直接保存到该文件中。由于json文件不支持保存numpy数组,因此会将其中的`numpy.array`类型转换为列表形式。
 - 调用`save_to_img()` 方法会将可视化结果保存到指定的`save_path`中,如果指定为目录,则保存的路径为`save_path/{your_img_basename}_ocr_res_img.{your_img_extension}`,如果指定为文件,则直接保存到该文件中。(产线通常包含较多结果图片,不建议直接指定为具体的文件路径,否则多张图会被覆盖,仅保留最后一张图)
-- 调用`save_to_html()` 方法会将上述内容保存到指定的`save_path`中,如果指定为目录,则保存的路径为`save_path/{your_img_basename}.html`,如果指定为文件,则直接保存到该文件中。在通用表格识别产线v2中,将会把图像中表格的HTML形式写入到指定的html文件中。
-- 调用`save_to_xlsx()` 方法会将上述内容保存到指定的`save_path`中,如果指定为目录,则保存的路径为`save_path/{your_img_basename}.xlsx`,如果指定为文件,则直接保存到该文件中。在通用表格识别产线v2中,将会把图像中表格的Excel表格形式写入到指定的xlsx文件中。
+- 调用`save_to_html()` 方法会将上述内容保存到指定的`save_path`中,如果指定为目录,则保存的路径为`save_path/{your_img_basename}_table_1.html`,如果指定为文件,则直接保存到该文件中。在通用表格识别v2产线中,将会把图像中表格的HTML形式写入到指定的html文件中。
+- 调用`save_to_xlsx()` 方法会将上述内容保存到指定的`save_path`中,如果指定为目录,则保存的路径为`save_path/{your_img_basename}_res.xlsx`,如果指定为文件,则直接保存到该文件中。在通用表格识别v2产线中,将会把图像中表格的Excel表格形式写入到指定的xlsx文件中。
 
 * 此外,也支持通过属性获取带结果的可视化图像和预测结果,具体如下:
 
@@ -985,13 +850,13 @@ for res in output:
 - `json` 属性获取的预测结果为dict类型的数据,相关内容与调用 `save_to_json()` 方法保存的内容一致。
 - `img` 属性返回的预测结果是一个字典类型的数据。其中,键分别为 `table_res_img`、`ocr_res_img` 、`layout_res_img` 和 `preprocessed_img`,对应的值是四个 `Image.Image` 对象,按顺序分别为:表格识别结果的可视化图像、OCR 结果的可视化图像、版面区域检测结果的可视化图像、图像预处理的可视化图像。如果没有使用某个子模块,则字典中不包含对应的结果图像。
 
-此外,您可以获取通用表格识别产线v2配置文件,并加载配置文件进行预测。可执行如下命令将结果保存在 `my_path` 中:
+此外,您可以获取通用表格识别v2产线配置文件,并加载配置文件进行预测。可执行如下命令将结果保存在 `my_path` 中:
 
 ```
 paddlex --get_pipeline_config table_recognition_v2 --save_path ./my_path
 ```
 
-若您获取了配置文件,即可对通用表格识别产线v2各项配置进行自定义,只需要修改 `create_pipeline` 方法中的 `pipeline` 参数值为产线配置文件路径即可。示例如下:
+若您获取了配置文件,即可对通用表格识别v2产线各项配置进行自定义,只需要修改 `create_pipeline` 方法中的 `pipeline` 参数值为产线配置文件路径即可。示例如下:
 
 ```python
 from paddlex import create_pipeline
@@ -1013,7 +878,7 @@ for res in output:
 
 ```
 
-<b>注:</b> 配置文件中的参数为产线初始化参数,如果希望更改通用通用表格识别产线v2初始化参数,可以直接修改配置文件中的参数,并加载配置文件进行预测。同时,CLI 预测也支持传入配置文件,`--pipeline` 指定配置文件的路径即可。
+<b>注:</b> 配置文件中的参数为产线初始化参数,如果希望更改通用通用表格识别v2产线初始化参数,可以直接修改配置文件中的参数,并加载配置文件进行预测。同时,CLI 预测也支持传入配置文件,`--pipeline` 指定配置文件的路径即可。
 
 
 ## 3. 开发集成/部署
@@ -1278,10 +1143,10 @@ for i, res in enumerate(result["tableRecResults"]):
 您可以根据需要选择合适的方式部署模型产线,进而进行后续的 AI 应用集成。
 
 ## 4. 二次开发
-如果通用表格识别产线v2提供的默认模型权重在您的场景中,精度或速度不满意,您可以尝试利用<b>您自己拥有的特定领域或应用场景的数据</b>对现有模型进行进一步的<b>微调</b>,以提升通用表格识别产线v2的在您的场景中的识别效果。
+如果通用表格识别v2产线提供的默认模型权重在您的场景中,精度或速度不满意,您可以尝试利用<b>您自己拥有的特定领域或应用场景的数据</b>对现有模型进行进一步的<b>微调</b>,以提升通用表格识别v2产线的在您的场景中的识别效果。
 
 ### 4.1 模型微调
-由于通用表格识别产线v2包含若干模块,模型产线的效果如果不及预期,可能来自于其中任何一个模块。您可以对识别效果差的图片进行分析,进而确定是哪个模块存在问题,并参考以下表格中对应的微调教程链接进行模型微调。
+由于通用表格识别v2产线包含若干模块,模型产线的效果如果不及预期,可能来自于其中任何一个模块。您可以对识别效果差的图片进行分析,进而确定是哪个模块存在问题,并参考以下表格中对应的微调教程链接进行模型微调。
 
 <table>
 <thead>
@@ -1416,10 +1281,12 @@ SubPipelines:
 ##  5. 多硬件支持
 PaddleX 支持英伟达 GPU、昆仑芯 XPU、昇腾 NPU和寒武纪 MLU 等多种主流硬件设备,<b>仅需修改 `--device` 参数</b>即可完成不同硬件之间的无缝切换。
 
-例如,您使用昇腾 NPU 进行 OCR 产线的推理,使用的 CLI 命令为:
+例如,您使用昇腾 NPU 进行通用表格识别v2产线的推理,使用的 CLI 命令为:
 
 ```bash
 paddlex --pipeline table_recognition_v2 \
+        --use_doc_orientation_classify=False \
+        --use_doc_unwarping=False \
         --input table_recognition.jpg \
         --save_path ./output \
         --device npu:0
@@ -1427,4 +1294,4 @@ paddlex --pipeline table_recognition_v2 \
 
 当然,您也可以在 Python 脚本中 `create_pipeline()` 时或者 `predict()` 时指定硬件设备。
 
-若您想在更多种类的硬件上使用通用表格识别产线v2,请参考[PaddleX多硬件使用指南](../../../other_devices_support/multi_devices_use_guide.md)。
+若您想在更多种类的硬件上使用通用表格识别v2产线,请参考[PaddleX多硬件使用指南](../../../other_devices_support/multi_devices_use_guide.md)。

+ 69 - 22
paddlex/inference/pipelines/table_recognition/table_recognition_post_processing_v2.py

@@ -120,6 +120,29 @@ def compute_iou(rec1: list, rec2: list) -> float:
         intersect = (right_line - left_line) * (bottom_line - top_line)
         return (intersect / (sum_area - intersect)) * 1.0
 
+def compute_inter(rec1, rec2):
+    """
+    computing intersection over rec2_area
+    Args:
+        rec1 (list): (x1, y1, x2, y2)
+        rec2 (list): (x1, y1, x2, y2)
+    Returns:
+        float: Intersection over rec2_area
+    """
+    x1_1, y1_1, x2_1, y2_1 = rec1
+    x1_2, y1_2, x2_2, y2_2 = rec2
+    x_left = max(x1_1, x1_2)
+    y_top = max(y1_1, y1_2)
+    x_right = min(x2_1, x2_2)
+    y_bottom = min(y2_1, y2_2)
+    inter_width = max(0, x_right - x_left)
+    inter_height = max(0, y_bottom - y_top)
+    inter_area = inter_width * inter_height
+    rec2_area = (x2_2 - x1_2) * (y2_2 - y1_2)
+    if rec2_area == 0:
+        return 0 
+    iou = inter_area / rec2_area
+    return iou
 
 def match_table_and_ocr(cell_box_list: list, ocr_dt_boxes: list) -> dict:
     """
@@ -133,27 +156,52 @@ def match_table_and_ocr(cell_box_list: list, ocr_dt_boxes: list) -> dict:
         dict: matched dict, key is table index, value is ocr index
     """
     matched = {}
-    for i, ocr_box in enumerate(np.array(ocr_dt_boxes)):
-        ocr_box = ocr_box.astype(np.float32)
-        distances = []
-        for j, table_box in enumerate(cell_box_list):
-            if len(table_box) == 8:
-                table_box = [
-                    np.min(table_box[0::2]),
-                    np.min(table_box[1::2]),
-                    np.max(table_box[0::2]),
-                    np.max(table_box[1::2]),
-                ]
-            distances.append(
-                (distance(table_box, ocr_box), 1.0 - compute_iou(table_box, ocr_box))
-            )  # compute iou and l1 distance
-        sorted_distances = distances.copy()
-        # select det box by iou and l1 distance
-        sorted_distances = sorted(sorted_distances, key=lambda item: (item[1], item[0]))
-        if distances.index(sorted_distances[0]) not in matched.keys():
-            matched[distances.index(sorted_distances[0])] = [i]
-        else:
-            matched[distances.index(sorted_distances[0])].append(i)
+    del_ocr = []
+    for i, table_box in enumerate(cell_box_list):
+        if len(table_box) == 8:
+            table_box = [
+                np.min(table_box[0::2]),
+                np.min(table_box[1::2]),
+                np.max(table_box[0::2]),
+                np.max(table_box[1::2]),
+            ]
+        for j, ocr_box in enumerate(np.array(ocr_dt_boxes)):
+            if compute_inter(table_box, ocr_box) > 0.8:
+                if i not in matched.keys():
+                    matched[i] = [j]
+                else:
+                    matched[i].append(j)
+                del_ocr.append(j)
+    miss_ocr = []
+    miss_ocr_index = []
+    for m in range(len(ocr_dt_boxes)):
+        if m not in del_ocr:
+            miss_ocr.append(ocr_dt_boxes[m])
+            miss_ocr_index.append(m)
+    if len(miss_ocr) != 0:
+        for k, miss_ocr_box in enumerate(miss_ocr):
+            distances = []
+            for q, table_box in enumerate(cell_box_list):
+                if len(table_box) == 0:
+                    continue
+                if len(table_box) == 8:
+                    table_box = [
+                        np.min(table_box[0::2]),
+                        np.min(table_box[1::2]),
+                        np.max(table_box[0::2]),
+                        np.max(table_box[1::2]),
+                    ]
+                distances.append(
+                    (distance(table_box, miss_ocr_box), 1.0 - compute_iou(table_box, miss_ocr_box))
+                )  # compute iou and l1 distance
+            sorted_distances = distances.copy()
+            # select det box by iou and l1 distance
+            sorted_distances = sorted(sorted_distances, key=lambda item: (item[1], item[0]))
+            if distances.index(sorted_distances[0]) not in matched.keys():
+                matched[distances.index(sorted_distances[0])] = [miss_ocr_index[k]]
+            else:
+                matched[distances.index(sorted_distances[0])].append(miss_ocr_index[k])
+    # print(matched)
     return matched
 
 def get_html_result(
@@ -329,7 +377,6 @@ def get_table_recognition_res(
     ocr_texts_res = table_ocr_pred["rec_texts"]
 
     table_cells_result = sort_table_cells_boxes(table_cells_result)
-    ocr_dt_boxes = sort_table_cells_boxes(ocr_dt_boxes)
 
     matched_index = match_table_and_ocr(table_cells_result, ocr_dt_boxes)
     pred_html = get_html_result(matched_index, ocr_texts_res, table_structure_result)

+ 104 - 2
paddlex/utils/pipeline_arguments.py

@@ -142,8 +142,110 @@ PIPELINE_ARGUMENTS = {
             "help": "Sets the threshold for human detection.",
         },
     ],
-    "table_recognition": None,
-    "table_recognition_v2": None,
+    "table_recognition": [
+        {
+            "name": "--use_doc_orientation_classify",
+            "type": bool,
+            "help": "Determines whether to use document preprocessing",
+        },
+        {
+            "name": "--use_doc_unwarping",
+            "type": bool,
+            "help": "Determines whether to use document unwarping",
+        },
+        {
+            "name": "--use_layout_detection",
+            "type": bool,
+            "help": "Determines whether to use document layout detection",
+        },
+        {
+            "name": "--use_ocr_model",
+            "type": bool,
+            "help": "Determines whether to use OCR",
+        },
+        {
+            "name": "--text_det_limit_side_len",
+            "type": int,
+            "help": "Sets the side length limit for text detection.",
+        },
+        {
+            "name": "--text_det_limit_type",
+            "type": str,
+            "help": "Sets the limit type for text detection.",
+        },
+        {
+            "name": "--text_det_thresh",
+            "type": float,
+            "help": "Sets the threshold for text detection.",
+        },
+        {
+            "name": "--text_det_box_thresh",
+            "type": float,
+            "help": "Sets the box threshold for text detection.",
+        },
+        {
+            "name": "--text_det_unclip_ratio",
+            "type": float,
+            "help": "Sets the unclip ratio for text detection.",
+        },
+        {
+            "name": "--text_rec_score_thresh",
+            "type": float,
+            "help": "Sets the score threshold for text recognition.",
+        },
+    ],
+    "table_recognition_v2": [
+        {
+            "name": "--use_doc_orientation_classify",
+            "type": bool,
+            "help": "Determines whether to use document preprocessing",
+        },
+        {
+            "name": "--use_doc_unwarping",
+            "type": bool,
+            "help": "Determines whether to use document unwarping",
+        },
+        {
+            "name": "--use_layout_detection",
+            "type": bool,
+            "help": "Determines whether to use document layout detection",
+        },
+        {
+            "name": "--use_ocr_model",
+            "type": bool,
+            "help": "Determines whether to use OCR",
+        },
+        {
+            "name": "--text_det_limit_side_len",
+            "type": int,
+            "help": "Sets the side length limit for text detection.",
+        },
+        {
+            "name": "--text_det_limit_type",
+            "type": str,
+            "help": "Sets the limit type for text detection.",
+        },
+        {
+            "name": "--text_det_thresh",
+            "type": float,
+            "help": "Sets the threshold for text detection.",
+        },
+        {
+            "name": "--text_det_box_thresh",
+            "type": float,
+            "help": "Sets the box threshold for text detection.",
+        },
+        {
+            "name": "--text_det_unclip_ratio",
+            "type": float,
+            "help": "Sets the unclip ratio for text detection.",
+        },
+        {
+            "name": "--text_rec_score_thresh",
+            "type": float,
+            "help": "Sets the score threshold for text recognition.",
+        },
+    ],
     "seal_recognition": [
         {
             "name": "--use_doc_orientation_classify",