Browse Source

Fix bugs for table related code and docs (#3338)

* fix bugs

* refine codes

* fix bugs

* refine code

* add new algorithm

* refine codes

* refine codes

* refine table method

* fix bugs
Liu Jiaxuan 9 tháng trước cách đây
mục cha
commit
5d4f06f5f1

Những thai đổi đã bị hủy bỏ vì nó quá lớn
+ 0 - 1
docs/module_usage/tutorials/ocr_modules/table_cells_detection.en.md


Những thai đổi đã bị hủy bỏ vì nó quá lớn
+ 0 - 1
docs/module_usage/tutorials/ocr_modules/table_cells_detection.md


+ 2 - 2
docs/module_usage/tutorials/ocr_modules/table_classification.en.md

@@ -45,14 +45,14 @@ for res in output:
 After running the code, the result obtained is:
 
 ```
-{'res': {'input_path': 'table_recognition.jpg', 'class_ids': array([0, 1], dtype=int32), 'scores': array([0.84421, 0.15579], dtype=float32), 'label_names': ['wired_table', 'wireless_table']}}
+{'res': {'input_path': 'table_recognition.jpg', "page_index": None, 'class_ids': array([0, 1], dtype=int32), 'scores': array([0.84421, 0.15579], dtype=float32), 'label_names': ['wired_table', 'wireless_table']}}
 ```
 
 <img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/refs/heads/main/images/modules/table_classification/01.jpg">
 
 The meanings of the parameters in the running results are as follows:
 - `input_path`: Indicates the path of the input image.
-- `page_index`:If the input is a PDF file, this indicates the current page number of the PDF. Otherwise, it is `null`
+- `page_index`:If the input is a PDF file, this indicates the current page number of the PDF. Otherwise, it is `None`
 - `class_ids`: Indicates the class ID of the prediction result.
 - `scores`: Indicates the confidence of the prediction result.
 - `label_names`: Indicates the class name of the prediction result.

+ 2 - 2
docs/module_usage/tutorials/ocr_modules/table_classification.md

@@ -45,12 +45,12 @@ for res in output:
 
 运行后,得到的结果为:
 ```
-{"res": {"input_path": "table_recognition.jpg", "page_index": null, "class_ids": array([0, 1], dtype=int32), "scores": array([0.84421, 0.15579], dtype=float32), "label_names": ["wired_table", "wireless_table"]}}
+{"res": {"input_path": "table_recognition.jpg", "page_index": None, "page_index": null, "class_ids": array([0, 1], dtype=int32), "scores": array([0.84421, 0.15579], dtype=float32), "label_names": ["wired_table", "wireless_table"]}}
 ```
 
 运行结果参数含义如下:
 - `input_path`:表示输入图片的路径
-- `page_index`:如果输入是PDF文件,则表示当前是PDF的第几页,否则为 `null`
+- `page_index`:如果输入是PDF文件,则表示当前是PDF的第几页,否则为 `None`
 - `class_ids`:表示预测结果的类别id
 - `scores`:表示预测结果的置信度
 - `label_names`:表示预测结果的类别名

+ 2 - 2
docs/module_usage/tutorials/ocr_modules/table_structure_recognition.en.md

@@ -72,13 +72,13 @@ for res in output:
 <details><summary>👉 <b>After running, the result is: (Click to expand)</b></summary>
 
 ```
-{'res': {'input_path': 'table_recognition.jpg', 'page_index': null, 'bbox': [array([ 42,   2, 390,   2, 388,  27,  40,  26]), array([11, 35, 89, 35, 87, 63, 11, 63]), array([113,  34, 192,  34, 186,  64, 109,  64]), array([219,  33, 399,  33, 393,  62, 212,  62]), array([413,  33, 544,  33, 544,  64, 407,  64]), array([12, 67, 98, 68, 96, 93, 12, 93]), array([115,  66, 205,  66, 200,  91, 111,  91]), array([234,  65, 390,  65, 385,  92, 227,  92]), array([414,  66, 537,  67, 537,  95, 409,  95]), array([  7,  97, 106,  97, 104, 128,   7, 128]), array([113,  96, 206,  95, 201, 127, 109, 127]), array([236,  96, 386,  96, 381, 128, 230, 128]), array([413,  96, 534,  95, 533, 127, 408, 127])], 'structure': ['<html>', '<body>', '<table>', '<tr>', '<td', ' colspan="4"', '>', '</td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '</table>', '</body>', '</html>'], 'structure_score': 0.99948007}}
+{'res': {'input_path': 'table_recognition.jpg', 'page_index': None, 'bbox': [array([ 42,   2, 390,   2, 388,  27,  40,  26]), array([11, 35, 89, 35, 87, 63, 11, 63]), array([113,  34, 192,  34, 186,  64, 109,  64]), array([219,  33, 399,  33, 393,  62, 212,  62]), array([413,  33, 544,  33, 544,  64, 407,  64]), array([12, 67, 98, 68, 96, 93, 12, 93]), array([115,  66, 205,  66, 200,  91, 111,  91]), array([234,  65, 390,  65, 385,  92, 227,  92]), array([414,  66, 537,  67, 537,  95, 409,  95]), array([  7,  97, 106,  97, 104, 128,   7, 128]), array([113,  96, 206,  95, 201, 127, 109, 127]), array([236,  96, 386,  96, 381, 128, 230, 128]), array([413,  96, 534,  95, 533, 127, 408, 127])], 'structure': ['<html>', '<body>', '<table>', '<tr>', '<td', ' colspan="4"', '>', '</td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '</table>', '</body>', '</html>'], 'structure_score': 0.99948007}}
 ```
 
 Parameter meanings are as follows:
 <ul>
 <li><code>input_path</code>: The path of the input image to be predicted</li>
-<li><code>page_index</code>:If the input is a PDF file, this indicates the current page number of the PDF. Otherwise, it is `null`</li>
+<li><code>page_index</code>:If the input is a PDF file, this indicates the current page number of the PDF. Otherwise, it is `None`</li>
 <li><code>boxes</code>: Predicted table cell information, a list composed of several predicted table cell coordinates. Note that the table cell predictions from the SLANeXt series models are invalid</li>
 <li><code>structure</code>: Predicted table structure in HTML expressions, a list composed of several predicted HTML keywords in order</li>
 <li><code>structure_score</code>: Confidence score of the predicted table structure</li>

+ 3 - 3
docs/module_usage/tutorials/ocr_modules/table_structure_recognition.md

@@ -66,13 +66,13 @@ for res in output:
 
 <details><summary>👉 <b>运行后,得到的结果为:(点击展开)</b></summary>
 
-```json
-{'res': {'input_path': 'table_recognition.jpg', 'page_index': null, 'bbox': [array([ 42,   2, 390,   2, 388,  27,  40,  26]), array([11, 35, 89, 35, 87, 63, 11, 63]), array([113,  34, 192,  34, 186,  64, 109,  64]), array([219,  33, 399,  33, 393,  62, 212,  62]), array([413,  33, 544,  33, 544,  64, 407,  64]), array([12, 67, 98, 68, 96, 93, 12, 93]), array([115,  66, 205,  66, 200,  91, 111,  91]), array([234,  65, 390,  65, 385,  92, 227,  92]), array([414,  66, 537,  67, 537,  95, 409,  95]), array([  7,  97, 106,  97, 104, 128,   7, 128]), array([113,  96, 206,  95, 201, 127, 109, 127]), array([236,  96, 386,  96, 381, 128, 230, 128]), array([413,  96, 534,  95, 533, 127, 408, 127])], 'structure': ['<html>', '<body>', '<table>', '<tr>', '<td', ' colspan="4"', '>', '</td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '</table>', '</body>', '</html>'], 'structure_score': 0.99948007}}
+```
+{'res': {'input_path': 'table_recognition.jpg', 'page_index': None, 'bbox': [array([ 42,   2, 390,   2, 388,  27,  40,  26]), array([11, 35, 89, 35, 87, 63, 11, 63]), array([113,  34, 192,  34, 186,  64, 109,  64]), array([219,  33, 399,  33, 393,  62, 212,  62]), array([413,  33, 544,  33, 544,  64, 407,  64]), array([12, 67, 98, 68, 96, 93, 12, 93]), array([115,  66, 205,  66, 200,  91, 111,  91]), array([234,  65, 390,  65, 385,  92, 227,  92]), array([414,  66, 537,  67, 537,  95, 409,  95]), array([  7,  97, 106,  97, 104, 128,   7, 128]), array([113,  96, 206,  95, 201, 127, 109, 127]), array([236,  96, 386,  96, 381, 128, 230, 128]), array([413,  96, 534,  95, 533, 127, 408, 127])], 'structure': ['<html>', '<body>', '<table>', '<tr>', '<td', ' colspan="4"', '>', '</td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '</table>', '</body>', '</html>'], 'structure_score': 0.99948007}}
 ```
 
 参数含义如下:
 - `input_path`:输入的待预测表格图像的路径
-- `page_index`:如果输入是PDF文件,则表示当前是PDF的第几页,否则为 `null`
+- `page_index`:如果输入是PDF文件,则表示当前是PDF的第几页,否则为 `None`
 - `boxes`:预测的表格单元格信息,一个列表,由预测的若干表格单元格坐标组成。特别地, SLANeXt 系列模型预测的表格单元格无效
 - `structure`:预测的表格结构Html表达式,一个列表,由预测的若干Html关键字按顺序组成
 - `structure_score`:预测表格结构的置信度

+ 9 - 0
paddlex/inference/pipelines/table_recognition/result.py

@@ -96,6 +96,15 @@ class TableRecognitionResult(BaseCVResult, HtmlMixin, XlsxMixin):
         super().__init__(data)
         HtmlMixin.__init__(self)
         XlsxMixin.__init__(self)
+    
+    def _get_input_fn(self):
+        fn = super()._get_input_fn()
+        if (page_idx := self["page_index"]) is not None:
+            fp = Path(fn)
+            stem, suffix = fp.stem, fp.suffix
+            return f"{stem}_{page_idx}{suffix}"
+        else:
+            return fn
 
     def _to_img(self) -> Dict[str, np.ndarray]:
         res_img_dict = {}

Một số tệp đã không được hiển thị bởi vì quá nhiều tập tin thay đổi trong này khác