16 Commits d163421bd0 ... 3a5b2ab300

Autore SHA1 Messaggio Data
  zhch158_admin 3a5b2ab300 chore: Add .gitignore and a script to verify GridRecovery module import and cell computation with mocked dependencies. 3 giorni fa
  zhch158_admin 76f8e864a8 feat: Add .gitignore, implement grid recovery syntax verification, and enhance HuggingFace model loading with local cache prioritization. 3 giorni fa
  zhch158_admin e355727495 feat: Add wired table processing modules, `wired_table` adapter, and enhance HuggingFace model caching in `docling_layout_adapter`. 3 giorni fa
  zhch158_admin a4ad1d803a feat: Implement wired table processing with grid recovery and skew detection, and improve HuggingFace model caching. 3 giorni fa
  zhch158_admin 4f32495604 feat: Introduce new wired table processing module with enhanced skew detection, grid recovery, and output capabilities, and update pipeline to utilize it. 3 giorni fa
  zhch158_admin 3b3c3c9c5a feat: Introduce wired table parsing adapter with grid recovery, OCR formatting, and enhanced region cropping. 3 giorni fa
  zhch158_admin ce29ee3458 feat: Implement `mineru_wired_table_v2` adapter with enhanced table OCR preprocessing, grid recovery, and visualization utilities. 3 giorni fa
  zhch158_admin 6477e9183b feat: Add wired table adapter components, update Mineru wired table adapter, and improve HuggingFace model caching logic. 3 giorni fa
  zhch158_admin f7da730070 fix: 增强错误日志记录,添加详细的堆栈跟踪信息以便于调试 5 giorni fa
  zhch158_admin 4b399d085e feat: 添加倾斜检测与矫正功能,集成BBoxExtractor以优化OCR框处理 5 giorni fa
  zhch158_admin 05d07bb9ef feat: 添加BBoxExtractor以计算OCR文本的倾斜角度并记录信息 5 giorni fa
  zhch158_admin d7e5f2f689 refactor: 移除多边形到边界框的转换逻辑,简化IOU计算过程 5 giorni fa
  zhch158_admin bd17ca00f4 feat: 更新示例输入输出路径,添加新的测试图像以增强文档解析功能的测试覆盖率 5 giorni fa
  zhch158_admin 5235aff1b9 feat: 更新MinerU适配器,添加ocr_platform根目录到Python路径并优化坐标处理逻辑 5 giorni fa
  zhch158_admin fe223cd19d feat: 优化OCR文本框识别逻辑,优先使用多边形数据并增强错误日志信息 5 giorni fa
  zhch158_admin 60f761a6b5 feat: 重构单元格计算与网格恢复逻辑,增强对复杂表格的处理能力 6 giorni fa
22 ha cambiato i file con 2492 aggiunte e 1161 eliminazioni
  1. 36 0
      docs/ocr_tools/universal_doc_parser/unet表格识别连通域图-算法.md
  2. 5 3
      ocr_tools/universal_doc_parser/config/bank_statement_wired_unet.yaml
  3. 31 11
      ocr_tools/universal_doc_parser/core/coordinate_utils.py
  4. 84 29
      ocr_tools/universal_doc_parser/core/element_processors.py
  5. 0 2
      ocr_tools/universal_doc_parser/core/layout_utils.py
  6. 21 12
      ocr_tools/universal_doc_parser/core/pipeline_manager_v2.py
  7. 2 1
      ocr_tools/universal_doc_parser/core/pipeline_manager_v2_streaming.py
  8. 14 7
      ocr_tools/universal_doc_parser/main_v2.py
  9. 91 3
      ocr_tools/universal_doc_parser/models/adapters/docling_layout_adapter.py
  10. 11 4
      ocr_tools/universal_doc_parser/models/adapters/mineru_adapter.py
  11. 220 1086
      ocr_tools/universal_doc_parser/models/adapters/mineru_wired_table.py
  12. 32 0
      ocr_tools/universal_doc_parser/models/adapters/wired_table/__init__.py
  13. 107 0
      ocr_tools/universal_doc_parser/models/adapters/wired_table/debug_utils.py
  14. 537 0
      ocr_tools/universal_doc_parser/models/adapters/wired_table/grid_recovery.py
  15. 154 0
      ocr_tools/universal_doc_parser/models/adapters/wired_table/html_generator.py
  16. 46 0
      ocr_tools/universal_doc_parser/models/adapters/wired_table/ocr_formatter.py
  17. 455 0
      ocr_tools/universal_doc_parser/models/adapters/wired_table/skew_detection.py
  18. 463 0
      ocr_tools/universal_doc_parser/models/adapters/wired_table/text_filling.py
  19. 177 0
      ocr_tools/universal_doc_parser/models/adapters/wired_table/visualization.py
  20. 2 1
      ocr_utils/pdf_utils.py
  21. 2 0
      table_line_generator/backend/api/editor.py
  22. 2 2
      table_line_generator/core/table_analyzer.py

+ 36 - 0
docs/ocr_tools/universal_doc_parser/unet表格识别连通域图-算法.md

@@ -0,0 +1,36 @@
+### User Input
+
+如果unet可以正确识别出横竖线,能否输出横竖线的坐标,
+然后根据横竖线的坐标焦点计算出连通域?
+这种做饭与现有做法对比,哪种更稳定,为什么?
+
+### Planner Response
+
+这是一个非常深入且专业的问题。简短的回答是:**先提取线坐标,计算焦点并延长连接,再重新生成Mask进行连通域分析(即“先矢量化梳理,再栅格化分析”),是目前公认最稳定的有线表格恢复方案。**
+
+这也是 MinerU(以及许多商业级OCR引擎)采用的核心策略。
+
+我为您详细对比这两种方案的优劣和原理:
+
+### 方案对比
+
+| 特性 | **方案A:基于Mask的直接连通域 (现有做法)** | **方案B:基于矢量线的重构连通域 (推荐做法/MinerU做法)** |
+| :--- | :--- | :--- |
+| **核心逻辑** | UNet -> 概率图 -> 二值化 -> 也就是现在的做法 | UNet -> 概率图 -> **提取线段坐标** -> **数学延长线段使其相交** -> **重绘Mask** -> 连通域 |
+| **处理交叉点** | 依赖形态学膨胀(Dilate)去"糊"住缺口。如果缺口大或错位,容易失败。 | 依赖几何计算。**强制将线段延伸直到与另一条线相交**,从数学上消灭缺口。 |
+| **抗干扰性** | 弱。噪点、虚线、污渍会直接影响连通域形状。 | 强。只提取显著的长线段,过滤掉细碎噪点。 |
+| **抗形变性** | 强。弯曲的表格也能很好适配(因为Mask是像素级的)。 | 较弱。如果表格弯曲严重,直线拟合误差会变大(需引入曲线拟合)。 |
+| **复杂度** | 低。主要靠OpenCV基础算子。 | 高。涉及线段聚类、直线方程求解、交点计算等。 |
+| **稳定性** | **中等**。容易出现“单元格漏检”(线没连上)或“单元格粘连”。 | **极高**。只要线被检出,单元格就一定闭合,且形状非常规整。 |
+
+### 为什么方案B(矢量化重构)更稳定?
+
+您遇到的 *"明明肉眼看线是清楚的,但连通域却漏了"*,通常是因为 **像素级的断连**。
+*   **Mask方案的局限**:在交叉点(十字或T字路口),预测的像素可能只差 1-2 个像素没连上。虽然肉眼看着是挨着的,但在计算机算法里,它们就是断开的,导致这个“房间”漏风,连通域分析时背景灌入,单元格就消失了。
+*   **矢量化方案的优势**:
+    1.  **强制闭合**:它计算出横线 $y=100$ 和竖线 $x=50$。即使预测的竖线只画到了 $y=98$(差2像素),矢量算法会强制将竖线延长,直到撞上横线。
+    2.  **结构规整**:重绘出的表格一定是横平竖直的,消除了手写或扫描带来的边缘抖动。
+
+### 结论与建议
+
+目前我们尝试的 `enhance_box_line` (膨胀) 依然属于 **方案A** 的修补措施。如果您的业务场景中包含大量打印不清、有轻微断线的表格,**方案B 是终极解决方案**。

+ 5 - 3
ocr_tools/universal_doc_parser/config/bank_statement_wired_unet.yaml

@@ -38,12 +38,16 @@ table_recognition_wired:
   col_threshold: 15
   col_threshold: 15
   ocr_conf_threshold: 0.5
   ocr_conf_threshold: 0.5
   cell_crop_margin: 2
   cell_crop_margin: 2
+  use_custom_postprocess: true  # 是否使用自定义后处理(默认启用)
+
+  # 是否启用倾斜矫正
+  enable_deskew: true
 
 
   # Debug 可视化配置(与 MinerUWiredTableRecognizer.DebugOptions 对齐)
   # Debug 可视化配置(与 MinerUWiredTableRecognizer.DebugOptions 对齐)
   # 默认关闭。开启后将保存:表格线、连通域、逻辑网格结构、文本覆盖可视化。
   # 默认关闭。开启后将保存:表格线、连通域、逻辑网格结构、文本覆盖可视化。
   debug_options:
   debug_options:
     enabled: true               # 是否开启调试可视化输出
     enabled: true               # 是否开启调试可视化输出
-    output_dir: "/Users/zhch158/workspace/repository.git/ocr_platform/ocr_tools/universal_doc_parser/output"             # 调试输出目录;null不输出
+    output_dir: null             # 调试输出目录;null不输出
     save_table_lines: true       # 保存表格线可视化(unet横线/竖线叠加)
     save_table_lines: true       # 保存表格线可视化(unet横线/竖线叠加)
     save_connected_components: true  # 保存连通域提取的单元格图
     save_connected_components: true  # 保存连通域提取的单元格图
     save_grid_structure: true    # 保存逻辑网格结构(row/col/rowspan/colspan)
     save_grid_structure: true    # 保存逻辑网格结构(row/col/rowspan/colspan)
@@ -65,5 +69,3 @@ output:
   save_enhanced_json: true
   save_enhanced_json: true
   coordinate_precision: 2
   coordinate_precision: 2
   normalize_numbers: true
   normalize_numbers: true
-
-use_custom_postprocess: true  # 是否使用自定义后处理(默认启用)

+ 31 - 11
ocr_tools/universal_doc_parser/core/coordinate_utils.py

@@ -197,28 +197,48 @@ class CoordinateUtils:
     # ==================== 图像裁剪 ====================
     # ==================== 图像裁剪 ====================
     
     
     @staticmethod
     @staticmethod
-    def crop_region(image: np.ndarray, bbox: List[float]) -> np.ndarray:
+    def crop_region(image: np.ndarray, bbox: List[float], padding: int = 0) -> np.ndarray:
         """
         """
         裁剪图像区域
         裁剪图像区域
-        
+
         Args:
         Args:
             image: 原始图像
             image: 原始图像
             bbox: 裁剪区域 [x1, y1, x2, y2]
             bbox: 裁剪区域 [x1, y1, x2, y2]
-            
+            padding: 边缘padding(像素),可以为正数(扩展裁剪区域)或负数(收缩裁剪区域)
+
         Returns:
         Returns:
             裁剪后的图像
             裁剪后的图像
         """
         """
         if len(bbox) < 4:
         if len(bbox) < 4:
             return image
             return image
-        
-        x1, y1, x2, y2 = map(int, bbox[:4])
+
         h, w = image.shape[:2]
         h, w = image.shape[:2]
-        
-        x1 = max(0, min(x1, w))
-        y1 = max(0, min(y1, h))
-        x2 = max(x1, min(x2, w))
-        y2 = max(y1, min(y2, h))
-        
+
+        # 解析padding(支持单个值或四个值)
+        if isinstance(padding, (int, float)):
+            pad_left = pad_right = pad_top = pad_bottom = int(padding)
+        else:
+            # 假设是长度为4的元组/列表 [left, top, right, bottom]
+            if len(padding) >= 4:
+                pad_left, pad_top, pad_right, pad_bottom = [int(p) for p in padding[:4]]
+            else:
+                pad_left = pad_top = pad_right = pad_bottom = 0
+
+        x1 = max(0 - pad_left, int(bbox[0]) - pad_left)
+        y1 = max(0 - pad_top, int(bbox[1]) - pad_top)
+        x2 = min(w + pad_right, int(bbox[2]) + pad_right)
+        y2 = min(h + pad_bottom, int(bbox[3]) + pad_bottom)
+
+        # 确保坐标有效
+        x1 = max(0, x1)
+        y1 = max(0, y1)
+        x2 = min(w, x2)
+        y2 = min(h, y2)
+
+        # 检查是否有效区域
+        if x2 <= x1 or y2 <= y1:
+            return image
+
         return image[y1:y2, x1:x2]
         return image[y1:y2, x1:x2]
     
     
     @staticmethod
     @staticmethod

+ 84 - 29
ocr_tools/universal_doc_parser/core/element_processors.py

@@ -183,7 +183,7 @@ class ElementProcessors:
         image: np.ndarray,
         image: np.ndarray,
         bbox: List[float],
         bbox: List[float],
         pre_matched_spans: Optional[List[Dict[str, Any]]] = None
         pre_matched_spans: Optional[List[Dict[str, Any]]] = None
-    ) -> Tuple[np.ndarray, List[Dict[str, Any]], int, str]:
+    ) -> Tuple[np.ndarray, List[Dict[str, Any]], int, str, int]:
         """
         """
         表格OCR预处理(共享逻辑)
         表格OCR预处理(共享逻辑)
         
         
@@ -195,10 +195,19 @@ class ElementProcessors:
             pre_matched_spans: 预匹配的 OCR spans
             pre_matched_spans: 预匹配的 OCR spans
             
             
         Returns:
         Returns:
-            (cropped_table, ocr_boxes, table_angle, ocr_source)
+            (cropped_table, ocr_boxes, table_angle, ocr_source, crop_padding)
             其中 cropped_table 已经过方向检测和旋转处理
             其中 cropped_table 已经过方向检测和旋转处理
+            crop_padding: 裁剪时添加的 padding 值
         """
         """
-        cropped_table = CoordinateUtils.crop_region(image, bbox)
+        # 计算表格区域尺寸,用于确定合适的padding
+        table_width = bbox[2] - bbox[0]
+        table_height = bbox[3] - bbox[1]
+
+        # 为倾斜图片添加padding,确保角落内容不被切掉
+        # padding = 表格宽度的1% + 表格高度的1%,最小20像素
+        crop_padding = max(20, int(min(table_width, table_height) * 0.01))
+
+        cropped_table = CoordinateUtils.crop_region(image, bbox, padding=crop_padding)
         table_angle = 0
         table_angle = 0
         
         
         # 1. 表格方向检测
         # 1. 表格方向检测
@@ -214,16 +223,42 @@ class ElementProcessors:
         ocr_boxes = []
         ocr_boxes = []
         ocr_source = "none"
         ocr_source = "none"
         
         
+        # 计算裁剪后图像的起始坐标(考虑 padding)
+        # 裁剪后图像的 (0, 0) 对应原图的 (bbox[0] - crop_padding, bbox[1] - crop_padding)
+        cropped_offset_x = bbox[0] - crop_padding
+        cropped_offset_y = bbox[1] - crop_padding
+        
         if pre_matched_spans and len(pre_matched_spans) > 0 and table_angle == 0:
         if pre_matched_spans and len(pre_matched_spans) > 0 and table_angle == 0:
             # 使用整页 OCR 的结果
             # 使用整页 OCR 的结果
             for idx, span in enumerate(pre_matched_spans):
             for idx, span in enumerate(pre_matched_spans):
+                # 优先使用 poly 数据,如果没有才使用 bbox
+                span_poly = span.get('poly', [])
                 span_bbox = span.get('bbox', [])
                 span_bbox = span.get('bbox', [])
-                if span_bbox:
+                
+                if span_poly:
+                    # 如果有 poly 数据,转换为相对于裁剪后图像的坐标(考虑 padding)
+                    if isinstance(span_poly[0], (list, tuple)) and len(span_poly) >= 4:
+                        # 转换为相对坐标(相对于裁剪后图像的 (0, 0))
+                        relative_poly = [
+                            [float(p[0]) - cropped_offset_x, float(p[1]) - cropped_offset_y]
+                            for p in span_poly[:4]
+                        ]
+                        formatted_box = CoordinateUtils.convert_ocr_to_matcher_format(
+                            relative_poly,
+                            span.get('text', ''),
+                            span.get('confidence', 0.0),
+                            idx,
+                            table_bbox=None
+                        )
+                        if formatted_box:
+                            ocr_boxes.append(formatted_box)
+                elif span_bbox and len(span_bbox) >= 4:
+                    # 兜底:使用 bbox 数据,转换为相对于裁剪后图像的坐标(考虑 padding)
                     relative_bbox = [
                     relative_bbox = [
-                        span_bbox[0] - bbox[0],
-                        span_bbox[1] - bbox[1],
-                        span_bbox[2] - bbox[0],
-                        span_bbox[3] - bbox[1]
+                        span_bbox[0] - cropped_offset_x,
+                        span_bbox[1] - cropped_offset_y,
+                        span_bbox[2] - cropped_offset_x,
+                        span_bbox[3] - cropped_offset_y
                     ]
                     ]
                     formatted_box = CoordinateUtils.convert_ocr_to_matcher_format(
                     formatted_box = CoordinateUtils.convert_ocr_to_matcher_format(
                         relative_bbox,
                         relative_bbox,
@@ -245,7 +280,8 @@ class ElementProcessors:
                 ocr_results = self.ocr_recognizer.recognize_text(cropped_table)
                 ocr_results = self.ocr_recognizer.recognize_text(cropped_table)
                 if ocr_results:
                 if ocr_results:
                     for idx, item in enumerate(ocr_results):
                     for idx, item in enumerate(ocr_results):
-                        ocr_poly = item.get('bbox', [])
+                        # 优先使用 poly,没有才用 bbox
+                        ocr_poly = item.get('poly', item.get('bbox', []))
                         if ocr_poly:
                         if ocr_poly:
                             formatted_box = CoordinateUtils.convert_ocr_to_matcher_format(
                             formatted_box = CoordinateUtils.convert_ocr_to_matcher_format(
                                 ocr_poly, 
                                 ocr_poly, 
@@ -272,16 +308,18 @@ class ElementProcessors:
                 ocr_source = "cropped_ocr"
                 ocr_source = "cropped_ocr"
                 logger.info(f"📊 OCR detected {len(ocr_boxes)} text boxes in table (cropped)")
                 logger.info(f"📊 OCR detected {len(ocr_boxes)} text boxes in table (cropped)")
             except Exception as e:
             except Exception as e:
-                logger.warning(f"Table OCR detection failed: {e}")
+                logger.warning(f"Table OCR failed: {e}")
         
         
-        return cropped_table, ocr_boxes, table_angle, ocr_source
+        return cropped_table, ocr_boxes, table_angle, ocr_source, crop_padding
     
     
     def process_table_element_wired(
     def process_table_element_wired(
         self,
         self,
         image: np.ndarray,
         image: np.ndarray,
         layout_item: Dict[str, Any],
         layout_item: Dict[str, Any],
         scale: float,
         scale: float,
-        pre_matched_spans: Optional[List[Dict[str, Any]]] = None
+        pre_matched_spans: Optional[List[Dict[str, Any]]] = None,
+        output_dir: Optional[str] = None,
+        basename: Optional[str] = None
     ) -> Dict[str, Any]:
     ) -> Dict[str, Any]:
         """
         """
         使用 UNet 有线表格识别处理表格元素
         使用 UNet 有线表格识别处理表格元素
@@ -302,8 +340,8 @@ class ElementProcessors:
         """
         """
         bbox = layout_item.get('bbox', [0, 0, 0, 0])
         bbox = layout_item.get('bbox', [0, 0, 0, 0])
         
         
-        # OCR 预处理(返回已旋转的表格图片 + OCR 框)
-        cropped_table, ocr_boxes, table_angle, ocr_source = \
+        # OCR 预处理(返回已旋转的表格图片 + OCR 框 + padding
+        cropped_table, ocr_boxes, table_angle, ocr_source, crop_padding = \
             self._prepare_table_ocr(image, bbox, pre_matched_spans)
             self._prepare_table_ocr(image, bbox, pre_matched_spans)
         
         
         # 获取裁剪后表格图片的尺寸
         # 获取裁剪后表格图片的尺寸
@@ -318,10 +356,19 @@ class ElementProcessors:
             if not self.wired_table_recognizer:
             if not self.wired_table_recognizer:
                 raise RuntimeError("Wired table recognizer not available")
                 raise RuntimeError("Wired table recognizer not available")
             
             
+            # 构造调试选项覆盖
+            debug_opts_override = {}
+            if output_dir:
+                debug_opts_override['output_dir'] = output_dir
+            if basename:
+                # 使用完整 basename 作为前缀 (如 "filename_page_001")
+                debug_opts_override['prefix'] = basename
+
             wired_res = self.wired_table_recognizer.recognize(
             wired_res = self.wired_table_recognizer.recognize(
                 table_image=cropped_table,
                 table_image=cropped_table,
                 # ocr_boxes=ocr_boxes_for_wired,
                 # ocr_boxes=ocr_boxes_for_wired,
                 ocr_boxes=ocr_boxes,
                 ocr_boxes=ocr_boxes,
+                debug_options=debug_opts_override
             )
             )
             
             
             if not (wired_res.get('html') or wired_res.get('cells')):
             if not (wired_res.get('html') or wired_res.get('cells')):
@@ -337,26 +384,29 @@ class ElementProcessors:
             return self._create_empty_table_result(layout_item, bbox, table_angle, ocr_source)
             return self._create_empty_table_result(layout_item, bbox, table_angle, ocr_source)
         
         
         # 坐标转换:将旋转后的坐标转换回原图坐标
         # 坐标转换:将旋转后的坐标转换回原图坐标
+        # 计算正确的偏移量:裁剪后图像的 (0, 0) 对应原图的 (bbox[0] - crop_padding, bbox[1] - crop_padding)
+        cropped_offset_bbox = [bbox[0] - crop_padding, bbox[1] - crop_padding, bbox[2] + crop_padding, bbox[3] + crop_padding]
+        
         if table_angle != 0 and MERGER_AVAILABLE:
         if table_angle != 0 and MERGER_AVAILABLE:
             cells, enhanced_html = CoordinateUtils.inverse_rotate_table_coords(
             cells, enhanced_html = CoordinateUtils.inverse_rotate_table_coords(
                 cells=cells,
                 cells=cells,
                 html=enhanced_html,
                 html=enhanced_html,
                 rotation_angle=table_angle,
                 rotation_angle=table_angle,
                 orig_table_size=orig_table_size,
                 orig_table_size=orig_table_size,
-                table_bbox=bbox
+                table_bbox=cropped_offset_bbox
             )
             )
             ocr_boxes = CoordinateUtils.inverse_rotate_ocr_boxes(
             ocr_boxes = CoordinateUtils.inverse_rotate_ocr_boxes(
                 ocr_boxes=ocr_boxes,
                 ocr_boxes=ocr_boxes,
                 rotation_angle=table_angle,
                 rotation_angle=table_angle,
                 orig_table_size=orig_table_size,
                 orig_table_size=orig_table_size,
-                table_bbox=bbox
+                table_bbox=cropped_offset_bbox
             )
             )
             logger.info(f"📐 Wired table coordinates transformed back to original image")
             logger.info(f"📐 Wired table coordinates transformed back to original image")
         else:
         else:
-            # 没有旋转,只需要加上表格偏移量
-            cells = CoordinateUtils.add_table_offset_to_cells(cells, bbox)
-            enhanced_html = CoordinateUtils.add_table_offset_to_html(enhanced_html, bbox)
-            ocr_boxes = CoordinateUtils.add_table_offset_to_ocr_boxes(ocr_boxes, bbox)
+            # 没有旋转,使用正确的偏移量(考虑 padding)
+            cells = CoordinateUtils.add_table_offset_to_cells(cells, cropped_offset_bbox)
+            enhanced_html = CoordinateUtils.add_table_offset_to_html(enhanced_html, cropped_offset_bbox)
+            ocr_boxes = CoordinateUtils.add_table_offset_to_ocr_boxes(ocr_boxes, cropped_offset_bbox)
         
         
         return {
         return {
             'type': 'table',
             'type': 'table',
@@ -401,8 +451,8 @@ class ElementProcessors:
         """
         """
         bbox = layout_item.get('bbox', [0, 0, 0, 0])
         bbox = layout_item.get('bbox', [0, 0, 0, 0])
         
         
-        # OCR 预处理(返回已旋转的表格图片 + OCR 框)
-        cropped_table, ocr_boxes, table_angle, ocr_source = \
+        # OCR 预处理(返回已旋转的表格图片 + OCR 框 + padding
+        cropped_table, ocr_boxes, table_angle, ocr_source, crop_padding = \
             self._prepare_table_ocr(image, bbox, pre_matched_spans)
             self._prepare_table_ocr(image, bbox, pre_matched_spans)
         
         
         # 获取裁剪后表格图片的尺寸
         # 获取裁剪后表格图片的尺寸
@@ -429,37 +479,42 @@ class ElementProcessors:
         
         
         if table_html and ocr_boxes and self.table_cell_matcher:
         if table_html and ocr_boxes and self.table_cell_matcher:
             try:
             try:
+                # table_bbox 参数是相对于裁剪后图像的,OCR 框已经是相对于裁剪后图像的
+                # 使用裁剪后图像的实际尺寸
                 enhanced_html, cells, _, skew_angle = self.table_cell_matcher.enhance_table_html_with_bbox(
                 enhanced_html, cells, _, skew_angle = self.table_cell_matcher.enhance_table_html_with_bbox(
                     html=table_html,
                     html=table_html,
                     paddle_text_boxes=ocr_boxes,
                     paddle_text_boxes=ocr_boxes,
                     start_pointer=0,
                     start_pointer=0,
-                    table_bbox=[0, 0, bbox[2] - bbox[0], bbox[3] - bbox[1]]
+                    table_bbox=[0, 0, orig_table_w, orig_table_h]
                 )
                 )
                 logger.info(f"📊 Matched {len(cells)} cells with coordinates (skew: {skew_angle:.2f}°)")
                 logger.info(f"📊 Matched {len(cells)} cells with coordinates (skew: {skew_angle:.2f}°)")
             except Exception as e:
             except Exception as e:
                 logger.warning(f"Cell coordinate matching failed: {e}")
                 logger.warning(f"Cell coordinate matching failed: {e}")
         
         
         # 坐标转换:将旋转后的坐标转换回原图坐标
         # 坐标转换:将旋转后的坐标转换回原图坐标
+        # 计算正确的偏移量:裁剪后图像的 (0, 0) 对应原图的 (bbox[0] - crop_padding, bbox[1] - crop_padding)
+        cropped_offset_bbox = [bbox[0] - crop_padding, bbox[1] - crop_padding, bbox[2] + crop_padding, bbox[3] + crop_padding]
+        
         if table_angle != 0 and MERGER_AVAILABLE:
         if table_angle != 0 and MERGER_AVAILABLE:
             cells, enhanced_html = CoordinateUtils.inverse_rotate_table_coords(
             cells, enhanced_html = CoordinateUtils.inverse_rotate_table_coords(
                 cells=cells,
                 cells=cells,
                 html=enhanced_html,
                 html=enhanced_html,
                 rotation_angle=table_angle,
                 rotation_angle=table_angle,
                 orig_table_size=orig_table_size,
                 orig_table_size=orig_table_size,
-                table_bbox=bbox
+                table_bbox=cropped_offset_bbox
             )
             )
             ocr_boxes = CoordinateUtils.inverse_rotate_ocr_boxes(
             ocr_boxes = CoordinateUtils.inverse_rotate_ocr_boxes(
                 ocr_boxes=ocr_boxes,
                 ocr_boxes=ocr_boxes,
                 rotation_angle=table_angle,
                 rotation_angle=table_angle,
                 orig_table_size=orig_table_size,
                 orig_table_size=orig_table_size,
-                table_bbox=bbox
+                table_bbox=cropped_offset_bbox
             )
             )
             logger.info(f"📐 VLM table coordinates transformed back to original image")
             logger.info(f"📐 VLM table coordinates transformed back to original image")
         else:
         else:
-            # 没有旋转,只需要加上表格偏移量
-            cells = CoordinateUtils.add_table_offset_to_cells(cells, bbox)
-            enhanced_html = CoordinateUtils.add_table_offset_to_html(enhanced_html, bbox)
-            ocr_boxes = CoordinateUtils.add_table_offset_to_ocr_boxes(ocr_boxes, bbox)
+            # 没有旋转,使用正确的偏移量(考虑 padding)
+            cells = CoordinateUtils.add_table_offset_to_cells(cells, cropped_offset_bbox)
+            enhanced_html = CoordinateUtils.add_table_offset_to_html(enhanced_html, cropped_offset_bbox)
+            ocr_boxes = CoordinateUtils.add_table_offset_to_ocr_boxes(ocr_boxes, cropped_offset_bbox)
         
         
         return {
         return {
             'type': 'table',
             'type': 'table',

+ 0 - 2
ocr_tools/universal_doc_parser/core/layout_utils.py

@@ -489,14 +489,12 @@ class SpanMatcher:
                 continue
                 continue
             
             
             bbox1 = span1.get('bbox', [0, 0, 0, 0])
             bbox1 = span1.get('bbox', [0, 0, 0, 0])
-            bbox1 = CoordinateUtils.poly_to_bbox(bbox1)
             
             
             for j in range(i + 1, len(spans)):
             for j in range(i + 1, len(spans)):
                 if j in removed:
                 if j in removed:
                     continue
                     continue
                 
                 
                 bbox2 = spans[j].get('bbox', [0, 0, 0, 0])
                 bbox2 = spans[j].get('bbox', [0, 0, 0, 0])
-                bbox2 = CoordinateUtils.poly_to_bbox(bbox2)
                 
                 
                 iou = CoordinateUtils.calculate_iou(bbox1, bbox2)
                 iou = CoordinateUtils.calculate_iou(bbox1, bbox2)
                 
                 

+ 21 - 12
ocr_tools/universal_doc_parser/core/pipeline_manager_v2.py

@@ -57,6 +57,7 @@ except ImportError:
     TableCellMatcher = None
     TableCellMatcher = None
     TextMatcher = None
     TextMatcher = None
 
 
+from ocr_utils.bbox_utils import BBoxExtractor
 
 
 class EnhancedDocPipeline:
 class EnhancedDocPipeline:
     """增强版文档处理流水线"""
     """增强版文档处理流水线"""
@@ -184,7 +185,8 @@ class EnhancedDocPipeline:
     def process_document(
     def process_document(
         self, 
         self, 
         document_path: str,
         document_path: str,
-        page_range: Optional[str] = None
+        page_range: Optional[str] = None,
+        output_dir: Optional[str] = None
     ) -> Dict[str, Any]:
     ) -> Dict[str, Any]:
         """
         """
         处理文档主流程
         处理文档主流程
@@ -232,7 +234,8 @@ class EnhancedDocPipeline:
                     page_idx=page_idx,
                     page_idx=page_idx,
                     pdf_type=pdf_type,
                     pdf_type=pdf_type,
                     pdf_doc=pdf_doc,
                     pdf_doc=pdf_doc,
-                    page_name=page_name
+                    page_name=page_name,
+                    output_dir=output_dir,
                 )
                 )
                 results['pages'].append(page_result)
                 results['pages'].append(page_result)
             
             
@@ -251,13 +254,14 @@ class EnhancedDocPipeline:
             raise
             raise
     
     
     def _process_single_page(
     def _process_single_page(
-        self,
-        image_dict: Dict[str, Any],
-        page_idx: int,
-        pdf_type: str,
-        pdf_doc: Optional[Any] = None,
-        page_name: Optional[str] = None
-    ) -> Dict[str, Any]:
+            self,
+            image_dict: Dict[str, Any],
+            page_idx: int,
+            pdf_type: str,
+            pdf_doc: Optional[Any] = None,
+            page_name: Optional[str] = None,
+            output_dir: Optional[str] = None
+        ) -> Dict[str, Any]:
         """
         """
         处理单页文档
         处理单页文档
         
         
@@ -368,7 +372,9 @@ class EnhancedDocPipeline:
             page_idx=page_idx,
             page_idx=page_idx,
             scale=scale,
             scale=scale,
             matched_spans=matched_spans,
             matched_spans=matched_spans,
-            layout_results=layout_results
+            layout_results=layout_results,
+            output_dir=output_dir,
+            basename=page_name
         )
         )
         
         
         # 7. 按阅读顺序排序
         # 7. 按阅读顺序排序
@@ -508,7 +514,9 @@ class EnhancedDocPipeline:
         page_idx: int,
         page_idx: int,
         scale: float,
         scale: float,
         matched_spans: Optional[Dict[int, List[Dict[str, Any]]]] = None,
         matched_spans: Optional[Dict[int, List[Dict[str, Any]]]] = None,
-        layout_results: Optional[List[Dict[str, Any]]] = None
+        layout_results: Optional[List[Dict[str, Any]]] = None,
+        output_dir: Optional[str] = None,
+        basename: Optional[str] = None,
     ) -> tuple:
     ) -> tuple:
         """
         """
         处理所有分类后的元素
         处理所有分类后的元素
@@ -596,7 +604,8 @@ class EnhancedDocPipeline:
                     # 有线表格路径:UNet 识别
                     # 有线表格路径:UNet 识别
                     logger.info(f"🔷 Using wired UNet table recognition (configured)")
                     logger.info(f"🔷 Using wired UNet table recognition (configured)")
                     element = self.element_processors.process_table_element_wired(
                     element = self.element_processors.process_table_element_wired(
-                        detection_image, item, scale, pre_matched_spans=spans
+                        detection_image, item, scale, pre_matched_spans=spans,
+                        output_dir=output_dir, basename=basename
                     )
                     )
                     
                     
                     # 如果有线识别失败(返回空 HTML),fallback 到 VLM
                     # 如果有线识别失败(返回空 HTML),fallback 到 VLM

+ 2 - 1
ocr_tools/universal_doc_parser/core/pipeline_manager_v2_streaming.py

@@ -157,7 +157,8 @@ class StreamingDocPipeline(EnhancedDocPipeline):
                     page_idx=page_idx,
                     page_idx=page_idx,
                     pdf_type=pdf_type,
                     pdf_type=pdf_type,
                     pdf_doc=pdf_doc,
                     pdf_doc=pdf_doc,
-                    page_name=page_name
+                    page_name=page_name,
+                    output_dir=self.output_dir
                 )
                 )
                 
                 
                 # 立即保存该页结果(使用 OutputFormatterV2 的方法,保持输出一致)
                 # 立即保存该页结果(使用 OutputFormatterV2 的方法,保持输出一致)

+ 14 - 7
ocr_tools/universal_doc_parser/main_v2.py

@@ -402,25 +402,31 @@ if __name__ == "__main__":
             # "input": "/Users/zhch158/workspace/data/流水分析/康强_北京农村商业银行.pdf",
             # "input": "/Users/zhch158/workspace/data/流水分析/康强_北京农村商业银行.pdf",
             # "output_dir": "./output/康强_北京农村商业银行_bank_statement_v2",
             # "output_dir": "./output/康强_北京农村商业银行_bank_statement_v2",
 
 
-            # "input": "/Users/zhch158/workspace/data/流水分析/2023年度报告母公司/mineru_vllm_results/2023年度报告母公司/2023年度报告母公司_page_003.png",
-            # "output_dir": "./output/2023年度报告母公司_bank_statement_v2",
+            "input": "/Users/zhch158/workspace/repository.git/ocr_platform/ocr_tools/universal_doc_parser/tests/A用户_单元格扫描流水_page_002.png",
+            "output_dir": "./output/A用户_单元格扫描流水_bank_statement_wired_unet",
             
             
             # "input": "/Users/zhch158/workspace/data/流水分析/B用户_扫描流水.pdf",
             # "input": "/Users/zhch158/workspace/data/流水分析/B用户_扫描流水.pdf",
             # "output_dir": "/Users/zhch158/workspace/data/流水分析/B用户_扫描流水/bank_statement_yusys_v2",
             # "output_dir": "/Users/zhch158/workspace/data/流水分析/B用户_扫描流水/bank_statement_yusys_v2",
 
 
-            # "input": "/Users/zhch158/workspace/repository.git/ocr_platform/ocr_tools/universal_doc_parser/tests/2023年度报告母公司_page_006_270.png",
+            # "input": "/Users/zhch158/workspace/repository.git/ocr_platform/ocr_tools/universal_doc_parser/tests/2023年度报告母公司_page_005.png",
+            # "input": "/Users/zhch158/workspace/repository.git/ocr_platform/ocr_tools/universal_doc_parser/tests/2023年度报告母公司_page_003_270.png",
+            # "input": "/Users/zhch158/workspace/repository.git/ocr_platform/ocr_tools/universal_doc_parser/tests/2023年度报告母公司_page_003_270_skew(-0.4).png",
             # "output_dir": "./output/2023年度报告母公司/bank_statement_wired_unet",
             # "output_dir": "./output/2023年度报告母公司/bank_statement_wired_unet",
+
             # "input": "/Users/zhch158/workspace/data/流水分析/2023年度报告母公司.pdf",
             # "input": "/Users/zhch158/workspace/data/流水分析/2023年度报告母公司.pdf",
+            # "output_dir": "/Users/zhch158/workspace/data/流水分析/2023年度报告母公司/bank_statement_wired_unet",
             # "output_dir": "/Users/zhch158/workspace/data/流水分析/2023年度报告母公司/bank_statement_yusys_v2",
             # "output_dir": "/Users/zhch158/workspace/data/流水分析/2023年度报告母公司/bank_statement_yusys_v2",
 
 
-            "input": "/Users/zhch158/workspace/repository.git/ocr_platform/ocr_tools/universal_doc_parser/tests/600916_中国黄金_2022年报_page_096.png",
-            "output_dir": "./output/600916_中国黄金_2022年报/bank_statement_wired_unet",
+            # "input": "/Users/zhch158/workspace/repository.git/ocr_platform/ocr_tools/universal_doc_parser/tests/600916_中国黄金_2022年报_page_096.png",
+            # "output_dir": "./output/600916_中国黄金_2022年报/bank_statement_wired_unet",
+            # "input": "/Users/zhch158/workspace/data/流水分析/600916_中国黄金_2022年报.pdf",
+            # "output_dir": "./output/600916_中国黄金_2022年报/bank_statement_wired_unet",
 
 
             # "input": "/Users/zhch158/workspace/data/流水分析/施博深.pdf",
             # "input": "/Users/zhch158/workspace/data/流水分析/施博深.pdf",
             # "output_dir": "/Users/zhch158/workspace/data/流水分析/施博深/bank_statement_yusys_v2",
             # "output_dir": "/Users/zhch158/workspace/data/流水分析/施博深/bank_statement_yusys_v2",
 
 
-            # "input": "/Users/zhch158/workspace/data/流水分析/施博深.wiredtable/施博深_page_001.png",
-            # "output_dir": "./output/施博深_page_001_bank_statement_wired_unet",
+            # "input": "/Users/zhch158/workspace/data/流水分析/施博深.wiredtable/施博深_page_020.png",
+            # "output_dir": "./output/施博深/bank_statement_wired_unet",
 
 
             # "input": "/Users/zhch158/workspace/data/流水分析/施博深.wiredtable",
             # "input": "/Users/zhch158/workspace/data/流水分析/施博深.wiredtable",
             # "output_dir": "/Users/zhch158/workspace/data/流水分析/施博深/bank_statement_wired_unet",
             # "output_dir": "/Users/zhch158/workspace/data/流水分析/施博深/bank_statement_wired_unet",
@@ -436,6 +442,7 @@ if __name__ == "__main__":
             # 页面范围(可选)
             # 页面范围(可选)
             # "pages": "6",  # 只处理前1页
             # "pages": "6",  # 只处理前1页
             # "pages": "1-3,5,7-10",  # 处理指定页面
             # "pages": "1-3,5,7-10",  # 处理指定页面
+            # "pages": "83-109",  # 处理指定页面
 
 
             "streaming": True,
             "streaming": True,
 
 

+ 91 - 3
ocr_tools/universal_doc_parser/models/adapters/docling_layout_adapter.py

@@ -15,6 +15,7 @@
 import cv2
 import cv2
 import numpy as np
 import numpy as np
 import threading
 import threading
+import os
 from pathlib import Path
 from pathlib import Path
 from typing import Dict, List, Union, Any, Optional
 from typing import Dict, List, Union, Any, Optional
 from PIL import Image
 from PIL import Image
@@ -127,9 +128,96 @@ class DoclingLayoutDetector(BaseLayoutDetector):
                 self._model_path = str(model_path)
                 self._model_path = str(model_path)
                 print(f"📂 Loading model from local path: {self._model_path}")
                 print(f"📂 Loading model from local path: {self._model_path}")
             else:
             else:
-                # 从 HuggingFace 下载
-                print(f"📥 Downloading model from HuggingFace: {model_dir}")
-                self._model_path = snapshot_download(repo_id=model_dir)
+                # HuggingFace 仓库 ID,先检查本地缓存
+                # 获取 HuggingFace 缓存目录
+                hf_home = os.environ.get('HF_HOME', None)
+                if hf_home:
+                    cache_dir = Path(hf_home) / "hub"
+                else:
+                    cache_dir = Path.home() / ".cache" / "huggingface" / "hub"
+                
+                # 将模型 ID 转换为缓存目录格式
+                # 例如: ds4sd/docling-layout-old -> models--ds4sd--docling-layout-old
+                repo_id_escaped = model_dir.replace("/", "--")
+                model_cache_dir = cache_dir / f"models--{repo_id_escaped}"
+                
+                # 先尝试从本地缓存加载(避免不必要的网络请求)
+                local_model_path = None
+                if model_cache_dir.exists() and model_cache_dir.is_dir():
+                    snapshots_dir = model_cache_dir / "snapshots"
+                    if snapshots_dir.exists():
+                        # 获取所有 snapshot 目录,按修改时间排序
+                        snapshots = sorted(
+                            [d for d in snapshots_dir.iterdir() if d.is_dir()],
+                            key=lambda x: x.stat().st_mtime,
+                            reverse=True
+                        )
+                        if snapshots:
+                            # 检查最新的 snapshot 是否完整
+                            latest_snapshot = snapshots[0]
+                            processor_config = latest_snapshot / "preprocessor_config.json"
+                            model_config = latest_snapshot / "config.json"
+                            safetensors_file = latest_snapshot / "model.safetensors"
+                            
+                            if processor_config.exists() and model_config.exists() and safetensors_file.exists():
+                                local_model_path = latest_snapshot
+                
+                if local_model_path:
+                    # 本地缓存存在且完整,直接使用(不进行网络请求)
+                    self._model_path = str(local_model_path)
+                    print(f"📂 Using local cached model: {self._model_path}")
+                    print(f"   (Skipping network check - model already cached)")
+                else:
+                    # 本地缓存不存在或不完整,尝试从 HuggingFace 下载或更新
+                    print(f"📥 Model not found in local cache, downloading from HuggingFace: {model_dir}")
+                    try:
+                        # snapshot_download 会自动检查本地缓存,如果存在且是最新的,不会重新下载
+                        # 只有在需要更新或首次下载时才会下载
+                        self._model_path = snapshot_download(repo_id=model_dir)
+                        print(f"✅ Model downloaded/updated: {self._model_path}")
+                    except Exception as e:
+                        # HuggingFace 访问失败,再次尝试查找本地缓存(可能之前检查时遗漏)
+                        print(f"⚠️ Failed to download from HuggingFace: {e}")
+                        print(f"🔍 Trying to find local cached model again...")
+                        
+                        if model_cache_dir.exists() and model_cache_dir.is_dir():
+                            snapshots_dir = model_cache_dir / "snapshots"
+                            if snapshots_dir.exists():
+                                snapshots = sorted(
+                                    [d for d in snapshots_dir.iterdir() if d.is_dir()],
+                                    key=lambda x: x.stat().st_mtime,
+                                    reverse=True
+                                )
+                                if snapshots:
+                                    local_model_path = snapshots[0]
+                                    processor_config = local_model_path / "preprocessor_config.json"
+                                    model_config = local_model_path / "config.json"
+                                    safetensors_file = local_model_path / "model.safetensors"
+                                    
+                                    if processor_config.exists() and model_config.exists() and safetensors_file.exists():
+                                        self._model_path = str(local_model_path)
+                                        print(f"✅ Found local cached model: {self._model_path}")
+                                    else:
+                                        raise FileNotFoundError(
+                                            f"Local cached model found but missing required files in {local_model_path}. "
+                                            f"Required: preprocessor_config.json, config.json, model.safetensors"
+                                        )
+                                else:
+                                    raise FileNotFoundError(
+                                        f"No snapshots found in {snapshots_dir}. "
+                                        f"Please download the model first or check your network connection."
+                                    )
+                            else:
+                                raise FileNotFoundError(
+                                    f"Cache directory exists but no snapshots found: {model_cache_dir}. "
+                                    f"Please download the model first or check your network connection."
+                                )
+                        else:
+                            raise FileNotFoundError(
+                                f"Model not found in local cache: {model_cache_dir}. "
+                                f"Please download the model first or check your network connection. "
+                                f"Original error: {e}"
+                            )
             
             
             # 检查必要文件
             # 检查必要文件
             processor_config = Path(self._model_path) / "preprocessor_config.json"
             processor_config = Path(self._model_path) / "preprocessor_config.json"

+ 11 - 4
ocr_tools/universal_doc_parser/models/adapters/mineru_adapter.py

@@ -7,11 +7,17 @@ from PIL import Image
 from loguru import logger
 from loguru import logger
 
 
 # 添加MinerU路径
 # 添加MinerU路径
-mineru_path = Path(__file__).parents[4] / "mineru"
-if str(mineru_path) not in sys.path:
-    sys.path.insert(0, str(mineru_path))
+# mineru_path = Path(__file__).parents[4] / "mineru"
+# if str(mineru_path) not in sys.path:
+#     sys.path.insert(0, str(mineru_path))
+
+# 添加 ocr_platform 根目录到 Python 路径(用于导入 ocr_utils)
+ocr_platform_root = Path(__file__).parents[4]  # adapters -> models -> universal_doc_parser -> ocr_tools -> ocr_platform 
+if str(ocr_platform_root) not in sys.path:
+    sys.path.insert(0, str(ocr_platform_root))
 
 
 from .base import BasePreprocessor, BaseLayoutDetector, BaseVLRecognizer, BaseOCRRecognizer
 from .base import BasePreprocessor, BaseLayoutDetector, BaseVLRecognizer, BaseOCRRecognizer
+from core.coordinate_utils import CoordinateUtils
 
 
 # 导入MinerU组件
 # 导入MinerU组件
 try:
 try:
@@ -490,7 +496,8 @@ class MinerUOCRRecognizer(BaseOCRRecognizer):
                 for item in ocr_results[0]:
                 for item in ocr_results[0]:
                     if len(item) >= 2 and len(item[1]) >= 2:
                     if len(item) >= 2 and len(item[1]) >= 2:
                         formatted_results.append({
                         formatted_results.append({
-                            'bbox': item[0],  # 坐标
+                            'bbox': CoordinateUtils.poly_to_bbox(item[0]),  # 坐标
+                            'poly': item[0],  # 多边形坐标
                             'text': item[1][0],  # 识别文本
                             'text': item[1][0],  # 识别文本
                             'confidence': item[1][1]  # 置信度
                             'confidence': item[1][1]  # 置信度
                         })
                         })

File diff suppressed because it is too large
+ 220 - 1086
ocr_tools/universal_doc_parser/models/adapters/mineru_wired_table.py


+ 32 - 0
ocr_tools/universal_doc_parser/models/adapters/wired_table/__init__.py

@@ -0,0 +1,32 @@
+"""
+有线表格识别子模块
+
+提供表格识别的各个功能模块:
+- 倾斜检测和矫正
+- 网格结构恢复
+- 文本填充
+- HTML生成
+- 可视化
+- OCR格式转换
+- 调试工具
+"""
+
+from .debug_utils import WiredTableDebugOptions, WiredTableDebugUtils
+from .ocr_formatter import OCRFormatter
+from .skew_detection import SkewDetector
+from .grid_recovery import GridRecovery
+from .text_filling import TextFiller
+from .html_generator import WiredTableHTMLGenerator
+from .visualization import WiredTableVisualizer
+
+__all__ = [
+    'WiredTableDebugOptions',
+    'WiredTableDebugUtils',
+    'OCRFormatter',
+    'SkewDetector',
+    'GridRecovery',
+    'TextFiller',
+    'WiredTableHTMLGenerator',
+    'WiredTableVisualizer',
+]
+

+ 107 - 0
ocr_tools/universal_doc_parser/models/adapters/wired_table/debug_utils.py

@@ -0,0 +1,107 @@
+"""
+有线表格识别调试工具模块
+
+提供调试选项管理和路径生成功能。
+"""
+from typing import Dict, Any, Optional
+from dataclasses import dataclass
+
+
+@dataclass
+class WiredTableDebugOptions:
+    """调试选项数据类"""
+    enabled: bool = False
+    output_dir: Optional[str] = None
+    save_table_lines: bool = False
+    save_connected_components: bool = False
+    save_grid_structure: bool = False
+    save_text_overlay: bool = False
+    image_format: str = "png"
+    prefix: str = ""
+
+
+class WiredTableDebugUtils:
+    """调试工具类"""
+    
+    @staticmethod
+    def merge_debug_options(
+        config: Dict[str, Any],
+        override: Optional[Dict[str, Any]] = None
+    ) -> WiredTableDebugOptions:
+        """
+        合并调试选项
+        
+        Args:
+            config: 配置字典
+            override: 覆盖选项字典
+            
+        Returns:
+            合并后的调试选项
+        """
+        debug_config = config.get("debug_options", {})
+        if not isinstance(debug_config, dict):
+            # 兼容旧配置:如果不是字典,尝试作为 boolean 或 fall back
+            debug_config = {}
+
+        opts = WiredTableDebugOptions(
+            enabled=bool(debug_config.get("enabled", False)),
+            output_dir=debug_config.get("output_dir"),
+            save_table_lines=bool(debug_config.get("save_table_lines", False)),
+            save_connected_components=bool(debug_config.get("save_connected_components", False)),
+            save_grid_structure=bool(debug_config.get("save_grid_structure", False)),
+            save_text_overlay=bool(debug_config.get("save_text_overlay", False)),
+            image_format=str(debug_config.get("image_format", "png")),
+            prefix=str(debug_config.get("prefix", "")),
+        )
+        
+        if override and isinstance(override, dict):
+            # 覆盖层允许临时启用或指定目录
+            for k, v in override.items():
+                if hasattr(opts, k):
+                    setattr(opts, k, v)
+        
+        return opts
+    
+    @staticmethod
+    def debug_is_on(
+        flag: str,
+        opts: Optional[WiredTableDebugOptions] = None
+    ) -> bool:
+        """
+        检查调试标志是否启用
+        
+        Args:
+            flag: 调试标志名称
+            opts: 调试选项(可选)
+            
+        Returns:
+            是否启用
+        """
+        if not opts or not opts.enabled:
+            return False
+        if not opts.output_dir:
+            return False
+        return bool(getattr(opts, flag, False))
+    
+    @staticmethod
+    def debug_path(
+        name: str,
+        opts: Optional[WiredTableDebugOptions] = None
+    ) -> Optional[str]:
+        """
+        生成调试文件路径
+        
+        Args:
+            name: 文件名(不含扩展名)
+            opts: 调试选项(可选)
+            
+        Returns:
+            完整文件路径,如果未启用则返回 None
+        """
+        if not opts or not opts.output_dir:
+            return None
+        
+        prefix = (opts.prefix + "_") if opts.prefix else ""
+        ext = opts.image_format or "png"
+        return f"{opts.output_dir}/{prefix}{name}.{ext}"
+

+ 537 - 0
ocr_tools/universal_doc_parser/models/adapters/wired_table/grid_recovery.py

@@ -0,0 +1,537 @@
+"""
+网格结构恢复模块
+
+提供从表格线提取单元格和恢复网格结构的功能。
+"""
+from typing import List, Dict
+import cv2
+import numpy as np
+from loguru import logger
+
+
+class GridRecovery:
+    """网格结构恢复工具类"""
+    
+    @staticmethod
+    def compute_cells_from_lines(
+        hpred_up: np.ndarray,
+        vpred_up: np.ndarray,
+        upscale: float = 1.0,
+        debug_dir: str = None,
+        debug_prefix: str = "",
+    ) -> List[List[float]]:
+        """
+        基于矢量重构的连通域分析 (Advanced Vector-based Recovery)
+        
+        策略 (自定义增强版):
+        1. 预处理:自适应形态学闭运算修复像素级断连
+        2. 提取矢量线段 (get_table_line)
+        3. 线段归并/连接 (adjust_lines)
+        4. 几何延长线段 (Custom final_adjust_lines with larger threshold)
+        5. 重绘Mask并进行连通域分析
+        
+        Args:
+            hpred_up: 横线预测mask(上采样后)
+            vpred_up: 竖线预测mask(上采样后)
+            upscale: 上采样比例
+            debug_dir: 调试输出目录 (Optional)
+            debug_prefix: 调试文件名前缀 (Optional)
+            
+        Returns:
+            单元格bbox列表 [[x1, y1, x2, y2], ...]
+        """
+        import numpy as np
+        import cv2
+        import math
+        import os
+        from loguru import logger
+        
+        # 尝试导入MinerU的工具函数 (仅导入基础提取函数)
+        try:
+            from mineru.model.table.rec.unet_table.utils_table_line_rec import (
+                get_table_line,
+                draw_lines,
+                adjust_lines
+            )
+        except ImportError:
+            import sys
+            logger.error("Could not import mineru utils. Please ensure MinerU is in python path.")
+            raise
+            
+        # --- Local Helper Functions for Robust Line Adjustment ---
+        # Ported and modified from MinerU to verify larger gaps
+        
+        def fit_line(p):
+            x1, y1 = p[0]
+            x2, y2 = p[1]
+            A = y2 - y1
+            B = x1 - x2
+            C = x2 * y1 - x1 * y2
+            return A, B, C
+
+        def point_line_cor(p, A, B, C):
+            x, y = p
+            r = A * x + B * y + C
+            return r
+
+        def dist_sqrt(p1, p2):
+             return np.sqrt((p1[0] - p2[0]) ** 2 + (p1[1] - p2[1]) ** 2)
+
+        def line_to_line(points1, points2, alpha=10, angle=30, max_len=None):
+            x1, y1, x2, y2 = points1
+            ox1, oy1, ox2, oy2 = points2
+            
+            # Calculate current line length
+            current_len = dist_sqrt((x1, y1), (x2, y2))
+            
+            # If we already exceeded max_len, don't extend further
+            if max_len is not None and current_len >= max_len:
+                return points1
+
+            # Dynamic Alpha based on CURRENT length (or capped by max extension per step)
+            # We maintain the "step limit" to avoid huge jumps, but rely on max_len for total size.
+            # effective_alpha = min(alpha, current_len) 
+            # (User previous logic: limit step to 1.0x length)
+            step_limit = current_len 
+            effective_alpha = min(alpha, step_limit)
+            
+            # Fit lines
+            xy = np.array([(x1, y1), (x2, y2)], dtype="float32")
+            A1, B1, C1 = fit_line(xy)
+            oxy = np.array([(ox1, oy1), (ox2, oy2)], dtype="float32")
+            A2, B2, C2 = fit_line(oxy)
+            
+            flag1 = point_line_cor(np.array([x1, y1], dtype="float32"), A2, B2, C2)
+            flag2 = point_line_cor(np.array([x2, y2], dtype="float32"), A2, B2, C2)
+
+            # 如果位于同一侧(没穿过),尝试延伸
+            if (flag1 > 0 and flag2 > 0) or (flag1 < 0 and flag2 < 0):
+                if (A1 * B2 - A2 * B1) != 0:
+                    # 计算交点
+                    x = (B1 * C2 - B2 * C1) / (A1 * B2 - A2 * B1)
+                    y = (A2 * C1 - A1 * C2) / (A1 * B2 - A2 * B1)
+                    p = (x, y)
+                    r0 = dist_sqrt(p, (x1, y1))
+                    r1 = dist_sqrt(p, (x2, y2))
+                    
+                    if min(r0, r1) < effective_alpha:
+                        # Check total length constraint
+                        if max_len is not None:
+                            # Estimate new length
+                            if r0 < r1: # Extending (x1,y1) -> p
+                                new_len = dist_sqrt(p, (x2, y2))
+                            else: # Extending (x2,y2) -> p
+                                new_len = dist_sqrt((x1, y1), p)
+                            
+                            if new_len > max_len:
+                                return points1
+
+                        if r0 < r1:
+                            k = abs((y2 - p[1]) / (x2 - p[0] + 1e-10))
+                            a = math.atan(k) * 180 / math.pi
+                            if a < angle or abs(90 - a) < angle:
+                                points1 = np.array([p[0], p[1], x2, y2], dtype="float32")
+                        else:
+                            k = abs((y1 - p[1]) / (x1 - p[0] + 1e-10))
+                            a = math.atan(k) * 180 / math.pi
+                            if a < angle or abs(90 - a) < angle:
+                                points1 = np.array([x1, y1, p[0], p[1]], dtype="float32")
+            return points1
+
+        def custom_final_adjust_lines(rowboxes, colboxes, alpha=50):
+            nrow = len(rowboxes)
+            ncol = len(colboxes)
+            
+            # Pre-calculate Max Allowed Lengths (Original Length * Multiplier)
+            # Multiplier = 2.0 means we allow the line to double in size, but not more.
+            # This effectively stops short noise from becoming page-height lines.
+            extension_multiplier = 3.0 
+            
+            row_max_lens = [dist_sqrt(b[:2], b[2:]) * extension_multiplier for b in rowboxes]
+            col_max_lens = [dist_sqrt(b[:2], b[2:]) * extension_multiplier for b in colboxes]
+            
+            for i in range(nrow):
+                for j in range(ncol):
+                    rowboxes[i] = line_to_line(rowboxes[i], colboxes[j], alpha=alpha, angle=30, max_len=row_max_lens[i])
+                    colboxes[j] = line_to_line(colboxes[j], rowboxes[i], alpha=alpha, angle=30, max_len=col_max_lens[j])
+            return rowboxes, colboxes
+            
+        def save_debug_image(step_name, img, is_lines=False, lines=None):
+            if debug_dir:
+                try:
+                    os.makedirs(debug_dir, exist_ok=True)
+                    name = f"{debug_prefix}_{step_name}.png" if debug_prefix else f"{step_name}.png"
+                    path = os.path.join(debug_dir, name)
+                    
+                    if is_lines and lines:
+                        # Draw lines on black background
+                        tmp = np.zeros(img.shape[:2], dtype=np.uint8)
+                        tmp = draw_lines(tmp, lines, color=255, lineW=2)
+                        cv2.imwrite(path, tmp)
+                    else:
+                        cv2.imwrite(path, img)
+                    logger.debug(f"Saved debug image: {path}")
+                except Exception as e:
+                    logger.warning(f"Failed to save debug image {step_name}: {e}")
+
+        # ---------------------------------------------------------
+
+        h, w = hpred_up.shape[:2]
+        
+        # 1. 预处理:二值化
+        _, h_bin = cv2.threshold(hpred_up, 127, 255, cv2.THRESH_BINARY)
+        _, v_bin = cv2.threshold(vpred_up, 127, 255, cv2.THRESH_BINARY)
+        
+        # 1.1 自适应形态学修复
+        hors_k = int(math.sqrt(w) * 1.2)
+        vert_k = int(math.sqrt(h) * 1.2)
+        hors_k = max(10, min(hors_k, 50))
+        vert_k = max(10, min(vert_k, 50))
+        
+        kernel_h = cv2.getStructuringElement(cv2.MORPH_RECT, (hors_k, 1))
+        kernel_v = cv2.getStructuringElement(cv2.MORPH_RECT, (1, vert_k))
+        
+        h_bin = cv2.morphologyEx(h_bin, cv2.MORPH_CLOSE, kernel_h, iterations=1)
+        v_bin = cv2.morphologyEx(v_bin, cv2.MORPH_CLOSE, kernel_v, iterations=1)
+        
+        # 2. 提取矢量线段
+        rowboxes = get_table_line(h_bin, axis=0, lineW=int(10))
+        colboxes = get_table_line(v_bin, axis=1, lineW=int(10))
+        
+        logger.debug(f"Initial lines -> Rows: {len(rowboxes)}, Cols: {len(colboxes)}")
+        
+        # Step 2 Debug
+        save_debug_image("step02_raw_vectors", h_bin, is_lines=True, lines=rowboxes + colboxes)
+        
+        # 3. 线段合并 (adjust_lines)
+        rboxes_row_ = adjust_lines(rowboxes, alph=100, angle=50)
+        rboxes_col_ = adjust_lines(colboxes, alph=15, angle=50)
+        
+        if rboxes_row_:
+            rowboxes += rboxes_row_
+        if rboxes_col_:
+            colboxes += rboxes_col_
+            
+        # Step 3 Debug
+        save_debug_image("step03_merged_vectors", h_bin, is_lines=True, lines=rowboxes + colboxes)
+        
+        # 3.5 过滤短线 (Noise Filtering)
+        # 在延长线段之前,过滤掉过短的线段(往往是噪声、文字下划线等)
+        # 阈值: min(w, h) * 0.02, 至少 20px
+        filter_threshold = max(20, min(w, h) * 0.02)
+        
+        def filter_short_lines(lines, thresh):
+            valid_lines = []
+            for line in lines:
+                x1, y1, x2, y2 = line
+                length = math.sqrt((x2-x1)**2 + (y2-y1)**2)
+                if length > thresh:
+                    valid_lines.append(line)
+            return valid_lines
+            
+        len_row_before = len(rowboxes)
+        len_col_before = len(colboxes)
+        
+        rowboxes = filter_short_lines(rowboxes, filter_threshold)
+        colboxes = filter_short_lines(colboxes, filter_threshold)
+        
+        if len(rowboxes) < len_row_before or len(colboxes) < len_col_before:
+            logger.info(f"Filtered short lines (thresh={filter_threshold:.1f}): Rows {len_row_before}->{len(rowboxes)}, Cols {len_col_before}->{len(colboxes)}")
+            # Optional: Save filtered state
+            save_debug_image("step03b_filtered_vectors", h_bin, is_lines=True, lines=rowboxes + colboxes)
+
+        # 4. 几何延长线段 (使用自定义的大阈值函数)
+        # alpha=w//20 动态阈值,或者固定给一个较大的值如 100
+        # 假设分辨率较大,100px的断连是需要被修复的
+        dynamic_alpha = max(50, int(min(w, h) * 0.05)) # 5% of min dimension
+        logger.info(f"Using dynamic alpha for line extension: {dynamic_alpha}")
+        
+        rowboxes, colboxes = custom_final_adjust_lines(rowboxes, colboxes, alpha=dynamic_alpha)
+        
+        # Step 4 Debug
+        save_debug_image("step04_extended_vectors", h_bin, is_lines=True, lines=rowboxes + colboxes)
+        
+        # 5. 重绘纯净Mask
+        line_mask = np.zeros((h, w), dtype=np.uint8)
+        # 线宽设为4,确保物理接触
+        line_mask = draw_lines(line_mask, rowboxes + colboxes, color=255, lineW=4)
+        
+        # Step 5a Debug (Before Dilation)
+        save_debug_image("step05a_rerasterized", line_mask)
+        
+        # 增强: 全局微膨胀
+        kernel_dilate = cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))
+        line_mask = cv2.dilate(line_mask, kernel_dilate, iterations=1)
+        
+        # Step 5b Debug (After Dilation)
+        save_debug_image("step05b_dilated", line_mask)
+        
+        # 6. 反转图像
+        inv_grid = cv2.bitwise_not(line_mask)
+        
+        # Step 6 Debug (Input to ConnectedComponents)
+        save_debug_image("step06_inverted_input", inv_grid)
+        
+        # 7. 连通域
+        num_labels, labels, stats, centroids = cv2.connectedComponentsWithStats(inv_grid, connectivity=8)
+        
+        bboxes = []
+        
+        # 8. 过滤
+        for i in range(1, num_labels):
+            x = stats[i, cv2.CC_STAT_LEFT]
+            y = stats[i, cv2.CC_STAT_TOP]
+            w_cell = stats[i, cv2.CC_STAT_WIDTH]
+            h_cell = stats[i, cv2.CC_STAT_HEIGHT]
+            area = stats[i, cv2.CC_STAT_AREA]
+            
+            if w_cell > w * 0.98 and h_cell > h * 0.98:
+                continue
+            if area < 50:
+                continue
+                
+            orig_h = h_cell / upscale
+            orig_w = w_cell / upscale
+            
+            if orig_h < 4.0 or orig_w < 4.0:
+                continue
+            
+            bboxes.append([
+                x / upscale,
+                y / upscale,
+                (x + w_cell) / upscale,
+                (y + h_cell) / upscale
+            ])
+        
+        bboxes.sort(key=lambda b: (int(b[1] / 10), b[0]))
+        
+        logger.info(f"矢量重构分析提取到 {len(bboxes)} 个单元格 (Dynamic Alpha: {dynamic_alpha})")
+        
+        return bboxes
+
+    @staticmethod
+    def find_grid_lines(coords: List[float], tolerance: float = 5.0, min_support: int = 2) -> List[float]:
+        """
+        聚类坐标点并筛选出高支持度的网格线
+        
+        Args:
+            coords: 坐标列表
+            tolerance: 容差(像素)
+            min_support: 最小支持度(至少有多少个坐标点对齐)
+            
+        Returns:
+            网格线坐标列表
+        """
+        if not coords:
+            return []
+        
+        coords.sort()
+        
+        # 1. 简单聚类
+        clusters = []
+        if coords:
+            curr_cluster = [coords[0]]
+            for x in coords[1:]:
+                if x - curr_cluster[-1] < tolerance:
+                    curr_cluster.append(x)
+                else:
+                    clusters.append(curr_cluster)
+                    curr_cluster = [x]
+            clusters.append(curr_cluster)
+        
+        # 2. 计算聚类中心和支持度
+        grid_lines = []
+        for cluster in clusters:
+            if len(cluster) >= min_support:
+                center = sum(cluster) / len(cluster)
+                grid_lines.append(center)
+        
+        return grid_lines
+    
+    @staticmethod
+    def recover_grid_structure(bboxes: List[List[float]]) -> List[Dict]:
+        """
+        从散乱的单元格 bbox 恢复表格的行列结构 (row, col, rowspan, colspan)
+        重构版:基于投影网格线 (Projected Grid Lines) 的算法
+        适用于行高差异巨大、存在密集小行的复杂表格
+        
+        Args:
+            bboxes: 单元格bbox列表
+            
+        Returns:
+            结构化单元格列表,包含 row, col, rowspan, colspan
+        """
+        if not bboxes:
+            return []
+        
+        # 1. 识别行分割线 (Y轴)
+        y_coords = []
+        for b in bboxes:
+            y_coords.append(b[1])
+            y_coords.append(b[3])
+        
+        row_dividers = GridRecovery.find_grid_lines(y_coords, tolerance=5, min_support=2)
+        
+        # 2. 识别列分割线 (X轴)
+        x_coords = []
+        for b in bboxes:
+            x_coords.append(b[0])
+            x_coords.append(b[2])
+        col_dividers = GridRecovery.find_grid_lines(x_coords, tolerance=5, min_support=2)
+        
+        # 3. 构建网格结构
+        structured_cells = []
+        
+        # 定义行区间
+        row_intervals = []
+        for i in range(len(row_dividers) - 1):
+            row_intervals.append({
+                "top": row_dividers[i],
+                "bottom": row_dividers[i+1],
+                "height": row_dividers[i+1] - row_dividers[i],
+                "index": i
+            })
+        
+        # 定义列区间
+        col_intervals = []
+        for i in range(len(col_dividers) - 1):
+            col_intervals.append({
+                "left": col_dividers[i],
+                "right": col_dividers[i+1],
+                "width": col_dividers[i+1] - col_dividers[i],
+                "index": i
+            })
+        
+        for bbox in bboxes:
+            b_top, b_bottom = bbox[1], bbox[3]
+            b_left, b_right = bbox[0], bbox[2]
+            b_h = b_bottom - b_top
+            b_w = b_right - b_left
+            
+            # 匹配行
+            matched_rows = []
+            for r in row_intervals:
+                inter_top = max(b_top, r["top"])
+                inter_bottom = min(b_bottom, r["bottom"])
+                inter_h = max(0, inter_bottom - inter_top)
+                
+                if r["height"] > 0 and (inter_h / r["height"] > 0.5 or inter_h / b_h > 0.5):
+                    matched_rows.append(r["index"])
+            
+            if not matched_rows:
+                cy = (b_top + b_bottom) / 2
+                closest_r = min(row_intervals, key=lambda r: abs((r["top"]+r["bottom"])/2 - cy))
+                matched_rows = [closest_r["index"]]
+            
+            row_start = min(matched_rows)
+            row_end = max(matched_rows)
+            rowspan = row_end - row_start + 1
+            
+            # 匹配列
+            matched_cols = []
+            for c in col_intervals:
+                inter_left = max(b_left, c["left"])
+                inter_right = min(b_right, c["right"])
+                inter_w = max(0, inter_right - inter_left)
+                
+                if c["width"] > 0 and (inter_w / c["width"] > 0.5 or inter_w / b_w > 0.5):
+                    matched_cols.append(c["index"])
+            
+            if not matched_cols:
+                cx = (b_left + b_right) / 2
+                closest_c = min(col_intervals, key=lambda c: abs((c["left"]+c["right"])/2 - cx))
+                matched_cols = [closest_c["index"]]
+            
+            col_start = min(matched_cols)
+            col_end = max(matched_cols)
+            colspan = col_end - col_start + 1
+            
+            structured_cells.append({
+                "bbox": bbox,
+                "row": row_start,
+                "col": col_start,
+                "rowspan": rowspan,
+                "colspan": colspan
+            })
+        
+        # 按行列排序
+        structured_cells.sort(key=lambda c: (c["row"], c["col"]))
+        
+        # 压缩网格 (移除空行空列)
+        structured_cells = GridRecovery.compress_grid(structured_cells)
+        
+        return structured_cells
+    
+    @staticmethod
+    def compress_grid(cells: List[Dict]) -> List[Dict]:
+        """
+        压缩网格索引,移除空行和空列
+        
+        Args:
+            cells: 单元格列表
+            
+        Returns:
+            压缩后的单元格列表
+        """
+        if not cells:
+            return []
+        
+        # 1. 计算当前最大行列
+        max_row = 0
+        max_col = 0
+        for cell in cells:
+            max_row = max(max_row, cell["row"] + cell.get("rowspan", 1))
+            max_col = max(max_col, cell["col"] + cell.get("colspan", 1))
+        
+        # 2. 标记占用情况
+        row_occupied = [False] * max_row
+        col_occupied = [False] * max_col
+        
+        for cell in cells:
+            if cell["row"] < max_row:
+                row_occupied[cell["row"]] = True
+            if cell["col"] < max_col:
+                col_occupied[cell["col"]] = True
+        
+        # 3. 构建映射表
+        row_map = [0] * (max_row + 1)
+        current_row = 0
+        for r in range(max_row):
+            if row_occupied[r]:
+                current_row += 1
+            row_map[r + 1] = current_row
+        
+        col_map = [0] * (max_col + 1)
+        current_col = 0
+        for c in range(max_col):
+            if col_occupied[c]:
+                current_col += 1
+            col_map[c + 1] = current_col
+        
+        # 4. 更新单元格索引
+        new_cells = []
+        for cell in cells:
+            new_cell = cell.copy()
+            
+            old_r1 = cell["row"]
+            old_r2 = old_r1 + cell.get("rowspan", 1)
+            new_r1 = row_map[old_r1]
+            new_r2 = row_map[old_r2]
+            
+            old_c1 = cell["col"]
+            old_c2 = old_c1 + cell.get("colspan", 1)
+            new_c1 = col_map[old_c1]
+            new_c2 = col_map[old_c2]
+            
+            new_span_r = max(1, new_r2 - new_r1)
+            new_span_c = max(1, new_c2 - new_c1)
+            
+            new_cell["row"] = new_r1
+            new_cell["col"] = new_c1
+            new_cell["rowspan"] = new_span_r
+            new_cell["colspan"] = new_span_c
+            
+            new_cells.append(new_cell)
+        
+        return new_cells
+

+ 154 - 0
ocr_tools/universal_doc_parser/models/adapters/wired_table/html_generator.py

@@ -0,0 +1,154 @@
+"""
+HTML生成模块
+
+提供表格HTML生成和增强功能。
+"""
+import html
+from typing import List, Dict, Any
+
+
+class WiredTableHTMLGenerator:
+    """HTML生成工具类"""
+    
+    @staticmethod
+    def build_html_from_merged_cells(merged_cells: List[Dict]) -> str:
+        """
+        基于矩阵填充法生成 HTML,防止错位
+        
+        Args:
+            merged_cells: 合并后的单元格列表,包含 row, col, rowspan, colspan, text, bbox 等字段
+            
+        Returns:
+            HTML字符串
+        """
+        if not merged_cells:
+            return "<table><tbody></tbody></table>"
+        
+        # 1. 计算网格尺寸
+        max_row = 0
+        max_col = 0
+        for cell in merged_cells:
+            max_row = max(max_row, cell["row"] + cell.get("rowspan", 1))
+            max_col = max(max_col, cell["col"] + cell.get("colspan", 1))
+        
+        # 2. 构建占用矩阵 (True 表示该位置已被占据)
+        occupied = [[False for _ in range(max_col)] for _ in range(max_row)]
+        
+        # 3. 将单元格放入查找表,方便按坐标检索
+        cell_map = {}
+        for cell in merged_cells:
+            key = (cell["row"], cell["col"])
+            cell_map[key] = cell
+        
+        html_parts = ["<table><tbody>"]
+        
+        # 4. 逐行逐列扫描
+        for r in range(max_row):
+            html_parts.append("<tr>")
+            for c in range(max_col):
+                # 如果该位置已被之前的 rowspan/colspan 占据,跳过
+                if occupied[r][c]:
+                    continue
+                
+                # 检查是否有单元格起始于此
+                cell = cell_map.get((r, c))
+                
+                if cell:
+                    # 有单元格:输出 td 并标记占用区域
+                    bbox = cell["bbox"]
+                    colspan = cell.get("colspan", 1)
+                    rowspan = cell.get("rowspan", 1)
+                    text = html.escape(cell.get("text", ""))
+                    bbox_str = f"[{int(bbox[0])},{int(bbox[1])},{int(bbox[2])},{int(bbox[3])}]"
+                    
+                    attrs = [f'data-bbox="{bbox_str}"']
+                    if colspan > 1:
+                        attrs.append(f'colspan="{colspan}"')
+                    if rowspan > 1:
+                        attrs.append(f'rowspan="{rowspan}"')
+                    
+                    html_parts.append(f'<td {" ".join(attrs)}>{text}</td>')
+                    
+                    # 标记占用
+                    for i in range(rowspan):
+                        for j in range(colspan):
+                            if r + i < max_row and c + j < max_col:
+                                occupied[r + i][c + j] = True
+                else:
+                    # 无单元格(空洞):输出空 td 占位,防止后续单元格左移
+                    html_parts.append("<td></td>")
+                    occupied[r][c] = True
+                    
+            html_parts.append("</tr>")
+        
+        html_parts.append("</tbody></table>")
+        return "".join(html_parts)
+    
+    @staticmethod
+    def enhance_html_with_cell_data(html_code: str, cells: List[Dict[str, Any]]) -> str:
+        """
+        通过BeautifulSoup增强HTML,为每个td添加data-bbox和data-score属性
+        
+        保留原始MinerU的rowspan/colspan等属性,只增加定位信息。按照cells的row/col与HTML中td的位置关系进行匹配。
+        
+        Args:
+            html_code: MinerU生成的原始HTML
+            cells: 单元格列表,包含bbox、row、col等信息
+        
+        Returns:
+            增强后的HTML字符串,包含data-bbox和data-score属性
+        """
+        if not html_code or not cells:
+            return html_code
+        
+        try:
+            from bs4 import BeautifulSoup
+        except ImportError:
+            return html_code
+        
+        soup = BeautifulSoup(html_code, 'html.parser')
+        
+        # 建立cell快速查询字典:(row, col) -> cell
+        cell_dict = {}
+        for cell in cells:
+            row = cell.get("row", 0)
+            col = cell.get("col", 0)
+            key = (row, col)
+            cell_dict[key] = cell
+        
+        # 遍历HTML中的所有tr和td,按行列顺序进行匹配
+        rows = soup.find_all('tr')
+        for row_idx, tr in enumerate(rows):
+            tds = tr.find_all('td')  # type: ignore
+            col_idx = 0
+            for td in tds:
+                # 获取colspan和rowspan属性
+                colspan_str = td.get('colspan')  # type: ignore
+                rowspan_str = td.get('rowspan')  # type: ignore
+                try:
+                    colspan = int(str(colspan_str)) if colspan_str else 1
+                    rowspan = int(str(rowspan_str)) if rowspan_str else 1
+                except (ValueError, TypeError):
+                    colspan = 1
+                    rowspan = 1
+                
+                # 根据row_idx和col_idx查找对应的cell
+                cell_key = (row_idx, col_idx)
+                if cell_key in cell_dict:
+                    cell = cell_dict[cell_key]
+                    bbox = cell.get("bbox", [])
+                    score = cell.get("score", 100.0)
+                    
+                    # 添加data-bbox属性
+                    if bbox and len(bbox) >= 4:
+                        bbox_str = ",".join(map(str, map(int, bbox[:4])))
+                        td['data-bbox'] = f"[{bbox_str}]"  # type: ignore
+                    
+                    # 添加data-score属性
+                    td['data-score'] = f"{score:.4f}"  # type: ignore
+                
+                # 更新列索引(考虑colspan)
+                col_idx += colspan
+        
+        return str(soup)
+

+ 46 - 0
ocr_tools/universal_doc_parser/models/adapters/wired_table/ocr_formatter.py

@@ -0,0 +1,46 @@
+"""
+OCR格式转换模块
+
+提供OCR结果格式转换功能,将OCR结果转换为UNet模型期望的格式。
+"""
+from typing import List, Dict, Any
+
+
+class OCRFormatter:
+    """OCR格式转换工具类"""
+    
+    @staticmethod
+    def to_unet_ocr_format(ocr_boxes: List[Dict[str, Any]]) -> List[List[Any]]:
+        """
+        将OCR结果转成 UNet 期望格式 [[poly4,text,score], ...],坐标用浮点。
+        
+        Args:
+            ocr_boxes: OCR结果列表,每个元素包含 bbox, text, confidence 等字段
+            
+        Returns:
+            UNet格式的OCR结果列表
+        """
+        formatted = []
+        for item in ocr_boxes:
+            poly = item.get("bbox", [])
+            text = item.get("text", "")
+            score = item.get("confidence", 0.0)
+            if not poly or len(poly) < 4:
+                continue
+            
+            # 统一成4点 (4,2)
+            if len(poly) == 8:
+                # 8点格式:[x1, y1, x2, y2, x3, y3, x4, y4]
+                poly_pts = [[float(poly[i]), float(poly[i + 1])] for i in range(0, 8, 2)]
+            elif len(poly) == 4:
+                # 4点bbox格式:[x1, y1, x2, y2]
+                x1, y1, x2, y2 = poly
+                poly_pts = [[x1, y1], [x2, y1], [x2, y2], [x1, y2]]
+            else:
+                # 其他格式跳过
+                continue
+            
+            formatted.append([poly_pts, text, float(score)])
+        
+        return formatted
+

+ 455 - 0
ocr_tools/universal_doc_parser/models/adapters/wired_table/skew_detection.py

@@ -0,0 +1,455 @@
+"""
+倾斜检测和矫正模块
+
+提供表格图像倾斜检测和矫正功能。
+"""
+from typing import Dict, Any, List, Tuple, Optional
+import cv2
+import numpy as np
+from loguru import logger
+import math
+
+# 导入倾斜矫正工具
+try:
+    from ocr_utils import BBoxExtractor
+    BBOX_EXTRACTOR_AVAILABLE = True
+except ImportError:
+    BBoxExtractor = None
+    BBOX_EXTRACTOR_AVAILABLE = False
+
+
+class SkewDetector:
+    """倾斜检测和矫正工具类"""
+    
+    def __init__(self, config: Dict[str, Any]):
+        """
+        初始化倾斜检测器
+        
+        Args:
+            config: 配置字典
+        """
+        self.enable_deskew: bool = config.get("enable_deskew", True) and BBOX_EXTRACTOR_AVAILABLE
+        self.skew_threshold: float = config.get("skew_threshold", 0.1)  # 小于此角度不矫正
+        # Hough变换参数(用于基于图像的倾斜检测)
+        self.hough_rho: float = config.get("hough_rho", 1.0)  # 距离精度(像素)
+        self.hough_theta: float = config.get("hough_theta", np.pi / 180)  # 角度精度(弧度,默认1度)
+        self.skew_angle_range: Tuple[float, float] = config.get("skew_angle_range", (-30, 30))  # 角度范围限制(度)
+        self.skew_outlier_threshold: float = config.get("skew_outlier_threshold", 2.0)  # 异常值阈值(度)
+        self.skew_small_angle_range: Tuple[float, float] = config.get("skew_small_angle_range", (-2.0, 2.0))  # 小角度范围(度)
+    
+    def detect_skew_angle_from_image(self, image: np.ndarray) -> float:
+        """
+        使用Hough变换从图像直接检测倾斜角度(不依赖OCR结果)
+        
+        Args:
+            image: 表格图像(可以是灰度图或彩色图)
+            
+        Returns:
+            倾斜角度(度数,正值=逆时针,负值=顺时针)
+        """
+        if not self.enable_deskew:
+            return 0.0
+        
+        try:
+            # 1. 图像预处理:转换为灰度图
+            if len(image.shape) == 3:
+                gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
+            else:
+                gray = image.copy()
+            
+            # 2. 二值化(使用Otsu自适应阈值)
+            _, binary = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
+            
+            # 3. 边缘检测(使用Canny)
+            edges = cv2.Canny(binary, 50, 150, apertureSize=3)
+            
+            # 4. Hough变换检测直线
+            h, w = edges.shape
+            hough_threshold = max(int(min(h, w) * 0.15), 50)  # 自适应阈值
+            lines = cv2.HoughLines(edges, 
+                                   rho=self.hough_rho,
+                                   theta=self.hough_theta,
+                                   threshold=hough_threshold)
+            
+            if lines is None or len(lines) == 0:
+                logger.debug("Hough变换未检测到直线,返回0°")
+                return 0.0
+            
+            # 5. 分析直线角度
+            line_data = []  # 存储角度和权重(线长度)
+            min_angle, max_angle = self.skew_angle_range
+            
+            for line in lines:
+                line_item = line[0]  # type: ignore
+                rho = float(line_item[0])
+                theta = float(line_item[1])
+                
+                # 转换为直线与水平线的夹角(度数)
+                if theta < np.pi / 4 or theta > 3 * np.pi / 4:
+                    # 接近垂直的线,跳过
+                    continue
+                
+                # 将theta转换为直线角度
+                angle_rad = theta - np.pi / 2
+                angle_degrees = np.degrees(angle_rad)
+                
+                # 过滤:只保留在指定范围内的角度
+                if min_angle <= angle_degrees <= max_angle:
+                    # 计算直线在图像中的长度作为权重
+                    cos_theta = np.cos(theta)
+                    sin_theta = np.sin(theta)
+                    
+                    # 计算与边界的交点
+                    intersections = []
+                    # 左边界 (x=0)
+                    if abs(sin_theta) > 1e-6:
+                        y = rho / sin_theta
+                        if 0 <= y <= h:
+                            intersections.append((0.0, y))
+                    # 右边界 (x=w)
+                    if abs(sin_theta) > 1e-6:
+                        y = (rho - w * cos_theta) / sin_theta
+                        if 0 <= y <= h:
+                            intersections.append((float(w), y))
+                    # 上边界 (y=0)
+                    if abs(cos_theta) > 1e-6:
+                        x = rho / cos_theta
+                        if 0 <= x <= w:
+                            intersections.append((x, 0.0))
+                    # 下边界 (y=h)
+                    if abs(cos_theta) > 1e-6:
+                        x = (rho - h * sin_theta) / cos_theta
+                        if 0 <= x <= w:
+                            intersections.append((x, float(h)))
+                    
+                    # 去重交点
+                    unique_intersections = []
+                    for pt in intersections:
+                        is_duplicate = False
+                        for existing_pt in unique_intersections:
+                            if abs(pt[0] - existing_pt[0]) < 1e-3 and abs(pt[1] - existing_pt[1]) < 1e-3:
+                                is_duplicate = True
+                                break
+                        if not is_duplicate:
+                            unique_intersections.append(pt)
+                    
+                    # 计算线长度
+                    if len(unique_intersections) >= 2:
+                        max_dist = 0
+                        for i in range(len(unique_intersections)):
+                            for j in range(i + 1, len(unique_intersections)):
+                                dx = unique_intersections[i][0] - unique_intersections[j][0]
+                                dy = unique_intersections[i][1] - unique_intersections[j][1]
+                                dist = np.sqrt(dx * dx + dy * dy)
+                                if dist > max_dist:
+                                    max_dist = dist
+                        line_length = max_dist
+                    else:
+                        line_length = float(w)
+                    
+                    line_data.append({
+                        'angle': angle_degrees,
+                        'weight': line_length
+                    })
+            
+            if len(line_data) == 0:
+                logger.debug("未找到符合条件的直线角度,返回0°")
+                return 0.0
+            
+            # 6. 计算倾斜角度(使用两阶段过滤和加权平均)
+            significant_angle_data = [
+                item for item in line_data 
+                if abs(item['angle']) >= 0.1
+            ]
+            
+            small_min, small_max = self.skew_small_angle_range
+            if len(significant_angle_data) >= 10:
+                candidate_data = significant_angle_data
+                logger.debug(f"使用明显倾斜角度数据: {len(candidate_data)}条直线(角度绝对值>=0.1度)")
+            else:
+                small_angle_data = [
+                    item for item in line_data 
+                    if small_min <= item['angle'] <= small_max
+                ]
+                if len(small_angle_data) >= 10:
+                    candidate_data = small_angle_data
+                    logger.debug(f"明显倾斜角度不足({len(significant_angle_data)}个),使用小角度范围数据: {len(candidate_data)}条直线")
+                else:
+                    candidate_data = line_data
+                    logger.debug(f"小角度样本不足({len(small_angle_data)}个),使用全部数据: {len(candidate_data)}条直线")
+            
+            # 提取角度和权重
+            angles_array = np.array([item['angle'] for item in candidate_data])
+            weights_array = np.array([item['weight'] for item in candidate_data])
+            
+            if len(angles_array) == 0:
+                logger.debug("没有候选角度数据,返回0°")
+                return 0.0
+            
+            # 计算中位数
+            median_angle = np.median(angles_array)
+            
+            # 过滤异常值
+            outlier_threshold = self.skew_outlier_threshold
+            filtered_indices = np.abs(angles_array - median_angle) <= outlier_threshold
+            filtered_angles = angles_array[filtered_indices]
+            filtered_weights = weights_array[filtered_indices]
+            
+            if len(filtered_angles) < 3:
+                skew_angle = float(median_angle)
+                logger.debug(f"过滤后角度太少({len(filtered_angles)}个),使用中位数: {skew_angle:.3f}°")
+            else:
+                total_weight = np.sum(filtered_weights)
+                if total_weight > 0:
+                    weighted_angle = np.sum(filtered_angles * filtered_weights) / total_weight
+                    skew_angle = float(weighted_angle)
+                else:
+                    skew_angle = float(np.median(filtered_angles))
+            
+            logger.debug(f"基于图像的倾斜角度检测: {skew_angle:.3f}° (检测到{len(lines)}条直线,有效角度{len(line_data)}个,过滤后{len(filtered_angles)}个)")
+            return skew_angle
+            
+        except Exception as e:
+            logger.warning(f"基于图像的倾斜角度检测失败: {e}")
+            return 0.0
+
+    def get_oriented_lines_from_mask(self, mask: np.ndarray, lineW: int = 10) -> List[Tuple[float, float, float, float]]:
+        """
+        从二值Mask中提取带方向的线段 (x1, y1, x2, y2)
+        使用 cv2.fitLine 保留斜率方向 (区分 +/- 斜率)
+        
+        Args:
+            mask: 二值图像
+            lineW: 最小线宽/长度过滤 (近似)
+            
+        Returns:
+            List of (x1, y1, x2, y2)
+        """
+        # 确保Mask是连续的,并且是uint8类型
+        if not isinstance(mask, np.ndarray):
+            mask = np.array(mask)
+        if not mask.flags['C_CONTIGUOUS']:
+            mask = np.ascontiguousarray(mask)
+        if mask.dtype != np.uint8:
+            mask = mask.astype(np.uint8)
+            
+        contours, _ = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
+        lines = []
+        
+        for cnt in contours:
+            # 过滤小轮廓
+            if cv2.contourArea(cnt) < lineW * 2:
+                continue
+                
+            # fitLine 获取方向
+            # [vx, vy, x, y] normalized vector (vx,vy) and point (x,y)
+            [vx, vy, x, y] = cv2.fitLine(cnt, cv2.DIST_L2, 0, 0.01, 0.01)
+            
+            # 找到轮廓在这个方向上的极值点
+            # 计算所有点在线上的投影
+            
+            # 投影公式: proj = (p - center) dot dir
+            # p = (x_pt, y_pt)
+            pts = cnt.reshape(-1, 2)
+            
+            # 向量化投影
+            # project t along line
+            # Line: P = (x,y) + t * (vx, vy)
+            # t = (dx*vx + dy*vy)
+            dx = pts[:, 0] - x
+            dy = pts[:, 1] - y
+            t = dx * vx + dy * vy
+            
+            t_min = np.min(t)
+            t_max = np.max(t)
+            
+            # Compute Endpoints
+            p1_x = x + t_min * vx
+            p1_y = y + t_min * vy
+            p2_x = x + t_max * vx
+            p2_y = y + t_max * vy
+            
+            lines.append((float(p1_x), float(p1_y), float(p2_x), float(p2_y)))
+            
+        return lines
+
+    def calculate_skew_from_vectors(self, lines: List[Tuple[float, float, float, float]]) -> float:
+        """从矢量线段计算加权平均倾斜角"""
+        angles = []
+        weights = []
+        
+        for len_vec in lines:
+            x1, y1, x2, y2 = len_vec
+            length = math.sqrt((x2-x1)**2 + (y2-y1)**2)
+            if length < 100:
+                continue
+                
+            angle = math.degrees(math.atan2(y2-y1, x2-x1))
+            
+            # Normalize to horizontal (-45 to 45)
+            if angle > 45: angle -= 180
+            elif angle < -45: angle += 180
+            
+            if abs(angle) > 30: continue
+            
+            angles.append(angle)
+            weights.append(length)
+            
+        if not angles:
+            return 0.0
+            
+        angles = np.array(angles)
+        weights = np.array(weights)
+        
+        avg_angle = np.average(angles, weights=weights)
+        
+        # Filter outliers
+        valid = np.abs(angles - avg_angle) < 5.0
+        if np.sum(valid) > 0:
+            return float(np.average(angles[valid], weights=weights[valid]))
+        return float(avg_angle)
+
+    def detect_skew_from_mask(self, mask: np.ndarray) -> float:
+        """
+        从UNet预测的二值Mask中检测倾斜角度 (使用 GridRecovery 的 fitLine 逻辑)
+        
+        Args:
+            mask: 横线或竖线mask (0/255)
+            
+        Returns:
+            倾斜角度
+        """
+        if not self.enable_deskew:
+            return 0.0
+            
+        try:
+            # 1. 提取带方向的矢量线段
+            lines = self.get_oriented_lines_from_mask(mask)
+            
+            # 2. 计算加权倾斜角
+            final_angle = self.calculate_skew_from_vectors(lines)
+            
+            logger.debug(f"基于Mask(fitLine)检测: {final_angle:.3f}° (基于 {len(lines)} 条线段)")
+            return final_angle
+            
+        except Exception as e:
+            logger.warning(f"基于Mask的倾斜角度检测失败: {e}")
+            import traceback
+            logger.warning(traceback.format_exc())
+            return 0.0
+    
+    def apply_deskew(
+        self,
+        table_image: np.ndarray,
+        ocr_boxes: List[Dict[str, Any]],
+        skew_angle: float
+    ) -> Tuple[np.ndarray, List[Dict[str, Any]]]:
+        """
+        应用倾斜矫正并同步更新OCR坐标
+        
+        Args:
+            table_image: 表格图像
+            ocr_boxes: OCR结果列表
+            skew_angle: 倾斜角度(度数,正值=逆时针,负值=顺时针)
+            
+        Returns:
+            (矫正后的图像, 更新后的OCR框列表)
+        """
+        if abs(skew_angle) < self.skew_threshold:
+            return table_image, ocr_boxes
+        
+        if not BBOX_EXTRACTOR_AVAILABLE:
+            logger.warning("BBoxExtractor不可用,跳过倾斜矫正")
+            return table_image, ocr_boxes
+        
+        try:
+            h, w = table_image.shape[:2]
+            center = (w / 2, h / 2)
+            
+            # 计算矫正角度
+            correction_angle = skew_angle
+            
+            # 构建旋转矩阵
+            rotation_matrix = cv2.getRotationMatrix2D(center, correction_angle, 1.0)
+            
+            # 计算旋转后的图像尺寸(避免裁剪)
+            cos_val = abs(rotation_matrix[0, 0])
+            sin_val = abs(rotation_matrix[0, 1])
+            new_w = int((h * sin_val) + (w * cos_val))
+            new_h = int((h * cos_val) + (w * sin_val))
+            
+            # 调整旋转矩阵的平移部分,使图像居中
+            rotation_matrix[0, 2] += (new_w / 2) - center[0]
+            rotation_matrix[1, 2] += (new_h / 2) - center[1]
+            
+            # 应用旋转(填充背景为白色)
+            if len(table_image.shape) == 2:
+                # 灰度图
+                deskewed_image = cv2.warpAffine(
+                    table_image, rotation_matrix, (new_w, new_h),
+                    flags=cv2.INTER_LINEAR,
+                    borderMode=cv2.BORDER_CONSTANT,
+                    borderValue=255
+                )
+            else:
+                # 彩色图
+                deskewed_image = cv2.warpAffine(
+                    table_image, rotation_matrix, (new_w, new_h),
+                    flags=cv2.INTER_LINEAR,
+                    borderMode=cv2.BORDER_CONSTANT,
+                    borderValue=(255, 255, 255)
+                )
+            
+            # 更新OCR框坐标(如果提供了OCR框)
+            if ocr_boxes and len(ocr_boxes) > 0:
+                paddle_boxes = ocr_boxes
+                
+                # 使用BBoxExtractor更新坐标
+                if BBoxExtractor is None:
+                    logger.warning("BBoxExtractor不可用,无法更新OCR坐标")
+                    return deskewed_image, ocr_boxes
+                
+                updated_paddle_boxes = BBoxExtractor.correct_boxes_skew(
+                    paddle_boxes, correction_angle, (new_w, new_h)
+                )
+                
+                # 转换回原始格式
+                updated_ocr_boxes = []
+                for i, paddle_box in enumerate(updated_paddle_boxes):
+                    original_box = ocr_boxes[i] if i < len(ocr_boxes) else {}
+                    
+                    # 从poly重新计算bbox(确保坐标正确)
+                    poly = paddle_box.get("poly", [])
+                    if poly and len(poly) >= 4:
+                        xs = [p[0] for p in poly]
+                        ys = [p[1] for p in poly]
+                        bbox = [min(xs), min(ys), max(xs), max(ys)]
+                    else:
+                        bbox = paddle_box.get("bbox", [])
+                    
+                    updated_box = {
+                        "bbox": bbox,
+                        "text": paddle_box.get("text", original_box.get("text", "")),
+                        "confidence": paddle_box.get("confidence", original_box.get("confidence", original_box.get("score", 1.0))),
+                    }
+                    # 保留其他字段(包括 original_bbox)
+                    for key in original_box:
+                        if key not in updated_box:
+                            updated_box[key] = original_box[key]
+                    # 确保保留 paddle_box 中的 original_bbox(如果存在)
+                    if 'original_bbox' in paddle_box:
+                        updated_box['original_bbox'] = paddle_box['original_bbox']
+                    
+                    updated_ocr_boxes.append(updated_box)
+                
+                logger.info(f"✅ 倾斜矫正完成: {skew_angle:.3f}° → 0° (图像尺寸: {w}x{h} → {new_w}x{new_h}),已更新{len(updated_ocr_boxes)}个OCR框")
+                
+                return deskewed_image, updated_ocr_boxes
+            else:
+                # OCR框为空,只返回矫正后的图像
+                logger.info(f"✅ 倾斜矫正完成: {skew_angle:.3f}° → 0° (图像尺寸: {w}x{h} → {new_w}x{new_h}),无OCR框需要更新")
+                return deskewed_image, ocr_boxes
+            
+        except Exception as e:
+            logger.error(f"倾斜矫正失败: {e}")
+            return table_image, ocr_boxes

+ 463 - 0
ocr_tools/universal_doc_parser/models/adapters/wired_table/text_filling.py

@@ -0,0 +1,463 @@
+"""
+文本填充模块
+
+提供表格单元格文本填充功能,包括OCR文本匹配和二次OCR填充。
+"""
+from typing import List, Dict, Any, Tuple, Optional
+import cv2
+import numpy as np
+from loguru import logger
+
+from ocr_tools.universal_doc_parser.core.coordinate_utils import CoordinateUtils
+
+
+class TextFiller:
+    """文本填充工具类"""
+    
+    def __init__(self, ocr_engine: Any, config: Dict[str, Any]):
+        """
+        初始化文本填充器
+        
+        Args:
+            ocr_engine: OCR引擎
+            config: 配置字典
+        """
+        self.ocr_engine = ocr_engine
+        self.cell_crop_margin: int = config.get("cell_crop_margin", 2)
+        self.ocr_conf_threshold: float = config.get("ocr_conf_threshold", 0.5)
+    
+    def fill_text_by_center_point(
+        self,
+        bboxes: List[List[float]],
+        ocr_boxes: List[Dict[str, Any]],
+    ) -> Tuple[List[str], List[float], List[List[Dict[str, Any]]], List[int]]:
+        """
+        使用中心点落格策略填充文本。
+        
+        参考 fill_html_with_ocr_by_bbox:
+        - OCR文本中心点落入单元格bbox内则匹配
+        - 多行文本按y坐标排序拼接
+        
+        Args:
+            bboxes: 单元格坐标 [[x1,y1,x2,y2], ...]
+            ocr_boxes: OCR结果 [{"bbox": [...], "text": "..."}, ...]
+            
+        Returns:
+            每个单元格的文本列表
+            每个单元格的置信度列表
+            每个单元格匹配到的 OCR boxes 列表
+            需要二次 OCR 的单元格索引列表(OCR box 跨多个单元格或过大)
+        """
+        texts: List[str] = ["" for _ in bboxes]
+        scores: List[float] = [0.0 for _ in bboxes]
+        matched_boxes_list: List[List[Dict[str, Any]]] = [[] for _ in bboxes]
+        need_reocr_indices: List[int] = []
+        
+        if not ocr_boxes:
+            return texts, scores, matched_boxes_list, need_reocr_indices
+        
+        # 预处理OCR结果:计算中心点
+        ocr_items: List[Dict[str, Any]] = []
+        for item in ocr_boxes:
+            # 使用 CoordinateUtils.poly_to_bbox() 替换 _normalize_bbox()
+            box = CoordinateUtils.poly_to_bbox(item.get("bbox", []))
+            if not box:
+                continue
+            cx = (box[0] + box[2]) / 2
+            cy = (box[1] + box[3]) / 2
+            ocr_items.append({
+                "center_x": cx,
+                "center_y": cy,
+                "y1": box[1],
+                "bbox": box,  # 保存 bbox 用于跨单元格检测
+                "text": item.get("text", ""),
+                "confidence": float(item.get("confidence", item.get("score", 1.0))),
+                "original_box": item,  # 保存完整的 OCR box 对象
+            })
+        
+        # 为每个单元格匹配OCR文本
+        for idx, bbox in enumerate(bboxes):
+            x1, y1, x2, y2 = bbox
+            matched: List[Tuple[str, float, float, Dict[str, Any]]] = [] # (text, y1, score, original_box)
+            
+            for ocr in ocr_items:
+                if x1 <= ocr["center_x"] <= x2 and y1 <= ocr["center_y"] <= y2:
+                    matched.append((ocr["text"], ocr["y1"], ocr["confidence"], ocr["original_box"]))
+            
+            if matched:
+                # 按y坐标排序,确保多行文本顺序正确
+                matched.sort(key=lambda x: x[1])
+                texts[idx] = "".join([t for t, _, _, _ in matched])
+                # 计算平均置信度
+                avg_score = sum([s for _, _, s, _ in matched]) / len(matched)
+                scores[idx] = avg_score
+                # 保存匹配到的 OCR boxes
+                matched_boxes_list[idx] = [box for _, _, _, box in matched]
+                
+                # 检测 OCR box 是否跨多个单元格或过大
+                for ocr_item in ocr_items:
+                    ocr_bbox = ocr_item["bbox"]
+                    # 检测是否跨多个单元格
+                    overlapping_cells = self.detect_ocr_box_spanning_cells(ocr_bbox, bboxes, overlap_threshold=0.3)
+                    if len(overlapping_cells) >= 2:
+                        # OCR box 跨多个单元格,标记所有相关单元格需要二次 OCR
+                        for cell_idx in overlapping_cells:
+                            if cell_idx not in need_reocr_indices:
+                                need_reocr_indices.append(cell_idx)
+                        logger.debug(f"检测到 OCR box 跨 {len(overlapping_cells)} 个单元格: {ocr_item['text'][:20]}...")
+                    
+                    # 检测 OCR box 是否相对于当前单元格过大
+                    if self.is_ocr_box_too_large(ocr_bbox, bbox, size_ratio_threshold=1.5):
+                        if idx not in need_reocr_indices:
+                            need_reocr_indices.append(idx)
+                        logger.debug(f"检测到 OCR box 相对于单元格过大 (单元格 {idx}): {ocr_item['text'][:20]}...")
+            else:
+                scores[idx] = 0.0 # 无匹配文本,置信度为0
+        
+        return texts, scores, matched_boxes_list, need_reocr_indices
+    
+    @staticmethod
+    def merge_boxes_original_bbox(boxes: List[Dict[str, Any]]) -> List[float]:
+        """
+        合并多个 OCR box 的原始坐标(优先使用 original_bbox)
+        
+        Args:
+            boxes: OCR box 列表,每个 box 可能包含 'original_bbox' 或 'bbox' 字段
+            
+        Returns:
+            合并后的 bbox [x1, y1, x2, y2](原始坐标系)
+        """
+        if not boxes:
+            return [0.0, 0.0, 0.0, 0.0]
+        
+        # 优先使用 original_bbox,如果没有则使用 bbox
+        def get_coords(b):
+            return b.get('original_bbox', b.get('bbox', [0.0, 0.0, 0.0, 0.0]))
+        
+        coords_list = [get_coords(b) for b in boxes if get_coords(b) and len(get_coords(b)) == 4]
+        if not coords_list:
+            return [0.0, 0.0, 0.0, 0.0]
+        
+        x1 = min(c[0] for c in coords_list)
+        y1 = min(c[1] for c in coords_list)
+        x2 = max(c[2] for c in coords_list)
+        y2 = max(c[3] for c in coords_list)
+        return [float(x1), float(y1), float(x2), float(y2)]
+    
+    @staticmethod
+    def detect_ocr_box_spanning_cells(
+        ocr_bbox: List[float],
+        cell_bboxes: List[List[float]],
+        overlap_threshold: float = 0.3
+    ) -> List[int]:
+        """
+        检测 OCR box 是否跨多个单元格
+        
+        Args:
+            ocr_bbox: OCR box 坐标 [x1, y1, x2, y2]
+            cell_bboxes: 单元格坐标列表
+            overlap_threshold: 重叠比例阈值(OCR box 与单元格的重叠面积占 OCR box 面积的比例)
+            
+        Returns:
+            与 OCR box 重叠的单元格索引列表
+        """
+        if not ocr_bbox or len(ocr_bbox) < 4:
+            return []
+        
+        overlapping_cells = []
+        ocr_area = (ocr_bbox[2] - ocr_bbox[0]) * (ocr_bbox[3] - ocr_bbox[1])
+        
+        if ocr_area <= 0:
+            return []
+        
+        for idx, cell_bbox in enumerate(cell_bboxes):
+            if not cell_bbox or len(cell_bbox) < 4:
+                continue
+            
+            # 计算交集
+            inter_x1 = max(ocr_bbox[0], cell_bbox[0])
+            inter_y1 = max(ocr_bbox[1], cell_bbox[1])
+            inter_x2 = min(ocr_bbox[2], cell_bbox[2])
+            inter_y2 = min(ocr_bbox[3], cell_bbox[3])
+            
+            if inter_x2 > inter_x1 and inter_y2 > inter_y1:
+                inter_area = (inter_x2 - inter_x1) * (inter_y2 - inter_y1)
+                overlap_ratio = inter_area / ocr_area
+                
+                if overlap_ratio > overlap_threshold:
+                    overlapping_cells.append(idx)
+        
+        return overlapping_cells
+    
+    @staticmethod
+    def is_ocr_box_too_large(
+        ocr_bbox: List[float],
+        cell_bbox: List[float],
+        size_ratio_threshold: float = 1.5
+    ) -> bool:
+        """
+        检测 OCR box 是否相对于单元格过大
+        
+        Args:
+            ocr_bbox: OCR box 坐标 [x1, y1, x2, y2]
+            cell_bbox: 单元格坐标 [x1, y1, x2, y2]
+            size_ratio_threshold: 面积比阈值,如果 OCR box 面积 > 单元格面积 * 阈值,则认为过大
+            
+        Returns:
+            是否过大
+        """
+        if not ocr_bbox or len(ocr_bbox) < 4 or not cell_bbox or len(cell_bbox) < 4:
+            return False
+        
+        ocr_area = (ocr_bbox[2] - ocr_bbox[0]) * (ocr_bbox[3] - ocr_bbox[1])
+        cell_area = (cell_bbox[2] - cell_bbox[0]) * (cell_bbox[3] - cell_bbox[1])
+        
+        if cell_area <= 0:
+            return False
+        
+        size_ratio = ocr_area / cell_area
+        return size_ratio > size_ratio_threshold
+    
+    def second_pass_ocr_fill(
+        self,
+        table_image: np.ndarray,
+        bboxes: List[List[float]],
+        texts: List[str],
+        scores: Optional[List[float]] = None,
+        need_reocr_indices: Optional[List[int]] = None,
+        force_all: bool = False,
+    ) -> List[str]:
+        """
+        二次OCR统一封装:
+        - 对空文本单元格裁剪图块并少量外扩
+        - 对低置信度文本进行重识别
+        - 对竖排单元格(高宽比大)进行旋转后识别
+        - 对 OCR 误合并的单元格进行重识别(OCR box 跨多个单元格或过大)
+        - [New] force_all=True: 强制对所有单元格进行裁剪识别 (Full-page OCR 作为 fallback)
+        
+        Args:
+            table_image: 表格图像
+            bboxes: 单元格坐标列表
+            texts: 当前文本列表
+            scores: 当前置信度列表
+            need_reocr_indices: 需要二次 OCR 的单元格索引列表(OCR 误合并检测结果)
+            force_all: 是否强制对所有单元格进行 OCR (Default: False)
+        """
+        try:
+            if not self.ocr_engine:
+                return texts
+            
+            # 如果没有传入 scores,则默认全为 1.0(仅处理空文本)
+            if scores is None:
+                scores = [1.0 if t else 0.0 for t in texts]
+            
+            # 如果没有传入 need_reocr_indices,初始化为空列表
+            if need_reocr_indices is None:
+                need_reocr_indices = []
+
+            h_img, w_img = table_image.shape[:2]
+            margin = self.cell_crop_margin
+            
+            # 触发二次OCR的阈值
+            trigger_score_thresh = 0.90 
+
+            crop_list: List[np.ndarray] = []
+            crop_indices: List[int] = []
+
+            # 收集需要二次OCR的裁剪块
+            for i, t in enumerate(texts):
+                bbox = bboxes[i]
+                w_box = bbox[2] - bbox[0]
+                h_box = bbox[3] - bbox[1]
+                
+                # 判断是否需要二次OCR
+                need_reocr = False
+                reocr_reason = ""
+                
+                if force_all:
+                    need_reocr = True
+                    reocr_reason = "强制全量OCR"
+                else:
+                    # 1. 文本为空
+                    if not t or not t.strip():
+                        need_reocr = True
+                        reocr_reason = "空文本"
+                    # 2. 置信度过低
+                    elif scores[i] < trigger_score_thresh:
+                        need_reocr = True
+                        reocr_reason = "低置信度"
+                    # 3. 竖排单元格 (高宽比 > 2.5) 且置信度不是极高
+                    elif h_box > w_box * 2.5 and scores[i] < 0.98:
+                        need_reocr = True
+                        reocr_reason = "竖排文本"
+                    # 4. OCR 误合并:OCR box 跨多个单元格或过大
+                    elif i in need_reocr_indices:
+                        need_reocr = True
+                        reocr_reason = "OCR误合并"
+
+                if not need_reocr:
+                    continue
+                
+                # if reocr_reason:
+                    # logger.debug(f"单元格 {i} 触发二次OCR: {reocr_reason} (文本: '{t[:30]}...')")
+                
+                if i >= len(bboxes):
+                    continue
+
+                x1, y1, x2, y2 = map(int, bboxes[i])
+                x1 = max(0, x1 - margin)
+                y1 = max(0, y1 - margin)
+                x2 = min(w_img, x2 + margin)
+                y2 = min(h_img, y2 + margin)
+                if x2 <= x1 or y2 <= y1:
+                    continue
+
+                cell_img = table_image[y1:y2, x1:x2]
+                if cell_img.size == 0:
+                    continue
+
+                ch, cw = cell_img.shape[:2]
+                # 小图放大
+                if ch < 64 or cw < 64:
+                    cell_img = cv2.resize(cell_img, None, fx=2.0, fy=2.0, interpolation=cv2.INTER_CUBIC)
+                    ch, cw = cell_img.shape[:2]
+
+                # 竖排文本旋转为横排
+                if ch > cw * 2.0:
+                    cell_img = cv2.rotate(cell_img, cv2.ROTATE_90_COUNTERCLOCKWISE)
+
+                crop_list.append(cell_img)
+                crop_indices.append(i)
+
+            if not crop_list:
+                return texts
+            
+            logger.info(f"触发二次OCR: {len(crop_list)} 个单元格 (总数 {len(texts)})")
+
+            # 先批量检测文本块,再批量识别(提高效率)
+            # Step 1: 批量检测
+            det_results = []
+            for cell_img in crop_list:
+                try:
+                    det_res = self.ocr_engine.ocr(cell_img, det=True, rec=False)
+                    if det_res and len(det_res) > 0:
+                        dt_boxes = det_res[0]
+                        det_results.append(dt_boxes if dt_boxes else [])
+                    else:
+                        det_results.append([])
+                except Exception as e:
+                    logger.warning(f"单元格文本检测失败: {e}")
+                    det_results.append([])
+            
+            # Step 2: 从检测框中裁剪图像并批量识别
+            rec_img_list = []
+            rec_indices = []
+            for cell_idx, dt_boxes in enumerate(det_results):
+                if not dt_boxes:
+                    continue
+                cell_img = crop_list[cell_idx]
+                h, w = cell_img.shape[:2]
+                
+                for box_idx, box in enumerate(dt_boxes):
+                    if not box or len(box) < 4:
+                        continue
+                    # 将检测框转换为bbox格式并裁剪
+                    if isinstance(box[0], (list, tuple)):
+                        # 多边形格式
+                        xs = [p[0] for p in box]
+                        ys = [p[1] for p in box]
+                        x1, y1 = int(max(0, min(xs))), int(max(0, min(ys)))
+                        x2, y2 = int(min(w, max(xs))), int(min(h, max(ys)))
+                    else:
+                        # bbox格式
+                        xs = [box[i] for i in range(0, len(box), 2)]
+                        ys = [box[i] for i in range(1, len(box), 2)]
+                        x1, y1 = int(max(0, min(xs))), int(max(0, min(ys)))
+                        x2, y2 = int(min(w, max(xs))), int(min(h, max(ys)))
+                    
+                    if x2 > x1 and y2 > y1:
+                        cropped = cell_img[y1:y2, x1:x2]
+                        if cropped.size > 0:
+                            rec_img_list.append(cropped)
+                            rec_indices.append((cell_idx, box_idx))
+            
+            # Step 3: 批量识别
+            results = [[] for _ in crop_list]
+            if rec_img_list:
+                try:
+                    rec_res = self.ocr_engine.ocr(rec_img_list, det=False, rec=True)
+                    if rec_res and len(rec_res) > 0:
+                        rec_results = rec_res[0] if isinstance(rec_res[0], list) else rec_res
+                        # 将识别结果回填到对应的单元格
+                        for (cell_idx, box_idx), rec_item in zip(rec_indices, rec_results):
+                            if rec_item:
+                                if isinstance(rec_item, (list, tuple)) and len(rec_item) >= 2:
+                                    text = str(rec_item[0] or "").strip()
+                                    score = float(rec_item[1] or 0.0)
+                                    if text:
+                                        results[cell_idx].append((text, score))
+                except Exception as e:
+                    logger.warning(f"批量识别失败: {e}")
+
+            # 解析为 (text, score) - 支持合并多个文本块
+            def _parse_item(res_item) -> Tuple[str, float]:
+                if res_item is None:
+                    return "", 0.0
+                
+                # 列表形式:包含多个文本块,需要合并
+                if isinstance(res_item, list) and len(res_item) > 0:
+                    texts_list = []
+                    scores_list = []
+                    
+                    for item in res_item:
+                        if isinstance(item, tuple) and len(item) >= 2:
+                            text = str(item[0] or "").strip()
+                            score = float(item[1] or 0.0)
+                            if text:
+                                texts_list.append(text)
+                                scores_list.append(score)
+                        elif isinstance(item, list) and len(item) >= 2:
+                            text = str(item[0] or "").strip()
+                            score = float(item[1] or 0.0)
+                            if text:
+                                texts_list.append(text)
+                                scores_list.append(score)
+                        elif isinstance(item, dict):
+                            text = str(item.get("text") or item.get("label") or "").strip()
+                            score = float(item.get("score") or item.get("confidence") or 0.0)
+                            if text:
+                                texts_list.append(text)
+                                scores_list.append(score)
+                    
+                    if texts_list:
+                        combined_text = "".join(texts_list)
+                        avg_score = sum(scores_list) / len(scores_list) if scores_list else 0.0
+                        return combined_text, avg_score
+                    return "", 0.0
+                
+                # 直接 (text, score)
+                if isinstance(res_item, tuple) and len(res_item) >= 2:
+                    return str(res_item[0] or ""), float(res_item[1] or 0.0)
+                
+                # 字典形式
+                if isinstance(res_item, dict):
+                    txt = str(res_item.get("text") or res_item.get("label") or "")
+                    sc = float(res_item.get("score") or res_item.get("confidence") or 0.0)
+                    return txt, sc
+                
+                return "", 0.0
+
+            # 对齐长度,避免越界
+            n = min(len(results) if isinstance(results, list) else 0, len(crop_list), len(crop_indices))
+            conf_th = self.ocr_conf_threshold
+
+            for k in range(n):
+                text_k, score_k = _parse_item(results[k])
+                if text_k and score_k >= conf_th:
+                    texts[crop_indices[k]] = text_k
+
+        except Exception as e:
+            logger.warning(f"二次OCR失败: {e}")
+
+        return texts
+

+ 177 - 0
ocr_tools/universal_doc_parser/models/adapters/wired_table/visualization.py

@@ -0,0 +1,177 @@
+"""
+可视化模块
+
+提供表格识别结果的可视化功能。
+"""
+from typing import List, Dict, Optional
+import cv2
+import numpy as np
+from loguru import logger
+
+
+class WiredTableVisualizer:
+    """可视化工具类"""
+    
+    @staticmethod
+    def visualize_table_lines(
+        table_image: np.ndarray,
+        hpred: np.ndarray,
+        vpred: np.ndarray,
+        output_path: str
+    ) -> np.ndarray:
+        """
+        可视化 UNet 检测到的表格线
+        
+        Args:
+            table_image: 原始图片
+            hpred: 横线mask(已缩放到原图大小)
+            vpred: 竖线mask(已缩放到原图大小)
+            output_path: 输出路径
+            
+        Returns:
+            可视化图像
+        """
+        vis_img = table_image.copy()
+        if len(vis_img.shape) == 2:
+            vis_img = cv2.cvtColor(vis_img, cv2.COLOR_GRAY2BGR)
+        
+        # 横线用红色,竖线用蓝色
+        vis_img[hpred > 128] = [0, 0, 255]  # 红色横线
+        vis_img[vpred > 128] = [255, 0, 0]  # 蓝色竖线
+        
+        cv2.imwrite(output_path, vis_img)
+        logger.info(f"表格线可视化: {output_path}")
+        
+        return vis_img
+    
+    @staticmethod
+    def visualize_connected_components(
+        hpred_up: np.ndarray,
+        vpred_up: np.ndarray,
+        bboxes: List[List[float]],
+        upscale: float,
+        output_path: str
+    ) -> None:
+        """
+        复刻连通域风格:红色网格线背景 + 绿色单元格框。
+        使用上采样尺度的 mask 与坐标,保证线条清晰。
+        
+        Args:
+            hpred_up: 横线预测mask(上采样后)
+            vpred_up: 竖线预测mask(上采样后)
+            bboxes: 单元格bbox列表
+            upscale: 上采样比例
+            output_path: 输出路径
+        """
+        h, w = hpred_up.shape[:2]
+
+        # 与连通域提取相同的预处理,以获得直观的网格线背景
+        _, h_bin = cv2.threshold(hpred_up, 127, 255, cv2.THRESH_BINARY)
+        _, v_bin = cv2.threshold(vpred_up, 127, 255, cv2.THRESH_BINARY)
+        kernel_h = cv2.getStructuringElement(cv2.MORPH_RECT, (5, 1))
+        kernel_v = cv2.getStructuringElement(cv2.MORPH_RECT, (1, 5))
+        h_bin = cv2.dilate(h_bin, kernel_h, iterations=1)
+        v_bin = cv2.dilate(v_bin, kernel_v, iterations=1)
+        grid_mask = cv2.bitwise_or(h_bin, v_bin)
+
+        vis = np.zeros((h, w, 3), dtype=np.uint8)
+        vis[grid_mask > 0] = [0, 0, 255]  # 红色线条
+
+        # 在上采样坐标系上绘制单元格框
+        for box in bboxes:
+            x1, y1, x2, y2 = [int(c * upscale) for c in box]
+            cv2.rectangle(vis, (x1, y1), (x2, y2), (0, 255, 0), 2)
+
+        cv2.imwrite(output_path, vis)
+        logger.info(f"连通域可视化: {output_path}")
+    
+    @staticmethod
+    def visualize_grid_structure(
+        table_image: np.ndarray,
+        cells: List[Dict],
+        output_path: str
+    ) -> None:
+        """
+        可视化表格逻辑结构 (row, col, span)
+        
+        Args:
+            table_image: 表格图像
+            cells: 单元格列表,包含 row, col, rowspan, colspan, bbox 等字段
+            output_path: 输出路径
+        """
+        vis = table_image.copy()
+        if len(vis.shape) == 2:
+            vis = cv2.cvtColor(vis, cv2.COLOR_GRAY2BGR)
+            
+        for cell in cells:
+            x1, y1, x2, y2 = [int(c) for c in cell["bbox"]]
+            
+            # 绘制边框
+            cv2.rectangle(vis, (x1, y1), (x2, y2), (0, 255, 0), 2)
+            
+            # 绘制逻辑坐标
+            info = f"R{cell['row']}C{cell['col']}"
+            if cell.get('rowspan', 1) > 1:
+                info += f" rs{cell['rowspan']}"
+            if cell.get('colspan', 1) > 1:
+                info += f" cs{cell['colspan']}"
+            
+            # 居中显示
+            font_scale = 0.5
+            thickness = 1
+            (tw, th), _ = cv2.getTextSize(info, cv2.FONT_HERSHEY_SIMPLEX, font_scale, thickness)
+            tx = x1 + (x2 - x1 - tw) // 2
+            ty = y1 + (y2 - y1 + th) // 2
+            
+            # 描边以增加可读性
+            cv2.putText(vis, info, (tx, ty), cv2.FONT_HERSHEY_SIMPLEX, font_scale, (0, 0, 0), thickness + 2)
+            cv2.putText(vis, info, (tx, ty), cv2.FONT_HERSHEY_SIMPLEX, font_scale, (0, 255, 255), thickness)
+            
+        cv2.imwrite(output_path, vis)
+        logger.info(f"表格结构可视化: {output_path}")
+    
+    @staticmethod
+    def visualize_with_text(
+        image: np.ndarray,
+        bboxes: List[List[float]],
+        texts: List[str],
+        output_path: Optional[str] = None
+    ) -> np.ndarray:
+        """
+        可视化单元格及其文本内容
+        
+        Args:
+            image: 原始图像
+            bboxes: 单元格坐标列表
+            texts: 文本列表
+            output_path: 输出路径(可选)
+            
+        Returns:
+            可视化图像
+        """
+        vis_img = image.copy()
+        if len(vis_img.shape) == 2:
+            vis_img = cv2.cvtColor(vis_img, cv2.COLOR_GRAY2BGR)
+        
+        for idx, (bbox, text) in enumerate(zip(bboxes, texts)):
+            x1, y1, x2, y2 = map(int, bbox)
+            
+            # 有文本用绿色,无文本用红色
+            color = (0, 255, 0) if text else (0, 0, 255)
+            cv2.rectangle(vis_img, (x1, y1), (x2, y2), color, 2)
+            
+            # 显示文本预览(最多10个字符)
+            preview = text[:10] + "..." if len(text) > 10 else text
+            if preview:
+                cv2.putText(
+                    vis_img, preview,
+                    (x1 + 2, y1 + 15),
+                    cv2.FONT_HERSHEY_SIMPLEX, 0.35, (255, 0, 0), 1
+                )
+        
+        if output_path:
+            cv2.imwrite(output_path, vis_img)
+            logger.info(f"文本填充可视化已保存: {output_path}")
+        
+        return vis_img
+

+ 2 - 1
ocr_utils/pdf_utils.py

@@ -145,7 +145,8 @@ class PDFUtils:
                     'img_pil': img_dict['img_pil'],
                     'img_pil': img_dict['img_pil'],
                     'scale': img_dict.get('scale', dpi / 72),
                     'scale': img_dict.get('scale', dpi / 72),
                     'source_path': str(document_path),
                     'source_path': str(document_path),
-                    'page_idx': idx  # 原始页码索引
+                    'page_idx': idx,  # 原始页码索引
+                    'page_name': f"{document_path.stem}_page_{idx + 1:03d}"
                 })
                 })
                 
                 
         elif document_path.suffix.lower() in ['.png', '.jpg', '.jpeg', '.bmp', '.tiff', '.tif']:
         elif document_path.suffix.lower() in ['.png', '.jpg', '.jpeg', '.bmp', '.tiff', '.tif']:

+ 2 - 0
table_line_generator/backend/api/editor.py

@@ -71,6 +71,8 @@ async def upload_files(
         
         
     except ValueError as e:
     except ValueError as e:
         logger.error(f"上传处理失败: {e}")
         logger.error(f"上传处理失败: {e}")
+        import traceback
+        logger.error(traceback.format_exc())
         raise HTTPException(status_code=400, detail=str(e))
         raise HTTPException(status_code=400, detail=str(e))
     except Exception as e:
     except Exception as e:
         logger.exception(f"上传处理异常: {e}")
         logger.exception(f"上传处理异常: {e}")

+ 2 - 2
table_line_generator/core/table_analyzer.py

@@ -232,7 +232,7 @@ class TableAnalyzer:
             'horizontal_lines': horizontal_lines,
             'horizontal_lines': horizontal_lines,
             'vertical_lines': vertical_lines,
             'vertical_lines': vertical_lines,
             'row_height': self.row_height,
             'row_height': self.row_height,
-            'col_widths': self.col_widths,
+            'col_widths': [int(round(c)) for c in self.col_widths],
             'table_bbox': self._get_table_bbox(),
             'table_bbox': self._get_table_bbox(),
             'total_rows': actual_rows,
             'total_rows': actual_rows,
             'total_cols': actual_cols,
             'total_cols': actual_cols,
@@ -302,7 +302,7 @@ class TableAnalyzer:
             'horizontal_lines': horizontal_lines,
             'horizontal_lines': horizontal_lines,
             'vertical_lines': vertical_lines,
             'vertical_lines': vertical_lines,
             'row_height': self.row_height,
             'row_height': self.row_height,
-            'col_widths': self.col_widths,
+            'col_widths': [int(round(c)) for c in self.col_widths],
             'table_bbox': self._get_table_bbox(),
             'table_bbox': self._get_table_bbox(),
             'mode': 'fixed',
             'mode': 'fixed',
             'modified_h_lines': [],
             'modified_h_lines': [],

Some files were not shown because too many files changed in this diff