SHA1
--- a/docs/ocr_tools/universal_doc_parser/unet表格识别连通域图-算法.md
+++ b/docs/ocr_tools/universal_doc_parser/unet表格识别连通域图-算法.md
@@ -0,0 +1,36 @@
 
				+### User Input
			
 
				+
			
 
				+如果unet可以正确识别出横竖线，能否输出横竖线的坐标，
			
 
				+然后根据横竖线的坐标焦点计算出连通域？
			
 
				+这种做饭与现有做法对比，哪种更稳定，为什么？
			
 
				+
			
 
				+### Planner Response
			
 
				+
			
 
				+这是一个非常深入且专业的问题。简短的回答是：**先提取线坐标，计算焦点并延长连接，再重新生成Mask进行连通域分析（即“先矢量化梳理，再栅格化分析”），是目前公认最稳定的有线表格恢复方案。**
			
 
				+
			
 
				+这也是 MinerU（以及许多商业级OCR引擎）采用的核心策略。
			
 
				+
			
 
				+我为您详细对比这两种方案的优劣和原理：
			
 
				+
			
 
				+### 方案对比
			
 
				+
			
 
				+| 特性 | **方案A：基于Mask的直接连通域 (现有做法)** | **方案B：基于矢量线的重构连通域 (推荐做法/MinerU做法)** |
			
 
				+| :--- | :--- | :--- |
			
 
				+| **核心逻辑** | UNet -> 概率图 -> 二值化 -> 也就是现在的做法 | UNet -> 概率图 -> **提取线段坐标** -> **数学延长线段使其相交** -> **重绘Mask** -> 连通域 |
			
 
				+| **处理交叉点** | 依赖形态学膨胀(Dilate)去"糊"住缺口。如果缺口大或错位，容易失败。 | 依赖几何计算。**强制将线段延伸直到与另一条线相交**，从数学上消灭缺口。 |
			
 
				+| **抗干扰性** | 弱。噪点、虚线、污渍会直接影响连通域形状。 | 强。只提取显著的长线段，过滤掉细碎噪点。 |
			
 
				+| **抗形变性** | 强。弯曲的表格也能很好适配（因为Mask是像素级的）。 | 较弱。如果表格弯曲严重，直线拟合误差会变大（需引入曲线拟合）。 |
			
 
				+| **复杂度** | 低。主要靠OpenCV基础算子。 | 高。涉及线段聚类、直线方程求解、交点计算等。 |
			
 
				+| **稳定性** | **中等**。容易出现“单元格漏检”（线没连上）或“单元格粘连”。 | **极高**。只要线被检出，单元格就一定闭合，且形状非常规整。 |
			
 
				+
			
 
				+### 为什么方案B（矢量化重构）更稳定？
			
 
				+
			
 
				+您遇到的 *"明明肉眼看线是清楚的，但连通域却漏了"*，通常是因为 **像素级的断连**。
			
 
				+*   **Mask方案的局限**：在交叉点（十字或T字路口），预测的像素可能只差 1-2 个像素没连上。虽然肉眼看着是挨着的，但在计算机算法里，它们就是断开的，导致这个“房间”漏风，连通域分析时背景灌入，单元格就消失了。
			
 
				+*   **矢量化方案的优势**：
			
 
				+    1.  **强制闭合**：它计算出横线 $y=100$ 和竖线 $x=50$。即使预测的竖线只画到了 $y=98$（差2像素），矢量算法会强制将竖线延长，直到撞上横线。
			
 
				+    2.  **结构规整**：重绘出的表格一定是横平竖直的，消除了手写或扫描带来的边缘抖动。
			
 
				+
			
 
				+### 结论与建议
			
 
				+
			
 
				+目前我们尝试的 `enhance_box_line` (膨胀) 依然属于 **方案A** 的修补措施。如果您的业务场景中包含大量打印不清、有轻微断线的表格，**方案B 是终极解决方案**。
			
--- a/ocr_tools/universal_doc_parser/config/bank_statement_wired_unet.yaml
+++ b/ocr_tools/universal_doc_parser/config/bank_statement_wired_unet.yaml
@@ -38,12 +38,16 @@ table_recognition_wired:
 
				   col_threshold: 15
			
 
				   ocr_conf_threshold: 0.5
			
 
				   cell_crop_margin: 2
			
 
				+  use_custom_postprocess: true  # 是否使用自定义后处理（默认启用）
			
 
				+
			
 
				+  # 是否启用倾斜矫正
			
 
				+  enable_deskew: true
			
 
				 
			
 
				   # Debug 可视化配置（与 MinerUWiredTableRecognizer.DebugOptions 对齐）
			
 
				   # 默认关闭。开启后将保存：表格线、连通域、逻辑网格结构、文本覆盖可视化。
			
 
				   debug_options:
			
 
				     enabled: true               # 是否开启调试可视化输出
			
 
				-    output_dir: "/Users/zhch158/workspace/repository.git/ocr_platform/ocr_tools/universal_doc_parser/output"             # 调试输出目录；null不输出
			
 
				+    output_dir: null             # 调试输出目录；null不输出
			
 
				     save_table_lines: true       # 保存表格线可视化（unet横线/竖线叠加）
			
 
				     save_connected_components: true  # 保存连通域提取的单元格图
			
 
				     save_grid_structure: true    # 保存逻辑网格结构（row/col/rowspan/colspan）
			
@@ -65,5 +69,3 @@ output:
 
				   save_enhanced_json: true
			
 
				   coordinate_precision: 2
			
 
				   normalize_numbers: true
			
 
				-
			
 
				-use_custom_postprocess: true  # 是否使用自定义后处理（默认启用）
			
--- a/ocr_tools/universal_doc_parser/core/coordinate_utils.py
+++ b/ocr_tools/universal_doc_parser/core/coordinate_utils.py
@@ -197,28 +197,48 @@ class CoordinateUtils:
 
				     # ==================== 图像裁剪 ====================
			
 
				     
			
 
				     @staticmethod
			
 
				-    def crop_region(image: np.ndarray, bbox: List[float]) -> np.ndarray:
			
 
				+    def crop_region(image: np.ndarray, bbox: List[float], padding: int = 0) -> np.ndarray:
			
 
				         """
			
 
				         裁剪图像区域
			
 
				-        
			
 
				+
			
 
				         Args:
			
 
				             image: 原始图像
			
 
				             bbox: 裁剪区域 [x1, y1, x2, y2]
			
 
				-            
			
 
				+            padding: 边缘padding（像素），可以为正数（扩展裁剪区域）或负数（收缩裁剪区域）
			
 
				+
			
 
				         Returns:
			
 
				             裁剪后的图像
			
 
				         """
			
 
				         if len(bbox) < 4:
			
 
				             return image
			
 
				-        
			
 
				-        x1, y1, x2, y2 = map(int, bbox[:4])
			
 
				+
			
 
				         h, w = image.shape[:2]
			
 
				-        
			
 
				-        x1 = max(0, min(x1, w))
			
 
				-        y1 = max(0, min(y1, h))
			
 
				-        x2 = max(x1, min(x2, w))
			
 
				-        y2 = max(y1, min(y2, h))
			
 
				-        
			
 
				+
			
 
				+        # 解析padding（支持单个值或四个值）
			
 
				+        if isinstance(padding, (int, float)):
			
 
				+            pad_left = pad_right = pad_top = pad_bottom = int(padding)
			
 
				+        else:
			
 
				+            # 假设是长度为4的元组/列表 [left, top, right, bottom]
			
 
				+            if len(padding) >= 4:
			
 
				+                pad_left, pad_top, pad_right, pad_bottom = [int(p) for p in padding[:4]]
			
 
				+            else:
			
 
				+                pad_left = pad_top = pad_right = pad_bottom = 0
			
 
				+
			
 
				+        x1 = max(0 - pad_left, int(bbox[0]) - pad_left)
			
 
				+        y1 = max(0 - pad_top, int(bbox[1]) - pad_top)
			
 
				+        x2 = min(w + pad_right, int(bbox[2]) + pad_right)
			
 
				+        y2 = min(h + pad_bottom, int(bbox[3]) + pad_bottom)
			
 
				+
			
 
				+        # 确保坐标有效
			
 
				+        x1 = max(0, x1)
			
 
				+        y1 = max(0, y1)
			
 
				+        x2 = min(w, x2)
			
 
				+        y2 = min(h, y2)
			
 
				+
			
 
				+        # 检查是否有效区域
			
 
				+        if x2 <= x1 or y2 <= y1:
			
 
				+            return image
			
 
				+
			
 
				         return image[y1:y2, x1:x2]
			
 
				     
			
 
				     @staticmethod
			
--- a/ocr_tools/universal_doc_parser/core/element_processors.py
+++ b/ocr_tools/universal_doc_parser/core/element_processors.py
@@ -183,7 +183,7 @@ class ElementProcessors:
 
				         image: np.ndarray,
			
 
				         bbox: List[float],
			
 
				         pre_matched_spans: Optional[List[Dict[str, Any]]] = None
			
 
				-    ) -> Tuple[np.ndarray, List[Dict[str, Any]], int, str]:
			
 
				+    ) -> Tuple[np.ndarray, List[Dict[str, Any]], int, str, int]:
			
 
				         """
			
 
				         表格OCR预处理（共享逻辑）
			
 
				         
			
@@ -195,10 +195,19 @@ class ElementProcessors:
 
				             pre_matched_spans: 预匹配的 OCR spans
			
 
				             
			
 
				         Returns:
			
 
				-            (cropped_table, ocr_boxes, table_angle, ocr_source)
			
 
				+            (cropped_table, ocr_boxes, table_angle, ocr_source, crop_padding)
			
 
				             其中 cropped_table 已经过方向检测和旋转处理
			
 
				+            crop_padding: 裁剪时添加的 padding 值
			
 
				         """
			
 
				-        cropped_table = CoordinateUtils.crop_region(image, bbox)
			
 
				+        # 计算表格区域尺寸，用于确定合适的padding
			
 
				+        table_width = bbox[2] - bbox[0]
			
 
				+        table_height = bbox[3] - bbox[1]
			
 
				+
			
 
				+        # 为倾斜图片添加padding，确保角落内容不被切掉
			
 
				+        # padding = 表格宽度的1% + 表格高度的1%，最小20像素
			
 
				+        crop_padding = max(20, int(min(table_width, table_height) * 0.01))
			
 
				+
			
 
				+        cropped_table = CoordinateUtils.crop_region(image, bbox, padding=crop_padding)
			
 
				         table_angle = 0
			
 
				         
			
 
				         # 1. 表格方向检测
			
@@ -214,16 +223,42 @@ class ElementProcessors:
 
				         ocr_boxes = []
			
 
				         ocr_source = "none"
			
 
				         
			
 
				+        # 计算裁剪后图像的起始坐标（考虑 padding）
			
 
				+        # 裁剪后图像的 (0, 0) 对应原图的 (bbox[0] - crop_padding, bbox[1] - crop_padding)
			
 
				+        cropped_offset_x = bbox[0] - crop_padding
			
 
				+        cropped_offset_y = bbox[1] - crop_padding
			
 
				+        
			
 
				         if pre_matched_spans and len(pre_matched_spans) > 0 and table_angle == 0:
			
 
				             # 使用整页 OCR 的结果
			
 
				             for idx, span in enumerate(pre_matched_spans):
			
 
				+                # 优先使用 poly 数据，如果没有才使用 bbox
			
 
				+                span_poly = span.get('poly', [])
			
 
				                 span_bbox = span.get('bbox', [])
			
 
				-                if span_bbox:
			
 
				+                
			
 
				+                if span_poly:
			
 
				+                    # 如果有 poly 数据，转换为相对于裁剪后图像的坐标（考虑 padding）
			
 
				+                    if isinstance(span_poly[0], (list, tuple)) and len(span_poly) >= 4:
			
 
				+                        # 转换为相对坐标（相对于裁剪后图像的 (0, 0)）
			
 
				+                        relative_poly = [
			
 
				+                            [float(p[0]) - cropped_offset_x, float(p[1]) - cropped_offset_y]
			
 
				+                            for p in span_poly[:4]
			
 
				+                        ]
			
 
				+                        formatted_box = CoordinateUtils.convert_ocr_to_matcher_format(
			
 
				+                            relative_poly,
			
 
				+                            span.get('text', ''),
			
 
				+                            span.get('confidence', 0.0),
			
 
				+                            idx,
			
 
				+                            table_bbox=None
			
 
				+                        )
			
 
				+                        if formatted_box:
			
 
				+                            ocr_boxes.append(formatted_box)
			
 
				+                elif span_bbox and len(span_bbox) >= 4:
			
 
				+                    # 兜底：使用 bbox 数据，转换为相对于裁剪后图像的坐标（考虑 padding）
			
 
				                     relative_bbox = [
			
 
				-                        span_bbox[0] - bbox[0],
			
 
				-                        span_bbox[1] - bbox[1],
			
 
				-                        span_bbox[2] - bbox[0],
			
 
				-                        span_bbox[3] - bbox[1]
			
 
				+                        span_bbox[0] - cropped_offset_x,
			
 
				+                        span_bbox[1] - cropped_offset_y,
			
 
				+                        span_bbox[2] - cropped_offset_x,
			
 
				+                        span_bbox[3] - cropped_offset_y
			
 
				                     ]
			
 
				                     formatted_box = CoordinateUtils.convert_ocr_to_matcher_format(
			
 
				                         relative_bbox,
			
@@ -245,7 +280,8 @@ class ElementProcessors:
 
				                 ocr_results = self.ocr_recognizer.recognize_text(cropped_table)
			
 
				                 if ocr_results:
			
 
				                     for idx, item in enumerate(ocr_results):
			
 
				-                        ocr_poly = item.get('bbox', [])
			
 
				+                        # 优先使用 poly，没有才用 bbox
			
 
				+                        ocr_poly = item.get('poly', item.get('bbox', []))
			
 
				                         if ocr_poly:
			
 
				                             formatted_box = CoordinateUtils.convert_ocr_to_matcher_format(
			
 
				                                 ocr_poly, 
			
@@ -272,16 +308,18 @@ class ElementProcessors:
 
				                 ocr_source = "cropped_ocr"
			
 
				                 logger.info(f"📊 OCR detected {len(ocr_boxes)} text boxes in table (cropped)")
			
 
				             except Exception as e:
			
 
				-                logger.warning(f"Table OCR detection failed: {e}")
			
 
				+                logger.warning(f"Table OCR failed: {e}")
			
 
				         
			
 
				-        return cropped_table, ocr_boxes, table_angle, ocr_source
			
 
				+        return cropped_table, ocr_boxes, table_angle, ocr_source, crop_padding
			
 
				     
			
 
				     def process_table_element_wired(
			
 
				         self,
			
 
				         image: np.ndarray,
			
 
				         layout_item: Dict[str, Any],
			
 
				         scale: float,
			
 
				-        pre_matched_spans: Optional[List[Dict[str, Any]]] = None
			
 
				+        pre_matched_spans: Optional[List[Dict[str, Any]]] = None,
			
 
				+        output_dir: Optional[str] = None,
			
 
				+        basename: Optional[str] = None
			
 
				     ) -> Dict[str, Any]:
			
 
				         """
			
 
				         使用 UNet 有线表格识别处理表格元素
			
@@ -302,8 +340,8 @@ class ElementProcessors:
 
				         """
			
 
				         bbox = layout_item.get('bbox', [0, 0, 0, 0])
			
 
				         
			
 
				-        # OCR 预处理（返回已旋转的表格图片 + OCR 框）
			
 
				-        cropped_table, ocr_boxes, table_angle, ocr_source = \
			
 
				+        # OCR 预处理（返回已旋转的表格图片 + OCR 框 + padding）
			
 
				+        cropped_table, ocr_boxes, table_angle, ocr_source, crop_padding = \
			
 
				             self._prepare_table_ocr(image, bbox, pre_matched_spans)
			
 
				         
			
 
				         # 获取裁剪后表格图片的尺寸
			
@@ -318,10 +356,19 @@ class ElementProcessors:
 
				             if not self.wired_table_recognizer:
			
 
				                 raise RuntimeError("Wired table recognizer not available")
			
 
				             
			
 
				+            # 构造调试选项覆盖
			
 
				+            debug_opts_override = {}
			
 
				+            if output_dir:
			
 
				+                debug_opts_override['output_dir'] = output_dir
			
 
				+            if basename:
			
 
				+                # 使用完整 basename 作为前缀 (如 "filename_page_001")
			
 
				+                debug_opts_override['prefix'] = basename
			
 
				+
			
 
				             wired_res = self.wired_table_recognizer.recognize(
			
 
				                 table_image=cropped_table,
			
 
				                 # ocr_boxes=ocr_boxes_for_wired,
			
 
				                 ocr_boxes=ocr_boxes,
			
 
				+                debug_options=debug_opts_override
			
 
				             )
			
 
				             
			
 
				             if not (wired_res.get('html') or wired_res.get('cells')):
			
@@ -337,26 +384,29 @@ class ElementProcessors:
 
				             return self._create_empty_table_result(layout_item, bbox, table_angle, ocr_source)
			
 
				         
			
 
				         # 坐标转换：将旋转后的坐标转换回原图坐标
			
 
				+        # 计算正确的偏移量：裁剪后图像的 (0, 0) 对应原图的 (bbox[0] - crop_padding, bbox[1] - crop_padding)
			
 
				+        cropped_offset_bbox = [bbox[0] - crop_padding, bbox[1] - crop_padding, bbox[2] + crop_padding, bbox[3] + crop_padding]
			
 
				+        
			
 
				         if table_angle != 0 and MERGER_AVAILABLE:
			
 
				             cells, enhanced_html = CoordinateUtils.inverse_rotate_table_coords(
			
 
				                 cells=cells,
			
 
				                 html=enhanced_html,
			
 
				                 rotation_angle=table_angle,
			
 
				                 orig_table_size=orig_table_size,
			
 
				-                table_bbox=bbox
			
 
				+                table_bbox=cropped_offset_bbox
			
 
				             )
			
 
				             ocr_boxes = CoordinateUtils.inverse_rotate_ocr_boxes(
			
 
				                 ocr_boxes=ocr_boxes,
			
 
				                 rotation_angle=table_angle,
			
 
				                 orig_table_size=orig_table_size,
			
 
				-                table_bbox=bbox
			
 
				+                table_bbox=cropped_offset_bbox
			
 
				             )
			
 
				             logger.info(f"📐 Wired table coordinates transformed back to original image")
			
 
				         else:
			
 
				-            # 没有旋转，只需要加上表格偏移量
			
 
				-            cells = CoordinateUtils.add_table_offset_to_cells(cells, bbox)
			
 
				-            enhanced_html = CoordinateUtils.add_table_offset_to_html(enhanced_html, bbox)
			
 
				-            ocr_boxes = CoordinateUtils.add_table_offset_to_ocr_boxes(ocr_boxes, bbox)
			
 
				+            # 没有旋转，使用正确的偏移量（考虑 padding）
			
 
				+            cells = CoordinateUtils.add_table_offset_to_cells(cells, cropped_offset_bbox)
			
 
				+            enhanced_html = CoordinateUtils.add_table_offset_to_html(enhanced_html, cropped_offset_bbox)
			
 
				+            ocr_boxes = CoordinateUtils.add_table_offset_to_ocr_boxes(ocr_boxes, cropped_offset_bbox)
			
 
				         
			
 
				         return {
			
 
				             'type': 'table',
			
@@ -401,8 +451,8 @@ class ElementProcessors:
 
				         """
			
 
				         bbox = layout_item.get('bbox', [0, 0, 0, 0])
			
 
				         
			
 
				-        # OCR 预处理（返回已旋转的表格图片 + OCR 框）
			
 
				-        cropped_table, ocr_boxes, table_angle, ocr_source = \
			
 
				+        # OCR 预处理（返回已旋转的表格图片 + OCR 框 + padding）
			
 
				+        cropped_table, ocr_boxes, table_angle, ocr_source, crop_padding = \
			
 
				             self._prepare_table_ocr(image, bbox, pre_matched_spans)
			
 
				         
			
 
				         # 获取裁剪后表格图片的尺寸
			
@@ -429,37 +479,42 @@ class ElementProcessors:
 
				         
			
 
				         if table_html and ocr_boxes and self.table_cell_matcher:
			
 
				             try:
			
 
				+                # table_bbox 参数是相对于裁剪后图像的，OCR 框已经是相对于裁剪后图像的
			
 
				+                # 使用裁剪后图像的实际尺寸
			
 
				                 enhanced_html, cells, _, skew_angle = self.table_cell_matcher.enhance_table_html_with_bbox(
			
 
				                     html=table_html,
			
 
				                     paddle_text_boxes=ocr_boxes,
			
 
				                     start_pointer=0,
			
 
				-                    table_bbox=[0, 0, bbox[2] - bbox[0], bbox[3] - bbox[1]]
			
 
				+                    table_bbox=[0, 0, orig_table_w, orig_table_h]
			
 
				                 )
			
 
				                 logger.info(f"📊 Matched {len(cells)} cells with coordinates (skew: {skew_angle:.2f}°)")
			
 
				             except Exception as e:
			
 
				                 logger.warning(f"Cell coordinate matching failed: {e}")
			
 
				         
			
 
				         # 坐标转换：将旋转后的坐标转换回原图坐标
			
 
				+        # 计算正确的偏移量：裁剪后图像的 (0, 0) 对应原图的 (bbox[0] - crop_padding, bbox[1] - crop_padding)
			
 
				+        cropped_offset_bbox = [bbox[0] - crop_padding, bbox[1] - crop_padding, bbox[2] + crop_padding, bbox[3] + crop_padding]
			
 
				+        
			
 
				         if table_angle != 0 and MERGER_AVAILABLE:
			
 
				             cells, enhanced_html = CoordinateUtils.inverse_rotate_table_coords(
			
 
				                 cells=cells,
			
 
				                 html=enhanced_html,
			
 
				                 rotation_angle=table_angle,
			
 
				                 orig_table_size=orig_table_size,
			
 
				-                table_bbox=bbox
			
 
				+                table_bbox=cropped_offset_bbox
			
 
				             )
			
 
				             ocr_boxes = CoordinateUtils.inverse_rotate_ocr_boxes(
			
 
				                 ocr_boxes=ocr_boxes,
			
 
				                 rotation_angle=table_angle,
			
 
				                 orig_table_size=orig_table_size,
			
 
				-                table_bbox=bbox
			
 
				+                table_bbox=cropped_offset_bbox
			
 
				             )
			
 
				             logger.info(f"📐 VLM table coordinates transformed back to original image")
			
 
				         else:
			
 
				-            # 没有旋转，只需要加上表格偏移量
			
 
				-            cells = CoordinateUtils.add_table_offset_to_cells(cells, bbox)
			
 
				-            enhanced_html = CoordinateUtils.add_table_offset_to_html(enhanced_html, bbox)
			
 
				-            ocr_boxes = CoordinateUtils.add_table_offset_to_ocr_boxes(ocr_boxes, bbox)
			
 
				+            # 没有旋转，使用正确的偏移量（考虑 padding）
			
 
				+            cells = CoordinateUtils.add_table_offset_to_cells(cells, cropped_offset_bbox)
			
 
				+            enhanced_html = CoordinateUtils.add_table_offset_to_html(enhanced_html, cropped_offset_bbox)
			
 
				+            ocr_boxes = CoordinateUtils.add_table_offset_to_ocr_boxes(ocr_boxes, cropped_offset_bbox)
			
 
				         
			
 
				         return {
			
 
				             'type': 'table',
			
--- a/ocr_tools/universal_doc_parser/core/layout_utils.py
+++ b/ocr_tools/universal_doc_parser/core/layout_utils.py
@@ -489,14 +489,12 @@ class SpanMatcher:
 
				                 continue
			
 
				             
			
 
				             bbox1 = span1.get('bbox', [0, 0, 0, 0])
			
 
				-            bbox1 = CoordinateUtils.poly_to_bbox(bbox1)
			
 
				             
			
 
				             for j in range(i + 1, len(spans)):
			
 
				                 if j in removed:
			
 
				                     continue
			
 
				                 
			
 
				                 bbox2 = spans[j].get('bbox', [0, 0, 0, 0])
			
 
				-                bbox2 = CoordinateUtils.poly_to_bbox(bbox2)
			
 
				                 
			
 
				                 iou = CoordinateUtils.calculate_iou(bbox1, bbox2)
			
 
				                 
			
--- a/ocr_tools/universal_doc_parser/core/pipeline_manager_v2.py
+++ b/ocr_tools/universal_doc_parser/core/pipeline_manager_v2.py
@@ -57,6 +57,7 @@ except ImportError:
 
				     TableCellMatcher = None
			
 
				     TextMatcher = None
			
 
				 
			
 
				+from ocr_utils.bbox_utils import BBoxExtractor
			
 
				 
			
 
				 class EnhancedDocPipeline:
			
 
				     """增强版文档处理流水线"""
			
@@ -184,7 +185,8 @@ class EnhancedDocPipeline:
 
				     def process_document(
			
 
				         self, 
			
 
				         document_path: str,
			
 
				-        page_range: Optional[str] = None
			
 
				+        page_range: Optional[str] = None,
			
 
				+        output_dir: Optional[str] = None
			
 
				     ) -> Dict[str, Any]:
			
 
				         """
			
 
				         处理文档主流程
			
@@ -232,7 +234,8 @@ class EnhancedDocPipeline:
 
				                     page_idx=page_idx,
			
 
				                     pdf_type=pdf_type,
			
 
				                     pdf_doc=pdf_doc,
			
 
				-                    page_name=page_name
			
 
				+                    page_name=page_name,
			
 
				+                    output_dir=output_dir,
			
 
				                 )
			
 
				                 results['pages'].append(page_result)
			
 
				             
			
@@ -251,13 +254,14 @@ class EnhancedDocPipeline:
 
				             raise
			
 
				     
			
 
				     def _process_single_page(
			
 
				-        self,
			
 
				-        image_dict: Dict[str, Any],
			
 
				-        page_idx: int,
			
 
				-        pdf_type: str,
			
 
				-        pdf_doc: Optional[Any] = None,
			
 
				-        page_name: Optional[str] = None
			
 
				-    ) -> Dict[str, Any]:
			
 
				+            self,
			
 
				+            image_dict: Dict[str, Any],
			
 
				+            page_idx: int,
			
 
				+            pdf_type: str,
			
 
				+            pdf_doc: Optional[Any] = None,
			
 
				+            page_name: Optional[str] = None,
			
 
				+            output_dir: Optional[str] = None
			
 
				+        ) -> Dict[str, Any]:
			
 
				         """
			
 
				         处理单页文档
			
 
				         
			
@@ -368,7 +372,9 @@ class EnhancedDocPipeline:
 
				             page_idx=page_idx,
			
 
				             scale=scale,
			
 
				             matched_spans=matched_spans,
			
 
				-            layout_results=layout_results
			
 
				+            layout_results=layout_results,
			
 
				+            output_dir=output_dir,
			
 
				+            basename=page_name
			
 
				         )
			
 
				         
			
 
				         # 7. 按阅读顺序排序
			
@@ -508,7 +514,9 @@ class EnhancedDocPipeline:
 
				         page_idx: int,
			
 
				         scale: float,
			
 
				         matched_spans: Optional[Dict[int, List[Dict[str, Any]]]] = None,
			
 
				-        layout_results: Optional[List[Dict[str, Any]]] = None
			
 
				+        layout_results: Optional[List[Dict[str, Any]]] = None,
			
 
				+        output_dir: Optional[str] = None,
			
 
				+        basename: Optional[str] = None,
			
 
				     ) -> tuple:
			
 
				         """
			
 
				         处理所有分类后的元素
			
@@ -596,7 +604,8 @@ class EnhancedDocPipeline:
 
				                     # 有线表格路径：UNet 识别
			
 
				                     logger.info(f"🔷 Using wired UNet table recognition (configured)")
			
 
				                     element = self.element_processors.process_table_element_wired(
			
 
				-                        detection_image, item, scale, pre_matched_spans=spans
			
 
				+                        detection_image, item, scale, pre_matched_spans=spans,
			
 
				+                        output_dir=output_dir, basename=basename
			
 
				                     )
			
 
				                     
			
 
				                     # 如果有线识别失败（返回空 HTML），fallback 到 VLM
			
--- a/ocr_tools/universal_doc_parser/core/pipeline_manager_v2_streaming.py
+++ b/ocr_tools/universal_doc_parser/core/pipeline_manager_v2_streaming.py
@@ -157,7 +157,8 @@ class StreamingDocPipeline(EnhancedDocPipeline):
 
				                     page_idx=page_idx,
			
 
				                     pdf_type=pdf_type,
			
 
				                     pdf_doc=pdf_doc,
			
 
				-                    page_name=page_name
			
 
				+                    page_name=page_name,
			
 
				+                    output_dir=self.output_dir
			
 
				                 )
			
 
				                 
			
 
				                 # 立即保存该页结果（使用 OutputFormatterV2 的方法，保持输出一致）
			
--- a/ocr_tools/universal_doc_parser/main_v2.py
+++ b/ocr_tools/universal_doc_parser/main_v2.py
@@ -402,25 +402,31 @@ if __name__ == "__main__":
 
				             # "input": "/Users/zhch158/workspace/data/流水分析/康强_北京农村商业银行.pdf",
			
 
				             # "output_dir": "./output/康强_北京农村商业银行_bank_statement_v2",
			
 
				 
			
 
				-            # "input": "/Users/zhch158/workspace/data/流水分析/2023年度报告母公司/mineru_vllm_results/2023年度报告母公司/2023年度报告母公司_page_003.png",
			
 
				-            # "output_dir": "./output/2023年度报告母公司_bank_statement_v2",
			
 
				+            "input": "/Users/zhch158/workspace/repository.git/ocr_platform/ocr_tools/universal_doc_parser/tests/A用户_单元格扫描流水_page_002.png",
			
 
				+            "output_dir": "./output/A用户_单元格扫描流水_bank_statement_wired_unet",
			
 
				             
			
 
				             # "input": "/Users/zhch158/workspace/data/流水分析/B用户_扫描流水.pdf",
			
 
				             # "output_dir": "/Users/zhch158/workspace/data/流水分析/B用户_扫描流水/bank_statement_yusys_v2",
			
 
				 
			
 
				-            # "input": "/Users/zhch158/workspace/repository.git/ocr_platform/ocr_tools/universal_doc_parser/tests/2023年度报告母公司_page_006_270.png",
			
 
				+            # "input": "/Users/zhch158/workspace/repository.git/ocr_platform/ocr_tools/universal_doc_parser/tests/2023年度报告母公司_page_005.png",
			
 
				+            # "input": "/Users/zhch158/workspace/repository.git/ocr_platform/ocr_tools/universal_doc_parser/tests/2023年度报告母公司_page_003_270.png",
			
 
				+            # "input": "/Users/zhch158/workspace/repository.git/ocr_platform/ocr_tools/universal_doc_parser/tests/2023年度报告母公司_page_003_270_skew(-0.4).png",
			
 
				             # "output_dir": "./output/2023年度报告母公司/bank_statement_wired_unet",
			
 
				+
			
 
				             # "input": "/Users/zhch158/workspace/data/流水分析/2023年度报告母公司.pdf",
			
 
				+            # "output_dir": "/Users/zhch158/workspace/data/流水分析/2023年度报告母公司/bank_statement_wired_unet",
			
 
				             # "output_dir": "/Users/zhch158/workspace/data/流水分析/2023年度报告母公司/bank_statement_yusys_v2",
			
 
				 
			
 
				-            "input": "/Users/zhch158/workspace/repository.git/ocr_platform/ocr_tools/universal_doc_parser/tests/600916_中国黄金_2022年报_page_096.png",
			
 
				-            "output_dir": "./output/600916_中国黄金_2022年报/bank_statement_wired_unet",
			
 
				+            # "input": "/Users/zhch158/workspace/repository.git/ocr_platform/ocr_tools/universal_doc_parser/tests/600916_中国黄金_2022年报_page_096.png",
			
 
				+            # "output_dir": "./output/600916_中国黄金_2022年报/bank_statement_wired_unet",
			
 
				+            # "input": "/Users/zhch158/workspace/data/流水分析/600916_中国黄金_2022年报.pdf",
			
 
				+            # "output_dir": "./output/600916_中国黄金_2022年报/bank_statement_wired_unet",
			
 
				 
			
 
				             # "input": "/Users/zhch158/workspace/data/流水分析/施博深.pdf",
			
 
				             # "output_dir": "/Users/zhch158/workspace/data/流水分析/施博深/bank_statement_yusys_v2",
			
 
				 
			
 
				-            # "input": "/Users/zhch158/workspace/data/流水分析/施博深.wiredtable/施博深_page_001.png",
			
 
				-            # "output_dir": "./output/施博深_page_001_bank_statement_wired_unet",
			
 
				+            # "input": "/Users/zhch158/workspace/data/流水分析/施博深.wiredtable/施博深_page_020.png",
			
 
				+            # "output_dir": "./output/施博深/bank_statement_wired_unet",
			
 
				 
			
 
				             # "input": "/Users/zhch158/workspace/data/流水分析/施博深.wiredtable",
			
 
				             # "output_dir": "/Users/zhch158/workspace/data/流水分析/施博深/bank_statement_wired_unet",
			
@@ -436,6 +442,7 @@ if __name__ == "__main__":
 
				             # 页面范围（可选）
			
 
				             # "pages": "6",  # 只处理前1页
			
 
				             # "pages": "1-3,5,7-10",  # 处理指定页面
			
 
				+            # "pages": "83-109",  # 处理指定页面
			
 
				 
			
 
				             "streaming": True,
			
 
				 
			
--- a/ocr_tools/universal_doc_parser/models/adapters/docling_layout_adapter.py
+++ b/ocr_tools/universal_doc_parser/models/adapters/docling_layout_adapter.py
@@ -15,6 +15,7 @@
 
				 import cv2
			
 
				 import numpy as np
			
 
				 import threading
			
 
				+import os
			
 
				 from pathlib import Path
			
 
				 from typing import Dict, List, Union, Any, Optional
			
 
				 from PIL import Image
			
@@ -127,9 +128,96 @@ class DoclingLayoutDetector(BaseLayoutDetector):
 
				                 self._model_path = str(model_path)
			
 
				                 print(f"📂 Loading model from local path: {self._model_path}")
			
 
				             else:
			
 
				-                # 从 HuggingFace 下载
			
 
				-                print(f"📥 Downloading model from HuggingFace: {model_dir}")
			
 
				-                self._model_path = snapshot_download(repo_id=model_dir)
			
 
				+                # HuggingFace 仓库 ID，先检查本地缓存
			
 
				+                # 获取 HuggingFace 缓存目录
			
 
				+                hf_home = os.environ.get('HF_HOME', None)
			
 
				+                if hf_home:
			
 
				+                    cache_dir = Path(hf_home) / "hub"
			
 
				+                else:
			
 
				+                    cache_dir = Path.home() / ".cache" / "huggingface" / "hub"
			
 
				+                
			
 
				+                # 将模型 ID 转换为缓存目录格式
			
 
				+                # 例如: ds4sd/docling-layout-old -> models--ds4sd--docling-layout-old
			
 
				+                repo_id_escaped = model_dir.replace("/", "--")
			
 
				+                model_cache_dir = cache_dir / f"models--{repo_id_escaped}"
			
 
				+                
			
 
				+                # 先尝试从本地缓存加载（避免不必要的网络请求）
			
 
				+                local_model_path = None
			
 
				+                if model_cache_dir.exists() and model_cache_dir.is_dir():
			
 
				+                    snapshots_dir = model_cache_dir / "snapshots"
			
 
				+                    if snapshots_dir.exists():
			
 
				+                        # 获取所有 snapshot 目录，按修改时间排序
			
 
				+                        snapshots = sorted(
			
 
				+                            [d for d in snapshots_dir.iterdir() if d.is_dir()],
			
 
				+                            key=lambda x: x.stat().st_mtime,
			
 
				+                            reverse=True
			
 
				+                        )
			
 
				+                        if snapshots:
			
 
				+                            # 检查最新的 snapshot 是否完整
			
 
				+                            latest_snapshot = snapshots[0]
			
 
				+                            processor_config = latest_snapshot / "preprocessor_config.json"
			
 
				+                            model_config = latest_snapshot / "config.json"
			
 
				+                            safetensors_file = latest_snapshot / "model.safetensors"
			
 
				+                            
			
 
				+                            if processor_config.exists() and model_config.exists() and safetensors_file.exists():
			
 
				+                                local_model_path = latest_snapshot
			
 
				+                
			
 
				+                if local_model_path:
			
 
				+                    # 本地缓存存在且完整，直接使用（不进行网络请求）
			
 
				+                    self._model_path = str(local_model_path)
			
 
				+                    print(f"📂 Using local cached model: {self._model_path}")
			
 
				+                    print(f"   (Skipping network check - model already cached)")
			
 
				+                else:
			
 
				+                    # 本地缓存不存在或不完整，尝试从 HuggingFace 下载或更新
			
 
				+                    print(f"📥 Model not found in local cache, downloading from HuggingFace: {model_dir}")
			
 
				+                    try:
			
 
				+                        # snapshot_download 会自动检查本地缓存，如果存在且是最新的，不会重新下载
			
 
				+                        # 只有在需要更新或首次下载时才会下载
			
 
				+                        self._model_path = snapshot_download(repo_id=model_dir)
			
 
				+                        print(f"✅ Model downloaded/updated: {self._model_path}")
			
 
				+                    except Exception as e:
			
 
				+                        # HuggingFace 访问失败，再次尝试查找本地缓存（可能之前检查时遗漏）
			
 
				+                        print(f"⚠️ Failed to download from HuggingFace: {e}")
			
 
				+                        print(f"🔍 Trying to find local cached model again...")
			
 
				+                        
			
 
				+                        if model_cache_dir.exists() and model_cache_dir.is_dir():
			
 
				+                            snapshots_dir = model_cache_dir / "snapshots"
			
 
				+                            if snapshots_dir.exists():
			
 
				+                                snapshots = sorted(
			
 
				+                                    [d for d in snapshots_dir.iterdir() if d.is_dir()],
			
 
				+                                    key=lambda x: x.stat().st_mtime,
			
 
				+                                    reverse=True
			
 
				+                                )
			
 
				+                                if snapshots:
			
 
				+                                    local_model_path = snapshots[0]
			
 
				+                                    processor_config = local_model_path / "preprocessor_config.json"
			
 
				+                                    model_config = local_model_path / "config.json"
			
 
				+                                    safetensors_file = local_model_path / "model.safetensors"
			
 
				+                                    
			
 
				+                                    if processor_config.exists() and model_config.exists() and safetensors_file.exists():
			
 
				+                                        self._model_path = str(local_model_path)
			
 
				+                                        print(f"✅ Found local cached model: {self._model_path}")
			
 
				+                                    else:
			
 
				+                                        raise FileNotFoundError(
			
 
				+                                            f"Local cached model found but missing required files in {local_model_path}. "
			
 
				+                                            f"Required: preprocessor_config.json, config.json, model.safetensors"
			
 
				+                                        )
			
 
				+                                else:
			
 
				+                                    raise FileNotFoundError(
			
 
				+                                        f"No snapshots found in {snapshots_dir}. "
			
 
				+                                        f"Please download the model first or check your network connection."
			
 
				+                                    )
			
 
				+                            else:
			
 
				+                                raise FileNotFoundError(
			
 
				+                                    f"Cache directory exists but no snapshots found: {model_cache_dir}. "
			
 
				+                                    f"Please download the model first or check your network connection."
			
 
				+                                )
			
 
				+                        else:
			
 
				+                            raise FileNotFoundError(
			
 
				+                                f"Model not found in local cache: {model_cache_dir}. "
			
 
				+                                f"Please download the model first or check your network connection. "
			
 
				+                                f"Original error: {e}"
			
 
				+                            )
			
 
				             
			
 
				             # 检查必要文件
			
 
				             processor_config = Path(self._model_path) / "preprocessor_config.json"
			
--- a/ocr_tools/universal_doc_parser/models/adapters/mineru_adapter.py
+++ b/ocr_tools/universal_doc_parser/models/adapters/mineru_adapter.py
@@ -7,11 +7,17 @@ from PIL import Image
 
				 from loguru import logger
			
 
				 
			
 
				 # 添加MinerU路径
			
 
				-mineru_path = Path(__file__).parents[4] / "mineru"
			
 
				-if str(mineru_path) not in sys.path:
			
 
				-    sys.path.insert(0, str(mineru_path))
			
 
				+# mineru_path = Path(__file__).parents[4] / "mineru"
			
 
				+# if str(mineru_path) not in sys.path:
			
 
				+#     sys.path.insert(0, str(mineru_path))
			
 
				+
			
 
				+# 添加 ocr_platform 根目录到 Python 路径（用于导入 ocr_utils）
			
 
				+ocr_platform_root = Path(__file__).parents[4]  # adapters -> models -> universal_doc_parser -> ocr_tools -> ocr_platform 
			
 
				+if str(ocr_platform_root) not in sys.path:
			
 
				+    sys.path.insert(0, str(ocr_platform_root))
			
 
				 
			
 
				 from .base import BasePreprocessor, BaseLayoutDetector, BaseVLRecognizer, BaseOCRRecognizer
			
 
				+from core.coordinate_utils import CoordinateUtils
			
 
				 
			
 
				 # 导入MinerU组件
			
 
				 try:
			
@@ -490,7 +496,8 @@ class MinerUOCRRecognizer(BaseOCRRecognizer):
 
				                 for item in ocr_results[0]:
			
 
				                     if len(item) >= 2 and len(item[1]) >= 2:
			
 
				                         formatted_results.append({
			
 
				-                            'bbox': item[0],  # 坐标
			
 
				+                            'bbox': CoordinateUtils.poly_to_bbox(item[0]),  # 坐标
			
 
				+                            'poly': item[0],  # 多边形坐标
			
 
				                             'text': item[1][0],  # 识别文本
			
 
				                             'confidence': item[1][1]  # 置信度
			
 
				                         })
			
--- a/ocr_tools/universal_doc_parser/models/adapters/mineru_wired_table.py
+++ b/ocr_tools/universal_doc_parser/models/adapters/mineru_wired_table.py
--- a/ocr_tools/universal_doc_parser/models/adapters/wired_table/__init__.py
+++ b/ocr_tools/universal_doc_parser/models/adapters/wired_table/__init__.py
@@ -0,0 +1,32 @@
 
				+"""
			
 
				+有线表格识别子模块
			
 
				+
			
 
				+提供表格识别的各个功能模块：
			
 
				+- 倾斜检测和矫正
			
 
				+- 网格结构恢复
			
 
				+- 文本填充
			
 
				+- HTML生成
			
 
				+- 可视化
			
 
				+- OCR格式转换
			
 
				+- 调试工具
			
 
				+"""
			
 
				+
			
 
				+from .debug_utils import WiredTableDebugOptions, WiredTableDebugUtils
			
 
				+from .ocr_formatter import OCRFormatter
			
 
				+from .skew_detection import SkewDetector
			
 
				+from .grid_recovery import GridRecovery
			
 
				+from .text_filling import TextFiller
			
 
				+from .html_generator import WiredTableHTMLGenerator
			
 
				+from .visualization import WiredTableVisualizer
			
 
				+
			
 
				+__all__ = [
			
 
				+    'WiredTableDebugOptions',
			
 
				+    'WiredTableDebugUtils',
			
 
				+    'OCRFormatter',
			
 
				+    'SkewDetector',
			
 
				+    'GridRecovery',
			
 
				+    'TextFiller',
			
 
				+    'WiredTableHTMLGenerator',
			
 
				+    'WiredTableVisualizer',
			
 
				+]
			
 
				+
			
--- a/ocr_tools/universal_doc_parser/models/adapters/wired_table/debug_utils.py
+++ b/ocr_tools/universal_doc_parser/models/adapters/wired_table/debug_utils.py
@@ -0,0 +1,107 @@
 
				+"""
			
 
				+有线表格识别调试工具模块
			
 
				+
			
 
				+提供调试选项管理和路径生成功能。
			
 
				+"""
			
 
				+from typing import Dict, Any, Optional
			
 
				+from dataclasses import dataclass
			
 
				+
			
 
				+
			
 
				+@dataclass
			
 
				+class WiredTableDebugOptions:
			
 
				+    """调试选项数据类"""
			
 
				+    enabled: bool = False
			
 
				+    output_dir: Optional[str] = None
			
 
				+    save_table_lines: bool = False
			
 
				+    save_connected_components: bool = False
			
 
				+    save_grid_structure: bool = False
			
 
				+    save_text_overlay: bool = False
			
 
				+    image_format: str = "png"
			
 
				+    prefix: str = ""
			
 
				+
			
 
				+
			
 
				+class WiredTableDebugUtils:
			
 
				+    """调试工具类"""
			
 
				+    
			
 
				+    @staticmethod
			
 
				+    def merge_debug_options(
			
 
				+        config: Dict[str, Any],
			
 
				+        override: Optional[Dict[str, Any]] = None
			
 
				+    ) -> WiredTableDebugOptions:
			
 
				+        """
			
 
				+        合并调试选项
			
 
				+        
			
 
				+        Args:
			
 
				+            config: 配置字典
			
 
				+            override: 覆盖选项字典
			
 
				+            
			
 
				+        Returns:
			
 
				+            合并后的调试选项
			
 
				+        """
			
 
				+        debug_config = config.get("debug_options", {})
			
 
				+        if not isinstance(debug_config, dict):
			
 
				+            # 兼容旧配置：如果不是字典，尝试作为 boolean 或 fall back
			
 
				+            debug_config = {}
			
 
				+
			
 
				+        opts = WiredTableDebugOptions(
			
 
				+            enabled=bool(debug_config.get("enabled", False)),
			
 
				+            output_dir=debug_config.get("output_dir"),
			
 
				+            save_table_lines=bool(debug_config.get("save_table_lines", False)),
			
 
				+            save_connected_components=bool(debug_config.get("save_connected_components", False)),
			
 
				+            save_grid_structure=bool(debug_config.get("save_grid_structure", False)),
			
 
				+            save_text_overlay=bool(debug_config.get("save_text_overlay", False)),
			
 
				+            image_format=str(debug_config.get("image_format", "png")),
			
 
				+            prefix=str(debug_config.get("prefix", "")),
			
 
				+        )
			
 
				+        
			
 
				+        if override and isinstance(override, dict):
			
 
				+            # 覆盖层允许临时启用或指定目录
			
 
				+            for k, v in override.items():
			
 
				+                if hasattr(opts, k):
			
 
				+                    setattr(opts, k, v)
			
 
				+        
			
 
				+        return opts
			
 
				+    
			
 
				+    @staticmethod
			
 
				+    def debug_is_on(
			
 
				+        flag: str,
			
 
				+        opts: Optional[WiredTableDebugOptions] = None
			
 
				+    ) -> bool:
			
 
				+        """
			
 
				+        检查调试标志是否启用
			
 
				+        
			
 
				+        Args:
			
 
				+            flag: 调试标志名称
			
 
				+            opts: 调试选项（可选）
			
 
				+            
			
 
				+        Returns:
			
 
				+            是否启用
			
 
				+        """
			
 
				+        if not opts or not opts.enabled:
			
 
				+            return False
			
 
				+        if not opts.output_dir:
			
 
				+            return False
			
 
				+        return bool(getattr(opts, flag, False))
			
 
				+    
			
 
				+    @staticmethod
			
 
				+    def debug_path(
			
 
				+        name: str,
			
 
				+        opts: Optional[WiredTableDebugOptions] = None
			
 
				+    ) -> Optional[str]:
			
 
				+        """
			
 
				+        生成调试文件路径
			
 
				+        
			
 
				+        Args:
			
 
				+            name: 文件名（不含扩展名）
			
 
				+            opts: 调试选项（可选）
			
 
				+            
			
 
				+        Returns:
			
 
				+            完整文件路径，如果未启用则返回 None
			
 
				+        """
			
 
				+        if not opts or not opts.output_dir:
			
 
				+            return None
			
 
				+        
			
 
				+        prefix = (opts.prefix + "_") if opts.prefix else ""
			
 
				+        ext = opts.image_format or "png"
			
 
				+        return f"{opts.output_dir}/{prefix}{name}.{ext}"
			
 
				+
			
--- a/ocr_tools/universal_doc_parser/models/adapters/wired_table/grid_recovery.py
+++ b/ocr_tools/universal_doc_parser/models/adapters/wired_table/grid_recovery.py
@@ -0,0 +1,537 @@
 
				+"""
			
 
				+网格结构恢复模块
			
 
				+
			
 
				+提供从表格线提取单元格和恢复网格结构的功能。
			
 
				+"""
			
 
				+from typing import List, Dict
			
 
				+import cv2
			
 
				+import numpy as np
			
 
				+from loguru import logger
			
 
				+
			
 
				+
			
 
				+class GridRecovery:
			
 
				+    """网格结构恢复工具类"""
			
 
				+    
			
 
				+    @staticmethod
			
 
				+    def compute_cells_from_lines(
			
 
				+        hpred_up: np.ndarray,
			
 
				+        vpred_up: np.ndarray,
			
 
				+        upscale: float = 1.0,
			
 
				+        debug_dir: str = None,
			
 
				+        debug_prefix: str = "",
			
 
				+    ) -> List[List[float]]:
			
 
				+        """
			
 
				+        基于矢量重构的连通域分析 (Advanced Vector-based Recovery)
			
 
				+        
			
 
				+        策略 (自定义增强版):
			
 
				+        1. 预处理：自适应形态学闭运算修复像素级断连
			
 
				+        2. 提取矢量线段 (get_table_line)
			
 
				+        3. 线段归并/连接 (adjust_lines)
			
 
				+        4. 几何延长线段 (Custom final_adjust_lines with larger threshold)
			
 
				+        5. 重绘Mask并进行连通域分析
			
 
				+        
			
 
				+        Args:
			
 
				+            hpred_up: 横线预测mask（上采样后）
			
 
				+            vpred_up: 竖线预测mask（上采样后）
			
 
				+            upscale: 上采样比例
			
 
				+            debug_dir: 调试输出目录 (Optional)
			
 
				+            debug_prefix: 调试文件名前缀 (Optional)
			
 
				+            
			
 
				+        Returns:
			
 
				+            单元格bbox列表 [[x1, y1, x2, y2], ...]
			
 
				+        """
			
 
				+        import numpy as np
			
 
				+        import cv2
			
 
				+        import math
			
 
				+        import os
			
 
				+        from loguru import logger
			
 
				+        
			
 
				+        # 尝试导入MinerU的工具函数 (仅导入基础提取函数)
			
 
				+        try:
			
 
				+            from mineru.model.table.rec.unet_table.utils_table_line_rec import (
			
 
				+                get_table_line,
			
 
				+                draw_lines,
			
 
				+                adjust_lines
			
 
				+            )
			
 
				+        except ImportError:
			
 
				+            import sys
			
 
				+            logger.error("Could not import mineru utils. Please ensure MinerU is in python path.")
			
 
				+            raise
			
 
				+            
			
 
				+        # --- Local Helper Functions for Robust Line Adjustment ---
			
 
				+        # Ported and modified from MinerU to verify larger gaps
			
 
				+        
			
 
				+        def fit_line(p):
			
 
				+            x1, y1 = p[0]
			
 
				+            x2, y2 = p[1]
			
 
				+            A = y2 - y1
			
 
				+            B = x1 - x2
			
 
				+            C = x2 * y1 - x1 * y2
			
 
				+            return A, B, C
			
 
				+
			
 
				+        def point_line_cor(p, A, B, C):
			
 
				+            x, y = p
			
 
				+            r = A * x + B * y + C
			
 
				+            return r
			
 
				+
			
 
				+        def dist_sqrt(p1, p2):
			
 
				+             return np.sqrt((p1[0] - p2[0]) ** 2 + (p1[1] - p2[1]) ** 2)
			
 
				+
			
 
				+        def line_to_line(points1, points2, alpha=10, angle=30, max_len=None):
			
 
				+            x1, y1, x2, y2 = points1
			
 
				+            ox1, oy1, ox2, oy2 = points2
			
 
				+            
			
 
				+            # Calculate current line length
			
 
				+            current_len = dist_sqrt((x1, y1), (x2, y2))
			
 
				+            
			
 
				+            # If we already exceeded max_len, don't extend further
			
 
				+            if max_len is not None and current_len >= max_len:
			
 
				+                return points1
			
 
				+
			
 
				+            # Dynamic Alpha based on CURRENT length (or capped by max extension per step)
			
 
				+            # We maintain the "step limit" to avoid huge jumps, but rely on max_len for total size.
			
 
				+            # effective_alpha = min(alpha, current_len) 
			
 
				+            # (User previous logic: limit step to 1.0x length)
			
 
				+            step_limit = current_len 
			
 
				+            effective_alpha = min(alpha, step_limit)
			
 
				+            
			
 
				+            # Fit lines
			
 
				+            xy = np.array([(x1, y1), (x2, y2)], dtype="float32")
			
 
				+            A1, B1, C1 = fit_line(xy)
			
 
				+            oxy = np.array([(ox1, oy1), (ox2, oy2)], dtype="float32")
			
 
				+            A2, B2, C2 = fit_line(oxy)
			
 
				+            
			
 
				+            flag1 = point_line_cor(np.array([x1, y1], dtype="float32"), A2, B2, C2)
			
 
				+            flag2 = point_line_cor(np.array([x2, y2], dtype="float32"), A2, B2, C2)
			
 
				+
			
 
				+            # 如果位于同一侧（没穿过），尝试延伸
			
 
				+            if (flag1 > 0 and flag2 > 0) or (flag1 < 0 and flag2 < 0):
			
 
				+                if (A1 * B2 - A2 * B1) != 0:
			
 
				+                    # 计算交点
			
 
				+                    x = (B1 * C2 - B2 * C1) / (A1 * B2 - A2 * B1)
			
 
				+                    y = (A2 * C1 - A1 * C2) / (A1 * B2 - A2 * B1)
			
 
				+                    p = (x, y)
			
 
				+                    r0 = dist_sqrt(p, (x1, y1))
			
 
				+                    r1 = dist_sqrt(p, (x2, y2))
			
 
				+                    
			
 
				+                    if min(r0, r1) < effective_alpha:
			
 
				+                        # Check total length constraint
			
 
				+                        if max_len is not None:
			
 
				+                            # Estimate new length
			
 
				+                            if r0 < r1: # Extending (x1,y1) -> p
			
 
				+                                new_len = dist_sqrt(p, (x2, y2))
			
 
				+                            else: # Extending (x2,y2) -> p
			
 
				+                                new_len = dist_sqrt((x1, y1), p)
			
 
				+                            
			
 
				+                            if new_len > max_len:
			
 
				+                                return points1
			
 
				+
			
 
				+                        if r0 < r1:
			
 
				+                            k = abs((y2 - p[1]) / (x2 - p[0] + 1e-10))
			
 
				+                            a = math.atan(k) * 180 / math.pi
			
 
				+                            if a < angle or abs(90 - a) < angle:
			
 
				+                                points1 = np.array([p[0], p[1], x2, y2], dtype="float32")
			
 
				+                        else:
			
 
				+                            k = abs((y1 - p[1]) / (x1 - p[0] + 1e-10))
			
 
				+                            a = math.atan(k) * 180 / math.pi
			
 
				+                            if a < angle or abs(90 - a) < angle:
			
 
				+                                points1 = np.array([x1, y1, p[0], p[1]], dtype="float32")
			
 
				+            return points1
			
 
				+
			
 
				+        def custom_final_adjust_lines(rowboxes, colboxes, alpha=50):
			
 
				+            nrow = len(rowboxes)
			
 
				+            ncol = len(colboxes)
			
 
				+            
			
 
				+            # Pre-calculate Max Allowed Lengths (Original Length * Multiplier)
			
 
				+            # Multiplier = 2.0 means we allow the line to double in size, but not more.
			
 
				+            # This effectively stops short noise from becoming page-height lines.
			
 
				+            extension_multiplier = 3.0 
			
 
				+            
			
 
				+            row_max_lens = [dist_sqrt(b[:2], b[2:]) * extension_multiplier for b in rowboxes]
			
 
				+            col_max_lens = [dist_sqrt(b[:2], b[2:]) * extension_multiplier for b in colboxes]
			
 
				+            
			
 
				+            for i in range(nrow):
			
 
				+                for j in range(ncol):
			
 
				+                    rowboxes[i] = line_to_line(rowboxes[i], colboxes[j], alpha=alpha, angle=30, max_len=row_max_lens[i])
			
 
				+                    colboxes[j] = line_to_line(colboxes[j], rowboxes[i], alpha=alpha, angle=30, max_len=col_max_lens[j])
			
 
				+            return rowboxes, colboxes
			
 
				+            
			
 
				+        def save_debug_image(step_name, img, is_lines=False, lines=None):
			
 
				+            if debug_dir:
			
 
				+                try:
			
 
				+                    os.makedirs(debug_dir, exist_ok=True)
			
 
				+                    name = f"{debug_prefix}_{step_name}.png" if debug_prefix else f"{step_name}.png"
			
 
				+                    path = os.path.join(debug_dir, name)
			
 
				+                    
			
 
				+                    if is_lines and lines:
			
 
				+                        # Draw lines on black background
			
 
				+                        tmp = np.zeros(img.shape[:2], dtype=np.uint8)
			
 
				+                        tmp = draw_lines(tmp, lines, color=255, lineW=2)
			
 
				+                        cv2.imwrite(path, tmp)
			
 
				+                    else:
			
 
				+                        cv2.imwrite(path, img)
			
 
				+                    logger.debug(f"Saved debug image: {path}")
			
 
				+                except Exception as e:
			
 
				+                    logger.warning(f"Failed to save debug image {step_name}: {e}")
			
 
				+
			
 
				+        # ---------------------------------------------------------
			
 
				+
			
 
				+        h, w = hpred_up.shape[:2]
			
 
				+        
			
 
				+        # 1. 预处理：二值化
			
 
				+        _, h_bin = cv2.threshold(hpred_up, 127, 255, cv2.THRESH_BINARY)
			
 
				+        _, v_bin = cv2.threshold(vpred_up, 127, 255, cv2.THRESH_BINARY)
			
 
				+        
			
 
				+        # 1.1 自适应形态学修复
			
 
				+        hors_k = int(math.sqrt(w) * 1.2)
			
 
				+        vert_k = int(math.sqrt(h) * 1.2)
			
 
				+        hors_k = max(10, min(hors_k, 50))
			
 
				+        vert_k = max(10, min(vert_k, 50))
			
 
				+        
			
 
				+        kernel_h = cv2.getStructuringElement(cv2.MORPH_RECT, (hors_k, 1))
			
 
				+        kernel_v = cv2.getStructuringElement(cv2.MORPH_RECT, (1, vert_k))
			
 
				+        
			
 
				+        h_bin = cv2.morphologyEx(h_bin, cv2.MORPH_CLOSE, kernel_h, iterations=1)
			
 
				+        v_bin = cv2.morphologyEx(v_bin, cv2.MORPH_CLOSE, kernel_v, iterations=1)
			
 
				+        
			
 
				+        # 2. 提取矢量线段
			
 
				+        rowboxes = get_table_line(h_bin, axis=0, lineW=int(10))
			
 
				+        colboxes = get_table_line(v_bin, axis=1, lineW=int(10))
			
 
				+        
			
 
				+        logger.debug(f"Initial lines -> Rows: {len(rowboxes)}, Cols: {len(colboxes)}")
			
 
				+        
			
 
				+        # Step 2 Debug
			
 
				+        save_debug_image("step02_raw_vectors", h_bin, is_lines=True, lines=rowboxes + colboxes)
			
 
				+        
			
 
				+        # 3. 线段合并 (adjust_lines)
			
 
				+        rboxes_row_ = adjust_lines(rowboxes, alph=100, angle=50)
			
 
				+        rboxes_col_ = adjust_lines(colboxes, alph=15, angle=50)
			
 
				+        
			
 
				+        if rboxes_row_:
			
 
				+            rowboxes += rboxes_row_
			
 
				+        if rboxes_col_:
			
 
				+            colboxes += rboxes_col_
			
 
				+            
			
 
				+        # Step 3 Debug
			
 
				+        save_debug_image("step03_merged_vectors", h_bin, is_lines=True, lines=rowboxes + colboxes)
			
 
				+        
			
 
				+        # 3.5 过滤短线 (Noise Filtering)
			
 
				+        # 在延长线段之前，过滤掉过短的线段（往往是噪声、文字下划线等）
			
 
				+        # 阈值: min(w, h) * 0.02, 至少 20px
			
 
				+        filter_threshold = max(20, min(w, h) * 0.02)
			
 
				+        
			
 
				+        def filter_short_lines(lines, thresh):
			
 
				+            valid_lines = []
			
 
				+            for line in lines:
			
 
				+                x1, y1, x2, y2 = line
			
 
				+                length = math.sqrt((x2-x1)**2 + (y2-y1)**2)
			
 
				+                if length > thresh:
			
 
				+                    valid_lines.append(line)
			
 
				+            return valid_lines
			
 
				+            
			
 
				+        len_row_before = len(rowboxes)
			
 
				+        len_col_before = len(colboxes)
			
 
				+        
			
 
				+        rowboxes = filter_short_lines(rowboxes, filter_threshold)
			
 
				+        colboxes = filter_short_lines(colboxes, filter_threshold)
			
 
				+        
			
 
				+        if len(rowboxes) < len_row_before or len(colboxes) < len_col_before:
			
 
				+            logger.info(f"Filtered short lines (thresh={filter_threshold:.1f}): Rows {len_row_before}->{len(rowboxes)}, Cols {len_col_before}->{len(colboxes)}")
			
 
				+            # Optional: Save filtered state
			
 
				+            save_debug_image("step03b_filtered_vectors", h_bin, is_lines=True, lines=rowboxes + colboxes)
			
 
				+
			
 
				+        # 4. 几何延长线段 (使用自定义的大阈值函数)
			
 
				+        # alpha=w//20 动态阈值，或者固定给一个较大的值如 100
			
 
				+        # 假设分辨率较大，100px的断连是需要被修复的
			
 
				+        dynamic_alpha = max(50, int(min(w, h) * 0.05)) # 5% of min dimension
			
 
				+        logger.info(f"Using dynamic alpha for line extension: {dynamic_alpha}")
			
 
				+        
			
 
				+        rowboxes, colboxes = custom_final_adjust_lines(rowboxes, colboxes, alpha=dynamic_alpha)
			
 
				+        
			
 
				+        # Step 4 Debug
			
 
				+        save_debug_image("step04_extended_vectors", h_bin, is_lines=True, lines=rowboxes + colboxes)
			
 
				+        
			
 
				+        # 5. 重绘纯净Mask
			
 
				+        line_mask = np.zeros((h, w), dtype=np.uint8)
			
 
				+        # 线宽设为4，确保物理接触
			
 
				+        line_mask = draw_lines(line_mask, rowboxes + colboxes, color=255, lineW=4)
			
 
				+        
			
 
				+        # Step 5a Debug (Before Dilation)
			
 
				+        save_debug_image("step05a_rerasterized", line_mask)
			
 
				+        
			
 
				+        # 增强: 全局微膨胀
			
 
				+        kernel_dilate = cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))
			
 
				+        line_mask = cv2.dilate(line_mask, kernel_dilate, iterations=1)
			
 
				+        
			
 
				+        # Step 5b Debug (After Dilation)
			
 
				+        save_debug_image("step05b_dilated", line_mask)
			
 
				+        
			
 
				+        # 6. 反转图像
			
 
				+        inv_grid = cv2.bitwise_not(line_mask)
			
 
				+        
			
 
				+        # Step 6 Debug (Input to ConnectedComponents)
			
 
				+        save_debug_image("step06_inverted_input", inv_grid)
			
 
				+        
			
 
				+        # 7. 连通域
			
 
				+        num_labels, labels, stats, centroids = cv2.connectedComponentsWithStats(inv_grid, connectivity=8)
			
 
				+        
			
 
				+        bboxes = []
			
 
				+        
			
 
				+        # 8. 过滤
			
 
				+        for i in range(1, num_labels):
			
 
				+            x = stats[i, cv2.CC_STAT_LEFT]
			
 
				+            y = stats[i, cv2.CC_STAT_TOP]
			
 
				+            w_cell = stats[i, cv2.CC_STAT_WIDTH]
			
 
				+            h_cell = stats[i, cv2.CC_STAT_HEIGHT]
			
 
				+            area = stats[i, cv2.CC_STAT_AREA]
			
 
				+            
			
 
				+            if w_cell > w * 0.98 and h_cell > h * 0.98:
			
 
				+                continue
			
 
				+            if area < 50:
			
 
				+                continue
			
 
				+                
			
 
				+            orig_h = h_cell / upscale
			
 
				+            orig_w = w_cell / upscale
			
 
				+            
			
 
				+            if orig_h < 4.0 or orig_w < 4.0:
			
 
				+                continue
			
 
				+            
			
 
				+            bboxes.append([
			
 
				+                x / upscale,
			
 
				+                y / upscale,
			
 
				+                (x + w_cell) / upscale,
			
 
				+                (y + h_cell) / upscale
			
 
				+            ])
			
 
				+        
			
 
				+        bboxes.sort(key=lambda b: (int(b[1] / 10), b[0]))
			
 
				+        
			
 
				+        logger.info(f"矢量重构分析提取到 {len(bboxes)} 个单元格 (Dynamic Alpha: {dynamic_alpha})")
			
 
				+        
			
 
				+        return bboxes
			
 
				+
			
 
				+    @staticmethod
			
 
				+    def find_grid_lines(coords: List[float], tolerance: float = 5.0, min_support: int = 2) -> List[float]:
			
 
				+        """
			
 
				+        聚类坐标点并筛选出高支持度的网格线
			
 
				+        
			
 
				+        Args:
			
 
				+            coords: 坐标列表
			
 
				+            tolerance: 容差（像素）
			
 
				+            min_support: 最小支持度（至少有多少个坐标点对齐）
			
 
				+            
			
 
				+        Returns:
			
 
				+            网格线坐标列表
			
 
				+        """
			
 
				+        if not coords:
			
 
				+            return []
			
 
				+        
			
 
				+        coords.sort()
			
 
				+        
			
 
				+        # 1. 简单聚类
			
 
				+        clusters = []
			
 
				+        if coords:
			
 
				+            curr_cluster = [coords[0]]
			
 
				+            for x in coords[1:]:
			
 
				+                if x - curr_cluster[-1] < tolerance:
			
 
				+                    curr_cluster.append(x)
			
 
				+                else:
			
 
				+                    clusters.append(curr_cluster)
			
 
				+                    curr_cluster = [x]
			
 
				+            clusters.append(curr_cluster)
			
 
				+        
			
 
				+        # 2. 计算聚类中心和支持度
			
 
				+        grid_lines = []
			
 
				+        for cluster in clusters:
			
 
				+            if len(cluster) >= min_support:
			
 
				+                center = sum(cluster) / len(cluster)
			
 
				+                grid_lines.append(center)
			
 
				+        
			
 
				+        return grid_lines
			
 
				+    
			
 
				+    @staticmethod
			
 
				+    def recover_grid_structure(bboxes: List[List[float]]) -> List[Dict]:
			
 
				+        """
			
 
				+        从散乱的单元格 bbox 恢复表格的行列结构 (row, col, rowspan, colspan)
			
 
				+        重构版：基于投影网格线 (Projected Grid Lines) 的算法
			
 
				+        适用于行高差异巨大、存在密集小行的复杂表格
			
 
				+        
			
 
				+        Args:
			
 
				+            bboxes: 单元格bbox列表
			
 
				+            
			
 
				+        Returns:
			
 
				+            结构化单元格列表，包含 row, col, rowspan, colspan
			
 
				+        """
			
 
				+        if not bboxes:
			
 
				+            return []
			
 
				+        
			
 
				+        # 1. 识别行分割线 (Y轴)
			
 
				+        y_coords = []
			
 
				+        for b in bboxes:
			
 
				+            y_coords.append(b[1])
			
 
				+            y_coords.append(b[3])
			
 
				+        
			
 
				+        row_dividers = GridRecovery.find_grid_lines(y_coords, tolerance=5, min_support=2)
			
 
				+        
			
 
				+        # 2. 识别列分割线 (X轴)
			
 
				+        x_coords = []
			
 
				+        for b in bboxes:
			
 
				+            x_coords.append(b[0])
			
 
				+            x_coords.append(b[2])
			
 
				+        col_dividers = GridRecovery.find_grid_lines(x_coords, tolerance=5, min_support=2)
			
 
				+        
			
 
				+        # 3. 构建网格结构
			
 
				+        structured_cells = []
			
 
				+        
			
 
				+        # 定义行区间
			
 
				+        row_intervals = []
			
 
				+        for i in range(len(row_dividers) - 1):
			
 
				+            row_intervals.append({
			
 
				+                "top": row_dividers[i],
			
 
				+                "bottom": row_dividers[i+1],
			
 
				+                "height": row_dividers[i+1] - row_dividers[i],
			
 
				+                "index": i
			
 
				+            })
			
 
				+        
			
 
				+        # 定义列区间
			
 
				+        col_intervals = []
			
 
				+        for i in range(len(col_dividers) - 1):
			
 
				+            col_intervals.append({
			
 
				+                "left": col_dividers[i],
			
 
				+                "right": col_dividers[i+1],
			
 
				+                "width": col_dividers[i+1] - col_dividers[i],
			
 
				+                "index": i
			
 
				+            })
			
 
				+        
			
 
				+        for bbox in bboxes:
			
 
				+            b_top, b_bottom = bbox[1], bbox[3]
			
 
				+            b_left, b_right = bbox[0], bbox[2]
			
 
				+            b_h = b_bottom - b_top
			
 
				+            b_w = b_right - b_left
			
 
				+            
			
 
				+            # 匹配行
			
 
				+            matched_rows = []
			
 
				+            for r in row_intervals:
			
 
				+                inter_top = max(b_top, r["top"])
			
 
				+                inter_bottom = min(b_bottom, r["bottom"])
			
 
				+                inter_h = max(0, inter_bottom - inter_top)
			
 
				+                
			
 
				+                if r["height"] > 0 and (inter_h / r["height"] > 0.5 or inter_h / b_h > 0.5):
			
 
				+                    matched_rows.append(r["index"])
			
 
				+            
			
 
				+            if not matched_rows:
			
 
				+                cy = (b_top + b_bottom) / 2
			
 
				+                closest_r = min(row_intervals, key=lambda r: abs((r["top"]+r["bottom"])/2 - cy))
			
 
				+                matched_rows = [closest_r["index"]]
			
 
				+            
			
 
				+            row_start = min(matched_rows)
			
 
				+            row_end = max(matched_rows)
			
 
				+            rowspan = row_end - row_start + 1
			
 
				+            
			
 
				+            # 匹配列
			
 
				+            matched_cols = []
			
 
				+            for c in col_intervals:
			
 
				+                inter_left = max(b_left, c["left"])
			
 
				+                inter_right = min(b_right, c["right"])
			
 
				+                inter_w = max(0, inter_right - inter_left)
			
 
				+                
			
 
				+                if c["width"] > 0 and (inter_w / c["width"] > 0.5 or inter_w / b_w > 0.5):
			
 
				+                    matched_cols.append(c["index"])
			
 
				+            
			
 
				+            if not matched_cols:
			
 
				+                cx = (b_left + b_right) / 2
			
 
				+                closest_c = min(col_intervals, key=lambda c: abs((c["left"]+c["right"])/2 - cx))
			
 
				+                matched_cols = [closest_c["index"]]
			
 
				+            
			
 
				+            col_start = min(matched_cols)
			
 
				+            col_end = max(matched_cols)
			
 
				+            colspan = col_end - col_start + 1
			
 
				+            
			
 
				+            structured_cells.append({
			
 
				+                "bbox": bbox,
			
 
				+                "row": row_start,
			
 
				+                "col": col_start,
			
 
				+                "rowspan": rowspan,
			
 
				+                "colspan": colspan
			
 
				+            })
			
 
				+        
			
 
				+        # 按行列排序
			
 
				+        structured_cells.sort(key=lambda c: (c["row"], c["col"]))
			
 
				+        
			
 
				+        # 压缩网格 (移除空行空列)
			
 
				+        structured_cells = GridRecovery.compress_grid(structured_cells)
			
 
				+        
			
 
				+        return structured_cells
			
 
				+    
			
 
				+    @staticmethod
			
 
				+    def compress_grid(cells: List[Dict]) -> List[Dict]:
			
 
				+        """
			
 
				+        压缩网格索引，移除空行和空列
			
 
				+        
			
 
				+        Args:
			
 
				+            cells: 单元格列表
			
 
				+            
			
 
				+        Returns:
			
 
				+            压缩后的单元格列表
			
 
				+        """
			
 
				+        if not cells:
			
 
				+            return []
			
 
				+        
			
 
				+        # 1. 计算当前最大行列
			
 
				+        max_row = 0
			
 
				+        max_col = 0
			
 
				+        for cell in cells:
			
 
				+            max_row = max(max_row, cell["row"] + cell.get("rowspan", 1))
			
 
				+            max_col = max(max_col, cell["col"] + cell.get("colspan", 1))
			
 
				+        
			
 
				+        # 2. 标记占用情况
			
 
				+        row_occupied = [False] * max_row
			
 
				+        col_occupied = [False] * max_col
			
 
				+        
			
 
				+        for cell in cells:
			
 
				+            if cell["row"] < max_row:
			
 
				+                row_occupied[cell["row"]] = True
			
 
				+            if cell["col"] < max_col:
			
 
				+                col_occupied[cell["col"]] = True
			
 
				+        
			
 
				+        # 3. 构建映射表
			
 
				+        row_map = [0] * (max_row + 1)
			
 
				+        current_row = 0
			
 
				+        for r in range(max_row):
			
 
				+            if row_occupied[r]:
			
 
				+                current_row += 1
			
 
				+            row_map[r + 1] = current_row
			
 
				+        
			
 
				+        col_map = [0] * (max_col + 1)
			
 
				+        current_col = 0
			
 
				+        for c in range(max_col):
			
 
				+            if col_occupied[c]:
			
 
				+                current_col += 1
			
 
				+            col_map[c + 1] = current_col
			
 
				+        
			
 
				+        # 4. 更新单元格索引
			
 
				+        new_cells = []
			
 
				+        for cell in cells:
			
 
				+            new_cell = cell.copy()
			
 
				+            
			
 
				+            old_r1 = cell["row"]
			
 
				+            old_r2 = old_r1 + cell.get("rowspan", 1)
			
 
				+            new_r1 = row_map[old_r1]
			
 
				+            new_r2 = row_map[old_r2]
			
 
				+            
			
 
				+            old_c1 = cell["col"]
			
 
				+            old_c2 = old_c1 + cell.get("colspan", 1)
			
 
				+            new_c1 = col_map[old_c1]
			
 
				+            new_c2 = col_map[old_c2]
			
 
				+            
			
 
				+            new_span_r = max(1, new_r2 - new_r1)
			
 
				+            new_span_c = max(1, new_c2 - new_c1)
			
 
				+            
			
 
				+            new_cell["row"] = new_r1
			
 
				+            new_cell["col"] = new_c1
			
 
				+            new_cell["rowspan"] = new_span_r
			
 
				+            new_cell["colspan"] = new_span_c
			
 
				+            
			
 
				+            new_cells.append(new_cell)
			
 
				+        
			
 
				+        return new_cells
			
 
				+
			
--- a/ocr_tools/universal_doc_parser/models/adapters/wired_table/html_generator.py
+++ b/ocr_tools/universal_doc_parser/models/adapters/wired_table/html_generator.py
@@ -0,0 +1,154 @@
 
				+"""
			
 
				+HTML生成模块
			
 
				+
			
 
				+提供表格HTML生成和增强功能。
			
 
				+"""
			
 
				+import html
			
 
				+from typing import List, Dict, Any
			
 
				+
			
 
				+
			
 
				+class WiredTableHTMLGenerator:
			
 
				+    """HTML生成工具类"""
			
 
				+    
			
 
				+    @staticmethod
			
 
				+    def build_html_from_merged_cells(merged_cells: List[Dict]) -> str:
			
 
				+        """
			
 
				+        基于矩阵填充法生成 HTML，防止错位
			
 
				+        
			
 
				+        Args:
			
 
				+            merged_cells: 合并后的单元格列表，包含 row, col, rowspan, colspan, text, bbox 等字段
			
 
				+            
			
 
				+        Returns:
			
 
				+            HTML字符串
			
 
				+        """
			
 
				+        if not merged_cells:
			
 
				+            return "<table><tbody></tbody></table>"
			
 
				+        
			
 
				+        # 1. 计算网格尺寸
			
 
				+        max_row = 0
			
 
				+        max_col = 0
			
 
				+        for cell in merged_cells:
			
 
				+            max_row = max(max_row, cell["row"] + cell.get("rowspan", 1))
			
 
				+            max_col = max(max_col, cell["col"] + cell.get("colspan", 1))
			
 
				+        
			
 
				+        # 2. 构建占用矩阵 (True 表示该位置已被占据)
			
 
				+        occupied = [[False for _ in range(max_col)] for _ in range(max_row)]
			
 
				+        
			
 
				+        # 3. 将单元格放入查找表，方便按坐标检索
			
 
				+        cell_map = {}
			
 
				+        for cell in merged_cells:
			
 
				+            key = (cell["row"], cell["col"])
			
 
				+            cell_map[key] = cell
			
 
				+        
			
 
				+        html_parts = ["<table><tbody>"]
			
 
				+        
			
 
				+        # 4. 逐行逐列扫描
			
 
				+        for r in range(max_row):
			
 
				+            html_parts.append("<tr>")
			
 
				+            for c in range(max_col):
			
 
				+                # 如果该位置已被之前的 rowspan/colspan 占据，跳过
			
 
				+                if occupied[r][c]:
			
 
				+                    continue
			
 
				+                
			
 
				+                # 检查是否有单元格起始于此
			
 
				+                cell = cell_map.get((r, c))
			
 
				+                
			
 
				+                if cell:
			
 
				+                    # 有单元格：输出 td 并标记占用区域
			
 
				+                    bbox = cell["bbox"]
			
 
				+                    colspan = cell.get("colspan", 1)
			
 
				+                    rowspan = cell.get("rowspan", 1)
			
 
				+                    text = html.escape(cell.get("text", ""))
			
 
				+                    bbox_str = f"[{int(bbox[0])},{int(bbox[1])},{int(bbox[2])},{int(bbox[3])}]"
			
 
				+                    
			
 
				+                    attrs = [f'data-bbox="{bbox_str}"']
			
 
				+                    if colspan > 1:
			
 
				+                        attrs.append(f'colspan="{colspan}"')
			
 
				+                    if rowspan > 1:
			
 
				+                        attrs.append(f'rowspan="{rowspan}"')
			
 
				+                    
			
 
				+                    html_parts.append(f'<td {" ".join(attrs)}>{text}</td>')
			
 
				+                    
			
 
				+                    # 标记占用
			
 
				+                    for i in range(rowspan):
			
 
				+                        for j in range(colspan):
			
 
				+                            if r + i < max_row and c + j < max_col:
			
 
				+                                occupied[r + i][c + j] = True
			
 
				+                else:
			
 
				+                    # 无单元格（空洞）：输出空 td 占位，防止后续单元格左移
			
 
				+                    html_parts.append("<td></td>")
			
 
				+                    occupied[r][c] = True
			
 
				+                    
			
 
				+            html_parts.append("</tr>")
			
 
				+        
			
 
				+        html_parts.append("</tbody></table>")
			
 
				+        return "".join(html_parts)
			
 
				+    
			
 
				+    @staticmethod
			
 
				+    def enhance_html_with_cell_data(html_code: str, cells: List[Dict[str, Any]]) -> str:
			
 
				+        """
			
 
				+        通过BeautifulSoup增强HTML，为每个td添加data-bbox和data-score属性
			
 
				+        
			
 
				+        保留原始MinerU的rowspan/colspan等属性，只增加定位信息。按照cells的row/col与HTML中td的位置关系进行匹配。
			
 
				+        
			
 
				+        Args:
			
 
				+            html_code: MinerU生成的原始HTML
			
 
				+            cells: 单元格列表，包含bbox、row、col等信息
			
 
				+        
			
 
				+        Returns:
			
 
				+            增强后的HTML字符串，包含data-bbox和data-score属性
			
 
				+        """
			
 
				+        if not html_code or not cells:
			
 
				+            return html_code
			
 
				+        
			
 
				+        try:
			
 
				+            from bs4 import BeautifulSoup
			
 
				+        except ImportError:
			
 
				+            return html_code
			
 
				+        
			
 
				+        soup = BeautifulSoup(html_code, 'html.parser')
			
 
				+        
			
 
				+        # 建立cell快速查询字典：(row, col) -> cell
			
 
				+        cell_dict = {}
			
 
				+        for cell in cells:
			
 
				+            row = cell.get("row", 0)
			
 
				+            col = cell.get("col", 0)
			
 
				+            key = (row, col)
			
 
				+            cell_dict[key] = cell
			
 
				+        
			
 
				+        # 遍历HTML中的所有tr和td，按行列顺序进行匹配
			
 
				+        rows = soup.find_all('tr')
			
 
				+        for row_idx, tr in enumerate(rows):
			
 
				+            tds = tr.find_all('td')  # type: ignore
			
 
				+            col_idx = 0
			
 
				+            for td in tds:
			
 
				+                # 获取colspan和rowspan属性
			
 
				+                colspan_str = td.get('colspan')  # type: ignore
			
 
				+                rowspan_str = td.get('rowspan')  # type: ignore
			
 
				+                try:
			
 
				+                    colspan = int(str(colspan_str)) if colspan_str else 1
			
 
				+                    rowspan = int(str(rowspan_str)) if rowspan_str else 1
			
 
				+                except (ValueError, TypeError):
			
 
				+                    colspan = 1
			
 
				+                    rowspan = 1
			
 
				+                
			
 
				+                # 根据row_idx和col_idx查找对应的cell
			
 
				+                cell_key = (row_idx, col_idx)
			
 
				+                if cell_key in cell_dict:
			
 
				+                    cell = cell_dict[cell_key]
			
 
				+                    bbox = cell.get("bbox", [])
			
 
				+                    score = cell.get("score", 100.0)
			
 
				+                    
			
 
				+                    # 添加data-bbox属性
			
 
				+                    if bbox and len(bbox) >= 4:
			
 
				+                        bbox_str = ",".join(map(str, map(int, bbox[:4])))
			
 
				+                        td['data-bbox'] = f"[{bbox_str}]"  # type: ignore
			
 
				+                    
			
 
				+                    # 添加data-score属性
			
 
				+                    td['data-score'] = f"{score:.4f}"  # type: ignore
			
 
				+                
			
 
				+                # 更新列索引（考虑colspan）
			
 
				+                col_idx += colspan
			
 
				+        
			
 
				+        return str(soup)
			
 
				+
			
--- a/ocr_tools/universal_doc_parser/models/adapters/wired_table/ocr_formatter.py
+++ b/ocr_tools/universal_doc_parser/models/adapters/wired_table/ocr_formatter.py
@@ -0,0 +1,46 @@
 
				+"""
			
 
				+OCR格式转换模块
			
 
				+
			
 
				+提供OCR结果格式转换功能，将OCR结果转换为UNet模型期望的格式。
			
 
				+"""
			
 
				+from typing import List, Dict, Any
			
 
				+
			
 
				+
			
 
				+class OCRFormatter:
			
 
				+    """OCR格式转换工具类"""
			
 
				+    
			
 
				+    @staticmethod
			
 
				+    def to_unet_ocr_format(ocr_boxes: List[Dict[str, Any]]) -> List[List[Any]]:
			
 
				+        """
			
 
				+        将OCR结果转成 UNet 期望格式 [[poly4,text,score], ...]，坐标用浮点。
			
 
				+        
			
 
				+        Args:
			
 
				+            ocr_boxes: OCR结果列表，每个元素包含 bbox, text, confidence 等字段
			
 
				+            
			
 
				+        Returns:
			
 
				+            UNet格式的OCR结果列表
			
 
				+        """
			
 
				+        formatted = []
			
 
				+        for item in ocr_boxes:
			
 
				+            poly = item.get("bbox", [])
			
 
				+            text = item.get("text", "")
			
 
				+            score = item.get("confidence", 0.0)
			
 
				+            if not poly or len(poly) < 4:
			
 
				+                continue
			
 
				+            
			
 
				+            # 统一成4点 (4,2)
			
 
				+            if len(poly) == 8:
			
 
				+                # 8点格式：[x1, y1, x2, y2, x3, y3, x4, y4]
			
 
				+                poly_pts = [[float(poly[i]), float(poly[i + 1])] for i in range(0, 8, 2)]
			
 
				+            elif len(poly) == 4:
			
 
				+                # 4点bbox格式：[x1, y1, x2, y2]
			
 
				+                x1, y1, x2, y2 = poly
			
 
				+                poly_pts = [[x1, y1], [x2, y1], [x2, y2], [x1, y2]]
			
 
				+            else:
			
 
				+                # 其他格式跳过
			
 
				+                continue
			
 
				+            
			
 
				+            formatted.append([poly_pts, text, float(score)])
			
 
				+        
			
 
				+        return formatted
			
 
				+
			
--- a/ocr_tools/universal_doc_parser/models/adapters/wired_table/skew_detection.py
+++ b/ocr_tools/universal_doc_parser/models/adapters/wired_table/skew_detection.py
@@ -0,0 +1,455 @@
 
				+"""
			
 
				+倾斜检测和矫正模块
			
 
				+
			
 
				+提供表格图像倾斜检测和矫正功能。
			
 
				+"""
			
 
				+from typing import Dict, Any, List, Tuple, Optional
			
 
				+import cv2
			
 
				+import numpy as np
			
 
				+from loguru import logger
			
 
				+import math
			
 
				+
			
 
				+# 导入倾斜矫正工具
			
 
				+try:
			
 
				+    from ocr_utils import BBoxExtractor
			
 
				+    BBOX_EXTRACTOR_AVAILABLE = True
			
 
				+except ImportError:
			
 
				+    BBoxExtractor = None
			
 
				+    BBOX_EXTRACTOR_AVAILABLE = False
			
 
				+
			
 
				+
			
 
				+class SkewDetector:
			
 
				+    """倾斜检测和矫正工具类"""
			
 
				+    
			
 
				+    def __init__(self, config: Dict[str, Any]):
			
 
				+        """
			
 
				+        初始化倾斜检测器
			
 
				+        
			
 
				+        Args:
			
 
				+            config: 配置字典
			
 
				+        """
			
 
				+        self.enable_deskew: bool = config.get("enable_deskew", True) and BBOX_EXTRACTOR_AVAILABLE
			
 
				+        self.skew_threshold: float = config.get("skew_threshold", 0.1)  # 小于此角度不矫正
			
 
				+        # Hough变换参数（用于基于图像的倾斜检测）
			
 
				+        self.hough_rho: float = config.get("hough_rho", 1.0)  # 距离精度（像素）
			
 
				+        self.hough_theta: float = config.get("hough_theta", np.pi / 180)  # 角度精度（弧度，默认1度）
			
 
				+        self.skew_angle_range: Tuple[float, float] = config.get("skew_angle_range", (-30, 30))  # 角度范围限制（度）
			
 
				+        self.skew_outlier_threshold: float = config.get("skew_outlier_threshold", 2.0)  # 异常值阈值（度）
			
 
				+        self.skew_small_angle_range: Tuple[float, float] = config.get("skew_small_angle_range", (-2.0, 2.0))  # 小角度范围（度）
			
 
				+    
			
 
				+    def detect_skew_angle_from_image(self, image: np.ndarray) -> float:
			
 
				+        """
			
 
				+        使用Hough变换从图像直接检测倾斜角度（不依赖OCR结果）
			
 
				+        
			
 
				+        Args:
			
 
				+            image: 表格图像（可以是灰度图或彩色图）
			
 
				+            
			
 
				+        Returns:
			
 
				+            倾斜角度（度数，正值=逆时针，负值=顺时针）
			
 
				+        """
			
 
				+        if not self.enable_deskew:
			
 
				+            return 0.0
			
 
				+        
			
 
				+        try:
			
 
				+            # 1. 图像预处理：转换为灰度图
			
 
				+            if len(image.shape) == 3:
			
 
				+                gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
			
 
				+            else:
			
 
				+                gray = image.copy()
			
 
				+            
			
 
				+            # 2. 二值化（使用Otsu自适应阈值）
			
 
				+            _, binary = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
			
 
				+            
			
 
				+            # 3. 边缘检测（使用Canny）
			
 
				+            edges = cv2.Canny(binary, 50, 150, apertureSize=3)
			
 
				+            
			
 
				+            # 4. Hough变换检测直线
			
 
				+            h, w = edges.shape
			
 
				+            hough_threshold = max(int(min(h, w) * 0.15), 50)  # 自适应阈值
			
 
				+            lines = cv2.HoughLines(edges, 
			
 
				+                                   rho=self.hough_rho,
			
 
				+                                   theta=self.hough_theta,
			
 
				+                                   threshold=hough_threshold)
			
 
				+            
			
 
				+            if lines is None or len(lines) == 0:
			
 
				+                logger.debug("Hough变换未检测到直线，返回0°")
			
 
				+                return 0.0
			
 
				+            
			
 
				+            # 5. 分析直线角度
			
 
				+            line_data = []  # 存储角度和权重（线长度）
			
 
				+            min_angle, max_angle = self.skew_angle_range
			
 
				+            
			
 
				+            for line in lines:
			
 
				+                line_item = line[0]  # type: ignore
			
 
				+                rho = float(line_item[0])
			
 
				+                theta = float(line_item[1])
			
 
				+                
			
 
				+                # 转换为直线与水平线的夹角（度数）
			
 
				+                if theta < np.pi / 4 or theta > 3 * np.pi / 4:
			
 
				+                    # 接近垂直的线，跳过
			
 
				+                    continue
			
 
				+                
			
 
				+                # 将theta转换为直线角度
			
 
				+                angle_rad = theta - np.pi / 2
			
 
				+                angle_degrees = np.degrees(angle_rad)
			
 
				+                
			
 
				+                # 过滤：只保留在指定范围内的角度
			
 
				+                if min_angle <= angle_degrees <= max_angle:
			
 
				+                    # 计算直线在图像中的长度作为权重
			
 
				+                    cos_theta = np.cos(theta)
			
 
				+                    sin_theta = np.sin(theta)
			
 
				+                    
			
 
				+                    # 计算与边界的交点
			
 
				+                    intersections = []
			
 
				+                    # 左边界 (x=0)
			
 
				+                    if abs(sin_theta) > 1e-6:
			
 
				+                        y = rho / sin_theta
			
 
				+                        if 0 <= y <= h:
			
 
				+                            intersections.append((0.0, y))
			
 
				+                    # 右边界 (x=w)
			
 
				+                    if abs(sin_theta) > 1e-6:
			
 
				+                        y = (rho - w * cos_theta) / sin_theta
			
 
				+                        if 0 <= y <= h:
			
 
				+                            intersections.append((float(w), y))
			
 
				+                    # 上边界 (y=0)
			
 
				+                    if abs(cos_theta) > 1e-6:
			
 
				+                        x = rho / cos_theta
			
 
				+                        if 0 <= x <= w:
			
 
				+                            intersections.append((x, 0.0))
			
 
				+                    # 下边界 (y=h)
			
 
				+                    if abs(cos_theta) > 1e-6:
			
 
				+                        x = (rho - h * sin_theta) / cos_theta
			
 
				+                        if 0 <= x <= w:
			
 
				+                            intersections.append((x, float(h)))
			
 
				+                    
			
 
				+                    # 去重交点
			
 
				+                    unique_intersections = []
			
 
				+                    for pt in intersections:
			
 
				+                        is_duplicate = False
			
 
				+                        for existing_pt in unique_intersections:
			
 
				+                            if abs(pt[0] - existing_pt[0]) < 1e-3 and abs(pt[1] - existing_pt[1]) < 1e-3:
			
 
				+                                is_duplicate = True
			
 
				+                                break
			
 
				+                        if not is_duplicate:
			
 
				+                            unique_intersections.append(pt)
			
 
				+                    
			
 
				+                    # 计算线长度
			
 
				+                    if len(unique_intersections) >= 2:
			
 
				+                        max_dist = 0
			
 
				+                        for i in range(len(unique_intersections)):
			
 
				+                            for j in range(i + 1, len(unique_intersections)):
			
 
				+                                dx = unique_intersections[i][0] - unique_intersections[j][0]
			
 
				+                                dy = unique_intersections[i][1] - unique_intersections[j][1]
			
 
				+                                dist = np.sqrt(dx * dx + dy * dy)
			
 
				+                                if dist > max_dist:
			
 
				+                                    max_dist = dist
			
 
				+                        line_length = max_dist
			
 
				+                    else:
			
 
				+                        line_length = float(w)
			
 
				+                    
			
 
				+                    line_data.append({
			
 
				+                        'angle': angle_degrees,
			
 
				+                        'weight': line_length
			
 
				+                    })
			
 
				+            
			
 
				+            if len(line_data) == 0:
			
 
				+                logger.debug("未找到符合条件的直线角度，返回0°")
			
 
				+                return 0.0
			
 
				+            
			
 
				+            # 6. 计算倾斜角度（使用两阶段过滤和加权平均）
			
 
				+            significant_angle_data = [
			
 
				+                item for item in line_data 
			
 
				+                if abs(item['angle']) >= 0.1
			
 
				+            ]
			
 
				+            
			
 
				+            small_min, small_max = self.skew_small_angle_range
			
 
				+            if len(significant_angle_data) >= 10:
			
 
				+                candidate_data = significant_angle_data
			
 
				+                logger.debug(f"使用明显倾斜角度数据: {len(candidate_data)}条直线（角度绝对值>=0.1度）")
			
 
				+            else:
			
 
				+                small_angle_data = [
			
 
				+                    item for item in line_data 
			
 
				+                    if small_min <= item['angle'] <= small_max
			
 
				+                ]
			
 
				+                if len(small_angle_data) >= 10:
			
 
				+                    candidate_data = small_angle_data
			
 
				+                    logger.debug(f"明显倾斜角度不足({len(significant_angle_data)}个)，使用小角度范围数据: {len(candidate_data)}条直线")
			
 
				+                else:
			
 
				+                    candidate_data = line_data
			
 
				+                    logger.debug(f"小角度样本不足({len(small_angle_data)}个)，使用全部数据: {len(candidate_data)}条直线")
			
 
				+            
			
 
				+            # 提取角度和权重
			
 
				+            angles_array = np.array([item['angle'] for item in candidate_data])
			
 
				+            weights_array = np.array([item['weight'] for item in candidate_data])
			
 
				+            
			
 
				+            if len(angles_array) == 0:
			
 
				+                logger.debug("没有候选角度数据，返回0°")
			
 
				+                return 0.0
			
 
				+            
			
 
				+            # 计算中位数
			
 
				+            median_angle = np.median(angles_array)
			
 
				+            
			
 
				+            # 过滤异常值
			
 
				+            outlier_threshold = self.skew_outlier_threshold
			
 
				+            filtered_indices = np.abs(angles_array - median_angle) <= outlier_threshold
			
 
				+            filtered_angles = angles_array[filtered_indices]
			
 
				+            filtered_weights = weights_array[filtered_indices]
			
 
				+            
			
 
				+            if len(filtered_angles) < 3:
			
 
				+                skew_angle = float(median_angle)
			
 
				+                logger.debug(f"过滤后角度太少({len(filtered_angles)}个)，使用中位数: {skew_angle:.3f}°")
			
 
				+            else:
			
 
				+                total_weight = np.sum(filtered_weights)
			
 
				+                if total_weight > 0:
			
 
				+                    weighted_angle = np.sum(filtered_angles * filtered_weights) / total_weight
			
 
				+                    skew_angle = float(weighted_angle)
			
 
				+                else:
			
 
				+                    skew_angle = float(np.median(filtered_angles))
			
 
				+            
			
 
				+            logger.debug(f"基于图像的倾斜角度检测: {skew_angle:.3f}° (检测到{len(lines)}条直线，有效角度{len(line_data)}个，过滤后{len(filtered_angles)}个)")
			
 
				+            return skew_angle
			
 
				+            
			
 
				+        except Exception as e:
			
 
				+            logger.warning(f"基于图像的倾斜角度检测失败: {e}")
			
 
				+            return 0.0
			
 
				+
			
 
				+    def get_oriented_lines_from_mask(self, mask: np.ndarray, lineW: int = 10) -> List[Tuple[float, float, float, float]]:
			
 
				+        """
			
 
				+        从二值Mask中提取带方向的线段 (x1, y1, x2, y2)
			
 
				+        使用 cv2.fitLine 保留斜率方向 (区分 +/- 斜率)
			
 
				+        
			
 
				+        Args:
			
 
				+            mask: 二值图像
			
 
				+            lineW: 最小线宽/长度过滤 (近似)
			
 
				+            
			
 
				+        Returns:
			
 
				+            List of (x1, y1, x2, y2)
			
 
				+        """
			
 
				+        # 确保Mask是连续的，并且是uint8类型
			
 
				+        if not isinstance(mask, np.ndarray):
			
 
				+            mask = np.array(mask)
			
 
				+        if not mask.flags['C_CONTIGUOUS']:
			
 
				+            mask = np.ascontiguousarray(mask)
			
 
				+        if mask.dtype != np.uint8:
			
 
				+            mask = mask.astype(np.uint8)
			
 
				+            
			
 
				+        contours, _ = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
			
 
				+        lines = []
			
 
				+        
			
 
				+        for cnt in contours:
			
 
				+            # 过滤小轮廓
			
 
				+            if cv2.contourArea(cnt) < lineW * 2:
			
 
				+                continue
			
 
				+                
			
 
				+            # fitLine 获取方向
			
 
				+            # [vx, vy, x, y] normalized vector (vx,vy) and point (x,y)
			
 
				+            [vx, vy, x, y] = cv2.fitLine(cnt, cv2.DIST_L2, 0, 0.01, 0.01)
			
 
				+            
			
 
				+            # 找到轮廓在这个方向上的极值点
			
 
				+            # 计算所有点在线上的投影
			
 
				+            
			
 
				+            # 投影公式: proj = (p - center) dot dir
			
 
				+            # p = (x_pt, y_pt)
			
 
				+            pts = cnt.reshape(-1, 2)
			
 
				+            
			
 
				+            # 向量化投影
			
 
				+            # project t along line
			
 
				+            # Line: P = (x,y) + t * (vx, vy)
			
 
				+            # t = (dx*vx + dy*vy)
			
 
				+            dx = pts[:, 0] - x
			
 
				+            dy = pts[:, 1] - y
			
 
				+            t = dx * vx + dy * vy
			
 
				+            
			
 
				+            t_min = np.min(t)
			
 
				+            t_max = np.max(t)
			
 
				+            
			
 
				+            # Compute Endpoints
			
 
				+            p1_x = x + t_min * vx
			
 
				+            p1_y = y + t_min * vy
			
 
				+            p2_x = x + t_max * vx
			
 
				+            p2_y = y + t_max * vy
			
 
				+            
			
 
				+            lines.append((float(p1_x), float(p1_y), float(p2_x), float(p2_y)))
			
 
				+            
			
 
				+        return lines
			
 
				+
			
 
				+    def calculate_skew_from_vectors(self, lines: List[Tuple[float, float, float, float]]) -> float:
			
 
				+        """从矢量线段计算加权平均倾斜角"""
			
 
				+        angles = []
			
 
				+        weights = []
			
 
				+        
			
 
				+        for len_vec in lines:
			
 
				+            x1, y1, x2, y2 = len_vec
			
 
				+            length = math.sqrt((x2-x1)**2 + (y2-y1)**2)
			
 
				+            if length < 100:
			
 
				+                continue
			
 
				+                
			
 
				+            angle = math.degrees(math.atan2(y2-y1, x2-x1))
			
 
				+            
			
 
				+            # Normalize to horizontal (-45 to 45)
			
 
				+            if angle > 45: angle -= 180
			
 
				+            elif angle < -45: angle += 180
			
 
				+            
			
 
				+            if abs(angle) > 30: continue
			
 
				+            
			
 
				+            angles.append(angle)
			
 
				+            weights.append(length)
			
 
				+            
			
 
				+        if not angles:
			
 
				+            return 0.0
			
 
				+            
			
 
				+        angles = np.array(angles)
			
 
				+        weights = np.array(weights)
			
 
				+        
			
 
				+        avg_angle = np.average(angles, weights=weights)
			
 
				+        
			
 
				+        # Filter outliers
			
 
				+        valid = np.abs(angles - avg_angle) < 5.0
			
 
				+        if np.sum(valid) > 0:
			
 
				+            return float(np.average(angles[valid], weights=weights[valid]))
			
 
				+        return float(avg_angle)
			
 
				+
			
 
				+    def detect_skew_from_mask(self, mask: np.ndarray) -> float:
			
 
				+        """
			
 
				+        从UNet预测的二值Mask中检测倾斜角度 (使用 GridRecovery 的 fitLine 逻辑)
			
 
				+        
			
 
				+        Args:
			
 
				+            mask: 横线或竖线mask (0/255)
			
 
				+            
			
 
				+        Returns:
			
 
				+            倾斜角度
			
 
				+        """
			
 
				+        if not self.enable_deskew:
			
 
				+            return 0.0
			
 
				+            
			
 
				+        try:
			
 
				+            # 1. 提取带方向的矢量线段
			
 
				+            lines = self.get_oriented_lines_from_mask(mask)
			
 
				+            
			
 
				+            # 2. 计算加权倾斜角
			
 
				+            final_angle = self.calculate_skew_from_vectors(lines)
			
 
				+            
			
 
				+            logger.debug(f"基于Mask(fitLine)检测: {final_angle:.3f}° (基于 {len(lines)} 条线段)")
			
 
				+            return final_angle
			
 
				+            
			
 
				+        except Exception as e:
			
 
				+            logger.warning(f"基于Mask的倾斜角度检测失败: {e}")
			
 
				+            import traceback
			
 
				+            logger.warning(traceback.format_exc())
			
 
				+            return 0.0
			
 
				+    
			
 
				+    def apply_deskew(
			
 
				+        self,
			
 
				+        table_image: np.ndarray,
			
 
				+        ocr_boxes: List[Dict[str, Any]],
			
 
				+        skew_angle: float
			
 
				+    ) -> Tuple[np.ndarray, List[Dict[str, Any]]]:
			
 
				+        """
			
 
				+        应用倾斜矫正并同步更新OCR坐标
			
 
				+        
			
 
				+        Args:
			
 
				+            table_image: 表格图像
			
 
				+            ocr_boxes: OCR结果列表
			
 
				+            skew_angle: 倾斜角度（度数，正值=逆时针，负值=顺时针）
			
 
				+            
			
 
				+        Returns:
			
 
				+            (矫正后的图像, 更新后的OCR框列表)
			
 
				+        """
			
 
				+        if abs(skew_angle) < self.skew_threshold:
			
 
				+            return table_image, ocr_boxes
			
 
				+        
			
 
				+        if not BBOX_EXTRACTOR_AVAILABLE:
			
 
				+            logger.warning("BBoxExtractor不可用，跳过倾斜矫正")
			
 
				+            return table_image, ocr_boxes
			
 
				+        
			
 
				+        try:
			
 
				+            h, w = table_image.shape[:2]
			
 
				+            center = (w / 2, h / 2)
			
 
				+            
			
 
				+            # 计算矫正角度
			
 
				+            correction_angle = skew_angle
			
 
				+            
			
 
				+            # 构建旋转矩阵
			
 
				+            rotation_matrix = cv2.getRotationMatrix2D(center, correction_angle, 1.0)
			
 
				+            
			
 
				+            # 计算旋转后的图像尺寸（避免裁剪）
			
 
				+            cos_val = abs(rotation_matrix[0, 0])
			
 
				+            sin_val = abs(rotation_matrix[0, 1])
			
 
				+            new_w = int((h * sin_val) + (w * cos_val))
			
 
				+            new_h = int((h * cos_val) + (w * sin_val))
			
 
				+            
			
 
				+            # 调整旋转矩阵的平移部分，使图像居中
			
 
				+            rotation_matrix[0, 2] += (new_w / 2) - center[0]
			
 
				+            rotation_matrix[1, 2] += (new_h / 2) - center[1]
			
 
				+            
			
 
				+            # 应用旋转（填充背景为白色）
			
 
				+            if len(table_image.shape) == 2:
			
 
				+                # 灰度图
			
 
				+                deskewed_image = cv2.warpAffine(
			
 
				+                    table_image, rotation_matrix, (new_w, new_h),
			
 
				+                    flags=cv2.INTER_LINEAR,
			
 
				+                    borderMode=cv2.BORDER_CONSTANT,
			
 
				+                    borderValue=255
			
 
				+                )
			
 
				+            else:
			
 
				+                # 彩色图
			
 
				+                deskewed_image = cv2.warpAffine(
			
 
				+                    table_image, rotation_matrix, (new_w, new_h),
			
 
				+                    flags=cv2.INTER_LINEAR,
			
 
				+                    borderMode=cv2.BORDER_CONSTANT,
			
 
				+                    borderValue=(255, 255, 255)
			
 
				+                )
			
 
				+            
			
 
				+            # 更新OCR框坐标（如果提供了OCR框）
			
 
				+            if ocr_boxes and len(ocr_boxes) > 0:
			
 
				+                paddle_boxes = ocr_boxes
			
 
				+                
			
 
				+                # 使用BBoxExtractor更新坐标
			
 
				+                if BBoxExtractor is None:
			
 
				+                    logger.warning("BBoxExtractor不可用，无法更新OCR坐标")
			
 
				+                    return deskewed_image, ocr_boxes
			
 
				+                
			
 
				+                updated_paddle_boxes = BBoxExtractor.correct_boxes_skew(
			
 
				+                    paddle_boxes, correction_angle, (new_w, new_h)
			
 
				+                )
			
 
				+                
			
 
				+                # 转换回原始格式
			
 
				+                updated_ocr_boxes = []
			
 
				+                for i, paddle_box in enumerate(updated_paddle_boxes):
			
 
				+                    original_box = ocr_boxes[i] if i < len(ocr_boxes) else {}
			
 
				+                    
			
 
				+                    # 从poly重新计算bbox（确保坐标正确）
			
 
				+                    poly = paddle_box.get("poly", [])
			
 
				+                    if poly and len(poly) >= 4:
			
 
				+                        xs = [p[0] for p in poly]
			
 
				+                        ys = [p[1] for p in poly]
			
 
				+                        bbox = [min(xs), min(ys), max(xs), max(ys)]
			
 
				+                    else:
			
 
				+                        bbox = paddle_box.get("bbox", [])
			
 
				+                    
			
 
				+                    updated_box = {
			
 
				+                        "bbox": bbox,
			
 
				+                        "text": paddle_box.get("text", original_box.get("text", "")),
			
 
				+                        "confidence": paddle_box.get("confidence", original_box.get("confidence", original_box.get("score", 1.0))),
			
 
				+                    }
			
 
				+                    # 保留其他字段（包括 original_bbox）
			
 
				+                    for key in original_box:
			
 
				+                        if key not in updated_box:
			
 
				+                            updated_box[key] = original_box[key]
			
 
				+                    # 确保保留 paddle_box 中的 original_bbox（如果存在）
			
 
				+                    if 'original_bbox' in paddle_box:
			
 
				+                        updated_box['original_bbox'] = paddle_box['original_bbox']
			
 
				+                    
			
 
				+                    updated_ocr_boxes.append(updated_box)
			
 
				+                
			
 
				+                logger.info(f"✅ 倾斜矫正完成: {skew_angle:.3f}° → 0° (图像尺寸: {w}x{h} → {new_w}x{new_h})，已更新{len(updated_ocr_boxes)}个OCR框")
			
 
				+                
			
 
				+                return deskewed_image, updated_ocr_boxes
			
 
				+            else:
			
 
				+                # OCR框为空，只返回矫正后的图像
			
 
				+                logger.info(f"✅ 倾斜矫正完成: {skew_angle:.3f}° → 0° (图像尺寸: {w}x{h} → {new_w}x{new_h})，无OCR框需要更新")
			
 
				+                return deskewed_image, ocr_boxes
			
 
				+            
			
 
				+        except Exception as e:
			
 
				+            logger.error(f"倾斜矫正失败: {e}")
			
 
				+            return table_image, ocr_boxes
			
--- a/ocr_tools/universal_doc_parser/models/adapters/wired_table/text_filling.py
+++ b/ocr_tools/universal_doc_parser/models/adapters/wired_table/text_filling.py
@@ -0,0 +1,463 @@
 
				+"""
			
 
				+文本填充模块
			
 
				+
			
 
				+提供表格单元格文本填充功能，包括OCR文本匹配和二次OCR填充。
			
 
				+"""
			
 
				+from typing import List, Dict, Any, Tuple, Optional
			
 
				+import cv2
			
 
				+import numpy as np
			
 
				+from loguru import logger
			
 
				+
			
 
				+from ocr_tools.universal_doc_parser.core.coordinate_utils import CoordinateUtils
			
 
				+
			
 
				+
			
 
				+class TextFiller:
			
 
				+    """文本填充工具类"""
			
 
				+    
			
 
				+    def __init__(self, ocr_engine: Any, config: Dict[str, Any]):
			
 
				+        """
			
 
				+        初始化文本填充器
			
 
				+        
			
 
				+        Args:
			
 
				+            ocr_engine: OCR引擎
			
 
				+            config: 配置字典
			
 
				+        """
			
 
				+        self.ocr_engine = ocr_engine
			
 
				+        self.cell_crop_margin: int = config.get("cell_crop_margin", 2)
			
 
				+        self.ocr_conf_threshold: float = config.get("ocr_conf_threshold", 0.5)
			
 
				+    
			
 
				+    def fill_text_by_center_point(
			
 
				+        self,
			
 
				+        bboxes: List[List[float]],
			
 
				+        ocr_boxes: List[Dict[str, Any]],
			
 
				+    ) -> Tuple[List[str], List[float], List[List[Dict[str, Any]]], List[int]]:
			
 
				+        """
			
 
				+        使用中心点落格策略填充文本。
			
 
				+        
			
 
				+        参考 fill_html_with_ocr_by_bbox：
			
 
				+        - OCR文本中心点落入单元格bbox内则匹配
			
 
				+        - 多行文本按y坐标排序拼接
			
 
				+        
			
 
				+        Args:
			
 
				+            bboxes: 单元格坐标 [[x1,y1,x2,y2], ...]
			
 
				+            ocr_boxes: OCR结果 [{"bbox": [...], "text": "..."}, ...]
			
 
				+            
			
 
				+        Returns:
			
 
				+            每个单元格的文本列表
			
 
				+            每个单元格的置信度列表
			
 
				+            每个单元格匹配到的 OCR boxes 列表
			
 
				+            需要二次 OCR 的单元格索引列表（OCR box 跨多个单元格或过大）
			
 
				+        """
			
 
				+        texts: List[str] = ["" for _ in bboxes]
			
 
				+        scores: List[float] = [0.0 for _ in bboxes]
			
 
				+        matched_boxes_list: List[List[Dict[str, Any]]] = [[] for _ in bboxes]
			
 
				+        need_reocr_indices: List[int] = []
			
 
				+        
			
 
				+        if not ocr_boxes:
			
 
				+            return texts, scores, matched_boxes_list, need_reocr_indices
			
 
				+        
			
 
				+        # 预处理OCR结果：计算中心点
			
 
				+        ocr_items: List[Dict[str, Any]] = []
			
 
				+        for item in ocr_boxes:
			
 
				+            # 使用 CoordinateUtils.poly_to_bbox() 替换 _normalize_bbox()
			
 
				+            box = CoordinateUtils.poly_to_bbox(item.get("bbox", []))
			
 
				+            if not box:
			
 
				+                continue
			
 
				+            cx = (box[0] + box[2]) / 2
			
 
				+            cy = (box[1] + box[3]) / 2
			
 
				+            ocr_items.append({
			
 
				+                "center_x": cx,
			
 
				+                "center_y": cy,
			
 
				+                "y1": box[1],
			
 
				+                "bbox": box,  # 保存 bbox 用于跨单元格检测
			
 
				+                "text": item.get("text", ""),
			
 
				+                "confidence": float(item.get("confidence", item.get("score", 1.0))),
			
 
				+                "original_box": item,  # 保存完整的 OCR box 对象
			
 
				+            })
			
 
				+        
			
 
				+        # 为每个单元格匹配OCR文本
			
 
				+        for idx, bbox in enumerate(bboxes):
			
 
				+            x1, y1, x2, y2 = bbox
			
 
				+            matched: List[Tuple[str, float, float, Dict[str, Any]]] = [] # (text, y1, score, original_box)
			
 
				+            
			
 
				+            for ocr in ocr_items:
			
 
				+                if x1 <= ocr["center_x"] <= x2 and y1 <= ocr["center_y"] <= y2:
			
 
				+                    matched.append((ocr["text"], ocr["y1"], ocr["confidence"], ocr["original_box"]))
			
 
				+            
			
 
				+            if matched:
			
 
				+                # 按y坐标排序，确保多行文本顺序正确
			
 
				+                matched.sort(key=lambda x: x[1])
			
 
				+                texts[idx] = "".join([t for t, _, _, _ in matched])
			
 
				+                # 计算平均置信度
			
 
				+                avg_score = sum([s for _, _, s, _ in matched]) / len(matched)
			
 
				+                scores[idx] = avg_score
			
 
				+                # 保存匹配到的 OCR boxes
			
 
				+                matched_boxes_list[idx] = [box for _, _, _, box in matched]
			
 
				+                
			
 
				+                # 检测 OCR box 是否跨多个单元格或过大
			
 
				+                for ocr_item in ocr_items:
			
 
				+                    ocr_bbox = ocr_item["bbox"]
			
 
				+                    # 检测是否跨多个单元格
			
 
				+                    overlapping_cells = self.detect_ocr_box_spanning_cells(ocr_bbox, bboxes, overlap_threshold=0.3)
			
 
				+                    if len(overlapping_cells) >= 2:
			
 
				+                        # OCR box 跨多个单元格，标记所有相关单元格需要二次 OCR
			
 
				+                        for cell_idx in overlapping_cells:
			
 
				+                            if cell_idx not in need_reocr_indices:
			
 
				+                                need_reocr_indices.append(cell_idx)
			
 
				+                        logger.debug(f"检测到 OCR box 跨 {len(overlapping_cells)} 个单元格: {ocr_item['text'][:20]}...")
			
 
				+                    
			
 
				+                    # 检测 OCR box 是否相对于当前单元格过大
			
 
				+                    if self.is_ocr_box_too_large(ocr_bbox, bbox, size_ratio_threshold=1.5):
			
 
				+                        if idx not in need_reocr_indices:
			
 
				+                            need_reocr_indices.append(idx)
			
 
				+                        logger.debug(f"检测到 OCR box 相对于单元格过大 (单元格 {idx}): {ocr_item['text'][:20]}...")
			
 
				+            else:
			
 
				+                scores[idx] = 0.0 # 无匹配文本，置信度为0
			
 
				+        
			
 
				+        return texts, scores, matched_boxes_list, need_reocr_indices
			
 
				+    
			
 
				+    @staticmethod
			
 
				+    def merge_boxes_original_bbox(boxes: List[Dict[str, Any]]) -> List[float]:
			
 
				+        """
			
 
				+        合并多个 OCR box 的原始坐标（优先使用 original_bbox）
			
 
				+        
			
 
				+        Args:
			
 
				+            boxes: OCR box 列表，每个 box 可能包含 'original_bbox' 或 'bbox' 字段
			
 
				+            
			
 
				+        Returns:
			
 
				+            合并后的 bbox [x1, y1, x2, y2]（原始坐标系）
			
 
				+        """
			
 
				+        if not boxes:
			
 
				+            return [0.0, 0.0, 0.0, 0.0]
			
 
				+        
			
 
				+        # 优先使用 original_bbox，如果没有则使用 bbox
			
 
				+        def get_coords(b):
			
 
				+            return b.get('original_bbox', b.get('bbox', [0.0, 0.0, 0.0, 0.0]))
			
 
				+        
			
 
				+        coords_list = [get_coords(b) for b in boxes if get_coords(b) and len(get_coords(b)) == 4]
			
 
				+        if not coords_list:
			
 
				+            return [0.0, 0.0, 0.0, 0.0]
			
 
				+        
			
 
				+        x1 = min(c[0] for c in coords_list)
			
 
				+        y1 = min(c[1] for c in coords_list)
			
 
				+        x2 = max(c[2] for c in coords_list)
			
 
				+        y2 = max(c[3] for c in coords_list)
			
 
				+        return [float(x1), float(y1), float(x2), float(y2)]
			
 
				+    
			
 
				+    @staticmethod
			
 
				+    def detect_ocr_box_spanning_cells(
			
 
				+        ocr_bbox: List[float],
			
 
				+        cell_bboxes: List[List[float]],
			
 
				+        overlap_threshold: float = 0.3
			
 
				+    ) -> List[int]:
			
 
				+        """
			
 
				+        检测 OCR box 是否跨多个单元格
			
 
				+        
			
 
				+        Args:
			
 
				+            ocr_bbox: OCR box 坐标 [x1, y1, x2, y2]
			
 
				+            cell_bboxes: 单元格坐标列表
			
 
				+            overlap_threshold: 重叠比例阈值（OCR box 与单元格的重叠面积占 OCR box 面积的比例）
			
 
				+            
			
 
				+        Returns:
			
 
				+            与 OCR box 重叠的单元格索引列表
			
 
				+        """
			
 
				+        if not ocr_bbox or len(ocr_bbox) < 4:
			
 
				+            return []
			
 
				+        
			
 
				+        overlapping_cells = []
			
 
				+        ocr_area = (ocr_bbox[2] - ocr_bbox[0]) * (ocr_bbox[3] - ocr_bbox[1])
			
 
				+        
			
 
				+        if ocr_area <= 0:
			
 
				+            return []
			
 
				+        
			
 
				+        for idx, cell_bbox in enumerate(cell_bboxes):
			
 
				+            if not cell_bbox or len(cell_bbox) < 4:
			
 
				+                continue
			
 
				+            
			
 
				+            # 计算交集
			
 
				+            inter_x1 = max(ocr_bbox[0], cell_bbox[0])
			
 
				+            inter_y1 = max(ocr_bbox[1], cell_bbox[1])
			
 
				+            inter_x2 = min(ocr_bbox[2], cell_bbox[2])
			
 
				+            inter_y2 = min(ocr_bbox[3], cell_bbox[3])
			
 
				+            
			
 
				+            if inter_x2 > inter_x1 and inter_y2 > inter_y1:
			
 
				+                inter_area = (inter_x2 - inter_x1) * (inter_y2 - inter_y1)
			
 
				+                overlap_ratio = inter_area / ocr_area
			
 
				+                
			
 
				+                if overlap_ratio > overlap_threshold:
			
 
				+                    overlapping_cells.append(idx)
			
 
				+        
			
 
				+        return overlapping_cells
			
 
				+    
			
 
				+    @staticmethod
			
 
				+    def is_ocr_box_too_large(
			
 
				+        ocr_bbox: List[float],
			
 
				+        cell_bbox: List[float],
			
 
				+        size_ratio_threshold: float = 1.5
			
 
				+    ) -> bool:
			
 
				+        """
			
 
				+        检测 OCR box 是否相对于单元格过大
			
 
				+        
			
 
				+        Args:
			
 
				+            ocr_bbox: OCR box 坐标 [x1, y1, x2, y2]
			
 
				+            cell_bbox: 单元格坐标 [x1, y1, x2, y2]
			
 
				+            size_ratio_threshold: 面积比阈值，如果 OCR box 面积 > 单元格面积 * 阈值，则认为过大
			
 
				+            
			
 
				+        Returns:
			
 
				+            是否过大
			
 
				+        """
			
 
				+        if not ocr_bbox or len(ocr_bbox) < 4 or not cell_bbox or len(cell_bbox) < 4:
			
 
				+            return False
			
 
				+        
			
 
				+        ocr_area = (ocr_bbox[2] - ocr_bbox[0]) * (ocr_bbox[3] - ocr_bbox[1])
			
 
				+        cell_area = (cell_bbox[2] - cell_bbox[0]) * (cell_bbox[3] - cell_bbox[1])
			
 
				+        
			
 
				+        if cell_area <= 0:
			
 
				+            return False
			
 
				+        
			
 
				+        size_ratio = ocr_area / cell_area
			
 
				+        return size_ratio > size_ratio_threshold
			
 
				+    
			
 
				+    def second_pass_ocr_fill(
			
 
				+        self,
			
 
				+        table_image: np.ndarray,
			
 
				+        bboxes: List[List[float]],
			
 
				+        texts: List[str],
			
 
				+        scores: Optional[List[float]] = None,
			
 
				+        need_reocr_indices: Optional[List[int]] = None,
			
 
				+        force_all: bool = False,
			
 
				+    ) -> List[str]:
			
 
				+        """
			
 
				+        二次OCR统一封装：
			
 
				+        - 对空文本单元格裁剪图块并少量外扩
			
 
				+        - 对低置信度文本进行重识别
			
 
				+        - 对竖排单元格（高宽比大）进行旋转后识别
			
 
				+        - 对 OCR 误合并的单元格进行重识别（OCR box 跨多个单元格或过大）
			
 
				+        - [New] force_all=True: 强制对所有单元格进行裁剪识别 (Full-page OCR 作为 fallback)
			
 
				+        
			
 
				+        Args:
			
 
				+            table_image: 表格图像
			
 
				+            bboxes: 单元格坐标列表
			
 
				+            texts: 当前文本列表
			
 
				+            scores: 当前置信度列表
			
 
				+            need_reocr_indices: 需要二次 OCR 的单元格索引列表（OCR 误合并检测结果）
			
 
				+            force_all: 是否强制对所有单元格进行 OCR (Default: False)
			
 
				+        """
			
 
				+        try:
			
 
				+            if not self.ocr_engine:
			
 
				+                return texts
			
 
				+            
			
 
				+            # 如果没有传入 scores，则默认全为 1.0（仅处理空文本）
			
 
				+            if scores is None:
			
 
				+                scores = [1.0 if t else 0.0 for t in texts]
			
 
				+            
			
 
				+            # 如果没有传入 need_reocr_indices，初始化为空列表
			
 
				+            if need_reocr_indices is None:
			
 
				+                need_reocr_indices = []
			
 
				+
			
 
				+            h_img, w_img = table_image.shape[:2]
			
 
				+            margin = self.cell_crop_margin
			
 
				+            
			
 
				+            # 触发二次OCR的阈值
			
 
				+            trigger_score_thresh = 0.90 
			
 
				+
			
 
				+            crop_list: List[np.ndarray] = []
			
 
				+            crop_indices: List[int] = []
			
 
				+
			
 
				+            # 收集需要二次OCR的裁剪块
			
 
				+            for i, t in enumerate(texts):
			
 
				+                bbox = bboxes[i]
			
 
				+                w_box = bbox[2] - bbox[0]
			
 
				+                h_box = bbox[3] - bbox[1]
			
 
				+                
			
 
				+                # 判断是否需要二次OCR
			
 
				+                need_reocr = False
			
 
				+                reocr_reason = ""
			
 
				+                
			
 
				+                if force_all:
			
 
				+                    need_reocr = True
			
 
				+                    reocr_reason = "强制全量OCR"
			
 
				+                else:
			
 
				+                    # 1. 文本为空
			
 
				+                    if not t or not t.strip():
			
 
				+                        need_reocr = True
			
 
				+                        reocr_reason = "空文本"
			
 
				+                    # 2. 置信度过低
			
 
				+                    elif scores[i] < trigger_score_thresh:
			
 
				+                        need_reocr = True
			
 
				+                        reocr_reason = "低置信度"
			
 
				+                    # 3. 竖排单元格 (高宽比 > 2.5) 且置信度不是极高
			
 
				+                    elif h_box > w_box * 2.5 and scores[i] < 0.98:
			
 
				+                        need_reocr = True
			
 
				+                        reocr_reason = "竖排文本"
			
 
				+                    # 4. OCR 误合并：OCR box 跨多个单元格或过大
			
 
				+                    elif i in need_reocr_indices:
			
 
				+                        need_reocr = True
			
 
				+                        reocr_reason = "OCR误合并"
			
 
				+
			
 
				+                if not need_reocr:
			
 
				+                    continue
			
 
				+                
			
 
				+                # if reocr_reason:
			
 
				+                    # logger.debug(f"单元格 {i} 触发二次OCR: {reocr_reason} (文本: '{t[:30]}...')")
			
 
				+                
			
 
				+                if i >= len(bboxes):
			
 
				+                    continue
			
 
				+
			
 
				+                x1, y1, x2, y2 = map(int, bboxes[i])
			
 
				+                x1 = max(0, x1 - margin)
			
 
				+                y1 = max(0, y1 - margin)
			
 
				+                x2 = min(w_img, x2 + margin)
			
 
				+                y2 = min(h_img, y2 + margin)
			
 
				+                if x2 <= x1 or y2 <= y1:
			
 
				+                    continue
			
 
				+
			
 
				+                cell_img = table_image[y1:y2, x1:x2]
			
 
				+                if cell_img.size == 0:
			
 
				+                    continue
			
 
				+
			
 
				+                ch, cw = cell_img.shape[:2]
			
 
				+                # 小图放大
			
 
				+                if ch < 64 or cw < 64:
			
 
				+                    cell_img = cv2.resize(cell_img, None, fx=2.0, fy=2.0, interpolation=cv2.INTER_CUBIC)
			
 
				+                    ch, cw = cell_img.shape[:2]
			
 
				+
			
 
				+                # 竖排文本旋转为横排
			
 
				+                if ch > cw * 2.0:
			
 
				+                    cell_img = cv2.rotate(cell_img, cv2.ROTATE_90_COUNTERCLOCKWISE)
			
 
				+
			
 
				+                crop_list.append(cell_img)
			
 
				+                crop_indices.append(i)
			
 
				+
			
 
				+            if not crop_list:
			
 
				+                return texts
			
 
				+            
			
 
				+            logger.info(f"触发二次OCR: {len(crop_list)} 个单元格 (总数 {len(texts)})")
			
 
				+
			
 
				+            # 先批量检测文本块，再批量识别（提高效率）
			
 
				+            # Step 1: 批量检测
			
 
				+            det_results = []
			
 
				+            for cell_img in crop_list:
			
 
				+                try:
			
 
				+                    det_res = self.ocr_engine.ocr(cell_img, det=True, rec=False)
			
 
				+                    if det_res and len(det_res) > 0:
			
 
				+                        dt_boxes = det_res[0]
			
 
				+                        det_results.append(dt_boxes if dt_boxes else [])
			
 
				+                    else:
			
 
				+                        det_results.append([])
			
 
				+                except Exception as e:
			
 
				+                    logger.warning(f"单元格文本检测失败: {e}")
			
 
				+                    det_results.append([])
			
 
				+            
			
 
				+            # Step 2: 从检测框中裁剪图像并批量识别
			
 
				+            rec_img_list = []
			
 
				+            rec_indices = []
			
 
				+            for cell_idx, dt_boxes in enumerate(det_results):
			
 
				+                if not dt_boxes:
			
 
				+                    continue
			
 
				+                cell_img = crop_list[cell_idx]
			
 
				+                h, w = cell_img.shape[:2]
			
 
				+                
			
 
				+                for box_idx, box in enumerate(dt_boxes):
			
 
				+                    if not box or len(box) < 4:
			
 
				+                        continue
			
 
				+                    # 将检测框转换为bbox格式并裁剪
			
 
				+                    if isinstance(box[0], (list, tuple)):
			
 
				+                        # 多边形格式
			
 
				+                        xs = [p[0] for p in box]
			
 
				+                        ys = [p[1] for p in box]
			
 
				+                        x1, y1 = int(max(0, min(xs))), int(max(0, min(ys)))
			
 
				+                        x2, y2 = int(min(w, max(xs))), int(min(h, max(ys)))
			
 
				+                    else:
			
 
				+                        # bbox格式
			
 
				+                        xs = [box[i] for i in range(0, len(box), 2)]
			
 
				+                        ys = [box[i] for i in range(1, len(box), 2)]
			
 
				+                        x1, y1 = int(max(0, min(xs))), int(max(0, min(ys)))
			
 
				+                        x2, y2 = int(min(w, max(xs))), int(min(h, max(ys)))
			
 
				+                    
			
 
				+                    if x2 > x1 and y2 > y1:
			
 
				+                        cropped = cell_img[y1:y2, x1:x2]
			
 
				+                        if cropped.size > 0:
			
 
				+                            rec_img_list.append(cropped)
			
 
				+                            rec_indices.append((cell_idx, box_idx))
			
 
				+            
			
 
				+            # Step 3: 批量识别
			
 
				+            results = [[] for _ in crop_list]
			
 
				+            if rec_img_list:
			
 
				+                try:
			
 
				+                    rec_res = self.ocr_engine.ocr(rec_img_list, det=False, rec=True)
			
 
				+                    if rec_res and len(rec_res) > 0:
			
 
				+                        rec_results = rec_res[0] if isinstance(rec_res[0], list) else rec_res
			
 
				+                        # 将识别结果回填到对应的单元格
			
 
				+                        for (cell_idx, box_idx), rec_item in zip(rec_indices, rec_results):
			
 
				+                            if rec_item:
			
 
				+                                if isinstance(rec_item, (list, tuple)) and len(rec_item) >= 2:
			
 
				+                                    text = str(rec_item[0] or "").strip()
			
 
				+                                    score = float(rec_item[1] or 0.0)
			
 
				+                                    if text:
			
 
				+                                        results[cell_idx].append((text, score))
			
 
				+                except Exception as e:
			
 
				+                    logger.warning(f"批量识别失败: {e}")
			
 
				+
			
 
				+            # 解析为 (text, score) - 支持合并多个文本块
			
 
				+            def _parse_item(res_item) -> Tuple[str, float]:
			
 
				+                if res_item is None:
			
 
				+                    return "", 0.0
			
 
				+                
			
 
				+                # 列表形式：包含多个文本块，需要合并
			
 
				+                if isinstance(res_item, list) and len(res_item) > 0:
			
 
				+                    texts_list = []
			
 
				+                    scores_list = []
			
 
				+                    
			
 
				+                    for item in res_item:
			
 
				+                        if isinstance(item, tuple) and len(item) >= 2:
			
 
				+                            text = str(item[0] or "").strip()
			
 
				+                            score = float(item[1] or 0.0)
			
 
				+                            if text:
			
 
				+                                texts_list.append(text)
			
 
				+                                scores_list.append(score)
			
 
				+                        elif isinstance(item, list) and len(item) >= 2:
			
 
				+                            text = str(item[0] or "").strip()
			
 
				+                            score = float(item[1] or 0.0)
			
 
				+                            if text:
			
 
				+                                texts_list.append(text)
			
 
				+                                scores_list.append(score)
			
 
				+                        elif isinstance(item, dict):
			
 
				+                            text = str(item.get("text") or item.get("label") or "").strip()
			
 
				+                            score = float(item.get("score") or item.get("confidence") or 0.0)
			
 
				+                            if text:
			
 
				+                                texts_list.append(text)
			
 
				+                                scores_list.append(score)
			
 
				+                    
			
 
				+                    if texts_list:
			
 
				+                        combined_text = "".join(texts_list)
			
 
				+                        avg_score = sum(scores_list) / len(scores_list) if scores_list else 0.0
			
 
				+                        return combined_text, avg_score
			
 
				+                    return "", 0.0
			
 
				+                
			
 
				+                # 直接 (text, score)
			
 
				+                if isinstance(res_item, tuple) and len(res_item) >= 2:
			
 
				+                    return str(res_item[0] or ""), float(res_item[1] or 0.0)
			
 
				+                
			
 
				+                # 字典形式
			
 
				+                if isinstance(res_item, dict):
			
 
				+                    txt = str(res_item.get("text") or res_item.get("label") or "")
			
 
				+                    sc = float(res_item.get("score") or res_item.get("confidence") or 0.0)
			
 
				+                    return txt, sc
			
 
				+                
			
 
				+                return "", 0.0
			
 
				+
			
 
				+            # 对齐长度，避免越界
			
 
				+            n = min(len(results) if isinstance(results, list) else 0, len(crop_list), len(crop_indices))
			
 
				+            conf_th = self.ocr_conf_threshold
			
 
				+
			
 
				+            for k in range(n):
			
 
				+                text_k, score_k = _parse_item(results[k])
			
 
				+                if text_k and score_k >= conf_th:
			
 
				+                    texts[crop_indices[k]] = text_k
			
 
				+
			
 
				+        except Exception as e:
			
 
				+            logger.warning(f"二次OCR失败: {e}")
			
 
				+
			
 
				+        return texts
			
 
				+
			
--- a/ocr_tools/universal_doc_parser/models/adapters/wired_table/visualization.py
+++ b/ocr_tools/universal_doc_parser/models/adapters/wired_table/visualization.py
@@ -0,0 +1,177 @@
 
				+"""
			
 
				+可视化模块
			
 
				+
			
 
				+提供表格识别结果的可视化功能。
			
 
				+"""
			
 
				+from typing import List, Dict, Optional
			
 
				+import cv2
			
 
				+import numpy as np
			
 
				+from loguru import logger
			
 
				+
			
 
				+
			
 
				+class WiredTableVisualizer:
			
 
				+    """可视化工具类"""
			
 
				+    
			
 
				+    @staticmethod
			
 
				+    def visualize_table_lines(
			
 
				+        table_image: np.ndarray,
			
 
				+        hpred: np.ndarray,
			
 
				+        vpred: np.ndarray,
			
 
				+        output_path: str
			
 
				+    ) -> np.ndarray:
			
 
				+        """
			
 
				+        可视化 UNet 检测到的表格线
			
 
				+        
			
 
				+        Args:
			
 
				+            table_image: 原始图片
			
 
				+            hpred: 横线mask（已缩放到原图大小）
			
 
				+            vpred: 竖线mask（已缩放到原图大小）
			
 
				+            output_path: 输出路径
			
 
				+            
			
 
				+        Returns:
			
 
				+            可视化图像
			
 
				+        """
			
 
				+        vis_img = table_image.copy()
			
 
				+        if len(vis_img.shape) == 2:
			
 
				+            vis_img = cv2.cvtColor(vis_img, cv2.COLOR_GRAY2BGR)
			
 
				+        
			
 
				+        # 横线用红色，竖线用蓝色
			
 
				+        vis_img[hpred > 128] = [0, 0, 255]  # 红色横线
			
 
				+        vis_img[vpred > 128] = [255, 0, 0]  # 蓝色竖线
			
 
				+        
			
 
				+        cv2.imwrite(output_path, vis_img)
			
 
				+        logger.info(f"表格线可视化: {output_path}")
			
 
				+        
			
 
				+        return vis_img
			
 
				+    
			
 
				+    @staticmethod
			
 
				+    def visualize_connected_components(
			
 
				+        hpred_up: np.ndarray,
			
 
				+        vpred_up: np.ndarray,
			
 
				+        bboxes: List[List[float]],
			
 
				+        upscale: float,
			
 
				+        output_path: str
			
 
				+    ) -> None:
			
 
				+        """
			
 
				+        复刻连通域风格：红色网格线背景 + 绿色单元格框。
			
 
				+        使用上采样尺度的 mask 与坐标，保证线条清晰。
			
 
				+        
			
 
				+        Args:
			
 
				+            hpred_up: 横线预测mask（上采样后）
			
 
				+            vpred_up: 竖线预测mask（上采样后）
			
 
				+            bboxes: 单元格bbox列表
			
 
				+            upscale: 上采样比例
			
 
				+            output_path: 输出路径
			
 
				+        """
			
 
				+        h, w = hpred_up.shape[:2]
			
 
				+
			
 
				+        # 与连通域提取相同的预处理，以获得直观的网格线背景
			
 
				+        _, h_bin = cv2.threshold(hpred_up, 127, 255, cv2.THRESH_BINARY)
			
 
				+        _, v_bin = cv2.threshold(vpred_up, 127, 255, cv2.THRESH_BINARY)
			
 
				+        kernel_h = cv2.getStructuringElement(cv2.MORPH_RECT, (5, 1))
			
 
				+        kernel_v = cv2.getStructuringElement(cv2.MORPH_RECT, (1, 5))
			
 
				+        h_bin = cv2.dilate(h_bin, kernel_h, iterations=1)
			
 
				+        v_bin = cv2.dilate(v_bin, kernel_v, iterations=1)
			
 
				+        grid_mask = cv2.bitwise_or(h_bin, v_bin)
			
 
				+
			
 
				+        vis = np.zeros((h, w, 3), dtype=np.uint8)
			
 
				+        vis[grid_mask > 0] = [0, 0, 255]  # 红色线条
			
 
				+
			
 
				+        # 在上采样坐标系上绘制单元格框
			
 
				+        for box in bboxes:
			
 
				+            x1, y1, x2, y2 = [int(c * upscale) for c in box]
			
 
				+            cv2.rectangle(vis, (x1, y1), (x2, y2), (0, 255, 0), 2)
			
 
				+
			
 
				+        cv2.imwrite(output_path, vis)
			
 
				+        logger.info(f"连通域可视化: {output_path}")
			
 
				+    
			
 
				+    @staticmethod
			
 
				+    def visualize_grid_structure(
			
 
				+        table_image: np.ndarray,
			
 
				+        cells: List[Dict],
			
 
				+        output_path: str
			
 
				+    ) -> None:
			
 
				+        """
			
 
				+        可视化表格逻辑结构 (row, col, span)
			
 
				+        
			
 
				+        Args:
			
 
				+            table_image: 表格图像
			
 
				+            cells: 单元格列表，包含 row, col, rowspan, colspan, bbox 等字段
			
 
				+            output_path: 输出路径
			
 
				+        """
			
 
				+        vis = table_image.copy()
			
 
				+        if len(vis.shape) == 2:
			
 
				+            vis = cv2.cvtColor(vis, cv2.COLOR_GRAY2BGR)
			
 
				+            
			
 
				+        for cell in cells:
			
 
				+            x1, y1, x2, y2 = [int(c) for c in cell["bbox"]]
			
 
				+            
			
 
				+            # 绘制边框
			
 
				+            cv2.rectangle(vis, (x1, y1), (x2, y2), (0, 255, 0), 2)
			
 
				+            
			
 
				+            # 绘制逻辑坐标
			
 
				+            info = f"R{cell['row']}C{cell['col']}"
			
 
				+            if cell.get('rowspan', 1) > 1:
			
 
				+                info += f" rs{cell['rowspan']}"
			
 
				+            if cell.get('colspan', 1) > 1:
			
 
				+                info += f" cs{cell['colspan']}"
			
 
				+            
			
 
				+            # 居中显示
			
 
				+            font_scale = 0.5
			
 
				+            thickness = 1
			
 
				+            (tw, th), _ = cv2.getTextSize(info, cv2.FONT_HERSHEY_SIMPLEX, font_scale, thickness)
			
 
				+            tx = x1 + (x2 - x1 - tw) // 2
			
 
				+            ty = y1 + (y2 - y1 + th) // 2
			
 
				+            
			
 
				+            # 描边以增加可读性
			
 
				+            cv2.putText(vis, info, (tx, ty), cv2.FONT_HERSHEY_SIMPLEX, font_scale, (0, 0, 0), thickness + 2)
			
 
				+            cv2.putText(vis, info, (tx, ty), cv2.FONT_HERSHEY_SIMPLEX, font_scale, (0, 255, 255), thickness)
			
 
				+            
			
 
				+        cv2.imwrite(output_path, vis)
			
 
				+        logger.info(f"表格结构可视化: {output_path}")
			
 
				+    
			
 
				+    @staticmethod
			
 
				+    def visualize_with_text(
			
 
				+        image: np.ndarray,
			
 
				+        bboxes: List[List[float]],
			
 
				+        texts: List[str],
			
 
				+        output_path: Optional[str] = None
			
 
				+    ) -> np.ndarray:
			
 
				+        """
			
 
				+        可视化单元格及其文本内容
			
 
				+        
			
 
				+        Args:
			
 
				+            image: 原始图像
			
 
				+            bboxes: 单元格坐标列表
			
 
				+            texts: 文本列表
			
 
				+            output_path: 输出路径（可选）
			
 
				+            
			
 
				+        Returns:
			
 
				+            可视化图像
			
 
				+        """
			
 
				+        vis_img = image.copy()
			
 
				+        if len(vis_img.shape) == 2:
			
 
				+            vis_img = cv2.cvtColor(vis_img, cv2.COLOR_GRAY2BGR)
			
 
				+        
			
 
				+        for idx, (bbox, text) in enumerate(zip(bboxes, texts)):
			
 
				+            x1, y1, x2, y2 = map(int, bbox)
			
 
				+            
			
 
				+            # 有文本用绿色，无文本用红色
			
 
				+            color = (0, 255, 0) if text else (0, 0, 255)
			
 
				+            cv2.rectangle(vis_img, (x1, y1), (x2, y2), color, 2)
			
 
				+            
			
 
				+            # 显示文本预览（最多10个字符）
			
 
				+            preview = text[:10] + "..." if len(text) > 10 else text
			
 
				+            if preview:
			
 
				+                cv2.putText(
			
 
				+                    vis_img, preview,
			
 
				+                    (x1 + 2, y1 + 15),
			
 
				+                    cv2.FONT_HERSHEY_SIMPLEX, 0.35, (255, 0, 0), 1
			
 
				+                )
			
 
				+        
			
 
				+        if output_path:
			
 
				+            cv2.imwrite(output_path, vis_img)
			
 
				+            logger.info(f"文本填充可视化已保存: {output_path}")
			
 
				+        
			
 
				+        return vis_img
			
 
				+
			
--- a/ocr_utils/pdf_utils.py
+++ b/ocr_utils/pdf_utils.py
@@ -145,7 +145,8 @@ class PDFUtils:
 
				                     'img_pil': img_dict['img_pil'],
			
 
				                     'scale': img_dict.get('scale', dpi / 72),
			
 
				                     'source_path': str(document_path),
			
 
				-                    'page_idx': idx  # 原始页码索引
			
 
				+                    'page_idx': idx,  # 原始页码索引
			
 
				+                    'page_name': f"{document_path.stem}_page_{idx + 1:03d}"
			
 
				                 })
			
 
				                 
			
 
				         elif document_path.suffix.lower() in ['.png', '.jpg', '.jpeg', '.bmp', '.tiff', '.tif']:
			
--- a/table_line_generator/backend/api/editor.py
+++ b/table_line_generator/backend/api/editor.py
@@ -71,6 +71,8 @@ async def upload_files(
 
				         
			
 
				     except ValueError as e:
			
 
				         logger.error(f"上传处理失败: {e}")
			
 
				+        import traceback
			
 
				+        logger.error(traceback.format_exc())
			
 
				         raise HTTPException(status_code=400, detail=str(e))
			
 
				     except Exception as e:
			
 
				         logger.exception(f"上传处理异常: {e}")
			
--- a/table_line_generator/core/table_analyzer.py
+++ b/table_line_generator/core/table_analyzer.py
@@ -232,7 +232,7 @@ class TableAnalyzer:
 
				             'horizontal_lines': horizontal_lines,
			
 
				             'vertical_lines': vertical_lines,
			
 
				             'row_height': self.row_height,
			
 
				-            'col_widths': self.col_widths,
			
 
				+            'col_widths': [int(round(c)) for c in self.col_widths],
			
 
				             'table_bbox': self._get_table_bbox(),
			
 
				             'total_rows': actual_rows,
			
 
				             'total_cols': actual_cols,
			
@@ -302,7 +302,7 @@ class TableAnalyzer:
 
				             'horizontal_lines': horizontal_lines,
			
 
				             'vertical_lines': vertical_lines,
			
 
				             'row_height': self.row_height,
			
 
				-            'col_widths': self.col_widths,
			
 
				+            'col_widths': [int(round(c)) for c in self.col_widths],
			
 
				             'table_bbox': self._get_table_bbox(),
			
 
				             'mode': 'fixed',
			
 
				             'modified_h_lines': [],
Tác giả	SHA1 Thông báo	Ngày
zhch158_admin	3a5b2ab300 chore: Add .gitignore and a script to verify GridRecovery module import and cell computation with mocked dependencies.	3 ngày trước cách đây
zhch158_admin	76f8e864a8 feat: Add .gitignore, implement grid recovery syntax verification, and enhance HuggingFace model loading with local cache prioritization.	3 ngày trước cách đây
zhch158_admin	e355727495 feat: Add wired table processing modules, `wired_table` adapter, and enhance HuggingFace model caching in `docling_layout_adapter`.	3 ngày trước cách đây
zhch158_admin	a4ad1d803a feat: Implement wired table processing with grid recovery and skew detection, and improve HuggingFace model caching.	3 ngày trước cách đây
zhch158_admin	4f32495604 feat: Introduce new wired table processing module with enhanced skew detection, grid recovery, and output capabilities, and update pipeline to utilize it.	3 ngày trước cách đây
zhch158_admin	3b3c3c9c5a feat: Introduce wired table parsing adapter with grid recovery, OCR formatting, and enhanced region cropping.	3 ngày trước cách đây
zhch158_admin	ce29ee3458 feat: Implement `mineru_wired_table_v2` adapter with enhanced table OCR preprocessing, grid recovery, and visualization utilities.	3 ngày trước cách đây
zhch158_admin	6477e9183b feat: Add wired table adapter components, update Mineru wired table adapter, and improve HuggingFace model caching logic.	3 ngày trước cách đây
zhch158_admin	f7da730070 fix: 增强错误日志记录，添加详细的堆栈跟踪信息以便于调试	5 ngày trước cách đây
zhch158_admin	4b399d085e feat: 添加倾斜检测与矫正功能，集成BBoxExtractor以优化OCR框处理	5 ngày trước cách đây
zhch158_admin	05d07bb9ef feat: 添加BBoxExtractor以计算OCR文本的倾斜角度并记录信息	5 ngày trước cách đây
zhch158_admin	d7e5f2f689 refactor: 移除多边形到边界框的转换逻辑，简化IOU计算过程	5 ngày trước cách đây
zhch158_admin	bd17ca00f4 feat: 更新示例输入输出路径，添加新的测试图像以增强文档解析功能的测试覆盖率	5 ngày trước cách đây
zhch158_admin	5235aff1b9 feat: 更新MinerU适配器，添加ocr_platform根目录到Python路径并优化坐标处理逻辑	5 ngày trước cách đây
zhch158_admin	fe223cd19d feat: 优化OCR文本框识别逻辑，优先使用多边形数据并增强错误日志信息	5 ngày trước cách đây
zhch158_admin	60f761a6b5 feat: 重构单元格计算与网格恢复逻辑，增强对复杂表格的处理能力	6 ngày trước cách đây