il y a 4 jours · 7b6a80f651
--- a/table_line_generator/README.md
+++ b/table_line_generator/README.md
@@ -0,0 +1,347 @@
 
				+# 📏 表格线生成工具
			
 
				+
			
 
				+> 基于 OCR bbox 结果自动生成表格线，提升无线表格识别准确率
			
 
				+
			
 
				+## 🎯 功能概述
			
 
				+
			
 
				+通过 OCR 识别的文本框位置（bbox）自动推测表格结构，生成标准表格线，然后应用 PPStructure 有线表格识别，显著提升准确率。
			
 
				+
			
 
				+### 核心功能
			
 
				+
			
 
				+- ✅ **自动检测**：基于 OCR bbox 自动分析行列结构
			
 
				+- ✅ **可视化调整**：Streamlit 界面支持人工微调表格线位置
			
 
				+- ✅ **模板复用**：将标注好的表格结构应用到同类型的其他页面
			
 
				+- ✅ **批量处理**：一键应用模板到整个文件夹
			
 
				+
			
 
				+## 📋 完整流程
			
 
				+
			
 
				+```
			
 
				+OCR识别(bbox) → 自动分析 → 人工调整 → 保存模板 → 批量应用 → PPStructure识别
			
 
				+```
			
 
				+
			
 
				+## 🚀 快速开始
			
 
				+
			
 
				+### 1️⃣ 准备数据
			
 
				+
			
 
				+确保有以下文件：
			
 
				+- **OCR 结果 JSON**：包含文本框坐标（bbox）
			
 
				+- **对应图片**：PNG 或 JPG 格式
			
 
				+
			
 
				+**支持的 OCR 格式：**
			
 
				+- PaddleOCR PPStructure V3 格式
			
 
				+- 标准 OCR 格式（含 `text` 和 `bbox` 字段）
			
 
				+
			
 
				+### 2️⃣ 打开可视化编辑器
			
 
				+
			
 
				+```bash
			
 
				+streamlit run streamlit_table_line_editor.py
			
 
				+```
			
 
				+
			
 
				+### 3️⃣ 新建标注（第一页）
			
 
				+
			
 
				+#### **选择模式**
			
 
				+在侧边栏选择 **🆕 新建标注**
			
 
				+
			
 
				+#### **上传文件**
			
 
				+- 上传 **OCR 结果 JSON**（如 `康强_page_001.json`）
			
 
				+- 上传 **对应图片**（如 `康强_page_001.png`）
			
 
				+
			
 
				+#### **调整参数**
			
 
				+- **Y轴聚类容差**：控制行检测的灵敏度（默认 5）
			
 
				+- **X轴聚类容差**：控制列检测的灵敏度（默认 10）
			
 
				+- **最小行高**：过滤高度过小的行（默认 20）
			
 
				+
			
 
				+#### **分析并调整**
			
 
				+1. 点击 **🔍 分析表格结构**
			
 
				+2. 查看检测结果（行数、列数、横线数、竖线数）
			
 
				+3. 使用 **🛠️ 手动调整** 功能微调：
			
 
				+   - 调整横线/竖线位置
			
 
				+   - 添加横线/竖线
			
 
				+   - 删除横线/竖线
			
 
				+
			
 
				+#### **保存配置**
			
 
				+1. 勾选 **保存表格结构配置**
			
 
				+2. 勾选 **保存表格线图片**（可选）
			
 
				+3. 选择线条颜色（黑色/蓝色/红色）
			
 
				+4. 点击 **💾 保存**
			
 
				+
			
 
				+**生成文件：**
			
 
				+- `康强_page_001_structure.json` - 表格结构配置
			
 
				+- `康强_page_001_with_lines.png` - 带表格线的图片
			
 
				+
			
 
				+### 4️⃣ 加载已有标注（继续调整）
			
 
				+
			
 
				+#### **选择模式**
			
 
				+在侧边栏选择 **📂 加载已有标注**
			
 
				+
			
 
				+#### **上传文件**
			
 
				+- 上传 **配置文件**（如 `康强_page_001_structure.json`）
			
 
				+- 上传 **对应图片**（可选，用于查看效果）
			
 
				+
			
 
				+#### **继续调整**
			
 
				+- 使用相同的调整功能修改表格线
			
 
				+- 保存更新后的配置
			
 
				+
			
 
				+### 5️⃣ 批量应用模板
			
 
				+
			
 
				+将标注好的表格结构应用到同类型的其他页面。
			
 
				+
			
 
				+#### **命令行方式**
			
 
				+
			
 
				+**单文件模式：**
			
 
				+```bash
			
 
				+python table_template_applier.py \
			
 
				+  --template output/table_structures/康强_page_001_structure.json \
			
 
				+  --image-file data/康强_page_002.png \
			
 
				+  --json-file data/康强_page_002.json \
			
 
				+  --output-dir output/batch_results \
			
 
				+  --width 2 \
			
 
				+  --color black
			
 
				+```
			
 
				+
			
 
				+**批量模式：**
			
 
				+```bash
			
 
				+python table_template_applier.py \
			
 
				+  --template output/table_structures/康强_page_001_structure.json \
			
 
				+  --image-dir data/images \
			
 
				+  --json-dir data/jsons \
			
 
				+  --output-dir output/batch_results \
			
 
				+  --width 2 \
			
 
				+  --color black
			
 
				+```
			
 
				+
			
 
				+#### **无参数快速运行**
			
 
				+
			
 
				+直接运行会使用默认配置：
			
 
				+```bash
			
 
				+python table_template_applier.py
			
 
				+```
			
 
				+
			
 
				+### 6️⃣ 使用 PPStructure 识别有线表格
			
 
				+
			
 
				+```bash
			
 
				+python batch_ocr/batch_process_pdf.py \
			
 
				+  -p ppstructv3 \
			
 
				+  -f output/batch_results/image_list.txt \
			
 
				+  -o output/ppstructv3_results
			
 
				+```
			
 
				+
			
 
				+## 📊 参数说明
			
 
				+
			
 
				+### 表格结构分析参数
			
 
				+
			
 
				+| 参数 | 默认值 | 说明 |
			
 
				+|------|--------|------|
			
 
				+| Y轴聚类容差 | 5 | 相近行的Y坐标容差（像素） |
			
 
				+| X轴聚类容差 | 10 | 相近列的X坐标容差（像素） |
			
 
				+| 最小行高 | 20 | 过滤掉高度过小的行（像素） |
			
 
				+
			
 
				+### 绘图参数
			
 
				+
			
 
				+| 参数 | 选项 | 说明 |
			
 
				+|------|------|------|
			
 
				+| 线条宽度 | 1-5 | 表格线的粗细 |
			
 
				+| 线条颜色 | black/blue/red | 保存时的线条颜色 |
			
 
				+
			
 
				+### 显示参数
			
 
				+
			
 
				+| 参数 | 说明 |
			
 
				+|------|------|
			
 
				+| 显示模式 | 对比显示 / 仅显示划线图 / 仅显示原图 |
			
 
				+| 图片缩放 | 0.25x - 2.0x |
			
 
				+| 显示线条编号 | 是否显示 R1, C1 等编号 |
			
 
				+
			
 
				+## 🎨 功能详解
			
 
				+
			
 
				+### 自动检测算法
			
 
				+
			
 
				+**行检测：**
			
 
				+1. 提取所有文本框的 Y 坐标
			
 
				+2. 按 Y 坐标排序
			
 
				+3. 聚类相近的 Y 坐标（容差内）
			
 
				+4. 过滤高度过小的行
			
 
				+
			
 
				+**列检测：**
			
 
				+1. 提取所有文本框的 X 坐标（左边界和右边界）
			
 
				+2. 聚类相近的 X 坐标
			
 
				+3. 生成列分界线
			
 
				+
			
 
				+**行高计算：**
			
 
				+- 排除表头行（通常较小）
			
 
				+- 计算数据行高度的**中位数**
			
 
				+- 用于批量应用时推算行数
			
 
				+
			
 
				+### 模板应用原理
			
 
				+
			
 
				+**固定参数（从模板继承）：**
			
 
				+- ✅ 表头高度
			
 
				+- ✅ 数据行高度
			
 
				+- ✅ 列宽列表
			
 
				+- ✅ 列的相对位置
			
 
				+
			
 
				+**可变参数（自动检测）：**
			
 
				+- 🔄 表格起始位置（锚点）
			
 
				+- 🔄 行数（根据页面内容计算）
			
 
				+
			
 
				+### 手动调整功能
			
 
				+
			
 
				+| 功能 | 说明 |
			
 
				+|------|------|
			
 
				+| **调整横线** | 修改特定横线的 Y 坐标 |
			
 
				+| **调整竖线** | 修改特定竖线的 X 坐标 |
			
 
				+| **添加横线** | 在指定位置插入新横线 |
			
 
				+| **删除横线** | 批量删除选中的横线 |
			
 
				+| **添加竖线** | 在指定位置插入新竖线 |
			
 
				+| **删除竖线** | 批量删除选中的竖线 |
			
 
				+| **撤销/重做** | 支持多步操作回退 |
			
 
				+
			
 
				+## 📁 输出文件说明
			
 
				+
			
 
				+### 配置文件（`*_structure.json`）
			
 
				+
			
 
				+```json
			
 
				+{
			
 
				+  "rows": [...],                    // 行区间列表
			
 
				+  "columns": [...],                 // 列区间列表
			
 
				+  "horizontal_lines": [700, 735, ...],  // 横线Y坐标
			
 
				+  "vertical_lines": [50, 180, ...],     // 竖线X坐标
			
 
				+  "header_height": 35,              // 表头高度
			
 
				+  "row_height": 59,                 // 数据行高度
			
 
				+  "col_widths": [130, 128, ...],    // 各列宽度
			
 
				+  "table_bbox": [x1, y1, x2, y2],   // 表格边界框
			
 
				+  "modified_h_lines": [0, 5],       // 修改过的横线索引
			
 
				+  "modified_v_lines": [2]           // 修改过的竖线索引
			
 
				+}
			
 
				+```
			
 
				+
			
 
				+### 表格线图片（`*_with_lines.png`）
			
 
				+
			
 
				+- **预览模式**：彩色编号（红色=已修改，蓝色=原始）
			
 
				+- **保存模式**：纯黑色线条，无编号
			
 
				+
			
 
				+### 批量结果（`batch_results.json`）
			
 
				+
			
 
				+```json
			
 
				+[
			
 
				+  {
			
 
				+    "source": "/path/to/page_002.png",
			
 
				+    "output": "/path/to/page_002_with_lines.png",
			
 
				+    "structure": "/path/to/page_002_structure.json",
			
 
				+    "num_rows": 15,
			
 
				+    "status": "success"
			
 
				+  }
			
 
				+]
			
 
				+```
			
 
				+
			
 
				+## 🎯 使用场景
			
 
				+
			
 
				+### 场景 1：单页表格标注
			
 
				+
			
 
				+适用于表格结构固定的文档（如银行流水、报表）
			
 
				+
			
 
				+```
			
 
				+1. 标注第1页 → 保存配置
			
 
				+2. 应用到其他页面
			
 
				+3. 批量识别
			
 
				+```
			
 
				+
			
 
				+### 场景 2：多模板管理
			
 
				+
			
 
				+适用于包含多种表格格式的文档
			
 
				+
			
 
				+```
			
 
				+1. 为每种格式标注一个模板
			
 
				+2. 根据页面类型选择对应模板
			
 
				+3. 分别批量应用
			
 
				+```
			
 
				+
			
 
				+### 场景 3：迭代优化
			
 
				+
			
 
				+适用于初次标注不满意的情况
			
 
				+
			
 
				+```
			
 
				+1. 加载已有配置
			
 
				+2. 微调表格线位置
			
 
				+3. 保存更新后的配置
			
 
				+4. 重新批量应用
			
 
				+```
			
 
				+
			
 
				+## 🔧 故障排查
			
 
				+
			
 
				+### 问题 1：检测不到行列
			
 
				+
			
 
				+**原因：**
			
 
				+- Y/X 轴容差设置不当
			
 
				+- OCR 结果质量差
			
 
				+
			
 
				+**解决：**
			
 
				+- 增大容差值
			
 
				+- 使用更高质量的 OCR 模型
			
 
				+
			
 
				+### 问题 2：表头高度不对
			
 
				+
			
 
				+**原因：**
			
 
				+- 表头行与数据行混在一起
			
 
				+
			
 
				+**解决：**
			
 
				+- 使用"调整横线"功能手动分离表头
			
 
				+- 调整"最小行高"参数
			
 
				+
			
 
				+### 问题 3：批量应用后位置偏移
			
 
				+
			
 
				+**原因：**
			
 
				+- 不同页面的表格起始位置不同
			
 
				+
			
 
				+**解决：**
			
 
				+- 检查 OCR 结果是否正确检测到第一行
			
 
				+- 模板会自动适应不同起始位置
			
 
				+
			
 
				+### 问题 4：列宽不一致
			
 
				+
			
 
				+**原因：**
			
 
				+- 不同页面的文本框宽度有细微差异
			
 
				+
			
 
				+**解决：**
			
 
				+- 适当增大 X 轴聚类容差
			
 
				+- 手动调整有问题的竖线
			
 
				+
			
 
				+## 📚 进阶技巧
			
 
				+
			
 
				+### 技巧 1：快速定位问题线
			
 
				+
			
 
				+在可视化界面中，已修改的线会标记为红色，方便追踪调整历史。
			
 
				+
			
 
				+### 技巧 2：使用撤销功能
			
 
				+
			
 
				+支持多步撤销/重做，可以放心尝试不同调整方案。
			
 
				+
			
 
				+### 技巧 3：分段保存
			
 
				+
			
 
				+处理长文档时，可以分段保存配置：
			
 
				+- `template_section1.json` - 表头部分
			
 
				+- `template_section2.json` - 数据部分
			
 
				+
			
 
				+### 技巧 4：虚拟画布模式
			
 
				+
			
 
				+即使没有图片，也可以加载配置文件查看坐标信息，方便检查配置正确性。
			
 
				+
			
 
				+## 🤝 贡献指南
			
 
				+
			
 
				+欢迎提交 Issue 和 Pull Request！
			
 
				+
			
 
				+### 改进方向
			
 
				+
			
 
				+- [ ] 支持合并单元格检测
			
 
				+- [ ] 支持斜线表头处理
			
 
				+- [ ] 支持不规则表格（非矩形）
			
 
				+- [ ] 导出为 Excel 格式
			
 
				+- [ ] 命令行批量处理增强
			
 
				+
			
 
				+## 📄 许可证
			
 
				+
			
 
				+MIT License
			
 
				+
			
 
				+---
			
 
				+
			
 
				+**作者**: [Your Name]  
			
 
				+**最后更新**: 2025-01-13
			
--- a/table_line_generator/batch_apply_table_lines.py
+++ b/table_line_generator/batch_apply_table_lines.py
@@ -0,0 +1,113 @@
 
				+"""
			
 
				+批量将表格结构应用到所有页
			
 
				+"""
			
 
				+
			
 
				+import json
			
 
				+from pathlib import Path
			
 
				+from table_line_generator import TableLineGenerator
			
 
				+from PIL import Image
			
 
				+from typing import List
			
 
				+import argparse
			
 
				+
			
 
				+
			
 
				+def batch_apply_table_structure(
			
 
				+    source_json_path: str,
			
 
				+    target_image_dir: str,
			
 
				+    output_dir: str,
			
 
				+    structure_config_path: str = None
			
 
				+):
			
 
				+    """
			
 
				+    批量应用表格结构
			
 
				+    
			
 
				+    Args:
			
 
				+        source_json_path: 源OCR结果JSON路径（用于生成初始结构）
			
 
				+        target_image_dir: 目标图片目录
			
 
				+        output_dir: 输出目录
			
 
				+        structure_config_path: 表格结构配置路径（可选）
			
 
				+    """
			
 
				+    # 1. 加载或生成表格结构
			
 
				+    if structure_config_path and Path(structure_config_path).exists():
			
 
				+        # 加载已有配置
			
 
				+        with open(structure_config_path, 'r') as f:
			
 
				+            structure = json.load(f)
			
 
				+        print(f"📂 加载表格结构: {structure_config_path}")
			
 
				+    else:
			
 
				+        # 生成新配置
			
 
				+        with open(source_json_path, 'r') as f:
			
 
				+            ocr_data = json.load(f)
			
 
				+        
			
 
				+        source_image_path = Path(source_json_path).with_suffix('.jpg')
			
 
				+        generator = TableLineGenerator(str(source_image_path), ocr_data)
			
 
				+        
			
 
				+        structure_info = generator.analyze_table_structure()
			
 
				+        structure = generator.save_table_structure(
			
 
				+            f"{output_dir}/table_structure.json"
			
 
				+        )
			
 
				+        print(f"✅ 生成表格结构配置")
			
 
				+    
			
 
				+    # 2. 查找所有目标图片
			
 
				+    target_images = list(Path(target_image_dir).glob("*.jpg"))
			
 
				+    target_images.extend(list(Path(target_image_dir).glob("*.png")))
			
 
				+    target_images = sorted(target_images)
			
 
				+    
			
 
				+    print(f"📁 找到 {len(target_images)} 个图片文件")
			
 
				+    
			
 
				+    # 3. 批量应用
			
 
				+    output_path = Path(output_dir)
			
 
				+    output_path.mkdir(parents=True, exist_ok=True)
			
 
				+    
			
 
				+    results = []
			
 
				+    for image_path in target_images:
			
 
				+        try:
			
 
				+            # 创建临时生成器（用于应用结构）
			
 
				+            generator = TableLineGenerator(str(image_path), [])
			
 
				+            generator.rows = structure.get('rows', [])
			
 
				+            generator.columns = structure.get('columns', [])
			
 
				+            generator.row_height = structure.get('row_height', 30)
			
 
				+            
			
 
				+            # 应用结构
			
 
				+            output_file = output_path / f"{image_path.stem}_with_lines.jpg"
			
 
				+            generator.apply_structure_to_image(
			
 
				+                str(image_path),
			
 
				+                structure,
			
 
				+                str(output_file)
			
 
				+            )
			
 
				+            
			
 
				+            results.append({
			
 
				+                'source': str(image_path),
			
 
				+                'output': str(output_file),
			
 
				+                'status': 'success'
			
 
				+            })
			
 
				+            print(f"✅ {image_path.name} → {output_file.name}")
			
 
				+            
			
 
				+        except Exception as e:
			
 
				+            results.append({
			
 
				+                'source': str(image_path),
			
 
				+                'status': 'error',
			
 
				+                'error': str(e)
			
 
				+            })
			
 
				+            print(f"❌ {image_path.name} 失败: {e}")
			
 
				+    
			
 
				+    # 保存结果
			
 
				+    with open(output_path / "batch_results.json", 'w') as f:
			
 
				+        json.dump(results, f, indent=2, ensure_ascii=False)
			
 
				+    
			
 
				+    success_count = sum(1 for r in results if r['status'] == 'success')
			
 
				+    print(f"\n🎉 完成！成功: {success_count}/{len(results)}")
			
 
				+
			
 
				+
			
 
				+if __name__ == "__main__":
			
 
				+    parser = argparse.ArgumentParser(description="批量应用表格结构")
			
 
				+    parser.add_argument('-s', '--source', required=True, help="源OCR结果JSON路径")
			
 
				+    parser.add_argument('-t', '--target', required=True, help="目标图片目录")
			
 
				+    parser.add_argument('-o', '--output', required=True, help="输出目录")
			
 
				+    parser.add_argument('-c', '--config', help="表格结构配置路径")
			
 
				+    
			
 
				+    args = parser.parse_args()
			
 
				+    
			
 
				+    batch_apply_table_structure(
			
 
				+        args.source,
			
 
				+        args.target,
			
 
				+        args.output,
			
 
				+        args.config
			
 
				+    )
			
--- a/table_line_generator/streamlit_table_line_editor.py
+++ b/table_line_generator/streamlit_table_line_editor.py
@@ -0,0 +1,1001 @@
 
				+"""
			
 
				+表格线可视化编辑器
			
 
				+支持人工调整表格线位置
			
 
				+"""
			
 
				+
			
 
				+import streamlit as st
			
 
				+from pathlib import Path
			
 
				+import json
			
 
				+from PIL import Image, ImageDraw, ImageFont
			
 
				+import numpy as np
			
 
				+import copy
			
 
				+
			
 
				+try:
			
 
				+    from .table_line_generator import TableLineGenerator
			
 
				+except ImportError:
			
 
				+    from table_line_generator import TableLineGenerator
			
 
				+
			
 
				+
			
 
				+def parse_ocr_data(ocr_data):
			
 
				+    """解析OCR数据，支持多种格式"""
			
 
				+    # 如果是字符串，尝试解析
			
 
				+    if isinstance(ocr_data, str):
			
 
				+        try:
			
 
				+            ocr_data = json.loads(ocr_data)
			
 
				+        except json.JSONDecodeError:
			
 
				+            st.error("❌ JSON 格式错误，无法解析")
			
 
				+            return []
			
 
				+    
			
 
				+    # 检查是否为 PPStructure V3 格式
			
 
				+    if isinstance(ocr_data, dict) and 'parsing_res_list' in ocr_data and 'overall_ocr_res' in ocr_data:
			
 
				+        st.info("🔍 检测到 PPStructure V3 格式")
			
 
				+        
			
 
				+        try:
			
 
				+            table_bbox, text_boxes = TableLineGenerator.parse_ppstructure_result(ocr_data)
			
 
				+            st.success(f"✅ 表格区域: {table_bbox}")
			
 
				+            st.success(f"✅ 表格内文本框: {len(text_boxes)} 个")
			
 
				+            return text_boxes
			
 
				+        except Exception as e:
			
 
				+            st.error(f"❌ 解析 PPStructure 结果失败: {e}")
			
 
				+            return []
			
 
				+    
			
 
				+    # 确保是列表
			
 
				+    if not isinstance(ocr_data, list):
			
 
				+        st.error(f"❌ OCR 数据应该是列表，实际类型: {type(ocr_data)}")
			
 
				+        return []
			
 
				+    
			
 
				+    if not ocr_data:
			
 
				+        st.warning("⚠️ OCR 数据为空")
			
 
				+        return []
			
 
				+    
			
 
				+    first_item = ocr_data[0]
			
 
				+    if not isinstance(first_item, dict):
			
 
				+        st.error(f"❌ OCR 数据项应该是字典，实际类型: {type(first_item)}")
			
 
				+        return []
			
 
				+    
			
 
				+    if 'bbox' not in first_item:
			
 
				+        st.error("❌ OCR 数据缺少 'bbox' 字段")
			
 
				+        st.info("💡 支持的格式示例:\n```json\n[\n  {\n    \"text\": \"文本\",\n    \"bbox\": [x1, y1, x2, y2]\n  }\n]\n```")
			
 
				+        return []
			
 
				+    
			
 
				+    return ocr_data
			
 
				+
			
 
				+
			
 
				+def draw_table_lines_with_numbers(image, structure, line_width=2, show_numbers=True):
			
 
				+    """
			
 
				+    绘制带编号的表格线（使用线坐标列表）
			
 
				+    
			
 
				+    Args:
			
 
				+        image: PIL Image 对象
			
 
				+        structure: 表格结构字典（包含 horizontal_lines 和 vertical_lines）
			
 
				+        line_width: 线条宽度
			
 
				+        show_numbers: 是否显示编号
			
 
				+    
			
 
				+    Returns:
			
 
				+        绘制了表格线和编号的图片
			
 
				+    """
			
 
				+    img_with_lines = image.copy()
			
 
				+    draw = ImageDraw.Draw(img_with_lines)
			
 
				+    
			
 
				+    # 尝试加载字体
			
 
				+    try:
			
 
				+        font = ImageFont.truetype("/System/Library/Fonts/Helvetica.ttc", 20)
			
 
				+    except:
			
 
				+        font = ImageFont.load_default()
			
 
				+    
			
 
				+    # 🆕 使用线坐标列表
			
 
				+    horizontal_lines = structure.get('horizontal_lines', [])
			
 
				+    vertical_lines = structure.get('vertical_lines', [])
			
 
				+    modified_h_lines = structure.get('modified_h_lines', set())
			
 
				+    modified_v_lines = structure.get('modified_v_lines', set())
			
 
				+    
			
 
				+    # 计算绘制范围
			
 
				+    x_start = vertical_lines[0] if vertical_lines else 0
			
 
				+    x_end = vertical_lines[-1] if vertical_lines else img_with_lines.width
			
 
				+    y_start = horizontal_lines[0] if horizontal_lines else 0
			
 
				+    y_end = horizontal_lines[-1] if horizontal_lines else img_with_lines.height
			
 
				+    
			
 
				+    # 🎨 绘制横线
			
 
				+    for idx, y in enumerate(horizontal_lines):
			
 
				+        color = (255, 0, 0) if idx in modified_h_lines else (0, 0, 255)
			
 
				+        draw.line([(x_start, y), (x_end, y)], fill=color, width=line_width)
			
 
				+        
			
 
				+        # 🔢 绘制行编号
			
 
				+        if show_numbers:
			
 
				+            text = f"R{idx+1}"
			
 
				+            bbox = draw.textbbox((x_start - 35, y - 10), text, font=font)
			
 
				+            draw.rectangle(bbox, fill='white', outline='black')
			
 
				+            draw.text((x_start - 35, y - 10), text, fill=color, font=font)
			
 
				+    
			
 
				+    # 🎨 绘制竖线
			
 
				+    for idx, x in enumerate(vertical_lines):
			
 
				+        color = (255, 0, 0) if idx in modified_v_lines else (0, 0, 255)
			
 
				+        draw.line([(x, y_start), (x, y_end)], fill=color, width=line_width)
			
 
				+        
			
 
				+        # 🔢 绘制列编号
			
 
				+        if show_numbers:
			
 
				+            text = f"C{idx+1}"
			
 
				+            bbox = draw.textbbox((x - 10, y_start - 25), text, font=font)
			
 
				+            draw.rectangle(bbox, fill='white', outline='black')
			
 
				+            draw.text((x - 10, y_start - 25), text, fill=color, font=font)
			
 
				+            bbox = draw.textbbox((x - 10, y_end + 25), text, font=font)
			
 
				+            draw.rectangle(bbox, fill='white', outline='black')
			
 
				+            draw.text((x - 10, y_end + 25), text, fill=color, font=font)
			
 
				+    
			
 
				+    return img_with_lines
			
 
				+
			
 
				+
			
 
				+# 🆕 新增：用于保存的纯净表格线绘制函数
			
 
				+def draw_clean_table_lines(image, structure, line_width=2, line_color=(0, 0, 0)):
			
 
				+    """
			
 
				+    绘制纯净的表格线（用于保存）
			
 
				+    - 所有线用黑色
			
 
				+    - 不显示编号
			
 
				+    
			
 
				+    Args:
			
 
				+        image: PIL Image 对象
			
 
				+        structure: 表格结构字典
			
 
				+        line_width: 线条宽度
			
 
				+        line_color: 线条颜色，默认黑色 (0, 0, 0)
			
 
				+    
			
 
				+    Returns:
			
 
				+        绘制了纯净表格线的图片
			
 
				+    """
			
 
				+    img_with_lines = image.copy()
			
 
				+    draw = ImageDraw.Draw(img_with_lines)
			
 
				+    
			
 
				+    horizontal_lines = structure.get('horizontal_lines', [])
			
 
				+    vertical_lines = structure.get('vertical_lines', [])
			
 
				+    
			
 
				+    if not horizontal_lines or not vertical_lines:
			
 
				+        return img_with_lines
			
 
				+    
			
 
				+    # 计算绘制范围
			
 
				+    x_start = vertical_lines[0]
			
 
				+    x_end = vertical_lines[-1]
			
 
				+    y_start = horizontal_lines[0]
			
 
				+    y_end = horizontal_lines[-1]
			
 
				+    
			
 
				+    # 🖤 绘制横线（统一黑色）
			
 
				+    for y in horizontal_lines:
			
 
				+        draw.line([(x_start, y), (x_end, y)], fill=line_color, width=line_width)
			
 
				+    
			
 
				+    # 🖤 绘制竖线（统一黑色）
			
 
				+    for x in vertical_lines:
			
 
				+        draw.line([(x, y_start), (x, y_end)], fill=line_color, width=line_width)
			
 
				+    
			
 
				+    return img_with_lines
			
 
				+
			
 
				+
			
 
				+def init_undo_stack():
			
 
				+    """初始化撤销/重做栈"""
			
 
				+    if 'undo_stack' not in st.session_state:
			
 
				+        st.session_state.undo_stack = []
			
 
				+    if 'redo_stack' not in st.session_state:
			
 
				+        st.session_state.redo_stack = []
			
 
				+
			
 
				+
			
 
				+def save_state_for_undo(structure):
			
 
				+    """保存当前状态到撤销栈"""
			
 
				+    # 深拷贝当前结构
			
 
				+    state = copy.deepcopy(structure)
			
 
				+    st.session_state.undo_stack.append(state)
			
 
				+    # 清空重做栈
			
 
				+    st.session_state.redo_stack = []
			
 
				+    
			
 
				+    # 限制栈深度（最多保存20个历史状态）
			
 
				+    if len(st.session_state.undo_stack) > 20:
			
 
				+        st.session_state.undo_stack.pop(0)
			
 
				+
			
 
				+
			
 
				+def undo_last_action():
			
 
				+    """撤销上一个操作"""
			
 
				+    if st.session_state.undo_stack:
			
 
				+        # 保存当前状态到重做栈
			
 
				+        current_state = copy.deepcopy(st.session_state.structure)
			
 
				+        st.session_state.redo_stack.append(current_state)
			
 
				+        
			
 
				+        # 恢复上一个状态
			
 
				+        st.session_state.structure = st.session_state.undo_stack.pop()
			
 
				+        return True
			
 
				+    return False
			
 
				+
			
 
				+
			
 
				+def redo_last_action():
			
 
				+    """重做上一个操作"""
			
 
				+    if st.session_state.redo_stack:
			
 
				+        # 保存当前状态到撤销栈
			
 
				+        current_state = copy.deepcopy(st.session_state.structure)
			
 
				+        st.session_state.undo_stack.append(current_state)
			
 
				+        
			
 
				+        # 恢复重做的状态
			
 
				+        st.session_state.structure = st.session_state.redo_stack.pop()
			
 
				+        return True
			
 
				+    return False
			
 
				+
			
 
				+
			
 
				+def get_structure_hash(structure, line_width, show_numbers):
			
 
				+    """生成结构的哈希值，用于判断是否需要重新绘制"""
			
 
				+    import hashlib
			
 
				+    
			
 
				+    # 🔧 使用线坐标列表生成哈希
			
 
				+    key_data = {
			
 
				+        'horizontal_lines': structure.get('horizontal_lines', []),
			
 
				+        'vertical_lines': structure.get('vertical_lines', []),
			
 
				+        'modified_h_lines': sorted(list(structure.get('modified_h_lines', set()))),
			
 
				+        'modified_v_lines': sorted(list(structure.get('modified_v_lines', set()))),
			
 
				+        'line_width': line_width,
			
 
				+        'show_numbers': show_numbers
			
 
				+    }
			
 
				+    
			
 
				+    key_str = json.dumps(key_data, sort_keys=True)
			
 
				+    return hashlib.md5(key_str.encode()).hexdigest()
			
 
				+
			
 
				+
			
 
				+def get_cached_table_lines_image(image, structure, line_width, show_numbers):
			
 
				+    """
			
 
				+    获取缓存的表格线图片，如果缓存不存在或失效则重新绘制
			
 
				+    
			
 
				+    Args:
			
 
				+        image: PIL Image 对象
			
 
				+        structure: 表格结构字典
			
 
				+        line_width: 线条宽度
			
 
				+        show_numbers: 是否显示编号
			
 
				+    
			
 
				+    Returns:
			
 
				+        绘制了表格线和编号的图片
			
 
				+    """
			
 
				+    # 初始化缓存
			
 
				+    if 'cached_table_image' not in st.session_state:
			
 
				+        st.session_state.cached_table_image = None
			
 
				+    if 'cached_table_hash' not in st.session_state:
			
 
				+        st.session_state.cached_table_hash = None
			
 
				+    
			
 
				+    # 计算当前结构的哈希
			
 
				+    current_hash = get_structure_hash(structure, line_width, show_numbers)
			
 
				+    
			
 
				+    # 检查缓存是否有效
			
 
				+    if (st.session_state.cached_table_hash == current_hash and 
			
 
				+        st.session_state.cached_table_image is not None):
			
 
				+        # 缓存有效，直接返回
			
 
				+        return st.session_state.cached_table_image
			
 
				+    
			
 
				+    # 缓存失效，重新绘制
			
 
				+    img_with_lines = draw_table_lines_with_numbers(
			
 
				+        image, 
			
 
				+        structure, 
			
 
				+        line_width=line_width,
			
 
				+        show_numbers=show_numbers
			
 
				+    )
			
 
				+    
			
 
				+    # 更新缓存
			
 
				+    st.session_state.cached_table_image = img_with_lines
			
 
				+    st.session_state.cached_table_hash = current_hash
			
 
				+    
			
 
				+    return img_with_lines
			
 
				+
			
 
				+
			
 
				+def clear_table_image_cache():
			
 
				+    """清除表格图片缓存"""
			
 
				+    if 'cached_table_image' in st.session_state:
			
 
				+        st.session_state.cached_table_image = None
			
 
				+    if 'cached_table_hash' in st.session_state:
			
 
				+        st.session_state.cached_table_hash = None
			
 
				+
			
 
				+
			
 
				+def load_structure_from_config(config_path: Path) -> dict:
			
 
				+    """
			
 
				+    从配置文件加载表格结构
			
 
				+    
			
 
				+    Args:
			
 
				+        config_path: 配置文件路径
			
 
				+    
			
 
				+    Returns:
			
 
				+        表格结构字典
			
 
				+    """
			
 
				+    with open(config_path, 'r', encoding='utf-8') as f:
			
 
				+        structure = json.load(f)
			
 
				+    
			
 
				+    # 🔧 兼容旧版配置（补充缺失字段）
			
 
				+    if 'horizontal_lines' not in structure:
			
 
				+        # 从 rows 生成横线坐标
			
 
				+        horizontal_lines = []
			
 
				+        for row in structure.get('rows', []):
			
 
				+            horizontal_lines.append(row['y_start'])
			
 
				+        if structure.get('rows'):
			
 
				+            horizontal_lines.append(structure['rows'][-1]['y_end'])
			
 
				+        structure['horizontal_lines'] = horizontal_lines
			
 
				+    
			
 
				+    if 'vertical_lines' not in structure:
			
 
				+        # 从 columns 生成竖线坐标
			
 
				+        vertical_lines = []
			
 
				+        for col in structure.get('columns', []):
			
 
				+            vertical_lines.append(col['x_start'])
			
 
				+        if structure.get('columns'):
			
 
				+            vertical_lines.append(structure['columns'][-1]['x_end'])
			
 
				+        structure['vertical_lines'] = vertical_lines
			
 
				+    
			
 
				+    # 🔧 转换修改标记（从列表转为集合）
			
 
				+    if 'modified_h_lines' in structure:
			
 
				+        structure['modified_h_lines'] = set(structure['modified_h_lines'])
			
 
				+    else:
			
 
				+        structure['modified_h_lines'] = set()
			
 
				+    
			
 
				+    if 'modified_v_lines' in structure:
			
 
				+        structure['modified_v_lines'] = set(structure['modified_v_lines'])
			
 
				+    else:
			
 
				+        structure['modified_v_lines'] = set()
			
 
				+    
			
 
				+    # 🔧 转换旧版的 modified_rows/modified_cols（如果存在）
			
 
				+    if 'modified_rows' in structure and not structure['modified_h_lines']:
			
 
				+        structure['modified_h_lines'] = set(structure.get('modified_rows', []))
			
 
				+    if 'modified_cols' in structure and not structure['modified_v_lines']:
			
 
				+        structure['modified_v_lines'] = set(structure.get('modified_cols', []))
			
 
				+    
			
 
				+    return structure
			
 
				+
			
 
				+
			
 
				+def create_table_line_editor():
			
 
				+    """创建表格线编辑器界面"""
			
 
				+    # 🆕 配置页面为宽屏模式
			
 
				+    st.set_page_config(
			
 
				+        page_title="表格线编辑器",
			
 
				+        page_icon="📏",
			
 
				+        layout="wide",
			
 
				+        initial_sidebar_state="expanded"
			
 
				+    )
			
 
				+    
			
 
				+    st.title("📏 表格线编辑器")
			
 
				+    
			
 
				+    # 初始化 session_state
			
 
				+    if 'loaded_json_name' not in st.session_state:
			
 
				+        st.session_state.loaded_json_name = None
			
 
				+    if 'loaded_image_name' not in st.session_state:
			
 
				+        st.session_state.loaded_image_name = None
			
 
				+    if 'loaded_config_name' not in st.session_state:
			
 
				+        st.session_state.loaded_config_name = None
			
 
				+    if 'ocr_data' not in st.session_state:
			
 
				+        st.session_state.ocr_data = None
			
 
				+    if 'image' not in st.session_state:
			
 
				+        st.session_state.image = None
			
 
				+    
			
 
				+    # 初始化撤销/重做栈
			
 
				+    init_undo_stack()
			
 
				+    
			
 
				+    # 🆕 添加工作模式选择
			
 
				+    st.sidebar.header("📂 工作模式")
			
 
				+    work_mode = st.sidebar.radio(
			
 
				+        "选择模式",
			
 
				+        ["🆕 新建标注", "📂 加载已有标注"],
			
 
				+        index=0
			
 
				+    )
			
 
				+    
			
 
				+    if work_mode == "🆕 新建标注":
			
 
				+        # 原有的上传流程
			
 
				+        st.sidebar.subheader("上传文件")
			
 
				+        uploaded_json = st.sidebar.file_uploader("上传OCR结果JSON", type=['json'], key="new_json")
			
 
				+        uploaded_image = st.sidebar.file_uploader("上传对应图片", type=['jpg', 'png'], key="new_image")
			
 
				+        
			
 
				+        # 检查是否需要重新加载 JSON
			
 
				+        if uploaded_json is not None:
			
 
				+            if st.session_state.loaded_json_name != uploaded_json.name:
			
 
				+                try:
			
 
				+                    raw_data = json.load(uploaded_json)
			
 
				+                    
			
 
				+                    with st.expander("🔍 原始数据结构"):
			
 
				+                        if isinstance(raw_data, dict):
			
 
				+                            st.json({k: f"<{type(v).__name__}>" if not isinstance(v, (str, int, float, bool, type(None))) else v 
			
 
				+                                    for k, v in list(raw_data.items())[:5]})
			
 
				+                        else:
			
 
				+                            st.json(raw_data[:3] if len(raw_data) > 3 else raw_data)
			
 
				+                    
			
 
				+                    ocr_data = parse_ocr_data(raw_data)
			
 
				+                    
			
 
				+                    if not ocr_data:
			
 
				+                        st.error("❌ 无法解析 OCR 数据，请检查 JSON 格式")
			
 
				+                        st.stop()
			
 
				+                    
			
 
				+                    st.session_state.ocr_data = ocr_data
			
 
				+                    st.session_state.loaded_json_name = uploaded_json.name
			
 
				+                    st.session_state.loaded_config_name = None  # 清除配置文件标记
			
 
				+                    
			
 
				+                    # 清除旧的分析结果、历史记录和缓存
			
 
				+                    if 'structure' in st.session_state:
			
 
				+                        del st.session_state.structure
			
 
				+                    if 'generator' in st.session_state:
			
 
				+                        del st.session_state.generator
			
 
				+                    st.session_state.undo_stack = []
			
 
				+                    st.session_state.redo_stack = []
			
 
				+                    clear_table_image_cache()
			
 
				+                    
			
 
				+                    st.success(f"✅ 成功加载 {len(ocr_data)} 条 OCR 记录")
			
 
				+                    
			
 
				+                except Exception as e:
			
 
				+                    st.error(f"❌ 加载数据失败: {e}")
			
 
				+                    st.stop()
			
 
				+        
			
 
				+        # 检查是否需要重新加载图片
			
 
				+        if uploaded_image is not None:
			
 
				+            if st.session_state.loaded_image_name != uploaded_image.name:
			
 
				+                try:
			
 
				+                    image = Image.open(uploaded_image)
			
 
				+                    
			
 
				+                    st.session_state.image = image
			
 
				+                    st.session_state.loaded_image_name = uploaded_image.name
			
 
				+                    
			
 
				+                    if 'structure' in st.session_state:
			
 
				+                        del st.session_state.structure
			
 
				+                    if 'generator' in st.session_state:
			
 
				+                        del st.session_state.generator
			
 
				+                    st.session_state.undo_stack = []
			
 
				+                    st.session_state.redo_stack = []
			
 
				+                    clear_table_image_cache()
			
 
				+                    
			
 
				+                    st.success(f"✅ 成功加载图片: {uploaded_image.name}")
			
 
				+                    
			
 
				+                except Exception as e:
			
 
				+                    st.error(f"❌ 加载图片失败: {e}")
			
 
				+                    st.stop()
			
 
				+    
			
 
				+    else:  # 📂 加载已有标注
			
 
				+        st.sidebar.subheader("加载已保存的标注")
			
 
				+        
			
 
				+        # 🆕 上传配置文件
			
 
				+        uploaded_config = st.sidebar.file_uploader(
			
 
				+            "上传配置文件 (*_structure.json)",
			
 
				+            type=['json'],
			
 
				+            key="load_config"
			
 
				+        )
			
 
				+        
			
 
				+        # 🆕 上传对应的图片（可选，用于重新标注）
			
 
				+        uploaded_image_for_config = st.sidebar.file_uploader(
			
 
				+            "上传对应图片（可选）",
			
 
				+            type=['jpg', 'png'],
			
 
				+            key="load_image"
			
 
				+        )
			
 
				+        
			
 
				+        # 处理配置文件加载
			
 
				+        if uploaded_config is not None:
			
 
				+            if st.session_state.loaded_config_name != uploaded_config.name:
			
 
				+                try:
			
 
				+                    # 🔧 直接从配置文件路径加载
			
 
				+                    import tempfile
			
 
				+                    
			
 
				+                    # 创建临时文件
			
 
				+                    with tempfile.NamedTemporaryFile(mode='w', suffix='.json', delete=False, encoding='utf-8') as tmp:
			
 
				+                        tmp.write(uploaded_config.getvalue().decode('utf-8'))
			
 
				+                        tmp_path = tmp.name
			
 
				+                    
			
 
				+                    # 加载结构
			
 
				+                    structure = load_structure_from_config(Path(tmp_path))
			
 
				+                    
			
 
				+                    # 清理临时文件
			
 
				+                    Path(tmp_path).unlink()
			
 
				+                    
			
 
				+                    st.session_state.structure = structure
			
 
				+                    st.session_state.loaded_config_name = uploaded_config.name
			
 
				+                    
			
 
				+                    # 清除历史记录和缓存
			
 
				+                    st.session_state.undo_stack = []
			
 
				+                    st.session_state.redo_stack = []
			
 
				+                    clear_table_image_cache()
			
 
				+                    
			
 
				+                    st.success(f"✅ 成功加载配置: {uploaded_config.name}")
			
 
				+                    st.info(
			
 
				+                        f"📊 表格结构: {len(structure['rows'])}行 x {len(structure['columns'])}列\n\n"
			
 
				+                        f"📏 横线数: {len(structure.get('horizontal_lines', []))}\n\n"
			
 
				+                        f"📏 竖线数: {len(structure.get('vertical_lines', []))}"
			
 
				+                    )
			
 
				+                    
			
 
				+                    # 🆕 显示配置文件详情
			
 
				+                    with st.expander("📋 配置详情"):
			
 
				+                        st.json({
			
 
				+                            "行数": len(structure['rows']),
			
 
				+                            "列数": len(structure['columns']),
			
 
				+                            "横线数": len(structure.get('horizontal_lines', [])),
			
 
				+                            "竖线数": len(structure.get('vertical_lines', [])),
			
 
				+                            "行高": structure.get('row_height'),
			
 
				+                            "列宽": structure.get('col_widths'),
			
 
				+                            "已修改的横线": list(structure.get('modified_h_lines', set())),
			
 
				+                            "已修改的竖线": list(structure.get('modified_v_lines', set()))
			
 
				+                        })
			
 
				+                    
			
 
				+                except Exception as e:
			
 
				+                    st.error(f"❌ 加载配置失败: {e}")
			
 
				+                    import traceback
			
 
				+                    st.code(traceback.format_exc())
			
 
				+                    st.stop()
			
 
				+        
			
 
				+        # 处理图片加载（用于显示）
			
 
				+        if uploaded_image_for_config is not None:
			
 
				+            if st.session_state.loaded_image_name != uploaded_image_for_config.name:
			
 
				+                try:
			
 
				+                    image = Image.open(uploaded_image_for_config)
			
 
				+                    st.session_state.image = image
			
 
				+                    st.session_state.loaded_image_name = uploaded_image_for_config.name
			
 
				+                    
			
 
				+                    clear_table_image_cache()
			
 
				+                    
			
 
				+                    st.success(f"✅ 成功加载图片: {uploaded_image_for_config.name}")
			
 
				+                    
			
 
				+                except Exception as e:
			
 
				+                    st.error(f"❌ 加载图片失败: {e}")
			
 
				+                    st.stop()
			
 
				+        
			
 
				+        # 🆕 如果配置已加载但没有图片，提示用户
			
 
				+        if 'structure' in st.session_state and st.session_state.image is None:
			
 
				+            st.warning("⚠️ 已加载配置，但未加载对应图片。请上传图片以查看效果。")
			
 
				+            st.info("💡 提示：配置文件已加载，您可以：\n1. 上传对应图片查看效果\n2. 直接编辑配置并保存")
			
 
				+    
			
 
				+    # 检查必要条件
			
 
				+    if work_mode == "🆕 新建标注":
			
 
				+        if st.session_state.ocr_data is None or st.session_state.image is None:
			
 
				+            st.info("👆 请在左侧上传 OCR 结果 JSON 文件和对应的图片")
			
 
				+            
			
 
				+            with st.expander("📖 使用说明"):
			
 
				+                st.markdown("""
			
 
				+                ### 🆕 新建标注模式
			
 
				+                
			
 
				+                **支持的OCR格式**
			
 
				+                
			
 
				+                **1. PPStructure V3 格式 (推荐)**
			
 
				+                ```json
			
 
				+                {
			
 
				+                  "parsing_res_list": [...],
			
 
				+                  "overall_ocr_res": {
			
 
				+                    "rec_boxes": [[x1, y1, x2, y2], ...],
			
 
				+                    "rec_texts": ["文本1", "文本2", ...]
			
 
				+                  }
			
 
				+                }
			
 
				+                ```
			
 
				+                
			
 
				+                **2. 标准格式**
			
 
				+                ```json
			
 
				+                [
			
 
				+                  {
			
 
				+                    "text": "文本内容",
			
 
				+                    "bbox": [x1, y1, x2, y2]
			
 
				+                  }
			
 
				+                ]
			
 
				+                ```
			
 
				+                
			
 
				+                ### 📂 加载已有标注模式
			
 
				+                
			
 
				+                1. 上传之前保存的 `*_structure.json` 配置文件
			
 
				+                2. 上传对应的图片（可选）
			
 
				+                3. 继续调整表格线位置
			
 
				+                4. 保存更新后的配置
			
 
				+                """)
			
 
				+            return
			
 
				+        
			
 
				+        ocr_data = st.session_state.ocr_data
			
 
				+        image = st.session_state.image
			
 
				+        
			
 
				+        st.info(f"📂 已加载: {st.session_state.loaded_json_name} + {st.session_state.loaded_image_name}")
			
 
				+        
			
 
				+        if 'generator' not in st.session_state or st.session_state.generator is None:
			
 
				+            try:
			
 
				+                generator = TableLineGenerator(image, ocr_data)
			
 
				+                st.session_state.generator = generator
			
 
				+            except Exception as e:
			
 
				+                st.error(f"❌ 初始化失败: {e}")
			
 
				+                st.stop()
			
 
				+    
			
 
				+    else:  # 加载已有标注模式
			
 
				+        if 'structure' not in st.session_state:
			
 
				+            st.info("👆 请在左侧上传配置文件 (*_structure.json)")
			
 
				+            
			
 
				+            with st.expander("📖 使用说明"):
			
 
				+                st.markdown("""
			
 
				+                ### 📂 加载已有标注
			
 
				+                
			
 
				+                **步骤：**
			
 
				+                
			
 
				+                1. **上传配置文件**：选择之前保存的 `*_structure.json`
			
 
				+                2. **上传图片**（可选）：上传对应的图片以查看效果
			
 
				+                3. **调整表格线**：使用下方的工具调整横线/竖线位置
			
 
				+                4. **保存更新**：保存修改后的配置
			
 
				+                
			
 
				+                **提示：**
			
 
				+                - 即使没有图片，也可以直接编辑配置文件中的坐标
			
 
				+                - 配置文件包含完整的表格结构信息
			
 
				+                - 可以应用到同类型的其他页面
			
 
				+                """)
			
 
				+            return
			
 
				+        
			
 
				+        if st.session_state.image is None:
			
 
				+            st.warning("⚠️ 仅加载了配置，未加载图片。部分功能受限。")
			
 
				+        
			
 
				+        # 🆕 使用配置中的信息
			
 
				+        structure = st.session_state.structure
			
 
				+        image = st.session_state.image
			
 
				+        
			
 
				+        if image is None:
			
 
				+            # 如果没有图片，创建一个虚拟的空白图片用于显示坐标信息
			
 
				+            if 'table_bbox' in structure:
			
 
				+                bbox = structure['table_bbox']
			
 
				+                dummy_width = bbox[2] + 100
			
 
				+                dummy_height = bbox[3] + 100
			
 
				+            else:
			
 
				+                dummy_width = 2000
			
 
				+                dummy_height = 2000
			
 
				+            
			
 
				+            image = Image.new('RGB', (dummy_width, dummy_height), color='white')
			
 
				+            st.info(f"💡 使用虚拟画布 ({dummy_width}x{dummy_height}) 显示表格结构")
			
 
				+    
			
 
				+    # 显示设置
			
 
				+    st.sidebar.divider()
			
 
				+    st.sidebar.subheader("🖼️ 显示设置")
			
 
				+    
			
 
				+    line_width = st.sidebar.slider("线条宽度", 1, 5, 2)
			
 
				+    display_mode = st.sidebar.radio("显示模式", ["对比显示", "仅显示划线图", "仅显示原图"], index=1)
			
 
				+    zoom_level = st.sidebar.slider("图片缩放", 0.25, 2.0, 1.0, 0.25)
			
 
				+    show_line_numbers = st.sidebar.checkbox("显示线条编号", value=True)
			
 
				+    
			
 
				+    # 撤销/重做按钮
			
 
				+    st.sidebar.divider()
			
 
				+    st.sidebar.subheader("↩️ 撤销/重做")
			
 
				+    
			
 
				+    col1, col2 = st.sidebar.columns(2)
			
 
				+    with col1:
			
 
				+        if st.button("↩️ 撤销", disabled=len(st.session_state.undo_stack) == 0):
			
 
				+            if undo_last_action():
			
 
				+                clear_table_image_cache()
			
 
				+                st.success("✅ 已撤销")
			
 
				+                st.rerun()
			
 
				+    
			
 
				+    with col2:
			
 
				+        if st.button("↪️ 重做", disabled=len(st.session_state.redo_stack) == 0):
			
 
				+            if redo_last_action():
			
 
				+                clear_table_image_cache()
			
 
				+                st.success("✅ 已重做")
			
 
				+                st.rerun()
			
 
				+    
			
 
				+    st.sidebar.info(f"📚 历史记录: {len(st.session_state.undo_stack)} 条")
			
 
				+    
			
 
				+    # 分析表格结构（仅在新建模式显示）
			
 
				+    if work_mode == "🆕 新建标注" and st.button("🔍 分析表格结构"):
			
 
				+        with st.spinner("分析中..."):
			
 
				+            try:
			
 
				+                generator = st.session_state.generator
			
 
				+                structure = generator.analyze_table_structure(
			
 
				+                    y_tolerance=y_tolerance,
			
 
				+                    x_tolerance=x_tolerance,
			
 
				+                    min_row_height=min_row_height
			
 
				+                )
			
 
				+                
			
 
				+                if not structure:
			
 
				+                    st.warning("⚠️ 未检测到表格结构")
			
 
				+                    st.stop()
			
 
				+                
			
 
				+                structure['modified_h_lines'] = set()
			
 
				+                structure['modified_v_lines'] = set()
			
 
				+                
			
 
				+                st.session_state.structure = structure
			
 
				+                
			
 
				+                st.session_state.undo_stack = []
			
 
				+                st.session_state.redo_stack = []
			
 
				+                clear_table_image_cache()
			
 
				+                
			
 
				+                st.success(
			
 
				+                    f"✅ 检测到 {len(structure['rows'])} 行（{len(structure['horizontal_lines'])} 条横线），"
			
 
				+                    f"{len(structure['columns'])} 列（{len(structure['vertical_lines'])} 条竖线）"
			
 
				+                )
			
 
				+                
			
 
				+                col1, col2, col3, col4 = st.columns(4)
			
 
				+                with col1:
			
 
				+                    st.metric("行数", len(structure['rows']))
			
 
				+                with col2:
			
 
				+                    st.metric("横线数", len(structure['horizontal_lines']))
			
 
				+                with col3:
			
 
				+                    st.metric("列数", len(structure['columns']))
			
 
				+                with col4:
			
 
				+                    st.metric("竖线数", len(structure['vertical_lines']))
			
 
				+            
			
 
				+            except Exception as e:
			
 
				+                st.error(f"❌ 分析失败: {e}")
			
 
				+                import traceback
			
 
				+                st.code(traceback.format_exc())
			
 
				+                st.stop()
			
 
				+    
			
 
				+    # 显示结果（两种模式通用）
			
 
				+    if 'structure' in st.session_state and st.session_state.structure:
			
 
				+        structure = st.session_state.structure
			
 
				+        
			
 
				+        # 使用缓存机制绘制表格线
			
 
				+        img_with_lines = get_cached_table_lines_image(
			
 
				+            image, 
			
 
				+            structure, 
			
 
				+            line_width=line_width,
			
 
				+            show_numbers=show_line_numbers
			
 
				+        )
			
 
				+        
			
 
				+        # 根据显示模式显示图片
			
 
				+        if display_mode == "对比显示":
			
 
				+            col1, col2 = st.columns(2)
			
 
				+            with col1:
			
 
				+                st.subheader("原图")
			
 
				+                st.image(image, use_container_width=True)
			
 
				+            
			
 
				+            with col2:
			
 
				+                st.subheader("添加表格线")
			
 
				+                st.image(img_with_lines, use_container_width=True)
			
 
				+                
			
 
				+        elif display_mode == "仅显示划线图":
			
 
				+            display_width = int(img_with_lines.width * zoom_level)
			
 
				+            
			
 
				+            st.subheader(f"表格线图 (缩放: {zoom_level:.0%})")
			
 
				+            st.image(img_with_lines, width=display_width)
			
 
				+            
			
 
				+        else:
			
 
				+            display_width = int(image.width * zoom_level)
			
 
				+            
			
 
				+            st.subheader(f"原图 (缩放: {zoom_level:.0%})")
			
 
				+            st.image(image, width=display_width)
			
 
				+        
			
 
				+        # 显示详细信息
			
 
				+        with st.expander("📊 表格结构详情"):
			
 
				+            st.json({
			
 
				+                "行数": len(structure['rows']),
			
 
				+                "列数": len(structure['columns']),
			
 
				+                "横线数": len(structure.get('horizontal_lines', [])),
			
 
				+                "竖线数": len(structure.get('vertical_lines', [])),
			
 
				+                "横线坐标": structure.get('horizontal_lines', []),
			
 
				+                "竖线坐标": structure.get('vertical_lines', []),
			
 
				+                "标准行高": structure.get('row_height'),
			
 
				+                "列宽度": structure.get('col_widths'),
			
 
				+                "修改的横线": list(structure.get('modified_h_lines', set())),
			
 
				+                "修改的竖线": list(structure.get('modified_v_lines', set()))
			
 
				+            })
			
 
				+        
			
 
				+        # 🆕 手动调整 - 使用线坐标列表
			
 
				+        st.subheader("🛠️ 手动调整")
			
 
				+        
			
 
				+        adjust_type = st.radio(
			
 
				+            "调整类型",
			
 
				+            ["调整横线", "调整竖线", "添加横线", "删除横线", "添加竖线", "删除竖线"],
			
 
				+            horizontal=True
			
 
				+        )
			
 
				+        
			
 
				+        if adjust_type == "调整横线":
			
 
				+            horizontal_lines = structure.get('horizontal_lines', [])
			
 
				+            if len(horizontal_lines) > 0:
			
 
				+                line_index = st.selectbox(
			
 
				+                    "选择横线",
			
 
				+                    range(len(horizontal_lines)),
			
 
				+                    format_func=lambda x: f"第 {x+1} 条横线 (Y: {horizontal_lines[x]}) {'🔴已修改' if x in structure.get('modified_h_lines', set()) else ''}"
			
 
				+                )
			
 
				+                
			
 
				+                new_y = st.number_input(
			
 
				+                    "新的Y坐标",
			
 
				+                    value=int(horizontal_lines[line_index]),
			
 
				+                    step=1
			
 
				+                )
			
 
				+                
			
 
				+                if st.button("应用调整"):
			
 
				+                    save_state_for_undo(structure)
			
 
				+                    
			
 
				+                    structure['horizontal_lines'][line_index] = new_y
			
 
				+                    structure['modified_h_lines'].add(line_index)
			
 
				+                    
			
 
				+                    # 🔧 同步更新 rows
			
 
				+                    if line_index < len(structure['rows']):
			
 
				+                        structure['rows'][line_index]['y_start'] = new_y
			
 
				+                    if line_index > 0:
			
 
				+                        structure['rows'][line_index - 1]['y_end'] = new_y
			
 
				+                    
			
 
				+                    clear_table_image_cache()
			
 
				+                    st.success("✅ 已调整")
			
 
				+                    st.rerun()
			
 
				+            else:
			
 
				+                st.warning("⚠️ 没有检测到横线")
			
 
				+        
			
 
				+        elif adjust_type == "调整竖线":
			
 
				+            vertical_lines = structure.get('vertical_lines', [])
			
 
				+            if len(vertical_lines) > 0:
			
 
				+                line_index = st.selectbox(
			
 
				+                    "选择竖线",
			
 
				+                    range(len(vertical_lines)),
			
 
				+                    format_func=lambda x: f"第 {x+1} 条竖线 (X: {vertical_lines[x]}) {'🔴已修改' if x in structure.get('modified_v_lines', set()) else ''}"
			
 
				+                )
			
 
				+                
			
 
				+                new_x = st.number_input(
			
 
				+                    "新的X坐标",
			
 
				+                    value=int(vertical_lines[line_index]),
			
 
				+                    step=1
			
 
				+                )
			
 
				+                
			
 
				+                if st.button("应用调整"):
			
 
				+                    save_state_for_undo(structure)
			
 
				+                    
			
 
				+                    structure['vertical_lines'][line_index] = new_x
			
 
				+                    structure['modified_v_lines'].add(line_index)
			
 
				+                    
			
 
				+                    # 🔧 同步更新 columns
			
 
				+                    if line_index < len(structure['columns']):
			
 
				+                        structure['columns'][line_index]['x_start'] = new_x
			
 
				+                    if line_index > 0:
			
 
				+                        structure['columns'][line_index - 1]['x_end'] = new_x
			
 
				+                    
			
 
				+                    clear_table_image_cache()
			
 
				+                    st.success("✅ 已调整")
			
 
				+                    st.rerun()
			
 
				+            else:
			
 
				+                st.warning("⚠️ 没有检测到竖线")
			
 
				+        
			
 
				+        elif adjust_type == "删除横线":
			
 
				+            horizontal_lines = structure.get('horizontal_lines', [])
			
 
				+            if len(horizontal_lines) > 0:
			
 
				+                lines_to_delete = st.multiselect(
			
 
				+                    "选择要删除的横线（可多选）",
			
 
				+                    range(len(horizontal_lines)),
			
 
				+                    format_func=lambda x: f"第 {x+1} 条横线 (Y: {horizontal_lines[x]}) {'🔴已修改' if x in structure.get('modified_h_lines', set()) else ''}"
			
 
				+                )
			
 
				+                
			
 
				+                if lines_to_delete and st.button("🗑️ 批量删除", type="primary"):
			
 
				+                    save_state_for_undo(structure)
			
 
				+                    
			
 
				+                    # 🔧 删除线坐标
			
 
				+                    for idx in sorted(lines_to_delete, reverse=True):
			
 
				+                        del structure['horizontal_lines'][idx]
			
 
				+                    
			
 
				+                    # 🔧 重新计算 rows（删除线后重建行区间）
			
 
				+                    new_rows = []
			
 
				+                    for i in range(len(structure['horizontal_lines']) - 1):
			
 
				+                        new_rows.append({
			
 
				+                            'y_start': structure['horizontal_lines'][i],
			
 
				+                            'y_end': structure['horizontal_lines'][i + 1],
			
 
				+                            # 'bboxes': []
			
 
				+                        })
			
 
				+                    structure['rows'] = new_rows
			
 
				+                    
			
 
				+                    # 更新修改标记
			
 
				+                    structure['modified_h_lines'] = set()
			
 
				+                    
			
 
				+                    clear_table_image_cache()
			
 
				+                    st.success(f"✅ 已删除 {len(lines_to_delete)} 条横线")
			
 
				+                    st.rerun()
			
 
				+                
			
 
				+                st.info(f"💡 当前有 {len(horizontal_lines)} 条横线，已选择 {len(lines_to_delete)} 条")
			
 
				+            else:
			
 
				+                st.warning("⚠️ 没有可删除的横线")
			
 
				+        
			
 
				+        elif adjust_type == "删除竖线":
			
 
				+            vertical_lines = structure.get('vertical_lines', [])
			
 
				+            if len(vertical_lines) > 0:
			
 
				+                lines_to_delete = st.multiselect(
			
 
				+                    "选择要删除的竖线（可多选）",
			
 
				+                    range(len(vertical_lines)),
			
 
				+                    format_func=lambda x: f"第 {x+1} 条竖线 (X: {vertical_lines[x]}) {'🔴已修改' if x in structure.get('modified_v_lines', set()) else ''}"
			
 
				+                )
			
 
				+                
			
 
				+                if lines_to_delete and st.button("🗑️ 批量删除", type="primary"):
			
 
				+                    save_state_for_undo(structure)
			
 
				+                    
			
 
				+                    # 🔧 删除线坐标
			
 
				+                    for idx in sorted(lines_to_delete, reverse=True):
			
 
				+                        del structure['vertical_lines'][idx]
			
 
				+                    
			
 
				+                    # 🔧 重新计算 columns
			
 
				+                    new_columns = []
			
 
				+                    for i in range(len(structure['vertical_lines']) - 1):
			
 
				+                        new_columns.append({
			
 
				+                            'x_start': structure['vertical_lines'][i],
			
 
				+                            'x_end': structure['vertical_lines'][i + 1]
			
 
				+                        })
			
 
				+                    structure['columns'] = new_columns
			
 
				+                    
			
 
				+                    # 重新计算列宽
			
 
				+                    structure['col_widths'] = [
			
 
				+                        col['x_end'] - col['x_start'] 
			
 
				+                        for col in new_columns
			
 
				+                    ]
			
 
				+                    
			
 
				+                    # 更新修改标记
			
 
				+                    structure['modified_v_lines'] = set()
			
 
				+                    
			
 
				+                    clear_table_image_cache()
			
 
				+                    st.success(f"✅ 已删除 {len(lines_to_delete)} 条竖线")
			
 
				+                    st.rerun()
			
 
				+                
			
 
				+                st.info(f"💡 当前有 {len(vertical_lines)} 条竖线，已选择 {len(lines_to_delete)} 条")
			
 
				+            else:
			
 
				+                st.warning("⚠️ 没有可删除的列")
			
 
				+        
			
 
				+        # 保存配置
			
 
				+        st.divider()
			
 
				+        
			
 
				+        save_col1, save_col2, save_col3 = st.columns(3)
			
 
				+        
			
 
				+        with save_col1:
			
 
				+            save_structure = st.checkbox("保存表格结构配置", value=True)
			
 
				+        
			
 
				+        with save_col2:
			
 
				+            save_image = st.checkbox("保存表格线图片", value=True)
			
 
				+        
			
 
				+        with save_col3:
			
 
				+            # 🆕 线条颜色选择
			
 
				+            line_color_option = st.selectbox(
			
 
				+                "保存时线条颜色",
			
 
				+                ["黑色", "蓝色", "红色"],
			
 
				+                index=0
			
 
				+            )
			
 
				+        
			
 
				+        if st.button("💾 保存", type="primary"):
			
 
				+            output_dir = Path("output/table_structures")
			
 
				+            output_dir.mkdir(parents=True, exist_ok=True)
			
 
				+            
			
 
				+            base_name = Path(st.session_state.loaded_image_name).stem
			
 
				+            saved_files = []
			
 
				+            
			
 
				+            if save_structure:
			
 
				+                structure_path = output_dir / f"{base_name}_structure.json"
			
 
				+                
			
 
				+                # 🔧 保存线坐标列表
			
 
				+                save_structure_data = {
			
 
				+                    'rows': structure['rows'],
			
 
				+                    'columns': structure['columns'],
			
 
				+                    'horizontal_lines': structure.get('horizontal_lines', []),
			
 
				+                    'vertical_lines': structure.get('vertical_lines', []),
			
 
				+                    'row_height': structure['row_height'],
			
 
				+                    'col_widths': structure['col_widths'],
			
 
				+                    'table_bbox': structure['table_bbox'],
			
 
				+                    'modified_h_lines': list(structure.get('modified_h_lines', set())),
			
 
				+                    'modified_v_lines': list(structure.get('modified_v_lines', set()))
			
 
				+                }
			
 
				+                
			
 
				+                with open(structure_path, 'w', encoding='utf-8') as f:
			
 
				+                    json.dump(save_structure_data, f, indent=2, ensure_ascii=False)
			
 
				+                
			
 
				+                saved_files.append(("配置文件", structure_path))
			
 
				+                
			
 
				+                with open(structure_path, 'r') as f:
			
 
				+                    st.download_button(
			
 
				+                        "📥 下载配置文件",
			
 
				+                        f.read(),
			
 
				+                        file_name=f"{base_name}_structure.json",
			
 
				+                        mime="application/json"
			
 
				+                    )
			
 
				+            
			
 
				+            if save_image:
			
 
				+                # 🆕 根据选择的颜色绘制纯净表格线
			
 
				+                color_map = {
			
 
				+                    "黑色": (0, 0, 0),
			
 
				+                    "蓝色": (0, 0, 255),
			
 
				+                    "红色": (255, 0, 0)
			
 
				+                }
			
 
				+                selected_color = color_map[line_color_option]
			
 
				+                
			
 
				+                # 🎯 使用纯净绘制函数
			
 
				+                clean_img = draw_clean_table_lines(
			
 
				+                    image,
			
 
				+                    structure,
			
 
				+                    line_width=line_width,
			
 
				+                    line_color=selected_color
			
 
				+                )
			
 
				+                
			
 
				+                output_image_path = output_dir / f"{base_name}_with_lines.png"
			
 
				+                clean_img.save(output_image_path)
			
 
				+                saved_files.append(("表格线图片", output_image_path))
			
 
				+                
			
 
				+                # 🆕 提供下载按钮
			
 
				+                import io
			
 
				+                buf = io.BytesIO()
			
 
				+                clean_img.save(buf, format='PNG')
			
 
				+                buf.seek(0)
			
 
				+                
			
 
				+                st.download_button(
			
 
				+                    "📥 下载表格线图片",
			
 
				+                    buf,
			
 
				+                    file_name=f"{base_name}_with_lines.png",
			
 
				+                    mime="image/png"
			
 
				+                )
			
 
				+            
			
 
				+            if saved_files:
			
 
				+                st.success(f"✅ 已保存 {len(saved_files)} 个文件:")
			
 
				+                for file_type, file_path in saved_files:
			
 
				+                    st.info(f"  • {file_type}: {file_path}")
			
 
				+
			
 
				+
			
 
				+if __name__ == "__main__":
			
 
				+    create_table_line_editor()
			
--- a/table_line_generator/table_line_generator.py
+++ b/table_line_generator/table_line_generator.py
@@ -0,0 +1,541 @@
 
				+"""
			
 
				+基于 OCR bbox 的表格线生成模块
			
 
				+自动分析无线表格的行列结构，生成表格线
			
 
				+"""
			
 
				+
			
 
				+import cv2
			
 
				+import numpy as np
			
 
				+from PIL import Image, ImageDraw
			
 
				+from pathlib import Path
			
 
				+from typing import List, Dict, Tuple, Optional, Union
			
 
				+import json
			
 
				+
			
 
				+
			
 
				+class TableLineGenerator:
			
 
				+    """表格线生成器"""
			
 
				+    
			
 
				+    def __init__(self, image: Union[str, Image.Image], ocr_data: List[Dict]):
			
 
				+        """
			
 
				+        初始化表格线生成器
			
 
				+        
			
 
				+        Args:
			
 
				+            image: 图片路径(str) 或 PIL.Image 对象
			
 
				+            ocr_data: OCR识别结果（包含bbox）
			
 
				+        """
			
 
				+        if isinstance(image, str):
			
 
				+            # 传入的是路径
			
 
				+            self.image_path = image
			
 
				+            self.image = Image.open(image)
			
 
				+        elif isinstance(image, Image.Image):
			
 
				+            # 传入的是 PIL Image 对象
			
 
				+            self.image_path = None  # 没有路径
			
 
				+            self.image = image
			
 
				+        else:
			
 
				+            raise TypeError(
			
 
				+                f"image 参数必须是 str (路径) 或 PIL.Image.Image 对象，"
			
 
				+                f"实际类型: {type(image)}"
			
 
				+            )
			
 
				+        
			
 
				+        self.ocr_data = ocr_data
			
 
				+        
			
 
				+        # 表格结构参数
			
 
				+        self.rows = []          # 行坐标列表 [(y_start, y_end), ...]
			
 
				+        self.columns = []       # 列坐标列表 [(x_start, x_end), ...]
			
 
				+        self.row_height = 0     # 标准行高
			
 
				+        self.col_widths = []    # 各列宽度
			
 
				+    
			
 
				+    @staticmethod
			
 
				+    def parse_ppstructure_result(ocr_result: Dict) -> Tuple[List[int], List[Dict]]:
			
 
				+        """
			
 
				+        解析 PPStructure V3 的 OCR 结果
			
 
				+        
			
 
				+        Args:
			
 
				+            ocr_result: PPStructure V3 的完整 JSON 结果
			
 
				+        
			
 
				+        Returns:
			
 
				+            (table_bbox, text_boxes): 表格边界框和文本框列表
			
 
				+        """
			
 
				+        # 1. 从 parsing_res_list 中找到 table 区域
			
 
				+        table_bbox = None
			
 
				+        if 'parsing_res_list' in ocr_result:
			
 
				+            for block in ocr_result['parsing_res_list']:
			
 
				+                if block.get('block_label') == 'table':
			
 
				+                    table_bbox = block.get('block_bbox')
			
 
				+                    break
			
 
				+        
			
 
				+        if not table_bbox:
			
 
				+            raise ValueError("未找到表格区域 (block_label='table')")
			
 
				+        
			
 
				+        # 2. 从 overall_ocr_res 中提取文本框（使用 rec_boxes）
			
 
				+        text_boxes = []
			
 
				+        if 'overall_ocr_res' in ocr_result:
			
 
				+            rec_boxes = ocr_result['overall_ocr_res'].get('rec_boxes', [])
			
 
				+            rec_texts = ocr_result['overall_ocr_res'].get('rec_texts', [])
			
 
				+            
			
 
				+            # 过滤出表格区域内的文本框
			
 
				+            for i, bbox in enumerate(rec_boxes):
			
 
				+                if len(bbox) >= 4:
			
 
				+                    # bbox 格式: [x1, y1, x2, y2]
			
 
				+                    x1, y1, x2, y2 = bbox[:4]
			
 
				+                    
			
 
				+                    # 判断文本框是否在表格区域内
			
 
				+                    if (x1 >= table_bbox[0] and y1 >= table_bbox[1] and
			
 
				+                        x2 <= table_bbox[2] and y2 <= table_bbox[3]):
			
 
				+                        text_boxes.append({
			
 
				+                            'bbox': [int(x1), int(y1), int(x2), int(y2)],
			
 
				+                            'text': rec_texts[i] if i < len(rec_texts) else ''
			
 
				+                        })
			
 
				+            # 对text_boxes从上到下，从左到右排序
			
 
				+            text_boxes.sort(key=lambda x: (x['bbox'][1], x['bbox'][0]))
			
 
				+        
			
 
				+        return table_bbox, text_boxes
			
 
				+        
			
 
				+    def analyze_table_structure(self, 
			
 
				+                               y_tolerance: int = 5,
			
 
				+                               x_tolerance: int = 10,
			
 
				+                               min_row_height: int = 20) -> Dict:
			
 
				+        """
			
 
				+        分析表格结构（行列分布）
			
 
				+        
			
 
				+        Args:
			
 
				+            y_tolerance: Y轴聚类容差（像素）
			
 
				+            x_tolerance: X轴聚类容差（像素）
			
 
				+            min_row_height: 最小行高（像素）
			
 
				+        
			
 
				+        Returns:
			
 
				+            表格结构信息，包含:
			
 
				+            - rows: 行区间列表
			
 
				+            - columns: 列区间列表
			
 
				+            - horizontal_lines: 横线Y坐标列表 [y1, y2, ..., y_{n+1}]
			
 
				+            - vertical_lines: 竖线X坐标列表 [x1, x2, ..., x_{m+1}]
			
 
				+            - row_height: 标准行高
			
 
				+            - col_widths: 各列宽度
			
 
				+            - table_bbox: 表格边界框
			
 
				+        """
			
 
				+        if not self.ocr_data:
			
 
				+            return {}
			
 
				+        
			
 
				+        # 1. 提取所有bbox的Y坐标（用于行检测）
			
 
				+        y_coords = []
			
 
				+        for item in self.ocr_data:
			
 
				+            bbox = item.get('bbox', [])
			
 
				+            if len(bbox) >= 4:
			
 
				+                y1, y2 = bbox[1], bbox[3]
			
 
				+                y_coords.append((y1, y2, bbox))
			
 
				+        
			
 
				+        # 按Y坐标排序
			
 
				+        y_coords.sort(key=lambda x: x[0])
			
 
				+        
			
 
				+        # 2. 聚类检测行（基于Y坐标相近的bbox）
			
 
				+        self.rows = self._cluster_rows(y_coords, y_tolerance, min_row_height)
			
 
				+        
			
 
				+        # 3. 计算标准行高（中位数）
			
 
				+        row_heights = [row['y_end'] - row['y_start'] for row in self.rows]
			
 
				+        self.row_height = int(np.median(row_heights)) if row_heights else 30
			
 
				+        
			
 
				+        # 4. 提取所有bbox的X坐标（用于列检测）
			
 
				+        x_coords = []
			
 
				+        for item in self.ocr_data:
			
 
				+            bbox = item.get('bbox', [])
			
 
				+            if len(bbox) >= 4:
			
 
				+                x1, x2 = bbox[0], bbox[2]
			
 
				+                x_coords.append((x1, x2))
			
 
				+        
			
 
				+        # 5. 聚类检测列（基于X坐标相近的bbox）
			
 
				+        self.columns = self._cluster_columns(x_coords, x_tolerance)
			
 
				+        
			
 
				+        # 6. 计算各列宽度
			
 
				+        self.col_widths = [col['x_end'] - col['x_start'] for col in self.columns]
			
 
				+        
			
 
				+        # 🆕 7. 生成横线坐标列表（共 n+1 条）
			
 
				+        horizontal_lines = []
			
 
				+        for row in self.rows:
			
 
				+            horizontal_lines.append(row['y_start'])
			
 
				+        # 添加最后一条横线
			
 
				+        if self.rows:
			
 
				+            horizontal_lines.append(self.rows[-1]['y_end'])
			
 
				+        
			
 
				+        # 🆕 8. 生成竖线坐标列表（共 m+1 条）
			
 
				+        vertical_lines = []
			
 
				+        for col in self.columns:
			
 
				+            vertical_lines.append(col['x_start'])
			
 
				+        # 添加最后一条竖线
			
 
				+        if self.columns:
			
 
				+            vertical_lines.append(self.columns[-1]['x_end'])
			
 
				+        
			
 
				+        return {
			
 
				+            'rows': self.rows,
			
 
				+            'columns': self.columns,
			
 
				+            'horizontal_lines': horizontal_lines,  # 🆕 横线Y坐标列表
			
 
				+            'vertical_lines': vertical_lines,      # 🆕 竖线X坐标列表
			
 
				+            'row_height': self.row_height,
			
 
				+            'col_widths': self.col_widths,
			
 
				+            'table_bbox': self._get_table_bbox()
			
 
				+        }
			
 
				+    
			
 
				+    def _cluster_rows(self, y_coords: List[Tuple], tolerance: int, min_height: int) -> List[Dict]:
			
 
				+        """
			
 
				+        聚类检测行
			
 
				+        
			
 
				+        策略：
			
 
				+        1. 按Y坐标排序
			
 
				+        2. 相近的Y坐标（容差内）归为同一行
			
 
				+        3. 过滤掉高度过小的行
			
 
				+        """
			
 
				+        if not y_coords:
			
 
				+            return []
			
 
				+        
			
 
				+        rows = []
			
 
				+        current_row = {
			
 
				+            'y_start': y_coords[0][0],
			
 
				+            'y_end': y_coords[0][1],
			
 
				+            'bboxes': [y_coords[0][2]]
			
 
				+        }
			
 
				+        
			
 
				+        for i in range(1, len(y_coords)):
			
 
				+            y1, y2, bbox = y_coords[i]
			
 
				+            
			
 
				+            # 判断是否属于当前行（Y坐标相近）
			
 
				+            if abs(y1 - current_row['y_start']) <= tolerance:
			
 
				+                # 更新行的Y范围
			
 
				+                current_row['y_start'] = min(current_row['y_start'], y1)
			
 
				+                current_row['y_end'] = max(current_row['y_end'], y2)
			
 
				+                current_row['bboxes'].append(bbox)
			
 
				+            else:
			
 
				+                # 保存当前行（如果高度足够）
			
 
				+                if current_row['y_end'] - current_row['y_start'] >= min_height:
			
 
				+                    rows.append(current_row)
			
 
				+                
			
 
				+                # 开始新行
			
 
				+                current_row = {
			
 
				+                    'y_start': y1,
			
 
				+                    'y_end': y2,
			
 
				+                    'bboxes': [bbox]
			
 
				+                }
			
 
				+        
			
 
				+        # 保存最后一行
			
 
				+        if current_row['y_end'] - current_row['y_start'] >= min_height:
			
 
				+            rows.append(current_row)
			
 
				+        
			
 
				+        return rows
			
 
				+    
			
 
				+    def _cluster_columns(self, x_coords: List[Tuple], tolerance: int) -> List[Dict]:
			
 
				+        """
			
 
				+        聚类检测列
			
 
				+        
			
 
				+        策略：
			
 
				+        1. 提取所有bbox的左边界和右边界
			
 
				+        2. 聚类相近的X坐标
			
 
				+        3. 生成列分界线
			
 
				+        """
			
 
				+        if not x_coords:
			
 
				+            return []
			
 
				+        
			
 
				+        # 提取所有X坐标（左边界和右边界）
			
 
				+        all_x = []
			
 
				+        for x1, x2 in x_coords:
			
 
				+            all_x.append(x1)
			
 
				+            all_x.append(x2)
			
 
				+        
			
 
				+        all_x = sorted(set(all_x))
			
 
				+        
			
 
				+        # 聚类X坐标
			
 
				+        columns = []
			
 
				+        current_x = all_x[0]
			
 
				+        
			
 
				+        for x in all_x[1:]:
			
 
				+            if x - current_x > tolerance:
			
 
				+                # 新列开始
			
 
				+                columns.append(current_x)
			
 
				+                current_x = x
			
 
				+        
			
 
				+        columns.append(current_x)
			
 
				+        
			
 
				+        # 生成列区间
			
 
				+        column_regions = []
			
 
				+        for i in range(len(columns) - 1):
			
 
				+            column_regions.append({
			
 
				+                'x_start': columns[i],
			
 
				+                'x_end': columns[i + 1]
			
 
				+            })
			
 
				+        
			
 
				+        return column_regions
			
 
				+    
			
 
				+    def _get_table_bbox(self) -> List[int]:
			
 
				+        """获取表格整体边界框"""
			
 
				+        if not self.rows or not self.columns:
			
 
				+            return [0, 0, self.image.width, self.image.height]
			
 
				+        
			
 
				+        y_min = min(row['y_start'] for row in self.rows)
			
 
				+        y_max = max(row['y_end'] for row in self.rows)
			
 
				+        x_min = min(col['x_start'] for col in self.columns)
			
 
				+        x_max = max(col['x_end'] for col in self.columns)
			
 
				+        
			
 
				+        return [x_min, y_min, x_max, y_max]
			
 
				+    
			
 
				+    def generate_table_lines(self, 
			
 
				+                            line_color: Tuple[int, int, int] = (0, 0, 255),
			
 
				+                            line_width: int = 2) -> Image.Image:
			
 
				+        """
			
 
				+        在原图上绘制表格线
			
 
				+        
			
 
				+        Args:
			
 
				+            line_color: 线条颜色 (R, G, B)
			
 
				+            line_width: 线条宽度
			
 
				+        
			
 
				+        Returns:
			
 
				+            绘制了表格线的图片
			
 
				+        """
			
 
				+        # 复制原图
			
 
				+        img_with_lines = self.image.copy()
			
 
				+        draw = ImageDraw.Draw(img_with_lines)
			
 
				+        
			
 
				+        # 🔧 简化：使用行列区间而不是重复计算
			
 
				+        x_start = self.columns[0]['x_start'] if self.columns else 0
			
 
				+        x_end = self.columns[-1]['x_end'] if self.columns else img_with_lines.width
			
 
				+        y_start = self.rows[0]['y_start'] if self.rows else 0
			
 
				+        y_end = self.rows[-1]['y_end'] if self.rows else img_with_lines.height
			
 
				+        
			
 
				+        # 绘制横线（包括最后一条）
			
 
				+        for row in self.rows:
			
 
				+            y = row['y_start']
			
 
				+            draw.line([(x_start, y), (x_end, y)], fill=line_color, width=line_width)
			
 
				+        
			
 
				+        # 绘制最后一条横线
			
 
				+        if self.rows:
			
 
				+            y = self.rows[-1]['y_end']
			
 
				+            draw.line([(x_start, y), (x_end, y)], fill=line_color, width=line_width)
			
 
				+        
			
 
				+        # 绘制竖线（包括最后一条）
			
 
				+        for col in self.columns:
			
 
				+            x = col['x_start']
			
 
				+            draw.line([(x, y_start), (x, y_end)], fill=line_color, width=line_width)
			
 
				+        
			
 
				+        # 绘制最后一条竖线
			
 
				+        if self.columns:
			
 
				+            x = self.columns[-1]['x_end']
			
 
				+            draw.line([(x, y_start), (x, y_end)], fill=line_color, width=line_width)
			
 
				+        
			
 
				+        return img_with_lines
			
 
				+    
			
 
				+    def save_table_structure(self, output_path: str):
			
 
				+        """保存表格结构配置（用于应用到其他页）"""
			
 
				+        structure = {
			
 
				+            'row_height': self.row_height,
			
 
				+            'col_widths': self.col_widths,
			
 
				+            'columns': self.columns,
			
 
				+            'first_row_y': self.rows[0]['y_start'] if self.rows else 0,
			
 
				+            'table_bbox': self._get_table_bbox()
			
 
				+        }
			
 
				+        
			
 
				+        with open(output_path, 'w', encoding='utf-8') as f:
			
 
				+            json.dump(structure, f, indent=2, ensure_ascii=False)
			
 
				+        
			
 
				+        return structure
			
 
				+    
			
 
				+    def apply_structure_to_image(self, 
			
 
				+                                target_image: Union[str, Image.Image],
			
 
				+                                structure: Dict,
			
 
				+                                output_path: str) -> str:
			
 
				+        """
			
 
				+        将表格结构应用到其他页
			
 
				+        
			
 
				+        Args:
			
 
				+            target_image: 目标图片路径(str) 或 PIL.Image 对象
			
 
				+            structure: 表格结构配置
			
 
				+            output_path: 输出路径
			
 
				+        
			
 
				+        Returns:
			
 
				+            生成的有线表格图片路径
			
 
				+        """
			
 
				+        # 🔧 修改：支持传入 Image 对象或路径
			
 
				+        if isinstance(target_image, str):
			
 
				+            target_img = Image.open(target_image)
			
 
				+        elif isinstance(target_image, Image.Image):
			
 
				+            target_img = target_image
			
 
				+        else:
			
 
				+            raise TypeError(
			
 
				+                f"target_image 参数必须是 str (路径) 或 PIL.Image.Image 对象，"
			
 
				+                f"实际类型: {type(target_image)}"
			
 
				+            )
			
 
				+        
			
 
				+        draw = ImageDraw.Draw(target_img)
			
 
				+        
			
 
				+        row_height = structure['row_height']
			
 
				+        col_widths = structure['col_widths']
			
 
				+        columns = structure['columns']
			
 
				+        first_row_y = structure['first_row_y']
			
 
				+        table_bbox = structure['table_bbox']
			
 
				+        
			
 
				+        # 计算行数（根据图片高度）
			
 
				+        num_rows = int((target_img.height - first_row_y) / row_height)
			
 
				+        
			
 
				+        # 绘制横线
			
 
				+        for i in range(num_rows + 1):
			
 
				+            y = first_row_y + i * row_height
			
 
				+            draw.line([(table_bbox[0], y), (table_bbox[2], y)], 
			
 
				+                     fill=(0, 0, 255), width=2)
			
 
				+        
			
 
				+        # 绘制竖线
			
 
				+        for col in columns:
			
 
				+            x = col['x_start']
			
 
				+            draw.line([(x, first_row_y), (x, first_row_y + num_rows * row_height)],
			
 
				+                     fill=(0, 0, 255), width=2)
			
 
				+        
			
 
				+        # 绘制最后一条竖线
			
 
				+        x = columns[-1]['x_end']
			
 
				+        draw.line([(x, first_row_y), (x, first_row_y + num_rows * row_height)],
			
 
				+                 fill=(0, 0, 255), width=2)
			
 
				+        
			
 
				+        # 保存
			
 
				+        target_img.save(output_path)
			
 
				+        return output_path
			
 
				+
			
 
				+
			
 
				+def generate_table_lines_from_ppstructure(
			
 
				+    json_path: str,
			
 
				+    output_dir: str,
			
 
				+    config: Dict
			
 
				+) -> Dict:
			
 
				+    """
			
 
				+    从 PPStructure V3 结果生成表格线
			
 
				+    
			
 
				+    Args:
			
 
				+        json_path: PPStructure V3 结果 JSON 路径
			
 
				+        output_dir: 输出目录
			
 
				+        config: 配置字典
			
 
				+    
			
 
				+    Returns:
			
 
				+        生成结果信息
			
 
				+    """
			
 
				+    # 1. 加载 PPStructure V3 结果
			
 
				+    with open(json_path, 'r', encoding='utf-8') as f:
			
 
				+        ppstructure_result = json.load(f)
			
 
				+    
			
 
				+    # 2. 解析表格区域和文本框
			
 
				+    table_bbox, text_boxes = TableLineGenerator.parse_ppstructure_result(ppstructure_result)
			
 
				+    
			
 
				+    print(f"✅ 表格区域: {table_bbox}")
			
 
				+    print(f"✅ 表格内文本框数量: {len(text_boxes)}")
			
 
				+    
			
 
				+    # 3. 查找对应图片
			
 
				+    json_file = Path(json_path)
			
 
				+    
			
 
				+    # 从 PPStructure 结果中获取原图路径
			
 
				+    input_path = ppstructure_result.get('input_path')
			
 
				+    if input_path and Path(input_path).exists():
			
 
				+        image_path = Path(input_path)
			
 
				+    else:
			
 
				+        # 尝试根据 JSON 文件名查找图片
			
 
				+        image_path = json_file.with_suffix('.png')
			
 
				+        if not image_path.exists():
			
 
				+            image_path = json_file.with_suffix('.jpg')
			
 
				+    
			
 
				+    if not image_path.exists():
			
 
				+        raise FileNotFoundError(f"找不到图片: {image_path}")
			
 
				+    
			
 
				+    print(f"✅ 图片路径: {image_path}")
			
 
				+    
			
 
				+    # 4. 初始化表格线生成器
			
 
				+    generator = TableLineGenerator(str(image_path), text_boxes)
			
 
				+    
			
 
				+    # 5. 分析表格结构
			
 
				+    structure = generator.analyze_table_structure(
			
 
				+        y_tolerance=config.get('y_tolerance', 5),
			
 
				+        x_tolerance=config.get('x_tolerance', 10),
			
 
				+        min_row_height=config.get('min_row_height', 20)
			
 
				+    )
			
 
				+    
			
 
				+    print(f"✅ 检测到 {len(structure['rows'])} 行，{len(structure['columns'])} 列")
			
 
				+    print(f"✅ 标准行高: {structure['row_height']}px")
			
 
				+    
			
 
				+    # 6. 生成表格线图片
			
 
				+    img_with_lines = generator.generate_table_lines(
			
 
				+        line_color=tuple(config.get('line_color', [0, 0, 255])),
			
 
				+        line_width=config.get('line_width', 2)
			
 
				+    )
			
 
				+    
			
 
				+    # 7. 保存结果
			
 
				+    output_path = Path(output_dir)
			
 
				+    output_path.mkdir(parents=True, exist_ok=True)
			
 
				+    
			
 
				+    output_image_path = output_path / f"{json_file.stem}_with_lines.jpg"
			
 
				+    img_with_lines.save(output_image_path)
			
 
				+    
			
 
				+    # 保存表格结构配置
			
 
				+    structure_path = output_path / f"{json_file.stem}_structure.json"
			
 
				+    generator.save_table_structure(str(structure_path))
			
 
				+    
			
 
				+    return {
			
 
				+        'image_with_lines': str(output_image_path),
			
 
				+        'structure_config': str(structure_path),
			
 
				+        'structure': structure,
			
 
				+        'table_bbox': table_bbox,
			
 
				+        'text_boxes_count': len(text_boxes)
			
 
				+    }
			
 
				+
			
 
				+
			
 
				+def generate_table_lines_for_page(json_path: str, 
			
 
				+                                  output_dir: str,
			
 
				+                                  config: Dict) -> Dict:
			
 
				+    """
			
 
				+    为单页生成表格线（兼容旧版接口）
			
 
				+    
			
 
				+    Args:
			
 
				+        json_path: OCR结果JSON路径
			
 
				+        output_dir: 输出目录
			
 
				+        config: 配置字典
			
 
				+    
			
 
				+    Returns:
			
 
				+        生成结果信息
			
 
				+    """
			
 
				+    # 加载OCR数据
			
 
				+    with open(json_path, 'r', encoding='utf-8') as f:
			
 
				+        ocr_data = json.load(f)
			
 
				+    
			
 
				+    # 判断是否为 PPStructure 结果
			
 
				+    if 'parsing_res_list' in ocr_data and 'overall_ocr_res' in ocr_data:
			
 
				+        # 使用新的 PPStructure 解析函数
			
 
				+        return generate_table_lines_from_ppstructure(json_path, output_dir, config)
			
 
				+    
			
 
				+    # 查找对应图片
			
 
				+    json_file = Path(json_path)
			
 
				+    image_path = json_file.with_suffix('.jpg')
			
 
				+    if not image_path.exists():
			
 
				+        image_path = json_file.with_suffix('.png')
			
 
				+    
			
 
				+    if not image_path.exists():
			
 
				+        raise FileNotFoundError(f"找不到图片: {image_path}")
			
 
				+    
			
 
				+    # 初始化表格线生成器
			
 
				+    generator = TableLineGenerator(str(image_path), ocr_data)
			
 
				+    
			
 
				+    # 分析表格结构
			
 
				+    structure = generator.analyze_table_structure(
			
 
				+        y_tolerance=config.get('y_tolerance', 5),
			
 
				+        x_tolerance=config.get('x_tolerance', 10),
			
 
				+        min_row_height=config.get('min_row_height', 20)
			
 
				+    )
			
 
				+    
			
 
				+    # 生成表格线图片
			
 
				+    img_with_lines = generator.generate_table_lines(
			
 
				+        line_color=tuple(config.get('line_color', [0, 0, 255])),
			
 
				+        line_width=config.get('line_width', 2)
			
 
				+    )
			
 
				+    
			
 
				+    # 保存
			
 
				+    output_path = Path(output_dir)
			
 
				+    output_path.mkdir(parents=True, exist_ok=True)
			
 
				+    
			
 
				+    output_image_path = output_path / f"{json_file.stem}_with_lines.jpg"
			
 
				+    img_with_lines.save(output_image_path)
			
 
				+    
			
 
				+    # 保存表格结构配置
			
 
				+    structure_path = output_path / f"{json_file.stem}_structure.json"
			
 
				+    generator.save_table_structure(str(structure_path))
			
 
				+    
			
 
				+    return {
			
 
				+        'image_with_lines': str(output_image_path),
			
 
				+        'structure_config': str(structure_path),
			
 
				+        'structure': structure
			
 
				+    }
			
--- a/table_line_generator/table_template_applier.py
+++ b/table_line_generator/table_template_applier.py
@@ -0,0 +1,653 @@
 
				+"""
			
 
				+表格模板应用器
			
 
				+将人工标注的表格结构应用到其他页面
			
 
				+"""
			
 
				+
			
 
				+import json
			
 
				+from pathlib import Path
			
 
				+from PIL import Image, ImageDraw
			
 
				+from typing import Dict, List, Tuple
			
 
				+import numpy as np
			
 
				+import argparse
			
 
				+
			
 
				+try:
			
 
				+    from table_line_generator import TableLineGenerator
			
 
				+except ImportError:
			
 
				+    from .table_line_generator import TableLineGenerator
			
 
				+
			
 
				+
			
 
				+class TableTemplateApplier:
			
 
				+    """表格模板应用器"""
			
 
				+    
			
 
				+    def __init__(self, template_config_path: str):
			
 
				+        """
			
 
				+        初始化模板应用器
			
 
				+        
			
 
				+        Args:
			
 
				+            template_config_path: 模板配置文件路径（人工标注的结果）
			
 
				+        """
			
 
				+        with open(template_config_path, 'r', encoding='utf-8') as f:
			
 
				+            self.template = json.load(f)
			
 
				+        
			
 
				+        # 🎯 从标注结果提取固定参数
			
 
				+        self.col_widths = self.template['col_widths']
			
 
				+        
			
 
				+        # 🔧 计算数据行的标准行高（排除表头）
			
 
				+        rows = self.template['rows']
			
 
				+        if len(rows) > 1:
			
 
				+            # 计算每行的实际高度
			
 
				+            row_heights = [row['y_end'] - row['y_start'] for row in rows]
			
 
				+            
			
 
				+            # 🎯 假设第一行是表头，从第二行开始计算
			
 
				+            data_row_heights = row_heights[1:] if len(row_heights) > 1 else row_heights
			
 
				+            
			
 
				+            # 使用中位数作为标准行高（更稳健）
			
 
				+            self.row_height = int(np.median(data_row_heights))
			
 
				+            self.header_height = row_heights[0] if row_heights else self.row_height
			
 
				+            
			
 
				+            print(f"📏 表头高度: {self.header_height}px")
			
 
				+            print(f"📏 数据行高度: {self.row_height}px")
			
 
				+            print(f"   （从 {len(data_row_heights)} 行数据中计算，中位数）")
			
 
				+        else:
			
 
				+            # 兜底方案
			
 
				+            self.row_height = self.template.get('row_height', 60)
			
 
				+            self.header_height = self.row_height
			
 
				+        
			
 
				+        # 🎯 计算列的相对位置（从第一列开始的偏移量）
			
 
				+        self.col_offsets = [0]
			
 
				+        for width in self.col_widths:
			
 
				+            self.col_offsets.append(self.col_offsets[-1] + width)
			
 
				+        
			
 
				+        # 🎯 提取表头的Y坐标（作为参考）
			
 
				+        self.template_header_y = rows[0]['y_start'] if rows else 0
			
 
				+        
			
 
				+        print(f"\n✅ 加载模板配置:")
			
 
				+        print(f"   表头高度: {self.header_height}px")
			
 
				+        print(f"   数据行高度: {self.row_height}px")
			
 
				+        print(f"   列数: {len(self.col_widths)}")
			
 
				+        print(f"   列宽: {self.col_widths}")
			
 
				+    
			
 
				+    def detect_table_anchor(self, ocr_data: List[Dict]) -> Tuple[int, int]:
			
 
				+        """
			
 
				+        检测表格的锚点位置（表头左上角）
			
 
				+        
			
 
				+        策略：
			
 
				+        1. 找到Y坐标最小的文本框（表头第一行）
			
 
				+        2. 找到X坐标最小的文本框（第一列）
			
 
				+        
			
 
				+        Args:
			
 
				+            ocr_data: OCR识别结果
			
 
				+        
			
 
				+        Returns:
			
 
				+            (anchor_x, anchor_y): 表格左上角坐标
			
 
				+        """
			
 
				+        if not ocr_data:
			
 
				+            return (0, 0)
			
 
				+        
			
 
				+        # 找到最小的X和Y坐标
			
 
				+        min_x = min(item['bbox'][0] for item in ocr_data)
			
 
				+        min_y = min(item['bbox'][1] for item in ocr_data)
			
 
				+        
			
 
				+        return (min_x, min_y)
			
 
				+    
			
 
				+    def detect_table_rows(self, ocr_data: List[Dict], header_y: int) -> int:
			
 
				+        """
			
 
				+        检测表格的行数（包括表头）
			
 
				+        
			
 
				+        策略：
			
 
				+        1. 找到Y坐标最大的文本框
			
 
				+        2. 根据数据行高计算行数
			
 
				+        3. 加上表头行
			
 
				+        
			
 
				+        Args:
			
 
				+            ocr_data: OCR识别结果
			
 
				+            header_y: 表头起始Y坐标
			
 
				+        
			
 
				+        Returns:
			
 
				+            总行数（包括表头）
			
 
				+        """
			
 
				+        if not ocr_data:
			
 
				+            return 1  # 至少有表头
			
 
				+        
			
 
				+        max_y = max(item['bbox'][3] for item in ocr_data)
			
 
				+        
			
 
				+        # 🔧 计算数据区的高度（排除表头）
			
 
				+        data_start_y = header_y + self.header_height
			
 
				+        data_height = max_y - data_start_y
			
 
				+        
			
 
				+        # 计算数据行数
			
 
				+        num_data_rows = max(int(data_height / self.row_height), 0)
			
 
				+        
			
 
				+        # 总行数 = 1行表头 + n行数据
			
 
				+        total_rows = 1 + num_data_rows
			
 
				+        
			
 
				+        print(f"📊 行数计算:")
			
 
				+        print(f"   表头Y: {header_y}, 数据区起始Y: {data_start_y}")
			
 
				+        print(f"   最大Y: {max_y}, 数据区高度: {data_height}px")
			
 
				+        print(f"   数据行数: {num_data_rows}, 总行数: {total_rows}")
			
 
				+        
			
 
				+        return total_rows
			
 
				+    
			
 
				+    def apply_to_image(self, 
			
 
				+                       image: Image.Image,
			
 
				+                       ocr_data: List[Dict],
			
 
				+                       anchor_x: int = None,
			
 
				+                       anchor_y: int = None,
			
 
				+                       num_rows: int = None,
			
 
				+                       line_width: int = 2,
			
 
				+                       line_color: Tuple[int, int, int] = (0, 0, 0)) -> Image.Image:
			
 
				+        """
			
 
				+        将模板应用到图片
			
 
				+        
			
 
				+        Args:
			
 
				+            image: 目标图片
			
 
				+            ocr_data: OCR识别结果（用于自动检测锚点）
			
 
				+            anchor_x: 表格起始X坐标（None=自动检测）
			
 
				+            anchor_y: 表头起始Y坐标（None=自动检测）
			
 
				+            num_rows: 总行数（None=自动检测）
			
 
				+            line_width: 线条宽度
			
 
				+            line_color: 线条颜色
			
 
				+        
			
 
				+        Returns:
			
 
				+            绘制了表格线的图片
			
 
				+        """
			
 
				+        img_with_lines = image.copy()
			
 
				+        draw = ImageDraw.Draw(img_with_lines)
			
 
				+        
			
 
				+        # 🔍 自动检测锚点
			
 
				+        if anchor_x is None or anchor_y is None:
			
 
				+            detected_x, detected_y = self.detect_table_anchor(ocr_data)
			
 
				+            anchor_x = anchor_x or detected_x
			
 
				+            anchor_y = anchor_y or detected_y
			
 
				+        
			
 
				+        # 🔍 自动检测行数
			
 
				+        if num_rows is None:
			
 
				+            num_rows = self.detect_table_rows(ocr_data, anchor_y)
			
 
				+        
			
 
				+        print(f"\n📍 表格锚点: ({anchor_x}, {anchor_y})")
			
 
				+        print(f"📊 总行数: {num_rows} (1表头 + {num_rows-1}数据)")
			
 
				+        
			
 
				+        # 🎨 生成横线坐标
			
 
				+        horizontal_lines = []
			
 
				+        
			
 
				+        # 第1条线：表头顶部
			
 
				+        horizontal_lines.append(anchor_y)
			
 
				+        
			
 
				+        # 第2条线：表头底部/数据区顶部
			
 
				+        horizontal_lines.append(anchor_y + self.header_height)
			
 
				+        
			
 
				+        # 后续横线：数据行分隔线
			
 
				+        current_y = anchor_y + self.header_height
			
 
				+        for i in range(num_rows - 1):  # 减1因为表头已经占了1行
			
 
				+            current_y += self.row_height
			
 
				+            horizontal_lines.append(current_y)
			
 
				+        
			
 
				+        # 🎨 生成竖线坐标
			
 
				+        vertical_lines = []
			
 
				+        for offset in self.col_offsets:
			
 
				+            x = anchor_x + offset
			
 
				+            vertical_lines.append(x)
			
 
				+        
			
 
				+        print(f"📏 横线坐标: {horizontal_lines[:3]}... (共{len(horizontal_lines)}条)")
			
 
				+        print(f"📏 竖线坐标: {vertical_lines[:3]}... (共{len(vertical_lines)}条)")
			
 
				+        
			
 
				+        # 🖊️ 绘制横线
			
 
				+        x_start = vertical_lines[0]
			
 
				+        x_end = vertical_lines[-1]
			
 
				+        for y in horizontal_lines:
			
 
				+            draw.line([(x_start, y), (x_end, y)], fill=line_color, width=line_width)
			
 
				+        
			
 
				+        # 🖊️ 绘制竖线
			
 
				+        y_start = horizontal_lines[0]
			
 
				+        y_end = horizontal_lines[-1]
			
 
				+        for x in vertical_lines:
			
 
				+            draw.line([(x, y_start), (x, y_end)], fill=line_color, width=line_width)
			
 
				+        
			
 
				+        return img_with_lines
			
 
				+    
			
 
				+    def generate_structure_for_image(self,
			
 
				+                                    ocr_data: List[Dict],
			
 
				+                                    anchor_x: int = None,
			
 
				+                                    anchor_y: int = None,
			
 
				+                                    num_rows: int = None) -> Dict:
			
 
				+        """
			
 
				+        为新图片生成表格结构配置
			
 
				+        
			
 
				+        Args:
			
 
				+            ocr_data: OCR识别结果
			
 
				+            anchor_x: 表格起始X坐标（None=自动检测）
			
 
				+            anchor_y: 表头起始Y坐标（None=自动检测）
			
 
				+            num_rows: 总行数（None=自动检测）
			
 
				+        
			
 
				+        Returns:
			
 
				+            表格结构配置
			
 
				+        """
			
 
				+        # 🔍 自动检测锚点
			
 
				+        if anchor_x is None or anchor_y is None:
			
 
				+            detected_x, detected_y = self.detect_table_anchor(ocr_data)
			
 
				+            anchor_x = anchor_x or detected_x
			
 
				+            anchor_y = anchor_y or detected_y
			
 
				+        
			
 
				+        # 🔍 自动检测行数
			
 
				+        if num_rows is None:
			
 
				+            num_rows = self.detect_table_rows(ocr_data, anchor_y)
			
 
				+        
			
 
				+        # 🎨 生成横线坐标
			
 
				+        horizontal_lines = []
			
 
				+        horizontal_lines.append(anchor_y)
			
 
				+        horizontal_lines.append(anchor_y + self.header_height)
			
 
				+        
			
 
				+        current_y = anchor_y + self.header_height
			
 
				+        for i in range(num_rows - 1):
			
 
				+            current_y += self.row_height
			
 
				+            horizontal_lines.append(current_y)
			
 
				+        
			
 
				+        # 🎨 生成竖线坐标
			
 
				+        vertical_lines = []
			
 
				+        for offset in self.col_offsets:
			
 
				+            x = anchor_x + offset
			
 
				+            vertical_lines.append(x)
			
 
				+        
			
 
				+        # 🎨 生成行区间
			
 
				+        rows = []
			
 
				+        for i in range(num_rows):
			
 
				+            rows.append({
			
 
				+                'y_start': horizontal_lines[i],
			
 
				+                'y_end': horizontal_lines[i + 1],
			
 
				+                'bboxes': []
			
 
				+            })
			
 
				+        
			
 
				+        # 🎨 生成列区间
			
 
				+        columns = []
			
 
				+        for i in range(len(vertical_lines) - 1):
			
 
				+            columns.append({
			
 
				+                'x_start': vertical_lines[i],
			
 
				+                'x_end': vertical_lines[i + 1]
			
 
				+            })
			
 
				+        
			
 
				+        return {
			
 
				+            'rows': rows,
			
 
				+            'columns': columns,
			
 
				+            'horizontal_lines': horizontal_lines,
			
 
				+            'vertical_lines': vertical_lines,
			
 
				+            'header_height': self.header_height,
			
 
				+            'row_height': self.row_height,
			
 
				+            'col_widths': self.col_widths,
			
 
				+            'table_bbox': [
			
 
				+                vertical_lines[0],
			
 
				+                horizontal_lines[0],
			
 
				+                vertical_lines[-1],
			
 
				+                horizontal_lines[-1]
			
 
				+            ],
			
 
				+            'anchor': {'x': anchor_x, 'y': anchor_y},
			
 
				+            'num_rows': num_rows
			
 
				+        }
			
 
				+
			
 
				+
			
 
				+def apply_template_to_single_file(
			
 
				+    applier: TableTemplateApplier,
			
 
				+    image_file: Path,
			
 
				+    json_file: Path,
			
 
				+    output_dir: Path,
			
 
				+    line_width: int = 2,
			
 
				+    line_color: Tuple[int, int, int] = (0, 0, 0)
			
 
				+) -> bool:
			
 
				+    """
			
 
				+    应用模板到单个文件
			
 
				+    
			
 
				+    Args:
			
 
				+        applier: 模板应用器实例
			
 
				+        image_file: 图片文件路径
			
 
				+        json_file: OCR JSON文件路径
			
 
				+        output_dir: 输出目录
			
 
				+        line_width: 线条宽度
			
 
				+        line_color: 线条颜色
			
 
				+    
			
 
				+    Returns:
			
 
				+        是否成功
			
 
				+    """
			
 
				+    print(f"📄 处理: {image_file.name}")
			
 
				+    
			
 
				+    try:
			
 
				+        # 加载OCR数据
			
 
				+        with open(json_file, 'r', encoding='utf-8') as f:
			
 
				+            raw_data = json.load(f)
			
 
				+        
			
 
				+        # 🔧 解析OCR数据（支持PPStructure格式）
			
 
				+        if 'parsing_res_list' in raw_data and 'overall_ocr_res' in raw_data:
			
 
				+            table_bbox, ocr_data = TableLineGenerator.parse_ppstructure_result(raw_data)
			
 
				+        else:
			
 
				+            raise ValueError("不是PPStructure格式的OCR结果")
			
 
				+        
			
 
				+        print(f"  ✅ 加载OCR数据: {len(ocr_data)} 个文本框")
			
 
				+        
			
 
				+        # 加载图片
			
 
				+        image = Image.open(image_file)
			
 
				+        print(f"  ✅ 加载图片: {image.size}")
			
 
				+        
			
 
				+        # 🎯 应用模板
			
 
				+        img_with_lines = applier.apply_to_image(
			
 
				+            image,
			
 
				+            ocr_data,
			
 
				+            line_width=line_width,
			
 
				+            line_color=line_color
			
 
				+        )
			
 
				+        
			
 
				+        # 保存图片
			
 
				+        output_file = output_dir / f"{image_file.stem}_with_lines.png"
			
 
				+        img_with_lines.save(output_file)
			
 
				+        
			
 
				+        # 🆕 生成并保存结构配置
			
 
				+        structure = applier.generate_structure_for_image(ocr_data)
			
 
				+        structure_file = output_dir / f"{image_file.stem}_structure.json"
			
 
				+        with open(structure_file, 'w', encoding='utf-8') as f:
			
 
				+            json.dump(structure, f, indent=2, ensure_ascii=False)
			
 
				+        
			
 
				+        print(f"  ✅ 保存图片: {output_file.name}")
			
 
				+        print(f"  ✅ 保存配置: {structure_file.name}")
			
 
				+        print(f"  📊 表格: {structure['num_rows']}行 x {len(structure['columns'])}列")
			
 
				+        
			
 
				+        return True
			
 
				+        
			
 
				+    except Exception as e:
			
 
				+        print(f"  ❌ 处理失败: {e}")
			
 
				+        import traceback
			
 
				+        traceback.print_exc()
			
 
				+        return False
			
 
				+
			
 
				+
			
 
				+def apply_template_batch(
			
 
				+    template_config_path: str,
			
 
				+    image_dir: str,
			
 
				+    json_dir: str,
			
 
				+    output_dir: str,
			
 
				+    line_width: int = 2,
			
 
				+    line_color: Tuple[int, int, int] = (0, 0, 0)
			
 
				+):
			
 
				+    """
			
 
				+    批量应用模板到所有图片
			
 
				+    
			
 
				+    Args:
			
 
				+        template_config_path: 模板配置路径
			
 
				+        image_dir: 图片目录
			
 
				+        json_dir: OCR JSON目录
			
 
				+        output_dir: 输出目录
			
 
				+        line_width: 线条宽度
			
 
				+        line_color: 线条颜色
			
 
				+    """
			
 
				+    applier = TableTemplateApplier(template_config_path)
			
 
				+    
			
 
				+    image_path = Path(image_dir)
			
 
				+    json_path = Path(json_dir)
			
 
				+    output_path = Path(output_dir)
			
 
				+    output_path.mkdir(parents=True, exist_ok=True)
			
 
				+    
			
 
				+    # 查找所有图片
			
 
				+    image_files = list(image_path.glob("*.jpg")) + list(image_path.glob("*.png"))
			
 
				+    image_files.sort()
			
 
				+    
			
 
				+    print(f"\n🔍 找到 {len(image_files)} 个图片文件")
			
 
				+    print(f"📂 图片目录: {image_dir}")
			
 
				+    print(f"📂 JSON目录: {json_dir}")
			
 
				+    print(f"📂 输出目录: {output_dir}\n")
			
 
				+    
			
 
				+    results = []
			
 
				+    success_count = 0
			
 
				+    failed_count = 0
			
 
				+    
			
 
				+    for idx, image_file in enumerate(image_files, 1):
			
 
				+        print(f"\n{'='*60}")
			
 
				+        print(f"[{idx}/{len(image_files)}] 处理: {image_file.name}")
			
 
				+        print(f"{'='*60}")
			
 
				+        
			
 
				+        # 查找对应的JSON文件
			
 
				+        json_file = json_path / f"{image_file.stem}.json"
			
 
				+        
			
 
				+        if not json_file.exists():
			
 
				+            print(f"⚠️  找不到OCR结果: {json_file.name}")
			
 
				+            results.append({
			
 
				+                'source': str(image_file),
			
 
				+                'status': 'skipped',
			
 
				+                'reason': 'no_json'
			
 
				+            })
			
 
				+            failed_count += 1
			
 
				+            continue
			
 
				+        
			
 
				+        if apply_template_to_single_file(
			
 
				+            applier, image_file, json_file, output_path, 
			
 
				+            line_width, line_color
			
 
				+        ):
			
 
				+            results.append({
			
 
				+                'source': str(image_file),
			
 
				+                'json': str(json_file),
			
 
				+                'status': 'success'
			
 
				+            })
			
 
				+            success_count += 1
			
 
				+        else:
			
 
				+            results.append({
			
 
				+                'source': str(image_file),
			
 
				+                'json': str(json_file),
			
 
				+                'status': 'error'
			
 
				+            })
			
 
				+            failed_count += 1
			
 
				+        
			
 
				+        print()
			
 
				+    
			
 
				+    # 保存批处理结果
			
 
				+    result_file = output_path / "batch_results.json"
			
 
				+    with open(result_file, 'w', encoding='utf-8') as f:
			
 
				+        json.dump(results, f, indent=2, ensure_ascii=False)
			
 
				+    
			
 
				+    # 统计
			
 
				+    skipped_count = sum(1 for r in results if r['status'] == 'skipped')
			
 
				+    
			
 
				+    print(f"\n{'='*60}")
			
 
				+    print(f"🎉 批处理完成！")
			
 
				+    print(f"{'='*60}")
			
 
				+    print(f"✅ 成功: {success_count}")
			
 
				+    print(f"❌ 失败: {failed_count}")
			
 
				+    print(f"⚠️  跳过: {skipped_count}")
			
 
				+    print(f"📊 总计: {len(results)}")
			
 
				+    print(f"📄 结果保存: {result_file}")
			
 
				+
			
 
				+
			
 
				+def main():
			
 
				+    """主函数"""
			
 
				+    parser = argparse.ArgumentParser(
			
 
				+        description='应用表格模板到其他页面',
			
 
				+        formatter_class=argparse.RawDescriptionHelpFormatter,
			
 
				+        epilog="""
			
 
				+示例用法:
			
 
				+
			
 
				+  1. 批量处理整个目录:
			
 
				+     python table_template_applier.py \\
			
 
				+         --template output/康强_北京农村商业银行_page_001_structure.json \\
			
 
				+         --image-dir /path/to/images \\
			
 
				+         --json-dir /path/to/jsons \\
			
 
				+         --output-dir /path/to/output
			
 
				+
			
 
				+  2. 处理单个文件:
			
 
				+     python table_template_applier.py \\
			
 
				+         --template output/康强_北京农村商业银行_page_001_structure.json \\
			
 
				+         --image-file /path/to/page_002.png \\
			
 
				+         --json-file /path/to/page_002.json \\
			
 
				+         --output-dir /path/to/output
			
 
				+
			
 
				+输出内容:
			
 
				+  - {name}_with_lines.png: 带表格线的图片
			
 
				+  - {name}_structure.json: 表格结构配置
			
 
				+  - batch_results.json: 批处理统计结果
			
 
				+        """
			
 
				+    )
			
 
				+    
			
 
				+    # 模板参数
			
 
				+    parser.add_argument(
			
 
				+        '-t', '--template',
			
 
				+        type=str,
			
 
				+        required=True,
			
 
				+        help='模板配置文件路径（人工标注的第一页结构）'
			
 
				+    )
			
 
				+    
			
 
				+    # 文件参数组
			
 
				+    file_group = parser.add_argument_group('文件参数（单文件模式）')
			
 
				+    file_group.add_argument(
			
 
				+        '--image-file',
			
 
				+        type=str,
			
 
				+        help='图片文件路径'
			
 
				+    )
			
 
				+    file_group.add_argument(
			
 
				+        '--json-file',
			
 
				+        type=str,
			
 
				+        help='OCR JSON文件路径'
			
 
				+    )
			
 
				+    
			
 
				+    # 目录参数组
			
 
				+    dir_group = parser.add_argument_group('目录参数（批量模式）')
			
 
				+    dir_group.add_argument(
			
 
				+        '--image-dir',
			
 
				+        type=str,
			
 
				+        help='图片目录'
			
 
				+    )
			
 
				+    dir_group.add_argument(
			
 
				+        '--json-dir',
			
 
				+        type=str,
			
 
				+        help='OCR JSON目录'
			
 
				+    )
			
 
				+    
			
 
				+    # 输出参数组
			
 
				+    output_group = parser.add_argument_group('输出参数')
			
 
				+    output_group.add_argument(
			
 
				+        '-o', '--output-dir',
			
 
				+        type=str,
			
 
				+        required=True,
			
 
				+        help='输出目录（必需）'
			
 
				+    )
			
 
				+    
			
 
				+    # 绘图参数组
			
 
				+    draw_group = parser.add_argument_group('绘图参数')
			
 
				+    draw_group.add_argument(
			
 
				+        '-w', '--width',
			
 
				+        type=int,
			
 
				+        default=2,
			
 
				+        help='线条宽度（默认: 2）'
			
 
				+    )
			
 
				+    draw_group.add_argument(
			
 
				+        '-c', '--color',
			
 
				+        default='black',
			
 
				+        choices=['black', 'blue', 'red'],
			
 
				+        help='线条颜色（默认: black）'
			
 
				+    )
			
 
				+    
			
 
				+    args = parser.parse_args()
			
 
				+    
			
 
				+    # 颜色映射
			
 
				+    color_map = {
			
 
				+        'black': (0, 0, 0),
			
 
				+        'blue': (0, 0, 255),
			
 
				+        'red': (255, 0, 0)
			
 
				+    }
			
 
				+    line_color = color_map[args.color]
			
 
				+    
			
 
				+    # 验证模板文件
			
 
				+    template_path = Path(args.template)
			
 
				+    if not template_path.exists():
			
 
				+        print(f"❌ 错误: 模板文件不存在: {template_path}")
			
 
				+        return
			
 
				+    
			
 
				+    output_path = Path(args.output_dir)
			
 
				+    output_path.mkdir(parents=True, exist_ok=True)
			
 
				+    
			
 
				+    # 判断模式
			
 
				+    if args.image_file and args.json_file:
			
 
				+        # 单文件模式
			
 
				+        image_file = Path(args.image_file)
			
 
				+        json_file = Path(args.json_file)
			
 
				+        
			
 
				+        if not image_file.exists():
			
 
				+            print(f"❌ 错误: 图片文件不存在: {image_file}")
			
 
				+            return
			
 
				+        
			
 
				+        if not json_file.exists():
			
 
				+            print(f"❌ 错误: JSON文件不存在: {json_file}")
			
 
				+            return
			
 
				+        
			
 
				+        print("\n🔧 单文件处理模式")
			
 
				+        print(f"📄 模板: {template_path.name}")
			
 
				+        print(f"📄 图片: {image_file.name}")
			
 
				+        print(f"📄 JSON: {json_file.name}")
			
 
				+        print(f"📂 输出: {output_path}\n")
			
 
				+        
			
 
				+        applier = TableTemplateApplier(str(template_path))
			
 
				+        
			
 
				+        success = apply_template_to_single_file(
			
 
				+            applier, image_file, json_file, output_path,
			
 
				+            args.width, line_color
			
 
				+        )
			
 
				+        
			
 
				+        if success:
			
 
				+            print("\n✅ 处理完成!")
			
 
				+        else:
			
 
				+            print("\n❌ 处理失败!")
			
 
				+    
			
 
				+    elif args.image_dir and args.json_dir:
			
 
				+        # 批量模式
			
 
				+        image_dir = Path(args.image_dir)
			
 
				+        json_dir = Path(args.json_dir)
			
 
				+        
			
 
				+        if not image_dir.exists():
			
 
				+            print(f"❌ 错误: 图片目录不存在: {image_dir}")
			
 
				+            return
			
 
				+        
			
 
				+        if not json_dir.exists():
			
 
				+            print(f"❌ 错误: JSON目录不存在: {json_dir}")
			
 
				+            return
			
 
				+        
			
 
				+        print("\n🔧 批量处理模式")
			
 
				+        print(f"📄 模板: {template_path.name}")
			
 
				+        
			
 
				+        apply_template_batch(
			
 
				+            str(template_path),
			
 
				+            str(image_dir),
			
 
				+            str(json_dir),
			
 
				+            str(output_path),
			
 
				+            args.width,
			
 
				+            line_color
			
 
				+        )
			
 
				+    
			
 
				+    else:
			
 
				+        parser.print_help()
			
 
				+        print("\n❌ 错误: 请指定单文件模式或批量模式的参数")
			
 
				+        print("\n提示:")
			
 
				+        print("  单文件模式: --image-file + --json-file")
			
 
				+        print("  批量模式:   --image-dir + --json-dir")
			
 
				+
			
 
				+
			
 
				+if __name__ == "__main__":
			
 
				+    print("🚀 启动表格模板批量应用程序...")
			
 
				+    
			
 
				+    import sys
			
 
				+    
			
 
				+    if len(sys.argv) == 1:
			
 
				+        # 如果没有命令行参数，使用默认配置运行
			
 
				+        print("ℹ️  未提供命令行参数，使用默认配置运行...")
			
 
				+        
			
 
				+        # 默认配置
			
 
				+        default_config = {
			
 
				+            "template": "output/table_structures/康强_北京农村商业银行_page_001_structure.json",
			
 
				+            "image-file": "/Users/zhch158/workspace/data/流水分析/康强_北京农村商业银行/ppstructurev3_client_results/康强_北京农村商业银行/康强_北京农村商业银行_page_002.png",
			
 
				+            "json-file": "/Users/zhch158/workspace/data/流水分析/康强_北京农村商业银行/ppstructurev3_client_results/康强_北京农村商业银行_page_002.json",
			
 
				+            "output-dir": "output/batch_results",
			
 
				+            "width": "2",
			
 
				+            "color": "black"
			
 
				+        }
			
 
				+        
			
 
				+        print("⚙️  默认参数:")
			
 
				+        for key, value in default_config.items():
			
 
				+            print(f"  --{key}: {value}")
			
 
				+        
			
 
				+        # 构造参数
			
 
				+        sys.argv = [sys.argv[0]]
			
 
				+        for key, value in default_config.items():
			
 
				+            sys.argv.extend([f"--{key}", str(value)])
			
 
				+    
			
 
				+    sys.exit(main())