批量处理模块用于将首页学习的表格模板应用到多个文件,适用于多页银行流水等场景。
| 特性 | 旧系统 (batch_processor.py) | 新系统 (batch_service.py) |
|---|---|---|
| 依赖 | SmartTableLineGenerator | TableAnalyzer |
| 数据结构 | TableStructure 数据类 | Dict 结构 |
| 列检测 | ColumnDetector 独立模块 | 内置聚类算法 |
| 行检测 | AdaptiveRowSplitter | TableAnalyzer.analyze() |
| 接口 | 命令行工具 | FastAPI REST API |
端点: POST /api/batch/process
请求体:
{
"template_structure": {
"vertical_lines": [100, 200, 300, 400],
"table_bbox": [50, 100, 800, 2000],
"total_cols": 5,
"mode": "cluster"
},
"file_pairs": [
{
"json_path": "/path/to/page_001.json",
"image_path": "/path/to/page_001.png"
},
{
"json_path": "/path/to/page_002.json",
"image_path": "/path/to/page_002.png"
}
],
"output_dir": "/path/to/output",
"parallel": true,
"adjust_rows": true
}
响应:
{
"success": true,
"total": 20,
"processed": 18,
"failed": 2,
"results": [
{
"success": true,
"json_path": "/path/to/page_001.json",
"image_path": "/path/to/page_001.png",
"structure_path": "/path/to/output/page_001_structure.json",
"filename": "page_001.png",
"rows": 45,
"cols": 5
}
],
"message": "批量处理完成: 成功 18/20"
}
端点: POST /api/batch/draw
请求体:
{
"results": [
{
"success": true,
"image_path": "/path/to/page_001.png",
"structure_path": "/path/to/output/page_001_structure.json",
"filename": "page_001.png"
}
],
"line_width": 2,
"line_color": [0, 0, 0]
}
import { batchApi } from '@/api'
// 批量处理
async function processBatch() {
try {
const response = await batchApi.batchProcess({
template_structure: editorStore.structure,
file_pairs: templateStore.filePairs.map(pair => ({
json_path: pair.json_path,
image_path: pair.image_path
})),
output_dir: templateStore.scanConfig.outputDir,
parallel: true,
adjust_rows: true
})
console.log(`处理完成: ${response.processed}/${response.total}`)
// 可选:批量绘图
if (response.success && response.processed > 0) {
const drawResponse = await batchApi.batchDraw({
results: response.results.filter(r => r.success),
line_width: 2,
line_color: [0, 0, 0]
})
console.log(`绘制完成: ${drawResponse.drawn}/${drawResponse.total}`)
}
} catch (error) {
console.error('批量处理失败:', error)
}
}
手动标注首页
选择数据源
批量应用模板
查看结果
_structure.json 文件保存到输出目录adjust_rows:
true(推荐):每页自适应调整行分割,适应不同页面的内容高度false:完全复用模板的行结构,适用于行高度完全一致的场景
parallel:
true(推荐):并行处理,速度快false:串行处理,便于调试backend/
├── services/
│ └── batch_service.py # 核心批量处理逻辑
├── api/
│ └── batch.py # REST API 端点
└── main.py # 注册 batch router
frontend/
└── src/
└── api/
└── batch.ts # 前端 API 客户端
旧的 table_line_generator/batch_processor.py 暂时保留作为参考,但推荐使用新系统: