zhengchun

zhengchun pushed to main at zhengchun/ocr_verify

  • 02ee248bbd fix: 移除多余的默认值设置,优化手动调整功能的行列修改逻辑

12 hours ago

zhengchun pushed to main at zhengchun/ocr_verify

  • 74c95e92f5 feat: 添加无图片模式以仅分析表格结构,优化行列边界计算逻辑
  • adb6af311f feat: 支持混合模式,优化模板应用逻辑并增强OCR数据处理
  • 3716bf591e refactor: 移除未使用的保存结构函数,优化结构数据处理逻辑
  • 3327051a35 fix: 优化目录选择器,避免重复加载数据源配置并重置选择索引
  • 446cf46bcb fix: 修正计算横线位置的间隔处理,确保最小间隔为2
  • View comparison for these 15 commits »

12 hours ago

zhengchun pushed to main at zhengchun/ocr_verify

2 days ago

zhengchun pushed to main at zhengchun/ocr_verify

  • 7b6a80f651 feat: Add table line generator and template applier modules - Implemented TableLineGenerator for generating table lines based on OCR bounding boxes. - Added functionality to analyze table structure, cluster rows and columns, and generate images with table lines. - Created TableTemplateApplier to apply predefined table structures to new images. - Included methods for detecting table anchors, calculating row counts, and generating structure configurations. - Enhanced batch processing capabilities for applying templates to multiple images. - Added command-line interface for single and batch processing modes.
  • 3cf0f6e4da feat: 添加table_line_generator调试配置以支持streamlit应用
  • View comparison for these 2 commits »

3 days ago

zhengchun pushed to main at zhengchun/ocr_verify

  • b2ff96cc83 fix: 修复图表宽度设置,使用容器宽度替代废弃参数

3 days ago

zhengchun pushed to main at zhengchun/ocr_verify

  • cc3e15d2d7 feat: 更新OCR结果比较功能,添加日期时间格式检测和解析逻辑

1 week ago

zhengchun pushed to main at zhengchun/ocr_verify

1 week ago

zhengchun pushed to main at zhengchun/ocr_verify

  • 62190e9d59 feat: 添加OCR结果对比模块的详细说明文档

1 week ago

zhengchun pushed to main at zhengchun/ocr_verify

1 week ago

zhengchun pushed to main at zhengchun/ocr_verify

  • ad61e0ace2 fix: 修正OCR工具统计信息的显示文本
  • 672d58aaf3 feat: 改进表头检测逻辑,新增分类行判断,优化得分计算
  • 6414c446cf Update file paths for OCR results comparison in compare_ocr_results.py - Changed file1_path to point to the 2023年度报告母公司 paddleocr results. - Changed file2_path to point to the 2023年度报告母公司 mineru results.
  • View comparison for these 3 commits »

1 week ago

zhengchun pushed to main at zhengchun/ocr_verify

1 week ago

zhengchun pushed to main at zhengchun/ocr_verify

1 week ago

zhengchun pushed to main at zhengchun/ocr_verify

  • 038666f9ed feat: 优化 PaddleOCR_VL 数据处理逻辑,移除不必要的格式转换
  • 6e15bf3df4 feat: 更新默认配置文件路径,指向新的数据集位置
  • 7930c6cd71 feat: 添加对 PaddleOCR_VL 数据的旋转角度和原始图像尺寸处理,优化 bbox 坐标转换
  • 2ec53f5194 feat: 添加旋转角度处理和原始图像尺寸获取功能,支持坐标反向旋转
  • View comparison for these 4 commits »

1 week ago

zhengchun pushed to main at zhengchun/ocr_verify

  • 788e93532b feat: 添加文件路径检查,确保切换数据源时路径有效
  • d451e66d4c feat: 添加 DotsOCR 和 PaddleOCR 合并程序,支持单文件和批量处理,输出为统一的MinerU格式
  • 6e82eedf30 feat: 添加 DotsOCR 和 PaddleOCR 合并模块,支持 JSON 数据合并和 Markdown 生成
  • 7018b3372e feat: 添加 DotsOCR 数据处理功能,支持转换为 MinerU 格式并添加 bbox 信息
  • 810f8e84a7 feat: 添加 DotsOCR (带 cell bbox) 工具配置,支持结果目录和描述
  • View comparison for these 7 commits »

1 week ago

zhengchun pushed to main at zhengchun/ocr_verify

  • a59da04cec feat: 重构数据源选择器,优化文档和OCR工具选择逻辑,支持三列布局
  • 9dd6fc4a73 feat: 修改主函数,直接从会话状态获取验证器配置
  • 4a914d9089 feat: 修改 find_available_ocr_files_multi_source 函数,简化数据源唯一标识生成逻辑
  • a8b6eabc3a feat: 优化数据源名称生成逻辑,使用 result_dir 提高唯一性和清晰度
  • View comparison for these 4 commits »

1 week ago

zhengchun pushed to main at zhengchun/ocr_verify

  • 19be083b28 feat: 修改初始化方法,支持通过配置字典传入配置,移除对 load_config 的依赖
  • 2643734c43 feat: 初始化配置管理器并优化文档和OCR工具的显示信息
  • afc9e3d481 feat: 新增配置管理器,支持分层配置和自动发现数据源,集成 Jinja2 模板变量
  • 206d52f443 Implement code changes to enhance functionality and improve performance
  • 776f9654da feat: 删除合并 MinerU 和 PaddleOCR 的结果的脚本,优化代码结构
  • View comparison for these 9 commits »

1 week ago

zhengchun pushed to zhch158 at zhengchun/PaddleX

1 week ago

zhengchun pushed to zhch158 at zhengchun/dots.ocr

  • d186d547fa feat: 更新README.md中的示例,替换为新的多线程脚本名称
  • c8a368430f feat: 删除OmniDocBench_DotsOCR_multthreads.py文件,移除不再使用的多线程处理功能
  • 9690c0822b feat: 添加支持PDF和图像批量处理的功能,生成符合OmniDocBench评测要求的输出文件
  • 5453729121 !6 Merge pull request #186 from PKUHPC/master Merge pull request !6 from zhch158/master
  • e56cb76c67 feat: 更新README.md中的示例,调整多线程参数并添加新示例

1 week ago

zhengchun created new branch zhch158 at zhengchun/dots.ocr

1 week ago

zhengchun pushed to zhch158 at zhengchun/MinerU

  • e0f16c0f68 feat: 更新默认配置,修改输入文件路径和输出目录,适配新的银行流水处理需求
  • b7ddaa6c92 fix: 修正图像旋转方法参数名称和逻辑,确保正确处理旋转角度
  • b0dea4a516 feat: 新增PaddleVL识别器支持,更新获取逻辑以适配不同模块
  • a93802d89b feat: 优化图像处理逻辑,修正方向校正返回值,更新版式检测模型名称和初始化方式
  • 1627cea010 feat: 新增PaddleVL识别器,支持PaddleOCR-VL-0.9B模型和http-client后端

1 week ago