Commit History

Author SHA1 Message Date
  icecraft a3a720ea87 refactor: isolate inference and pipeline 1 year ago
  myhloli 309be741e8 refactor(txt_parse): improve text extraction accuracy with new algorithm 1 year ago
  icecraft 283b597a6e feat: add [figure | table] match [caption | footnote] match algorithm v2 1 year ago
  myhloli 1efebe421c refactor(pdf_parse_union): integrate LayoutLMv3 for block orderingReplace the heuristic-based block ordering algorithm with LayoutLMv3 model predictions toimprove the accuracy of block ordering on PDF pages. Additionally, refactor the span 1 year ago
  赵小蒙 959b8d82d8 renamed pipeline file name 1 year ago
  赵小蒙 c9af3457f5 delete useless files 1 year ago
  赵小蒙 d438b97a0a 切图逻辑重构 1 year ago
  赵小蒙 709a65008a 中间态dict结构调整 1 year ago
  赵小蒙 1b9d65b3d3 1、Trace类的key增加前置下划线 1 year ago
  赵小蒙 88f5b9325c parse_pdf_by_txt 和 cut_image 重构,使用抽象类进行写出操作 1 year ago
  赵小蒙 0e2d0b8b4f parse_pdf_by_ocr 和 cut_image 重构,使用抽象类进行写出操作 1 year ago
  赵小蒙 f65be6e094 pdf_parse_by_model.py ---> pdf_parse_by_txt.py 1 year ago