Commit History

Autor SHA1 Mensaxe Data
  myhloli 309be741e8 refactor(txt_parse): improve text extraction accuracy with new algorithm hai 1 ano
  icecraft 283b597a6e feat: add [figure | table] match [caption | footnote] match algorithm v2 hai 1 ano
  myhloli 1efebe421c refactor(pdf_parse_union): integrate LayoutLMv3 for block orderingReplace the heuristic-based block ordering algorithm with LayoutLMv3 model predictions toimprove the accuracy of block ordering on PDF pages. Additionally, refactor the span hai 1 ano
  赵小蒙 959b8d82d8 renamed pipeline file name hai 1 ano
  赵小蒙 c9af3457f5 delete useless files hai 1 ano
  赵小蒙 d438b97a0a 切图逻辑重构 hai 1 ano
  赵小蒙 709a65008a 中间态dict结构调整 hai 1 ano
  赵小蒙 1b9d65b3d3 1、Trace类的key增加前置下划线 hai 1 ano
  赵小蒙 88f5b9325c parse_pdf_by_txt 和 cut_image 重构,使用抽象类进行写出操作 hai 1 ano
  赵小蒙 0e2d0b8b4f parse_pdf_by_ocr 和 cut_image 重构,使用抽象类进行写出操作 hai 1 ano
  赵小蒙 f65be6e094 pdf_parse_by_model.py ---> pdf_parse_by_txt.py hai 1 ano