Commit History

Autor SHA1 Mensaxe Data
  myhloli a46b12e967 refactor(pre_proc): clean up OCR processing code hai 1 ano
  myhloli 21fa78195e refactor(pre_proc): remove unused functions and simplify code hai 1 ano
  icecraft b492c19c4c refactor: move some constants or enums defs to config folder hai 1 ano
  myhloli c34c9d21ef refactor(ocr): improve image and table block handling hai 1 ano
  myhloli 1279f2cd0f feat(model): add support for DocLayout-YOLO model hai 1 ano
  myhloli 1f1dd3538d feat(list&index block): detect and merge list and index blocks hai 1 ano
  myhloli 34f8965007 refactor(draw_bbox): add line sorting visualization hai 1 ano
  myhloli 1efebe421c refactor(pdf_parse_union): integrate LayoutLMv3 for block orderingReplace the heuristic-based block ordering algorithm with LayoutLMv3 model predictions toimprove the accuracy of block ordering on PDF pages. Additionally, refactor the span hai 1 ano
  Xiaomeng Zhao 9067cd31ca fix(detect_all_bboxes): remove small overlapping blocks by merging (#501) hai 1 ano
  myhloli e831df807a fix(magic_pdf): use interline_equations instead of interline_equation_blocks hai 1 ano
  赵小蒙 e92de75844 add todo about interline_equation hai 1 ano
  赵小蒙 2f13b3a87c add new drop scene hai 1 ano
  赵小蒙 3ec3a38456 fix: all_bboxes with score hai 1 ano
  赵小蒙 deb98fd0b1 fix footnote overlap error hai 1 ano
  赵小蒙 eebd976715 remove overlap between with all blocks hai 1 ano
  赵小蒙 a817075b3c update discarded block and spans build logic hai 1 ano
  赵小蒙 f70289f99e fix remove error hai 1 ano
  赵小蒙 1936703b71 fix remove error hai 1 ano
  赵小蒙 91ee991150 change some remove logic hai 1 ano
  赵小蒙 83641d3d97 文本框与标题框重叠,优先信任文本框 hai 1 ano
  赵小蒙 55f358d1c5 block重叠和嵌套问题修复 hai 1 ano
  赵小蒙 45ce99bf87 block type 字段名修复 hai 1 ano
  赵小蒙 f5341e162f 重构 parse_by_ocr_v2.py hai 1 ano
  赵小蒙 7e8e9cabee 重构parse_by_ocr_v2 hai 1 ano