Commit History

Author SHA1 Message Date
  myhloli 6b296ee2b5 fix(pdf_parse): improve OCR result handling 1 year ago
  myhloli 5d6cbcb123 refactor(para): improve line stop flag and remove unused debug mode 1 year ago
  myhloli ae3b0a1e60 fix(pdf_parse): improve line stop flag detection accuracy 1 year ago
  myhloli 309be741e8 refactor(txt_parse): improve text extraction accuracy with new algorithm 1 year ago
  icecraft b492c19c4c refactor: move some constants or enums defs to config folder 1 year ago
  myhloli 08f46125a0 refactor(model): rename and restructure model modules 1 year ago
  myhloli 5936684fd8 refactor(pdf_parse): adjust line count threshold for layoutreader 1 year ago
  myhloli 5468e56fba refactor(pdf_parse): adjust line count limit for layoutreader 1 year ago
  myhloli 7d5850e3ce feat(model): add xycut algorithm for block sorting 1 year ago
  myhloli 149132d608 feat(pdf_parse): improve span filtering and add new block types 1 year ago
  myhloli ad0d06b6a0 fix(pdf_parse): improve span removal logic for all content types 1 year ago
  myhloli 509128d505 fix(pdf_parse): improve span removal logic for all content types 1 year ago
  myhloli eeda90af31 fix(pdf_parse): improve span removal logic for all content types 1 year ago
  myhloli 6b9f816f9e fix(pdf_parse): optimize span processing by removing outside spans 1 year ago
  myhloli 4cf7e9a224 refactor(pdf_parse): adjust block splitting logic for wide blocks 1 year ago
  myhloli c34c9d21ef refactor(ocr): improve image and table block handling 1 year ago
  icecraft 283b597a6e feat: add [figure | table] match [caption | footnote] match algorithm v2 1 year ago
  myhloli 7e301b849b refactor(pdf): adjust span filling threshold in block construction 1 year ago
  myhloli 6f63e70e94 feat(pdf_parse_union_core_v2): reintegrate para_split_v3 and add page range support 1 year ago
  myhloli ded2818ac2 feat(layoutreader): support local model directory and improve model loading 1 year ago
  myhloli a71db70314 feat: add arXiv paper link to header and adjust PDF parsing logic- Add arXiv paper link to the header template for easy access to the latest research paper. 1 year ago
  myhloli 564c4ce1e3 refactor(magic_pdf): improve line sorting and block indexing 1 year ago
  myhloli 4c9bf8abd5 refactor(memory management): remove unused clean_memory function 1 year ago
  myhloli 42a7d792c3 refactor(magic_pdf): import model helpers directly for clarity 1 year ago
  myhloli 5522d0a36c refactor(pdf_parse_union_core_v2): update import paths to use new package structure 1 year ago
  myhloli 2145a8b6d2 fix(pdf_parse): handle blocks without lines and enable bf16 on compatible devices 1 year ago
  myhloli 177ab08e9f refactor(pdf_parse): remove redundant sorting and optimize block indexing 1 year ago
  myhloli b9dfdea3cb refactor(pdf_parse_union_core_v2): implement model initialization within classRefactored model initialization to be handled by a singleton class to ensure that model 1 year ago
  myhloli b2790f6f45 refactor(drawing): simplify draw bbox functions and adjust debug config 1 year ago
  myhloli 34f8965007 refactor(draw_bbox): add line sorting visualization 1 year ago