myhloli
|
6b296ee2b5
fix(pdf_parse): improve OCR result handling
|
1 year ago |
myhloli
|
5d6cbcb123
refactor(para): improve line stop flag and remove unused debug mode
|
1 year ago |
myhloli
|
ae3b0a1e60
fix(pdf_parse): improve line stop flag detection accuracy
|
1 year ago |
myhloli
|
309be741e8
refactor(txt_parse): improve text extraction accuracy with new algorithm
|
1 year ago |
icecraft
|
b492c19c4c
refactor: move some constants or enums defs to config folder
|
1 year ago |
myhloli
|
08f46125a0
refactor(model): rename and restructure model modules
|
1 year ago |
myhloli
|
5936684fd8
refactor(pdf_parse): adjust line count threshold for layoutreader
|
1 year ago |
myhloli
|
5468e56fba
refactor(pdf_parse): adjust line count limit for layoutreader
|
1 year ago |
myhloli
|
7d5850e3ce
feat(model): add xycut algorithm for block sorting
|
1 year ago |
myhloli
|
149132d608
feat(pdf_parse): improve span filtering and add new block types
|
1 year ago |
myhloli
|
ad0d06b6a0
fix(pdf_parse): improve span removal logic for all content types
|
1 year ago |
myhloli
|
509128d505
fix(pdf_parse): improve span removal logic for all content types
|
1 year ago |
myhloli
|
eeda90af31
fix(pdf_parse): improve span removal logic for all content types
|
1 year ago |
myhloli
|
6b9f816f9e
fix(pdf_parse): optimize span processing by removing outside spans
|
1 year ago |
myhloli
|
4cf7e9a224
refactor(pdf_parse): adjust block splitting logic for wide blocks
|
1 year ago |
myhloli
|
c34c9d21ef
refactor(ocr): improve image and table block handling
|
1 year ago |
icecraft
|
283b597a6e
feat: add [figure | table] match [caption | footnote] match algorithm v2
|
1 year ago |
myhloli
|
7e301b849b
refactor(pdf): adjust span filling threshold in block construction
|
1 year ago |
myhloli
|
6f63e70e94
feat(pdf_parse_union_core_v2): reintegrate para_split_v3 and add page range support
|
1 year ago |
myhloli
|
ded2818ac2
feat(layoutreader): support local model directory and improve model loading
|
1 year ago |
myhloli
|
a71db70314
feat: add arXiv paper link to header and adjust PDF parsing logic- Add arXiv paper link to the header template for easy access to the latest research paper.
|
1 year ago |
myhloli
|
564c4ce1e3
refactor(magic_pdf): improve line sorting and block indexing
|
1 year ago |
myhloli
|
4c9bf8abd5
refactor(memory management): remove unused clean_memory function
|
1 year ago |
myhloli
|
42a7d792c3
refactor(magic_pdf): import model helpers directly for clarity
|
1 year ago |
myhloli
|
5522d0a36c
refactor(pdf_parse_union_core_v2): update import paths to use new package structure
|
1 year ago |
myhloli
|
2145a8b6d2
fix(pdf_parse): handle blocks without lines and enable bf16 on compatible devices
|
1 year ago |
myhloli
|
177ab08e9f
refactor(pdf_parse): remove redundant sorting and optimize block indexing
|
1 year ago |
myhloli
|
b9dfdea3cb
refactor(pdf_parse_union_core_v2): implement model initialization within classRefactored model initialization to be handled by a singleton class to ensure that model
|
1 year ago |
myhloli
|
b2790f6f45
refactor(drawing): simplify draw bbox functions and adjust debug config
|
1 year ago |
myhloli
|
34f8965007
refactor(draw_bbox): add line sorting visualization
|
1 year ago |