myhloli
|
21fa78195e
refactor(pre_proc): remove unused functions and simplify code
|
11 ماه پیش |
myhloli
|
ecdaa49aee
refactor(magic_pdf): remove unused functions and simplify code
|
11 ماه پیش |
myhloli
|
8163506295
feat(pdf_parse): improve text extraction for vertical spans
|
11 ماه پیش |
myhloli
|
7d4dfca253
feat(pdf_parse): add OCR score to span data
|
11 ماه پیش |
myhloli
|
14656085f5
refactor(pdf_parse): improve text content extraction from PDF spans
|
11 ماه پیش |
myhloli
|
7964ae45d2
refactor(pdf_parse): improve code readability and maintainability
|
11 ماه پیش |
myhloli
|
97bcc8b23b
refactor(pdf_parse): improve code readability and maintainability
|
11 ماه پیش |
myhloli
|
034c59a887
refactor(txt_spans_extract_v2): optimize span processing and OCR logic
|
11 ماه پیش |
myhloli
|
0d3ef89fb9
fix(pdf_parse): Move the logic for filling text content into spans before the discarded_block recognition to fix the issue of empty text blocks in discarded_block.
|
11 ماه پیش |
myhloli
|
6b296ee2b5
fix(pdf_parse): improve OCR result handling
|
1 سال پیش |
myhloli
|
5d6cbcb123
refactor(para): improve line stop flag and remove unused debug mode
|
1 سال پیش |
myhloli
|
ae3b0a1e60
fix(pdf_parse): improve line stop flag detection accuracy
|
1 سال پیش |
myhloli
|
309be741e8
refactor(txt_parse): improve text extraction accuracy with new algorithm
|
1 سال پیش |
icecraft
|
b492c19c4c
refactor: move some constants or enums defs to config folder
|
1 سال پیش |
myhloli
|
08f46125a0
refactor(model): rename and restructure model modules
|
1 سال پیش |
myhloli
|
5936684fd8
refactor(pdf_parse): adjust line count threshold for layoutreader
|
1 سال پیش |
myhloli
|
5468e56fba
refactor(pdf_parse): adjust line count limit for layoutreader
|
1 سال پیش |
myhloli
|
7d5850e3ce
feat(model): add xycut algorithm for block sorting
|
1 سال پیش |
myhloli
|
149132d608
feat(pdf_parse): improve span filtering and add new block types
|
1 سال پیش |
myhloli
|
ad0d06b6a0
fix(pdf_parse): improve span removal logic for all content types
|
1 سال پیش |
myhloli
|
509128d505
fix(pdf_parse): improve span removal logic for all content types
|
1 سال پیش |
myhloli
|
eeda90af31
fix(pdf_parse): improve span removal logic for all content types
|
1 سال پیش |
myhloli
|
6b9f816f9e
fix(pdf_parse): optimize span processing by removing outside spans
|
1 سال پیش |
myhloli
|
4cf7e9a224
refactor(pdf_parse): adjust block splitting logic for wide blocks
|
1 سال پیش |
myhloli
|
c34c9d21ef
refactor(ocr): improve image and table block handling
|
1 سال پیش |
icecraft
|
283b597a6e
feat: add [figure | table] match [caption | footnote] match algorithm v2
|
1 سال پیش |
myhloli
|
7e301b849b
refactor(pdf): adjust span filling threshold in block construction
|
1 سال پیش |
myhloli
|
6f63e70e94
feat(pdf_parse_union_core_v2): reintegrate para_split_v3 and add page range support
|
1 سال پیش |
myhloli
|
ded2818ac2
feat(layoutreader): support local model directory and improve model loading
|
1 سال پیش |
myhloli
|
a71db70314
feat: add arXiv paper link to header and adjust PDF parsing logic- Add arXiv paper link to the header template for easy access to the latest research paper.
|
1 سال پیش |