myhloli
|
d4345b6e39
refactor(pdf_parse): adjust character-axis alignment algorithm
|
11 maanden geleden |
myhloli
|
949d0867fb
feat(pdf_parse): add line start flag detection and optimize line stop flag logic
|
11 maanden geleden |
myhloli
|
ac88815620
refactor(pdf_check): improve character detection using PyMuPDF
|
11 maanden geleden |
myhloli
|
88c0854a65
refactor(ocr): improve text processing and span handling
|
11 maanden geleden |
myhloli
|
37da8c44c4
feat(pdf_parse): filter out skewed text lines
|
11 maanden geleden |
myhloli
|
08392d63a0
fix(Hybrid OCR):Enable Hybrid OCR for Empty Spans That Contain a Certain Number of Placeholders but No Actual Text
|
11 maanden geleden |
myhloli
|
1d2eb70aa0
refactor(pdf_parse_union_core_v2): optimize page processing time logging
|
11 maanden geleden |
myhloli
|
2db3c26374
refactor(libs): remove unused imports and functions
|
11 maanden geleden |
myhloli
|
21fa78195e
refactor(pre_proc): remove unused functions and simplify code
|
11 maanden geleden |
myhloli
|
ecdaa49aee
refactor(magic_pdf): remove unused functions and simplify code
|
11 maanden geleden |
myhloli
|
8163506295
feat(pdf_parse): improve text extraction for vertical spans
|
11 maanden geleden |
myhloli
|
7d4dfca253
feat(pdf_parse): add OCR score to span data
|
11 maanden geleden |
myhloli
|
14656085f5
refactor(pdf_parse): improve text content extraction from PDF spans
|
11 maanden geleden |
myhloli
|
7964ae45d2
refactor(pdf_parse): improve code readability and maintainability
|
11 maanden geleden |
myhloli
|
97bcc8b23b
refactor(pdf_parse): improve code readability and maintainability
|
11 maanden geleden |
myhloli
|
034c59a887
refactor(txt_spans_extract_v2): optimize span processing and OCR logic
|
11 maanden geleden |
myhloli
|
0d3ef89fb9
fix(pdf_parse): Move the logic for filling text content into spans before the discarded_block recognition to fix the issue of empty text blocks in discarded_block.
|
11 maanden geleden |
myhloli
|
6b296ee2b5
fix(pdf_parse): improve OCR result handling
|
1 jaar geleden |
myhloli
|
5d6cbcb123
refactor(para): improve line stop flag and remove unused debug mode
|
1 jaar geleden |
myhloli
|
ae3b0a1e60
fix(pdf_parse): improve line stop flag detection accuracy
|
1 jaar geleden |
myhloli
|
309be741e8
refactor(txt_parse): improve text extraction accuracy with new algorithm
|
1 jaar geleden |
icecraft
|
b492c19c4c
refactor: move some constants or enums defs to config folder
|
1 jaar geleden |
myhloli
|
08f46125a0
refactor(model): rename and restructure model modules
|
1 jaar geleden |
myhloli
|
5936684fd8
refactor(pdf_parse): adjust line count threshold for layoutreader
|
1 jaar geleden |
myhloli
|
5468e56fba
refactor(pdf_parse): adjust line count limit for layoutreader
|
1 jaar geleden |
myhloli
|
7d5850e3ce
feat(model): add xycut algorithm for block sorting
|
1 jaar geleden |
myhloli
|
149132d608
feat(pdf_parse): improve span filtering and add new block types
|
1 jaar geleden |
myhloli
|
ad0d06b6a0
fix(pdf_parse): improve span removal logic for all content types
|
1 jaar geleden |
myhloli
|
509128d505
fix(pdf_parse): improve span removal logic for all content types
|
1 jaar geleden |
myhloli
|
eeda90af31
fix(pdf_parse): improve span removal logic for all content types
|
1 jaar geleden |