myhloli
|
c46d3373de
refactor(ocr_mkcontent): improve title level handling and formatting
|
hai 8 meses |
myhloli
|
df1b8f598f
refactor(ocr_mkcontent): optimize full-width character handling
|
hai 9 meses |
myhloli
|
315adbce38
feat(ocr_mkcontent): add full-width to half-width character conversion
|
hai 9 meses |
myhloli
|
0a468eca6e
feat(llm_aided): add title optimization feature
|
hai 11 meses |
myhloli
|
c660fdc8f0
feat(llm): add LLM-aided formula and text correction
|
hai 11 meses |
myhloli
|
c638fc5d1f
fix(pdf): improve ligature handling and text extraction
|
hai 11 meses |
myhloli
|
74ee428bbb
fix(dict2md): add space for inline equations in CJK contexts
|
hai 11 meses |
myhloli
|
b80befe9cf
refactor(mkcontent): optimize paragraph text merging and language detection
|
hai 1 ano |
myhloli
|
c8cabb3cf6
feat(ocr_mkcontent): add language detection for line spacing
|
hai 1 ano |
myhloli
|
782e6571bc
fix(ocr_mkcontent): handle empty paragraphs on pages
|
hai 1 ano |
myhloli
|
88c0854a65
refactor(ocr): improve text processing and span handling
|
hai 1 ano |
Xiaomeng Zhao
|
23c8436ef9
Merge pull request #1047 from myhloli/dev
|
hai 1 ano |
myhloli
|
a07007e5e1
fix(ocr_mkcontent): improve hyphen handling at line ends
|
hai 1 ano |
icecraft
|
b492c19c4c
refactor: move some constants or enums defs to config folder
|
hai 1 ano |
myhloli
|
2de1d0ef05
fix(ocr_mkcontent): improve handling of single-character content
|
hai 1 ano |
myhloli
|
bd75596219
fix(merge_text): add ligature replacement functionality
|
hai 1 ano |
myhloli
|
99cf160d1c
fix(dict2md): improve text concatenation logic
|
hai 1 ano |
myhloli
|
87b9eeee59
fix(ocr): handle inline equations consistently with text content
|
hai 1 ano |
myhloli
|
7c03014c2a
fix(ocr_mkcontent): improve content handling for different languages and equation types- Adjust content formatting for Chinese, Japanese, Korean, and Western languages
|
hai 1 ano |
myhloli
|
faf8c286fb
fix(magic_pdf): handle missing image_path in spans
|
hai 1 ano |
myhloli
|
0e8d5893eb
feat(draw_bbox): update bounding box drawing for tables and images
|
hai 1 ano |
myhloli
|
c34c9d21ef
refactor(ocr): improve image and table block handling
|
hai 1 ano |
myhloli
|
644085760b
fix(ocr_mkcontent): expand para_to_standard_format_v2 to handle list and index blocks
|
hai 1 ano |
myhloli
|
fc49f5c446
refactor(magic_pdf): remove unused parameters and simplify functions
|
hai 1 ano |
myhloli
|
011a1b973b
refactor(ocr):Increase the dilation factor in OCR to address the issue of word concatenation.
|
hai 1 ano |
myhloli
|
1f1dd3538d
feat(list&index block): detect and merge list and index blocks
|
hai 1 ano |
Xiaomeng Zhao
|
98313d4a25
Merge branch 'dev' into content-list-not-drop
|
hai 1 ano |
myhloli
|
16699a9a70
fix(ocr_mkcontent): streamline drop reason handling
|
hai 1 ano |
myhloli
|
196de029a3
fix(ocr_mkcontent): correct drop mode handling for pages with drop reasons
|
hai 1 ano |
myhloli
|
37fbe998ac
feat(ocr_mkcontent): support drop reason in none_with_reason modeEnable the `NONE_WITH_REASON` drop mode in `para_to_standard_format_v2` by updating the
|
hai 1 ano |