| .. |
|
config
|
02b7999299
add init to magic_pdf.config
|
1 年之前 |
|
data
|
338c681455
feat: add more unittest
|
1 年之前 |
|
dict2md
|
2de1d0ef05
fix(ocr_mkcontent): improve handling of single-character content
|
1 年之前 |
|
filter
|
df14c61f6f
update: Enhance the capability to detect garbled document issues
|
1 年之前 |
|
integrations
|
b72d4ebd94
Feat/support rag (#510)
|
1 年之前 |
|
layout
|
03469909bb
Feat/support footnote in figure (#532)
|
1 年之前 |
|
libs
|
e78edb193e
refactor(table): update default table model to Rapid Table
|
1 年之前 |
|
model
|
fe2c2c0d8e
feat(table): add RapidOCR support for RapidTable model
|
1 年之前 |
|
para
|
220a24cd4c
更新 para_split_v3.py
|
1 年之前 |
|
pipe
|
1279f2cd0f
feat(model): add support for DocLayout-YOLO model
|
1 年之前 |
|
post_proc
|
1b9d65b3d3
1、Trace类的key增加前置下划线
|
1 年之前 |
|
pre_proc
|
1807126e7f
refactor(ocr): adjust OCR processing parameters
|
1 年之前 |
|
resources
|
240fe99e3c
feat(table): integrate RapidTable model for table recognition
|
1 年之前 |
|
rw
|
40e0827e60
Feat/impl cli (#264)
|
1 年之前 |
|
spark
|
c9af3457f5
delete useless files
|
1 年之前 |
|
tools
|
918ed65bd5
fix(parse_pipeline): Resolve post-processing exceptions caused by partial PDFs due to file corruption or non-standard format by forcing a re-print.
|
1 年之前 |
|
utils
|
9cda7051c6
add init to magic_pdf.utils
|
1 年之前 |
|
__init__.py
|
d5dbed7325
目录重构
|
1 年之前 |
|
pdf_parse_by_ocr.py
|
283b597a6e
feat: add [figure | table] match [caption | footnote] match algorithm v2
|
1 年之前 |
|
pdf_parse_by_txt.py
|
283b597a6e
feat: add [figure | table] match [caption | footnote] match algorithm v2
|
1 年之前 |
|
pdf_parse_union_core.py
|
068fab7f81
fix(end_page_id):Fix the issue where end_page_id is corrected to len-1 when its input is 0. (#518)
|
1 年之前 |
|
pdf_parse_union_core_v2.py
|
5936684fd8
refactor(pdf_parse): adjust line count threshold for layoutreader
|
1 年之前 |
|
user_api.py
|
1279f2cd0f
feat(model): add support for DocLayout-YOLO model
|
1 年之前 |