icecraft
|
a3a720ea87
refactor: isolate inference and pipeline
|
1 year ago |
myhloli
|
309be741e8
refactor(txt_parse): improve text extraction accuracy with new algorithm
|
1 year ago |
icecraft
|
283b597a6e
feat: add [figure | table] match [caption | footnote] match algorithm v2
|
1 year ago |
myhloli
|
1efebe421c
refactor(pdf_parse_union): integrate LayoutLMv3 for block orderingReplace the heuristic-based block ordering algorithm with LayoutLMv3 model predictions toimprove the accuracy of block ordering on PDF pages. Additionally, refactor the span
|
1 year ago |
赵小蒙
|
959b8d82d8
renamed pipeline file name
|
1 year ago |
赵小蒙
|
c9af3457f5
delete useless files
|
1 year ago |
赵小蒙
|
d438b97a0a
切图逻辑重构
|
1 year ago |
赵小蒙
|
709a65008a
中间态dict结构调整
|
1 year ago |
赵小蒙
|
1b9d65b3d3
1、Trace类的key增加前置下划线
|
1 year ago |
赵小蒙
|
88f5b9325c
parse_pdf_by_txt 和 cut_image 重构,使用抽象类进行写出操作
|
1 year ago |
赵小蒙
|
0e2d0b8b4f
parse_pdf_by_ocr 和 cut_image 重构,使用抽象类进行写出操作
|
1 year ago |
赵小蒙
|
f65be6e094
pdf_parse_by_model.py ---> pdf_parse_by_txt.py
|
1 year ago |