zhengchun/MinerU: https://github.com/opendatalab/MinerU.git @ d0a3058ba8049f7fd8214fc126e4bc0d8b5fa38b

icecraft 440fd0c75b fix: projects		hai 11 meses
..
config	87af738ab1 fix: 1. ocr txt mode error 2. lose pdf_parse_type field	hai 1 ano
data	b04867f90a docs: check links in doc	hai 11 meses
dict2md	74ee428bbb fix(dict2md): add space for inline equations in CJK contexts	hai 1 ano
filter	e1be7da644 refactor(magic_pdf): switch to pdfminer for invalid character detection	hai 11 meses
integrations	b492c19c4c refactor: move some constants or enums defs to config folder	hai 1 ano
libs	391a99860d Update version.py with new version	hai 11 meses
model	6a75d7dce5 perf(layout): optimize layout detection for PDF extraction	hai 11 meses
para	41545a13c6 refactor(para): adjust line height multiplier for block splitting	hai 1 ano
pipe	440fd0c75b fix: projects	hai 11 meses
post_proc	6a75d7dce5 perf(layout): optimize layout detection for PDF extraction	hai 11 meses
pre_proc	7f8dc353b0 fix(pre_proc): prevent errors when imageWriter is None	hai 1 ano
resources	240fe99e3c feat(table): integrate RapidTable model for table recognition	hai 1 ano
rw	2db3c26374 refactor(libs): remove unused imports and functions	hai 1 ano
spark	b492c19c4c refactor: move some constants or enums defs to config folder	hai 1 ano
tools	712d7d4a8d fix: classif pdf type	hai 11 meses
utils	f6af67eb11 feat: support convert ppt/pptx/doc/docx	hai 11 meses
__init__.py	d5dbed7325 目录重构	hai 1 ano
pdf_parse_by_ocr.py	a3a720ea87 refactor: isolate inference and pipeline	hai 1 ano
pdf_parse_by_txt.py	a3a720ea87 refactor: isolate inference and pipeline	hai 1 ano
pdf_parse_union_core_v2.py	9efc35ecaa refactor(magic_pdf): remove unused import in pdf_parse_union_core_v2.py	hai 11 meses
user_api.py	87af738ab1 fix: 1. ocr txt mode error 2. lose pdf_parse_type field	hai 1 ano