zhengchun/MinerU: https://github.com/opendatalab/MinerU.git @ 35cb414f1c36b194df97beb521b0b9dbe8f58a8c

myhloli 35cb414f1c feat: integrate LLM optimization for title enhancement in PDF processing		4 months ago
..
__init__.py	bd9279198c refactor: rename init file and update app.py to enable parsing method	5 months ago
block_pre_proc.py	236a6033f1 refactor: improve block processing logic and enhance span handling	5 months ago
block_sort.py	284cec041a refactor: replace get_file_from_repos with auto_download_and_get_model_root_path in multiple files	5 months ago
boxbase.py	236a6033f1 refactor: improve block processing logic and enhance span handling	5 months ago
config_reader.py	bd5252d946 fix: add conditional import for torch and torch_npu in config_reader.py	5 months ago
cut_image.py	38ace5dc61 refactor: streamline document analysis and enhance image handling in processing pipeline	5 months ago
draw_bbox.py	41ecaedc0c feat: disable logging for invalid overlay PDF generation in draw_bbox.py	5 months ago
enum_class.py	58b8e8a912 fix: add new enum values and improve MIN_BATCH_INFERENCE_SIZE documentation in pipeline_analyze.py	5 months ago
format_utils.py	0031981e60 Fix otsl to html conversion	5 months ago
hash_utils.py	cbba27b4f5 refactor: reorganize project structure and update import paths	5 months ago
language.py	8f1f9abec5 refactor: enhance bounding box utilities and add configuration reader for S3 integration	5 months ago
llm_aided.py	35cb414f1c feat: integrate LLM optimization for title enhancement in PDF processing	4 months ago
model_utils.py	d58b24b5dd fix: add conditional imports for torch and torch_npu in model_utils.py	5 months ago
models_download_utils.py	fa9aaaa7b7 fix: update model path handling in model.py and models_download_utils.py	5 months ago
ocr_utils.py	962a3453ca fix: adjust OCR confidence threshold and refine category assignment logic	4 months ago
pdf_classify.py	84fa04e22d feat: enhance PDF image coverage analysis with improved parsing and coverage calculation	5 months ago
pdf_image_tools.py	38ace5dc61 refactor: streamline document analysis and enhance image handling in processing pipeline	5 months ago
pdf_reader.py	4243b0eaed refactor: increase YOLO layout base batch size and improve progress tracking in predictions	4 months ago
pdf_text_tool.py	1ed61cb5d6 refactor: update OCR span extraction logic and improve PDF page processing	5 months ago
run_async.py	8e55a52693 feat: add mineru-vlm backend.	5 months ago
span_block_fix.py	f211554137 refactor: improve text processing by adding ligature and unicode replacement functions	5 months ago
span_pre_proc.py	a3ae57bf20 refactor: streamline text span extraction and remove unused functions	5 months ago