[![stars](https://img.shields.io/github/stars/opendatalab/MinerU.svg)](https://github.com/opendatalab/MinerU) [![forks](https://img.shields.io/github/forks/opendatalab/MinerU.svg)](https://github.com/opendatalab/MinerU) [![open issues](https://img.shields.io/github/issues-raw/opendatalab/MinerU)](https://github.com/opendatalab/MinerU/issues) [![issue resolution](https://img.shields.io/github/issues-closed-raw/opendatalab/MinerU)](https://github.com/opendatalab/MinerU/issues) [![PyPI version](https://img.shields.io/pypi/v/mineru)](https://pypi.org/project/mineru/) [![PyPI - Python Version](https://img.shields.io/pypi/pyversions/mineru)](https://pypi.org/project/mineru/) [![Downloads](https://static.pepy.tech/badge/mineru)](https://pepy.tech/project/mineru) [![Downloads](https://static.pepy.tech/badge/mineru/month)](https://pepy.tech/project/mineru) [![OpenDataLab](https://img.shields.io/badge/Demo_on_OpenDataLab-blue?logo=&labelColor=white)](https://mineru.net/OpenSourceTools/Extractor?source=github) [![HuggingFace](https://img.shields.io/badge/Demo_on_HuggingFace-yellow.svg?logo=&labelColor=white)](https://huggingface.co/spaces/opendatalab/MinerU) [![ModelScope](https://img.shields.io/badge/Demo_on_ModelScope-purple?logo=&labelColor=white)](https://www.modelscope.cn/studios/OpenDataLab/MinerU) [![HuggingFace](https://img.shields.io/badge/VLM_Demo_on_HuggingFace-yellow.svg?logo=&labelColor=white)](https://huggingface.co/spaces/opendatalab/mineru2) [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/gist/myhloli/3b3a00a4a0a61577b6c30f989092d20d/mineru_demo.ipynb) [![Paper](https://img.shields.io/badge/Paper-arXiv-green)](https://arxiv.org/abs/2409.18839) opendatalab%2FMinerU | Trendshift [English](README.md) | [็ฎ€ไฝ“ไธญๆ–‡](README_zh-CN.md)

PDF-Extract-Kit: High-Quality PDF Extraction Toolkit๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ

Easier to use: Just grab MinerU Desktop. No coding, no login, just a simple interface and smooth interactions. Enjoy it without any fuss!๐Ÿš€๐Ÿš€๐Ÿš€

๐Ÿ‘‹ join us on Discord and WeChat

# Changelog - 2025/06/17 2.0.4 Released - Fixed the issue where models were still required to be downloaded in the `sglang-client` mode - Fixed the issue where only the first instance would take effect when attempting to launch multiple `sglang-client` instances via multiple URLs within the same process - 2025/06/15 2.0.3 released - Fixed a configuration file key-value update error that occurred when downloading model type was set to `all` - Fixed the issue where the formula and table feature toggle switches were not working in `command line mode`, causing the features to remain enabled. - Fixed compatibility issues with sglang version 0.4.7 in the `sglang-engine` mode. - Updated Dockerfile and installation documentation for deploying the full version of MinerU in sglang environment - 2025/06/13 2.0.0 Released - MinerU 2.0 represents a comprehensive reconstruction and upgrade from architecture to functionality, delivering a more streamlined design, enhanced performance, and more flexible user experience. - **New Architecture**: MinerU 2.0 has been deeply restructured in code organization and interaction methods, significantly improving system usability, maintainability, and extensibility. - **Removal of Third-party Dependency Limitations**: Completely eliminated the dependency on `pymupdf`, moving the project toward a more open and compliant open-source direction. - **Ready-to-use, Easy Configuration**: No need to manually edit JSON configuration files; most parameters can now be set directly via command line or API. - **Automatic Model Management**: Added automatic model download and update mechanisms, allowing users to complete model deployment without manual intervention. - **Offline Deployment Friendly**: Provides built-in model download commands, supporting deployment requirements in completely offline environments. - **Streamlined Code Structure**: Removed thousands of lines of redundant code, simplified class inheritance logic, significantly improving code readability and development efficiency. - **Unified Intermediate Format Output**: Adopted standardized `middle_json` format, compatible with most secondary development scenarios based on this format, ensuring seamless ecosystem business migration. - **New Model**: MinerU 2.0 integrates our latest small-parameter, high-performance multimodal document parsing model, achieving end-to-end high-speed, high-precision document understanding. - **Small Model, Big Capabilities**: With parameters under 1B, yet surpassing traditional 72B-level vision-language models (VLMs) in parsing accuracy. - **Multiple Functions in One**: A single model covers multilingual recognition, handwriting recognition, layout analysis, table parsing, formula recognition, reading order sorting, and other core tasks. - **Ultimate Inference Speed**: Achieves peak throughput exceeding 10,000 tokens/s through `sglang` acceleration on a single NVIDIA 4090 card, easily handling large-scale document processing requirements. - **Online Experience**: You can experience this model online on our Hugging Face demo: [![HuggingFace](https://img.shields.io/badge/VLM_Demo_on_HuggingFace-yellow.svg?logo=&labelColor=white)](https://huggingface.co/spaces/opendatalab/mineru2) - **Incompatible Changes Notice**: To improve overall architectural rationality and long-term maintainability, this version contains some incompatible changes: - Python package name changed from `magic-pdf` to `mineru`, and the command-line tool changed from `magic-pdf` to `mineru`. Please update your scripts and command calls accordingly. - For modular system design and ecosystem consistency considerations, MinerU 2.0 no longer includes the LibreOffice document conversion module. If you need to process Office documents, we recommend converting them to PDF format through an independently deployed LibreOffice service before proceeding with subsequent parsing operations.
History Log
2025/05/24 Release 1.3.12
2025/04/29 Release 1.3.10
2025/04/27 Release 1.3.9
2025/04/23 Release 1.3.8
2025/04/22 Release 1.3.7
2025/04/16 Release 1.3.4
2025/04/12 Release 1.3.2
2025/04/08 Release 1.3.1
2025/04/03 Release 1.3.0
2025/03/03 1.2.1 released
2025/02/24 1.2.0 released

This version includes several fixes and improvements to enhance parsing efficiency and accuracy:

2025/01/22 1.1.0 released

In this version we have focused on improving parsing accuracy and efficiency:

2025/01/10 1.0.1 released

This is our first official release, where we have introduced a completely new API interface and enhanced compatibility through extensive refactoring, as well as a brand new automatic language identification feature:

2024/11/22 0.10.0 released

Introducing hybrid OCR text extraction capabilities:

2024/11/15 0.9.3 released

Integrated RapidTable for table recognition, improving single-table parsing speed by more than 10 times, with higher accuracy and lower GPU memory usage.

2024/11/06 0.9.2 released

Integrated the StructTable-InternVL2-1B model for table recognition functionality.

2024/10/31 0.9.0 released

This is a major new version with extensive code refactoring, addressing numerous issues, improving performance, reducing hardware requirements, and enhancing usability:

2024/09/27 Version 0.8.1 released

Fixed some bugs, and providing a localized deployment version of the online demo and the front-end interface.

2024/09/09 Version 0.8.0 released

Supporting fast deployment with Dockerfile, and launching demos on Huggingface and Modelscope.

2024/08/30 Version 0.7.1 released

Add paddle tablemaster table recognition option

2024/08/09 Version 0.7.0b1 released

Simplified installation process, added table recognition functionality

2024/08/01 Version 0.6.2b1 released

Optimized dependency conflict issues and installation documentation

2024/07/05 Initial open-source release

Table of Contents

  1. MinerU
  2. TODO
  3. Known Issues
  4. FAQ
  5. All Thanks To Our Contributors
  6. License Information
  7. Acknowledgments
  8. Citation
  9. Star History
  10. Magic-doc
  11. Magic-html
  12. Links
# MinerU ## Project Introduction MinerU is a tool that converts PDFs into machine-readable formats (e.g., markdown, JSON), allowing for easy extraction into any format. MinerU was born during the pre-training process of [InternLM](https://github.com/InternLM/InternLM). We focus on solving symbol conversion issues in scientific literature and hope to contribute to technological development in the era of large models. Compared to well-known commercial products, MinerU is still young. If you encounter any issues or if the results are not as expected, please submit an issue on [issue](https://github.com/opendatalab/MinerU/issues) and **attach the relevant PDF**. https://github.com/user-attachments/assets/4bea02c9-6d54-4cd6-97ed-dff14340982c ## Key Features - Remove headers, footers, footnotes, page numbers, etc., to ensure semantic coherence. - Output text in human-readable order, suitable for single-column, multi-column, and complex layouts. - Preserve the structure of the original document, including headings, paragraphs, lists, etc. - Extract images, image descriptions, tables, table titles, and footnotes. - Automatically recognize and convert formulas in the document to LaTeX format. - Automatically recognize and convert tables in the document to HTML format. - Automatically detect scanned PDFs and garbled PDFs and enable OCR functionality. - OCR supports detection and recognition of 84 languages. - Supports multiple output formats, such as multimodal and NLP Markdown, JSON sorted by reading order, and rich intermediate formats. - Supports various visualization results, including layout visualization and span visualization, for efficient confirmation of output quality. - Supports running in a pure CPU environment, and also supports GPU(CUDA)/NPU(CANN)/MPS acceleration - Compatible with Windows, Linux, and Mac platforms. ## Quick Start If you encounter any installation issues, please first consult the FAQ.
If the parsing results are not as expected, refer to the Known Issues.
There are three different ways to experience MinerU: - [Online Demo](#online-demo) - [Local Deployment](#local-deployment) > [!WARNING] > **Pre-installation Noticeโ€”Hardware and Software Environment Support** > > To ensure the stability and reliability of the project, we only optimize and test for specific hardware and software environments during development. This ensures that users deploying and running the project on recommended system configurations will get the best performance with the fewest compatibility issues. > > By focusing resources on the mainline environment, our team can more efficiently resolve potential bugs and develop new features. > > In non-mainline environments, due to the diversity of hardware and software configurations, as well as third-party dependency compatibility issues, we cannot guarantee 100% project availability. Therefore, for users who wish to use this project in non-recommended environments, we suggest carefully reading the documentation and FAQ first. Most issues already have corresponding solutions in the FAQ. We also encourage community feedback to help us gradually expand support.
Parsing Backend pipeline vlm-transformers vlm-sgslang
Operating System windows/linux/mac windows/linux windows(wsl2)/linux
Memory Requirements Minimum 16GB+, 32GB+ recommended
Disk Space Requirements 20GB+, SSD recommended
Python Version 3.10-3.13
CPU Inference Support โœ… โŒ โŒ
GPU Requirements Turing architecture or later, 6GB+ VRAM or Apple Silicon Ampere architecture or later, 8GB+ VRAM Ampere architecture or later, 24GB+ VRAM
## Online Demo [![OpenDataLab](https://img.shields.io/badge/Demo_on_OpenDataLab-blue?logo=&labelColor=white)](https://mineru.net/OpenSourceTools/Extractor?source=github) [![HuggingFace](https://img.shields.io/badge/Demo_on_HuggingFace-yellow.svg?logo=&labelColor=white)](https://huggingface.co/spaces/opendatalab/MinerU) [![ModelScope](https://img.shields.io/badge/Demo_on_ModelScope-purple?logo=&labelColor=white)](https://www.modelscope.cn/studios/OpenDataLab/MinerU) ### ๐Ÿš€๐Ÿš€๐Ÿš€VLM demo [![HuggingFace](https://img.shields.io/badge/VLM_Demo_on_HuggingFace-yellow.svg?logo=&labelColor=white)](https://huggingface.co/spaces/opendatalab/mineru2) ## Local Deployment ### 1. Install MinerU #### 1.1 Install via pip or uv ```bash pip install --upgrade pip pip install uv uv pip install -U "mineru[core]" ``` #### 1.2 Install from source ```bash git clone https://github.com/opendatalab/MinerU.git cd MinerU uv pip install -e .[core] ``` #### 1.3 Install the Full Version (Supports sglang Acceleration) If you need to use **sglang to accelerate VLM model inference**, you can choose any of the following methods to install the full version: - Install using uv or pip: ```bash uv pip install -U "mineru[all]" ``` - Install from source: ```bash uv pip install -e .[all] ``` - Build image using Dockerfile: ```bash wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docker/global/Dockerfile docker build -t mineru-sglang:latest -f Dockerfile . ``` Start Docker container: ```bash docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ --ipc=host \ mineru-sglang:latest \ mineru-sglang-server --host 0.0.0.0 --port 30000 ``` Or start using Docker Compose: ```bash wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docker/compose.yaml docker compose -f compose.yaml up -d ``` > [!TIP] > The Dockerfile uses `lmsysorg/sglang:v0.4.7-cu124` as the default base image. If necessary, you can modify it to another platform version. #### 1.4 Install client (for connecting to sglang-server on edge devices that require only CPU and network connectivity) ```bash uv pip install -U mineru mineru -p -o -b vlm-sglang-client -u http://: ``` --- ### 2. Using MinerU #### 2.1 Command Line Usage ##### Basic Usage The simplest command line invocation is: ```bash mineru -p -o ``` - ``: Local PDF file or directory (supports pdf/png/jpg/jpeg) - ``: Output directory ##### View Help Information Get all available parameter descriptions: ```bash mineru --help ``` ##### Parameter Details ```text Usage: mineru [OPTIONS] Options: -v, --version Show version and exit -p, --path PATH Input file path or directory (required) -o, --output PATH Output directory (required) -m, --method [auto|txt|ocr] Parsing method: auto (default), txt, ocr (pipeline backend only) -b, --backend [pipeline|vlm-transformers|vlm-sglang-engine|vlm-sglang-client] Parsing backend (default: pipeline) -l, --lang [ch|ch_server|... ] Specify document language (improves OCR accuracy, pipeline backend only) -u, --url TEXT Service address when using sglang-client -s, --start INTEGER Starting page number (0-based) -e, --end INTEGER Ending page number (0-based) -f, --formula BOOLEAN Enable formula parsing (default: on, pipeline backend only) -t, --table BOOLEAN Enable table parsing (default: on, pipeline backend only) -d, --device TEXT Inference device (e.g., cpu/cuda/cuda:0/npu/mps, pipeline backend only) --vram INTEGER Maximum GPU VRAM usage per process (pipeline backend only) --source [huggingface|modelscope|local] Model source, default: huggingface --help Show help information ``` --- #### 2.2 Model Source Configuration MinerU automatically downloads required models from HuggingFace on first run. If HuggingFace is inaccessible, you can switch model sources: ##### Switch to ModelScope Source ```bash mineru -p -o --source modelscope ``` Or set environment variable: ```bash export MINERU_MODEL_SOURCE=modelscope mineru -p -o ``` ##### Using Local Models ###### 1. Download Models Locally ```bash mineru-models-download --help ``` Or use interactive command-line tool to select models: ```bash mineru-models-download ``` After download, model paths will be displayed in current terminal and automatically written to `mineru.json` in user directory. ###### 2. Parse Using Local Models ```bash mineru -p -o --source local ``` Or enable via environment variable: ```bash export MINERU_MODEL_SOURCE=local mineru -p -o ``` --- #### 2.3 Using sglang to Accelerate VLM Model Inference ##### Start sglang-engine Mode ```bash mineru -p -o -b vlm-sglang-engine ``` ##### Start sglang-server/client Mode 1. Start Server: ```bash mineru-sglang-server --port 30000 ``` 2. Use Client in another terminal: ```bash mineru -p -o -b vlm-sglang-client -u http://127.0.0.1:30000 ``` > [!TIP] > For more information about output files, please refer to [Output File Documentation](docs/output_file_en_us.md) --- ### 3. API Usage You can also call MinerU through Python code, see example code at: ๐Ÿ‘‰ [Python Usage Example](demo/demo.py) --- ### 4. Deploy Derivative Projects Community developers have created various extensions based on MinerU, including: - Graphical interface based on Gradio - Web API based on FastAPI - Client/server architecture with multi-GPU load balancing - MCP Server based on the official API These projects typically offer better user experience and additional features. For detailed deployment instructions, please refer to: ๐Ÿ‘‰ [Derivative Projects Documentation](projects/README.md) --- # TODO - [x] Reading order based on the model - [x] Recognition of `index` and `list` in the main text - [x] Table recognition - [x] Heading Classification - [ ] Code block recognition in the main text - [ ] [Chemical formula recognition](docs/chemical_knowledge_introduction/introduction.pdf) - [ ] Geometric shape recognition # Known Issues - Reading order is determined by the model based on the spatial distribution of readable content, and may be out of order in some areas under extremely complex layouts. - Vertical text is not supported. - Tables of contents and lists are recognized through rules, and some uncommon list formats may not be recognized. - Code blocks are not yet supported in the layout model. - Comic books, art albums, primary school textbooks, and exercises cannot be parsed well. - Table recognition may result in row/column recognition errors in complex tables. - OCR recognition may produce inaccurate characters in PDFs of lesser-known languages (e.g., diacritical marks in Latin script, easily confused characters in Arabic script). - Some formulas may not render correctly in Markdown. # FAQ [FAQ in Chinese](docs/FAQ_zh_cn.md) [FAQ in English](docs/FAQ_en_us.md) # All Thanks To Our Contributors # License Information [LICENSE.md](LICENSE.md) Currently, some models in this project are trained based on YOLO. However, since YOLO follows the AGPL license, it may impose restrictions on certain use cases. In future iterations, we plan to explore and replace these with models under more permissive licenses to enhance user-friendliness and flexibility. # Acknowledgments - [PDF-Extract-Kit](https://github.com/opendatalab/PDF-Extract-Kit) - [DocLayout-YOLO](https://github.com/opendatalab/DocLayout-YOLO) - [UniMERNet](https://github.com/opendatalab/UniMERNet) - [RapidTable](https://github.com/RapidAI/RapidTable) - [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR) - [PaddleOCR2Pytorch](https://github.com/frotms/PaddleOCR2Pytorch) - [layoutreader](https://github.com/ppaanngggg/layoutreader) - [xy-cut](https://github.com/Sanster/xy-cut) - [fast-langdetect](https://github.com/LlmKira/fast-langdetect) - [pypdfium2](https://github.com/pypdfium2-team/pypdfium2) - [pdftext](https://github.com/datalab-to/pdftext) - [pdfminer.six](https://github.com/pdfminer/pdfminer.six) - [pypdf](https://github.com/py-pdf/pypdf) # Citation ```bibtex @misc{wang2024mineruopensourcesolutionprecise, title={MinerU: An Open-Source Solution for Precise Document Content Extraction}, author={Bin Wang and Chao Xu and Xiaomeng Zhao and Linke Ouyang and Fan Wu and Zhiyuan Zhao and Rui Xu and Kaiwen Liu and Yuan Qu and Fukai Shang and Bo Zhang and Liqun Wei and Zhihao Sui and Wei Li and Botian Shi and Yu Qiao and Dahua Lin and Conghui He}, year={2024}, eprint={2409.18839}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2409.18839}, } @article{he2024opendatalab, title={Opendatalab: Empowering general artificial intelligence with open datasets}, author={He, Conghui and Li, Wei and Jin, Zhenjiang and Xu, Chao and Wang, Bin and Lin, Dahua}, journal={arXiv preprint arXiv:2407.13773}, year={2024} } ``` # Star History Star History Chart # Magic-doc [Magic-Doc](https://github.com/InternLM/magic-doc) Fast speed ppt/pptx/doc/docx/pdf extraction tool # Magic-html [Magic-HTML](https://github.com/opendatalab/magic-html) Mixed web page extraction tool # Links - [LabelU (A Lightweight Multi-modal Data Annotation Tool)](https://github.com/opendatalab/labelU) - [LabelLLM (An Open-source LLM Dialogue Annotation Platform)](https://github.com/opendatalab/LabelLLM) - [PDF-Extract-Kit (A Comprehensive Toolkit for High-Quality PDF Content Extraction)](https://github.com/opendatalab/PDF-Extract-Kit)