[![stars](https://img.shields.io/github/stars/opendatalab/MinerU.svg)](https://github.com/opendatalab/MinerU) [![forks](https://img.shields.io/github/forks/opendatalab/MinerU.svg)](https://github.com/opendatalab/MinerU) [![open issues](https://img.shields.io/github/issues-raw/opendatalab/MinerU)](https://github.com/opendatalab/MinerU/issues) [![issue resolution](https://img.shields.io/github/issues-closed-raw/opendatalab/MinerU)](https://github.com/opendatalab/MinerU/issues) [![PyPI version](https://img.shields.io/pypi/v/mineru)](https://pypi.org/project/mineru/) [![PyPI - Python Version](https://img.shields.io/pypi/pyversions/mineru)](https://pypi.org/project/mineru/) [![Downloads](https://static.pepy.tech/badge/mineru)](https://pepy.tech/project/mineru) [![Downloads](https://static.pepy.tech/badge/mineru/month)](https://pepy.tech/project/mineru) [![OpenDataLab](https://img.shields.io/badge/webapp_on_mineru.net-blue?logo=&labelColor=white)](https://mineru.net/OpenSourceTools/Extractor?source=github) [![HuggingFace](https://img.shields.io/badge/Demo_on_HuggingFace-yellow.svg?logo=&labelColor=white)](https://huggingface.co/spaces/opendatalab/MinerU) [![ModelScope](https://img.shields.io/badge/Demo_on_ModelScope-purple?logo=&labelColor=white)](https://www.modelscope.cn/studios/OpenDataLab/MinerU) [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/gist/myhloli/a3cb16570ab3cfeadf9d8f0ac91b4fca/mineru_demo.ipynb) [![arXiv](https://img.shields.io/badge/MinerU-Technical%20Report-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2409.18839) [![arXiv](https://img.shields.io/badge/MinerU2.5-Technical%20Report-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2509.22186) [![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/opendatalab/MinerU) opendatalab%2FMinerU | Trendshift [English](README.md) | [简体中文](README_zh-CN.md)

🚀Access MinerU Now→✅ Zero-Install Web Version ✅ Full-Featured Desktop Client ✅ Instant API Access; Skip deployment headaches – get all product formats in one click. Developers, dive in!

👋 join us on Discord and WeChat

# Changelog - 2025/10/24 2.6.2 Release - `pipeline` backend optimizations - Added experimental support for Chinese formulas, which can be enabled by setting the environment variable `export MINERU_FORMULA_CH_SUPPORT=1`. This feature may cause a slight decrease in MFR speed and failures in recognizing some long formulas. It is recommended to enable it only when parsing Chinese formulas is needed. To disable this feature, set the environment variable to `0`. - `OCR` speed significantly improved by 200%~300%, thanks to the optimization solution provided by [@cjsdurj](https://github.com/cjsdurj) - `OCR` models optimized for improved accuracy and coverage of Latin script recognition, and updated Cyrillic, Arabic, Devanagari, Telugu (te), and Tamil (ta) language systems to `ppocr-v5` version, with accuracy improved by over 40% compared to previous models - `vlm` backend optimizations - `table_caption` and `table_footnote` matching logic optimized to improve the accuracy of table caption and footnote matching and reading order rationality in scenarios with multiple consecutive tables on a page - Optimized CPU resource usage during high concurrency when using `vllm` backend, reducing server pressure - Adapted to `vllm` version 0.11.0 - General optimizations - Cross-page table merging effect optimized, added support for cross-page continuation table merging, improving table merging effectiveness in multi-column merge scenarios - Added environment variable configuration option `MINERU_TABLE_MERGE_ENABLE` for table merging feature. Table merging is enabled by default and can be disabled by setting this variable to `0` - 2025/09/26 2.5.4 released - 🎉🎉 The MinerU2.5 [Technical Report](https://arxiv.org/abs/2509.22186) is now available! We welcome you to read it for a comprehensive overview of its model architecture, training strategy, data engineering and evaluation results. - Fixed an issue where some `PDF` files were mistakenly identified as `AI` files, causing parsing failures - 2025/09/20 2.5.3 Released - Dependency version range adjustment to enable Turing and earlier architecture GPUs to use vLLM acceleration for MinerU2.5 model inference. - `pipeline` backend compatibility fixes for torch 2.8.0. - Reduced default concurrency for vLLM async backend to lower server pressure and avoid connection closure issues caused by high load. - More compatibility-related details can be found in the [announcement](https://github.com/opendatalab/MinerU/discussions/3548) - 2025/09/19 2.5.2 Released We are officially releasing MinerU2.5, currently the most powerful multimodal large model for document parsing. With only 1.2B parameters, MinerU2.5's accuracy on the OmniDocBench benchmark comprehensively surpasses top-tier multimodal models like Gemini 2.5 Pro, GPT-4o, and Qwen2.5-VL-72B. It also significantly outperforms leading specialized models such as dots.ocr, MonkeyOCR, and PP-StructureV3. The model has been released on [HuggingFace](https://huggingface.co/opendatalab/MinerU2.5-2509-1.2B) and [ModelScope](https://modelscope.cn/models/opendatalab/MinerU2.5-2509-1.2B) platforms. Welcome to download and use! - Core Highlights: - SOTA Performance with Extreme Efficiency: As a 1.2B model, it achieves State-of-the-Art (SOTA) results that exceed models in the 10B and 100B+ classes, redefining the performance-per-parameter standard in document AI. - Advanced Architecture for Across-the-Board Leadership: By combining a two-stage inference pipeline (decoupling layout analysis from content recognition) with a native high-resolution architecture, it achieves SOTA performance across five key areas: layout analysis, text recognition, formula recognition, table recognition, and reading order. - Key Capability Enhancements: - Layout Detection: Delivers more complete results by accurately covering non-body content like headers, footers, and page numbers. It also provides more precise element localization and natural format reconstruction for lists and references. - Table Parsing: Drastically improves parsing for challenging cases, including rotated tables, borderless/semi-structured tables, and long/complex tables. - Formula Recognition: Significantly boosts accuracy for complex, long-form, and hybrid Chinese-English formulas, greatly enhancing the parsing capability for mathematical documents. Additionally, with the release of vlm 2.5, we have made some adjustments to the repository: - The vlm backend has been upgraded to version 2.5, supporting the MinerU2.5 model and no longer compatible with the MinerU2.0-2505-0.9B model. The last version supporting the 2.0 model is mineru-2.2.2. - VLM inference-related code has been moved to [mineru_vl_utils](https://github.com/opendatalab/mineru-vl-utils), reducing coupling with the main mineru repository and facilitating independent iteration in the future. - The vlm accelerated inference framework has been switched from `sglang` to `vllm`, achieving full compatibility with the vllm ecosystem, allowing users to use the MinerU2.5 model and accelerated inference on any platform that supports the vllm framework. - Due to major upgrades in the vlm model supporting more layout types, we have made some adjustments to the structure of the parsing intermediate file `middle.json` and result file `content_list.json`. Please refer to the [documentation](https://opendatalab.github.io/MinerU/reference/output_files/) for details. Other repository optimizations: - Removed file extension whitelist validation for input files. When input files are PDF documents or images, there are no longer requirements for file extensions, improving usability.
History Log
2025/09/10 2.2.2 Released
2025/09/08 2.2.1 Released
2025/09/05 2.2.0 Released
2025/08/01 2.1.10 Released
2025/07/30 2.1.9 Released
2025/07/28 2.1.8 Released
2025/07/27 2.1.7 Released
2025/07/26 2.1.6 Released
2025/07/24 2.1.5 Released
2025/07/23 2.1.4 Released
2025/07/16 2.1.1 Released
2025/07/05 2.1.0 Released
2025/06/20 2.0.6 Released
2025/06/17 2.0.5 Released
2025/06/15 2.0.3 released
2025/06/13 2.0.0 Released
2025/05/24 Release 1.3.12
2025/04/29 Release 1.3.10
2025/04/27 Release 1.3.9
2025/04/23 Release 1.3.8
2025/04/22 Release 1.3.7
2025/04/16 Release 1.3.4
2025/04/12 Release 1.3.2
2025/04/08 Release 1.3.1
2025/04/03 Release 1.3.0
2025/03/03 1.2.1 released
2025/02/24 1.2.0 released

This version includes several fixes and improvements to enhance parsing efficiency and accuracy:

2025/01/22 1.1.0 released

In this version we have focused on improving parsing accuracy and efficiency:

2025/01/10 1.0.1 released

This is our first official release, where we have introduced a completely new API interface and enhanced compatibility through extensive refactoring, as well as a brand new automatic language identification feature:

2024/11/22 0.10.0 released

Introducing hybrid OCR text extraction capabilities:

2024/11/15 0.9.3 released

Integrated RapidTable for table recognition, improving single-table parsing speed by more than 10 times, with higher accuracy and lower GPU memory usage.

2024/11/06 0.9.2 released

Integrated the StructTable-InternVL2-1B model for table recognition functionality.

2024/10/31 0.9.0 released

This is a major new version with extensive code refactoring, addressing numerous issues, improving performance, reducing hardware requirements, and enhancing usability:

2024/09/27 Version 0.8.1 released

Fixed some bugs, and providing a localized deployment version of the online demo and the front-end interface.

2024/09/09 Version 0.8.0 released

Supporting fast deployment with Dockerfile, and launching demos on Huggingface and Modelscope.

2024/08/30 Version 0.7.1 released

Add paddle tablemaster table recognition option

2024/08/09 Version 0.7.0b1 released

Simplified installation process, added table recognition functionality

2024/08/01 Version 0.6.2b1 released

Optimized dependency conflict issues and installation documentation

2024/07/05 Initial open-source release
# MinerU ## Project Introduction MinerU is a tool that converts PDFs into machine-readable formats (e.g., markdown, JSON), allowing for easy extraction into any format. MinerU was born during the pre-training process of [InternLM](https://github.com/InternLM/InternLM). We focus on solving symbol conversion issues in scientific literature and hope to contribute to technological development in the era of large models. Compared to well-known commercial products, MinerU is still young. If you encounter any issues or if the results are not as expected, please submit an issue on [issue](https://github.com/opendatalab/MinerU/issues) and **attach the relevant PDF**. https://github.com/user-attachments/assets/4bea02c9-6d54-4cd6-97ed-dff14340982c ## Key Features - Remove headers, footers, footnotes, page numbers, etc., to ensure semantic coherence. - Output text in human-readable order, suitable for single-column, multi-column, and complex layouts. - Preserve the structure of the original document, including headings, paragraphs, lists, etc. - Extract images, image descriptions, tables, table titles, and footnotes. - Automatically recognize and convert formulas in the document to LaTeX format. - Automatically recognize and convert tables in the document to HTML format. - Automatically detect scanned PDFs and garbled PDFs and enable OCR functionality. - OCR supports detection and recognition of 84 languages. - Supports multiple output formats, such as multimodal and NLP Markdown, JSON sorted by reading order, and rich intermediate formats. - Supports various visualization results, including layout visualization and span visualization, for efficient confirmation of output quality. - Supports running in a pure CPU environment, and also supports GPU(CUDA)/NPU(CANN)/MPS acceleration - Compatible with Windows, Linux, and Mac platforms. # Quick Start If you encounter any installation issues, please first consult the FAQ.
If the parsing results are not as expected, refer to the Known Issues.
## Online Experience ### Official online web application The official online version has the same functionality as the client, with a beautiful interface and rich features, requires login to use - [![OpenDataLab](https://img.shields.io/badge/webapp_on_mineru.net-blue?logo=&labelColor=white)](https://mineru.net/OpenSourceTools/Extractor?source=github) ### Gradio-based online demo A WebUI developed based on Gradio, with a simple interface and only core parsing functionality, no login required - [![ModelScope](https://img.shields.io/badge/Demo_on_ModelScope-purple?logo=&labelColor=white)](https://www.modelscope.cn/studios/OpenDataLab/MinerU) - [![HuggingFace](https://img.shields.io/badge/Demo_on_HuggingFace-yellow.svg?logo=&labelColor=white)](https://huggingface.co/spaces/opendatalab/MinerU) ## Local Deployment > [!WARNING] > **Pre-installation Notice—Hardware and Software Environment Support** > > To ensure the stability and reliability of the project, we only optimize and test for specific hardware and software environments during development. This ensures that users deploying and running the project on recommended system configurations will get the best performance with the fewest compatibility issues. > > By focusing resources on the mainline environment, our team can more efficiently resolve potential bugs and develop new features. > > In non-mainline environments, due to the diversity of hardware and software configurations, as well as third-party dependency compatibility issues, we cannot guarantee 100% project availability. Therefore, for users who wish to use this project in non-recommended environments, we suggest carefully reading the documentation and FAQ first. Most issues already have corresponding solutions in the FAQ. We also encourage community feedback to help us gradually expand support.
Parsing Backend pipeline
(Accuracy1 82+)
vlm (Accuracy1 90+)
transformers mlx-engine vllm-engine /
vllm-async-engine
http-client
Backend Features Fast, no hallucinations Good compatibility, slower Faster than transformers Fast, compatible with the vLLM ecosystem No configuration required, suitable for OpenAI-compatible servers
Operating System Linux2 / Windows / macOS macOS3 Linux2 / Windows4 Any
CPU inference support Not required
GPU RequirementsVolta or later architectures, 6 GB VRAM or more, or Apple Silicon Apple Silicon Volta or later architectures, 8 GB VRAM or more Not required
Memory Requirements Minimum 16 GB, 32 GB recommended 8 GB
Disk Space Requirements 20 GB or more, SSD recommended 2 GB
Python Version 3.10-3.13
1. Accuracy metric is the End-to-End Evaluation Overall score of OmniDocBench (v1.5) 2. Linux supports only distributions released in 2019 or later 3. Requires macOS 13.5 or later 4. Windows vLLM support via WSL2 ### Install MinerU #### Install MinerU using pip or uv ```bash pip install --upgrade pip pip install uv uv pip install -U "mineru[core]" ``` #### Install MinerU from source code ```bash git clone https://github.com/opendatalab/MinerU.git cd MinerU uv pip install -e .[core] ``` > [!TIP] > `mineru[core]` includes all core features except `vLLM` acceleration, compatible with Windows / Linux / macOS systems, suitable for most users. > If you need to use `vLLM` acceleration for VLM model inference or install a lightweight client on edge devices, please refer to the documentation [Extension Modules Installation Guide](https://opendatalab.github.io/MinerU/quick_start/extension_modules/). --- #### Deploy MinerU using Docker MinerU provides a convenient Docker deployment method, which helps quickly set up the environment and solve some tricky environment compatibility issues. You can get the [Docker Deployment Instructions](https://opendatalab.github.io/MinerU/quick_start/docker_deployment/) in the documentation. --- ### Using MinerU The simplest command line invocation is: ```bash mineru -p -o ``` You can use MinerU for PDF parsing through various methods such as command line, API, and WebUI. For detailed instructions, please refer to the [Usage Guide](https://opendatalab.github.io/MinerU/usage/). # TODO - [x] Reading order based on the model - [x] Recognition of `index` and `list` in the main text - [x] Table recognition - [x] Heading Classification - [x] Handwritten Text Recognition - [x] Vertical Text Recognition - [x] Latin Accent Mark Recognition - [x] Code block recognition in the main text - [x] [Chemical formula recognition](docs/chemical_knowledge_introduction/introduction.pdf)(mineru.net) - [ ] Geometric shape recognition # Known Issues - Reading order is determined by the model based on the spatial distribution of readable content, and may be out of order in some areas under extremely complex layouts. - Limited support for vertical text. - Tables of contents and lists are recognized through rules, and some uncommon list formats may not be recognized. - Code blocks are not yet supported in the layout model. - Comic books, art albums, primary school textbooks, and exercises cannot be parsed well. - Table recognition may result in row/column recognition errors in complex tables. - OCR recognition may produce inaccurate characters in PDFs of lesser-known languages (e.g., diacritical marks in Latin script, easily confused characters in Arabic script). - Some formulas may not render correctly in Markdown. # FAQ - If you encounter any issues during usage, you can first check the [FAQ](https://opendatalab.github.io/MinerU/faq/) for solutions. - If your issue remains unresolved, you may also use [DeepWiki](https://deepwiki.com/opendatalab/MinerU) to interact with an AI assistant, which can address most common problems. - If you still cannot resolve the issue, you are welcome to join our community via [Discord](https://discord.gg/Tdedn9GTXq) or [WeChat](https://mineru.net/community-portal/?aliasId=3c430f94) to discuss with other users and developers. # All Thanks To Our Contributors # License Information [LICENSE.md](LICENSE.md) Currently, some models in this project are trained based on YOLO. However, since YOLO follows the AGPL license, it may impose restrictions on certain use cases. In future iterations, we plan to explore and replace these with models under more permissive licenses to enhance user-friendliness and flexibility. # Acknowledgments - [PDF-Extract-Kit](https://github.com/opendatalab/PDF-Extract-Kit) - [DocLayout-YOLO](https://github.com/opendatalab/DocLayout-YOLO) - [UniMERNet](https://github.com/opendatalab/UniMERNet) - [RapidTable](https://github.com/RapidAI/RapidTable) - [TableStructureRec](https://github.com/RapidAI/TableStructureRec) - [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR) - [PaddleOCR2Pytorch](https://github.com/frotms/PaddleOCR2Pytorch) - [layoutreader](https://github.com/ppaanngggg/layoutreader) - [xy-cut](https://github.com/Sanster/xy-cut) - [fast-langdetect](https://github.com/LlmKira/fast-langdetect) - [pypdfium2](https://github.com/pypdfium2-team/pypdfium2) - [pdftext](https://github.com/datalab-to/pdftext) - [pdfminer.six](https://github.com/pdfminer/pdfminer.six) - [pypdf](https://github.com/py-pdf/pypdf) - [magika](https://github.com/google/magika) # Citation ```bibtex @misc{niu2025mineru25decoupledvisionlanguagemodel, title={MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing}, author={Junbo Niu and Zheng Liu and Zhuangcheng Gu and Bin Wang and Linke Ouyang and Zhiyuan Zhao and Tao Chu and Tianyao He and Fan Wu and Qintong Zhang and Zhenjiang Jin and Guang Liang and Rui Zhang and Wenzheng Zhang and Yuan Qu and Zhifei Ren and Yuefeng Sun and Yuanhong Zheng and Dongsheng Ma and Zirui Tang and Boyu Niu and Ziyang Miao and Hejun Dong and Siyi Qian and Junyuan Zhang and Jingzhou Chen and Fangdong Wang and Xiaomeng Zhao and Liqun Wei and Wei Li and Shasha Wang and Ruiliang Xu and Yuanyuan Cao and Lu Chen and Qianqian Wu and Huaiyu Gu and Lindong Lu and Keming Wang and Dechen Lin and Guanlin Shen and Xuanhe Zhou and Linfeng Zhang and Yuhang Zang and Xiaoyi Dong and Jiaqi Wang and Bo Zhang and Lei Bai and Pei Chu and Weijia Li and Jiang Wu and Lijun Wu and Zhenxiang Li and Guangyu Wang and Zhongying Tu and Chao Xu and Kai Chen and Yu Qiao and Bowen Zhou and Dahua Lin and Wentao Zhang and Conghui He}, year={2025}, eprint={2509.22186}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2509.22186}, } @misc{wang2024mineruopensourcesolutionprecise, title={MinerU: An Open-Source Solution for Precise Document Content Extraction}, author={Bin Wang and Chao Xu and Xiaomeng Zhao and Linke Ouyang and Fan Wu and Zhiyuan Zhao and Rui Xu and Kaiwen Liu and Yuan Qu and Fukai Shang and Bo Zhang and Liqun Wei and Zhihao Sui and Wei Li and Botian Shi and Yu Qiao and Dahua Lin and Conghui He}, year={2024}, eprint={2409.18839}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2409.18839}, } @article{he2024opendatalab, title={Opendatalab: Empowering general artificial intelligence with open datasets}, author={He, Conghui and Li, Wei and Jin, Zhenjiang and Xu, Chao and Wang, Bin and Lin, Dahua}, journal={arXiv preprint arXiv:2407.13773}, year={2024} } ``` # Star History Star History Chart # Links - [Easy Data Preparation with latest LLMs-based Operators and Pipelines](https://github.com/OpenDCAI/DataFlow) - [Vis3 (OSS browser based on s3)](https://github.com/opendatalab/Vis3) - [LabelU (A Lightweight Multi-modal Data Annotation Tool)](https://github.com/opendatalab/labelU) - [LabelLLM (An Open-source LLM Dialogue Annotation Platform)](https://github.com/opendatalab/LabelLLM) - [PDF-Extract-Kit (A Comprehensive Toolkit for High-Quality PDF Content Extraction)](https://github.com/opendatalab/PDF-Extract-Kit) - [OmniDocBench (A Comprehensive Benchmark for Document Parsing and Evaluation)](https://github.com/opendatalab/OmniDocBench) - [Magic-HTML (Mixed web page extraction tool)](https://github.com/opendatalab/magic-html) - [Magic-Doc (Fast speed ppt/pptx/doc/docx/pdf extraction tool)](https://github.com/InternLM/magic-doc) - [Dingo: A Comprehensive AI Data Quality Evaluation Tool](https://github.com/MigoXLab/dingo)