.. xtuner documentation master file, created by
sphinx-quickstart on Tue Jan 9 16:33:06 2024.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Welcome to the MinerU Documentation
==============================================
.. figure:: ./_static/image/logo.png
:align: center
:alt: mineru
:class: no-scaled-link
.. raw:: html
A one-stop, open-source, high-quality data extraction tool
Star
Watch
Fork
Project Introduction
--------------------
MinerU is a tool that converts PDFs into machine-readable formats (e.g.,
markdown, JSON), allowing for easy extraction into any format. MinerU
was born during the pre-training process of
`InternLM `__. We focus on solving
symbol conversion issues in scientific literature and hope to contribute
to technological development in the era of large models. Compared to
well-known commercial products, MinerU is still young. If you encounter
any issues or if the results are not as expected, please submit an issue
on `issue `__ and **attach
the relevant PDF**.
.. video:: https://github.com/user-attachments/assets/4bea02c9-6d54-4cd6-97ed-dff14340982c
Key Features
------------
- Removes elements such as headers, footers, footnotes, and page
numbers while maintaining semantic continuity
- Outputs text in a human-readable order from multi-column documents
- Retains the original structure of the document, including titles,
paragraphs, and lists
- Extracts images, image captions, tables, and table captions
- Automatically recognizes formulas in the document and converts them
to LaTeX
- Automatically recognizes tables in the document and converts them to
LaTeX
- Automatically detects and enables OCR for corrupted PDFs
- Supports both CPU and GPU environments
- Supports Windows, Linux, and Mac platforms
User Guide
-------------
.. toctree::
:maxdepth: 2
:caption: User Guide
user_guide
API Reference
-------------
If you are looking for information on a specific function, class or
method, this part of the documentation is for you.
.. toctree::
:maxdepth: 2
:caption: API
api
Additional Notes
------------------
.. toctree::
:maxdepth: 1
:caption: Additional Notes
additional_notes/known_issues
additional_notes/faq
additional_notes/changelog
additional_notes/glossary
Projects
---------
.. toctree::
:maxdepth: 1
:caption: Projects
projects