vor 4 Monaten · 136d26e513
--- a/docs/en/FAQ/index.md
+++ b/docs/en/FAQ/index.md
@@ -1,6 +1,6 @@
 
				 # Frequently Asked Questions
			
 
				 
			
 
				-### 1. Encountered the error `ImportError: libGL.so.1: cannot open shared object file: No such file or directory` in Ubuntu 22.04 on WSL2
			
 
				+## 1. Encountered the error `ImportError: libGL.so.1: cannot open shared object file: No such file or directory` in Ubuntu 22.04 on WSL2
			
 
				 
			
 
				 The `libgl` library is missing in Ubuntu 22.04 on WSL2. You can install the `libgl` library with the following command to resolve the issue:
			
 
				 
			
@@ -11,7 +11,7 @@ sudo apt-get install libgl1-mesa-glx
 
				 Reference: https://github.com/opendatalab/MinerU/issues/388
			
 
				 
			
 
				 
			
 
				-### 2. Error when installing MinerU on CentOS 7 or Ubuntu 18: `ERROR: Failed building wheel for simsimd`
			
 
				+## 2. Error when installing MinerU on CentOS 7 or Ubuntu 18: `ERROR: Failed building wheel for simsimd`
			
 
				 
			
 
				 The new version of albumentations (1.4.21) introduces a dependency on simsimd. Since the pre-built package of simsimd for Linux requires a glibc version greater than or equal to 2.28, this causes installation issues on some Linux distributions released before 2019. You can resolve this issue by using the following command:
			
 
				 ```
			
--- a/docs/en/index.md
+++ b/docs/en/index.md
--- a/docs/en/known_issues.md
+++ b/docs/en/known_issues.md
@@ -0,0 +1,10 @@
 
				+# Known Issues
			
 
				+
			
 
				+- Reading order is determined by the model based on the spatial distribution of readable content, and may be out of order in some areas under extremely complex layouts.
			
 
				+- Limited support for vertical text.
			
 
				+- Tables of contents and lists are recognized through rules, and some uncommon list formats may not be recognized.
			
 
				+- Code blocks are not yet supported in the layout model.
			
 
				+- Comic books, art albums, primary school textbooks, and exercises cannot be parsed well.
			
 
				+- Table recognition may result in row/column recognition errors in complex tables.
			
 
				+- OCR recognition may produce inaccurate characters in PDFs of lesser-known languages (e.g., diacritical marks in Latin script, easily confused characters in Arabic script).
			
 
				+- Some formulas may not render correctly in Markdown.
			
--- a/docs/en/output_file.md
+++ b/docs/en/output_file.md
@@ -1,21 +1,21 @@
 
				-## Overview
			
 
				+# Overview
			
 
				 
			
 
				 After executing the `mineru` command, in addition to outputting files related to markdown, several other files unrelated to markdown will also be generated. These files will be introduced one by one.
			
 
				 
			
 
				-### some_pdf_layout.pdf
			
 
				+## some_pdf_layout.pdf
			
 
				 
			
 
				 Each page's layout consists of one or more bounding boxes. The number in the top-right corner of each box indicates the reading order. Additionally, different content blocks are highlighted with distinct background colors within the layout.pdf.
			
 
				 ![layout example](../images/layout_example.png)
			
 
				 
			
 
				-### some_pdf_spans.pdf(Applicable only to the pipeline backend)
			
 
				+## some_pdf_spans.pdf(Applicable only to the pipeline backend)
			
 
				 
			
 
				 All spans on the page are drawn with different colored line frames according to the span type. This file can be used for quality control, allowing for quick identification of issues such as missing text or unrecognized inline formulas.
			
 
				 
			
 
				 ![spans example](../images/spans_example.png)
			
 
				 
			
 
				-### some_pdf_model.json(Applicable only to the pipeline backend)
			
 
				+## some_pdf_model.json(Applicable only to the pipeline backend)
			
 
				 
			
 
				-#### Structure Definition
			
 
				+### Structure Definition
			
 
				 
			
 
				 ```python
			
 
				 from pydantic import BaseModel, Field
			
@@ -63,7 +63,7 @@ inference_result: list[PageInferenceResults] = []
 
				 The format of the poly coordinates is \[x0, y0, x1, y1, x2, y2, x3, y3\], representing the coordinates of the top-left, top-right, bottom-right, and bottom-left points respectively.
			
 
				 ![Poly Coordinate Diagram](../images/poly.png)
			
 
				 
			
 
				-#### example
			
 
				+### example
			
 
				 
			
 
				 ```json
			
 
				 [
			
@@ -116,7 +116,7 @@ The format of the poly coordinates is \[x0, y0, x1, y1, x2, y2, x3, y3\], repres
 
				 ]
			
 
				 ```
			
 
				 
			
 
				-### some_pdf_model_output.txt (Applicable only to the VLM backend)
			
 
				+## some_pdf_model_output.txt (Applicable only to the VLM backend)
			
 
				 
			
 
				 This file contains the output of the VLM model, with each page's output separated by `----`.  
			
 
				 Each page's output consists of text blocks starting with `<|box_start|>` and ending with `<|md_end|>`.  
			
@@ -142,7 +142,7 @@ The meaning of each field is as follows:
 
				   This field contains the Markdown content of the block. If `type` is `text`, the end of the text may contain the `<|txt_contd|>` tag, indicating that this block can be connected with the following `text` block(s).
			
 
				   If `type` is `table`, the content is in `otsl` format and needs to be converted into HTML for rendering in Markdown.
			
 
				 
			
 
				-### some_pdf_middle.json
			
 
				+## some_pdf_middle.json
			
 
				 
			
 
				 | Field Name     | Description                                                                                                    |
			
 
				 |:---------------| :------------------------------------------------------------------------------------------------------------- |
			
@@ -251,7 +251,7 @@ The block structure is as follows:
 
				 
			
 
				 First-level block (if any) -> Second-level block -> Line -> Span
			
 
				 
			
 
				-#### example
			
 
				+### example
			
 
				 
			
 
				 ```json
			
 
				 {
			
@@ -355,7 +355,7 @@ First-level block (if any) -> Second-level block -> Line -> Span
 
				 ```
			
 
				 
			
 
				 
			
 
				-### some_pdf_content_list.json
			
 
				+## some_pdf_content_list.json
			
 
				 
			
 
				 This file is a JSON array where each element is a dict storing all readable content blocks in the document in reading order.  
			
 
				 `content_list` can be viewed as a simplified version of `middle.json`. The content block types are mostly consistent with those in `middle.json`, but layout information is not included.  
			
@@ -376,7 +376,7 @@ Please note that both `title` and text blocks in `content_list` are uniformly re
 
				 
			
 
				 Each content contains the `page_idx` field, indicating the page number (starting from 0) where the content block resides.
			
 
				 
			
 
				-#### example
			
 
				+### example
			
 
				 
			
 
				 ```json
			
 
				 [
			
--- a/docs/en/quick_start/quick_start.md
+++ b/docs/en/quick_start/quick_start.md
@@ -1,11 +1,15 @@
 
				 # Quick Start
			
 
				 
			
 
				-If you encounter any installation issues, please first consult the <a href="#faq">FAQ</a>. </br>
			
 
				-If the parsing results are not as expected, refer to the <a href="#known-issues">Known Issues</a>. </br>
			
 
				+If you encounter any installation issues, please first consult the [FAQ](../FAQ/index.md).
			
 
				+
			
 
				+
			
 
				+If the parsing results are not as expected, refer to the [Known Issues](../known_issues.md).
			
 
				+
			
 
				+
			
 
				 There are three different ways to experience MinerU:
			
 
				 
			
 
				-- [Online Demo](#online-demo)
			
 
				-- [Local Deployment](#local-deployment)
			
 
				+- [Online Demo](online_demo.md)
			
 
				+- [Local Deployment](local_deployment.md)
			
 
				 
			
 
				 
			
 
				 > [!WARNING]
			
--- a/docs/en/quick_start/local_deployment.md
+++ b/docs/en/quick_start/local_deployment.md
@@ -0,0 +1,72 @@
 
				+# Local Deployment
			
 
				+
			
 
				+## Install MinerU
			
 
				+
			
 
				+### Install via pip or uv
			
 
				+
			
 
				+```bash
			
 
				+pip install --upgrade pip
			
 
				+pip install uv
			
 
				+uv pip install -U "mineru[core]"
			
 
				+```
			
 
				+
			
 
				+### Install from source
			
 
				+
			
 
				+```bash
			
 
				+git clone https://github.com/opendatalab/MinerU.git
			
 
				+cd MinerU
			
 
				+uv pip install -e .[core]
			
 
				+```
			
 
				+
			
 
				+> [!NOTE]  
			
 
				+> Linux and macOS systems automatically support CUDA/MPS acceleration after installation. For Windows users who want to use CUDA acceleration, 
			
 
				+> please visit the [PyTorch official website](https://pytorch.org/get-started/locally/) to install PyTorch with the appropriate CUDA version.
			
 
				+
			
 
				+### Install Full Version (supports sglang acceleration) (requires device with Turing or newer architecture and at least 8GB GPU memory)
			
 
				+
			
 
				+If you need to use **sglang to accelerate VLM model inference**, you can choose any of the following methods to install the full version:
			
 
				+
			
 
				+- Install using uv or pip:
			
 
				+  ```bash
			
 
				+  uv pip install -U "mineru[all]"
			
 
				+  ```
			
 
				+- Install from source:
			
 
				+  ```bash
			
 
				+  uv pip install -e .[all]
			
 
				+  ```
			
 
				+
			
 
				+> [!TIP]  
			
 
				+> If any exceptions occur during the installation of `sglang`, please refer to the [official sglang documentation](https://docs.sglang.ai/start/install.html) for troubleshooting and solutions, or directly use Docker-based installation.
			
 
				+
			
 
				+- Build image using Dockerfile:
			
 
				+  ```bash
			
 
				+  wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docker/global/Dockerfile
			
 
				+  docker build -t mineru-sglang:latest -f Dockerfile .
			
 
				+  ```
			
 
				+  Start Docker container:
			
 
				+  ```bash
			
 
				+  docker run --gpus all \
			
 
				+    --shm-size 32g \
			
 
				+    -p 30000:30000 \
			
 
				+    --ipc=host \
			
 
				+    mineru-sglang:latest \
			
 
				+    mineru-sglang-server --host 0.0.0.0 --port 30000
			
 
				+  ```
			
 
				+  Or start using Docker Compose:
			
 
				+  ```bash
			
 
				+    wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docker/compose.yaml
			
 
				+    docker compose -f compose.yaml up -d
			
 
				+  ```
			
 
				+  
			
 
				+> [!TIP]
			
 
				+> The Dockerfile uses `lmsysorg/sglang:v0.4.8.post1-cu126` as the default base image, which supports the Turing/Ampere/Ada Lovelace/Hopper platforms.  
			
 
				+> If you are using the newer Blackwell platform, please change the base image to `lmsysorg/sglang:v0.4.8.post1-cu128-b200`.
			
 
				+
			
 
				+### Install client  (for connecting to sglang-server on edge devices that require only CPU and network connectivity)
			
 
				+
			
 
				+```bash
			
 
				+uv pip install -U mineru
			
 
				+mineru -p <input_path> -o <output_path> -b vlm-sglang-client -u http://<host_ip>:<port>
			
 
				+```
			
 
				+
			
 
				+---
			
--- a/docs/en/quick_start/online_demo.md
+++ b/docs/en/quick_start/online_demo.md
--- a/docs/en/todo.md
+++ b/docs/en/todo.md
@@ -0,0 +1,9 @@
 
				+# TODO
			
 
				+
			
 
				+- [x] Reading order based on the model  
			
 
				+- [x] Recognition of `index` and `list` in the main text  
			
 
				+- [x] Table recognition
			
 
				+- [x] Heading Classification
			
 
				+- [ ] Code block recognition in the main text
			
 
				+- [ ] [Chemical formula recognition](../chemical_knowledge_introduction/introduction.pdf)
			
 
				+- [ ] Geometric shape recognition
			
--- a/docs/en/usage/api.md
+++ b/docs/en/usage/api.md
@@ -0,0 +1,58 @@
 
				+# API Calls or Visual Invocation
			
 
				+
			
 
				+1. Directly invoke using Python API: [Python Invocation Example](https://github.com/opendatalab/MinerU/blob/master/demo/demo.py)
			
 
				+2. Invoke using FastAPI:
			
 
				+   ```bash
			
 
				+   mineru-api --host 127.0.0.1 --port 8000
			
 
				+   ```
			
 
				+   Visit http://127.0.0.1:8000/docs in your browser to view the API documentation.
			
 
				+
			
 
				+3. Use Gradio WebUI or Gradio API:
			
 
				+   ```bash
			
 
				+   # Using pipeline/vlm-transformers/vlm-sglang-client backend
			
 
				+   mineru-gradio --server-name 127.0.0.1 --server-port 7860
			
 
				+   # Or using vlm-sglang-engine/pipeline backend
			
 
				+   mineru-gradio --server-name 127.0.0.1 --server-port 7860 --enable-sglang-engine true
			
 
				+   ```
			
 
				+   Access http://127.0.0.1:7860 in your browser to use the Gradio WebUI, or visit http://127.0.0.1:7860/?view=api to use the Gradio API.
			
 
				+
			
 
				+
			
 
				+> [!TIP]  
			
 
				+> - Below are some suggestions and notes for using the sglang acceleration mode:  
			
 
				+> - The sglang acceleration mode currently supports operation on Turing architecture GPUs with a minimum of 8GB VRAM, but you may encounter VRAM shortages on GPUs with less than 24GB VRAM. You can optimize VRAM usage with the following parameters:  
			
 
				+>   - If running on a single GPU and encountering VRAM shortage, reduce the KV cache size by setting `--mem-fraction-static 0.5`. If VRAM issues persist, try lowering it further to `0.4` or below.  
			
 
				+>   - If you have more than one GPU, you can expand available VRAM using tensor parallelism (TP) mode: `--tp-size 2`  
			
 
				+> - If you are already successfully using sglang to accelerate VLM inference but wish to further improve inference speed, consider the following parameters:  
			
 
				+>   - If using multiple GPUs, increase throughput using sglang's multi-GPU parallel mode: `--dp-size 2`  
			
 
				+>   - You can also enable `torch.compile` to accelerate inference speed by about 15%: `--enable-torch-compile`  
			
 
				+> - For more information on using sglang parameters, please refer to the [sglang official documentation](https://docs.sglang.ai/backend/server_arguments.html#common-launch-commands)  
			
 
				+> - All sglang-supported parameters can be passed to MinerU via command-line arguments, including those used with the following commands: `mineru`, `mineru-sglang-server`, `mineru-gradio`, `mineru-api`
			
 
				+
			
 
				+> [!TIP]  
			
 
				+> - In any case, you can specify visible GPU devices at the start of a command line by adding the `CUDA_VISIBLE_DEVICES` environment variable. For example:  
			
 
				+>   ```bash
			
 
				+>   CUDA_VISIBLE_DEVICES=1 mineru -p <input_path> -o <output_path>
			
 
				+>   ```
			
 
				+> - This method works for all command-line calls, including `mineru`, `mineru-sglang-server`, `mineru-gradio`, and `mineru-api`, and applies to both `pipeline` and `vlm` backends.  
			
 
				+> - Below are some common `CUDA_VISIBLE_DEVICES` settings:  
			
 
				+>   ```bash
			
 
				+>   CUDA_VISIBLE_DEVICES=1 Only device 1 will be seen
			
 
				+>   CUDA_VISIBLE_DEVICES=0,1 Devices 0 and 1 will be visible
			
 
				+>   CUDA_VISIBLE_DEVICES="0,1" Same as above, quotation marks are optional
			
 
				+>   CUDA_VISIBLE_DEVICES=0,2,3 Devices 0, 2, 3 will be visible; device 1 is masked
			
 
				+>   CUDA_VISIBLE_DEVICES="" No GPU will be visible
			
 
				+>   ```
			
 
				+> - Below are some possible use cases:  
			
 
				+>   - If you have multiple GPUs and need to specify GPU 0 and GPU 1 to launch 'sglang-server' in multi-GPU mode, you can use the following command:  
			
 
				+>   ```bash
			
 
				+>   CUDA_VISIBLE_DEVICES=0,1 mineru-sglang-server --port 30000 --dp-size 2
			
 
				+>   ```
			
 
				+>   - If you have multiple GPUs and need to launch two `fastapi` services on GPU 0 and GPU 1 respectively, listening on different ports, you can use the following commands:  
			
 
				+>   ```bash
			
 
				+>   # In terminal 1
			
 
				+>   CUDA_VISIBLE_DEVICES=0 mineru-api --host 127.0.0.1 --port 8000
			
 
				+>   # In terminal 2
			
 
				+>   CUDA_VISIBLE_DEVICES=1 mineru-api --host 127.0.0.1 --port 8001
			
 
				+>   ```
			
 
				+
			
 
				+---
			
--- a/docs/en/usage/config.md
+++ b/docs/en/usage/config.md
@@ -0,0 +1,10 @@
 
				+# Extending MinerU Functionality Through Configuration Files
			
 
				+
			
 
				+- MinerU is designed to work out-of-the-box, but also supports extending functionality through configuration files. You can create a `mineru.json` file in your home directory and add custom configurations.
			
 
				+- The `mineru.json` file will be automatically generated when you use the built-in model download command `mineru-models-download`. Alternatively, you can create it by copying the [configuration template file](../../mineru.template.json) to your home directory and renaming it to `mineru.json`.
			
 
				+- Below are some available configuration options:
			
 
				+  - `latex-delimiter-config`: Used to configure LaTeX formula delimiters, defaults to the `$` symbol, and can be modified to other symbols or strings as needed.
			
 
				+  - `llm-aided-config`: Used to configure related parameters for LLM-assisted heading level detection, compatible with all LLM models supporting the `OpenAI protocol`. It defaults to Alibaba Cloud Qwen's `qwen2.5-32b-instruct` model. You need to configure an API key yourself and set `enable` to `true` to activate this feature.
			
 
				+  - `models-dir`: Used to specify local model storage directories. Please specify separate model directories for the `pipeline` and `vlm` backends. After specifying these directories, you can use local models by setting the environment variable `export MINERU_MODEL_SOURCE=local`.
			
 
				+
			
 
				+---
			
--- a/docs/en/usage/index.md
+++ b/docs/en/usage/index.md
@@ -0,0 +1,125 @@
 
				+# Using MinerU
			
 
				+
			
 
				+## Command Line Usage
			
 
				+
			
 
				+### Basic Usage
			
 
				+
			
 
				+The simplest command line invocation is:
			
 
				+
			
 
				+```bash
			
 
				+mineru -p <input_path> -o <output_path>
			
 
				+```
			
 
				+
			
 
				+- `<input_path>`: Local PDF/Image file or directory (supports pdf/png/jpg/jpeg/webp/gif)
			
 
				+- `<output_path>`: Output directory
			
 
				+
			
 
				+### View Help Information
			
 
				+
			
 
				+Get all available parameter descriptions:
			
 
				+
			
 
				+```bash
			
 
				+mineru --help
			
 
				+```
			
 
				+
			
 
				+### Parameter Details
			
 
				+
			
 
				+```text
			
 
				+Usage: mineru [OPTIONS]
			
 
				+
			
 
				+Options:
			
 
				+  -v, --version                   Show version and exit
			
 
				+  -p, --path PATH                 Input file path or directory (required)
			
 
				+  -o, --output PATH              Output directory (required)
			
 
				+  -m, --method [auto|txt|ocr]     Parsing method: auto (default), txt, ocr (pipeline backend only)
			
 
				+  -b, --backend [pipeline|vlm-transformers|vlm-sglang-engine|vlm-sglang-client]
			
 
				+                                  Parsing backend (default: pipeline)
			
 
				+  -l, --lang [ch|ch_server|ch_lite|en|korean|japan|chinese_cht|ta|te|ka|latin|arabic|east_slavic|cyrillic|devanagari]
			
 
				+                                  Specify document language (improves OCR accuracy, pipeline backend only)
			
 
				+  -u, --url TEXT                  Service address when using sglang-client
			
 
				+  -s, --start INTEGER             Starting page number (0-based)
			
 
				+  -e, --end INTEGER               Ending page number (0-based)
			
 
				+  -f, --formula BOOLEAN           Enable formula parsing (default: on)
			
 
				+  -t, --table BOOLEAN             Enable table parsing (default: on)
			
 
				+  -d, --device TEXT               Inference device (e.g., cpu/cuda/cuda:0/npu/mps, pipeline backend only)
			
 
				+  --vram INTEGER                  Maximum GPU VRAM usage per process (GB)(pipeline backend only)
			
 
				+  --source [huggingface|modelscope|local]
			
 
				+                                  Model source, default: huggingface
			
 
				+  --help                          Show help information
			
 
				+```
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## Model Source Configuration
			
 
				+
			
 
				+MinerU automatically downloads required models from HuggingFace on first run. If HuggingFace is inaccessible, you can switch model sources:
			
 
				+
			
 
				+### Switch to ModelScope Source
			
 
				+
			
 
				+```bash
			
 
				+mineru -p <input_path> -o <output_path> --source modelscope
			
 
				+```
			
 
				+
			
 
				+Or set environment variable:
			
 
				+
			
 
				+```bash
			
 
				+export MINERU_MODEL_SOURCE=modelscope
			
 
				+mineru -p <input_path> -o <output_path>
			
 
				+```
			
 
				+
			
 
				+### Using Local Models
			
 
				+
			
 
				+#### 1. Download Models Locally
			
 
				+
			
 
				+```bash
			
 
				+mineru-models-download --help
			
 
				+```
			
 
				+
			
 
				+Or use interactive command-line tool to select models:
			
 
				+
			
 
				+```bash
			
 
				+mineru-models-download
			
 
				+```
			
 
				+
			
 
				+After download, model paths will be displayed in current terminal and automatically written to `mineru.json` in user directory.
			
 
				+
			
 
				+#### 2. Parse Using Local Models
			
 
				+
			
 
				+```bash
			
 
				+mineru -p <input_path> -o <output_path> --source local
			
 
				+```
			
 
				+
			
 
				+Or enable via environment variable:
			
 
				+
			
 
				+```bash
			
 
				+export MINERU_MODEL_SOURCE=local
			
 
				+mineru -p <input_path> -o <output_path>
			
 
				+```
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## Using sglang to Accelerate VLM Model Inference
			
 
				+
			
 
				+### Through the sglang-engine Mode
			
 
				+
			
 
				+```bash
			
 
				+mineru -p <input_path> -o <output_path> -b vlm-sglang-engine
			
 
				+```
			
 
				+
			
 
				+### Through the sglang-server/client Mode
			
 
				+
			
 
				+1. Start Server:
			
 
				+
			
 
				+```bash
			
 
				+mineru-sglang-server --port 30000
			
 
				+```
			
 
				+
			
 
				+2. Use Client in another terminal:
			
 
				+
			
 
				+```bash
			
 
				+mineru -p <input_path> -o <output_path> -b vlm-sglang-client -u http://127.0.0.1:30000
			
 
				+```
			
 
				+
			
 
				+> [!TIP]
			
 
				+> For more information about output files, please refer to [Output File Documentation](../output_file.md)
			
 
				+
			
 
				+---
			
--- a/docs/zh/quick_start/index.md
+++ b/docs/zh/quick_start/index.md
@@ -1,7 +1,11 @@
 
				 # 快速开始
			
 
				 
			
 
				-如果遇到任何安装问题，请先查询 [FAQ](../FAQ.md) </br>
			
 
				-如果遇到解析效果不及预期，参考 [Known Issues](../known_issues.md) </br>
			
 
				+如果遇到任何安装问题，请先查询 [FAQ](../FAQ/index.md) 
			
 
				+
			
 
				+
			
 
				+如果遇到解析效果不及预期，参考 [Known Issues](../known_issues.md) 
			
 
				+
			
 
				+
			
 
				 有2种不同方式可以体验MinerU的效果：
			
 
				 
			
 
				 - [在线体验](local_deployment.md)
			
--- a/docs/zh/todo.md
+++ b/docs/zh/todo.md
@@ -5,5 +5,5 @@
 
				 - [x] 表格识别
			
 
				 - [x] 标题分级
			
 
				 - [ ] 正文中代码块识别
			
 
				-- [ ] [化学式识别](docs/chemical_knowledge_introduction/introduction.pdf)
			
 
				+- [ ] [化学式识别](../chemical_knowledge_introduction/introduction.pdf)
			
 
				 - [ ] 几何图形识别