2 months ago · e120a90d11
--- a/README.md
+++ b/README.md
@@ -583,7 +583,7 @@ A WebUI developed based on Gradio, with a simple interface and only core parsing
 
															         <td>Parsing Backend</td>
														
 
															         <td>pipeline</td>
														
 
															         <td>vlm-transformers</td>
														
 
															-        <td>vlm-sglang</td>
														
 
															+        <td>vlm-vllm</td>
														
 
															     </tr>
														
 
															     <tr>
														
 
															         <td>Operating System</td>
														
@@ -661,8 +661,8 @@ You can use MinerU for PDF parsing through various methods such as command line,
 
															 - [x] Handwritten Text Recognition  
														
 
															 - [x] Vertical Text Recognition  
														
 
															 - [x] Latin Accent Mark Recognition
														
 
															-- [ ] Code block recognition in the main text
														
 
															-- [ ] [Chemical formula recognition](docs/chemical_knowledge_introduction/introduction.pdf)
														
 
															+- [x] Code block recognition in the main text
														
 
															+- [x] [Chemical formula recognition](docs/chemical_knowledge_introduction/introduction.pdf)(mineru.net)
														
 
															 - [ ] Geometric shape recognition
														
 
															 # Known Issues
														
--- a/README_zh-CN.md
+++ b/README_zh-CN.md
@@ -570,7 +570,7 @@ https://github.com/user-attachments/assets/4bea02c9-6d54-4cd6-97ed-dff14340982c
 
															         <td>解析后端</td>
														
 
															         <td>pipeline</td>
														
 
															         <td>vlm-transformers</td>
														
 
															-        <td>vlm-sglang</td>
														
 
															+        <td>vlm-vllm</td>
														
 
															     </tr>
														
 
															     <tr>
														
 
															         <td>操作系统</td>
														
@@ -648,8 +648,8 @@ mineru -p <input_path> -o <output_path>
 
															 - [x] 手写文本识别
														
 
															 - [x] 竖排文本识别
														
 
															 - [x] 拉丁字母重音符号识别
														
 
															-- [ ] 正文中代码块识别
														
 
															-- [ ] [化学式识别](docs/chemical_knowledge_introduction/introduction.pdf)
														
 
															+- [x] 正文中代码块识别
														
 
															+- [x] [化学式识别](docs/chemical_knowledge_introduction/introduction.pdf)(https://mineru.net)
														
 
															 - [ ] 图表内容识别
														
 
															 # Known Issues
														
--- a/docs/en/faq/index.md
+++ b/docs/en/faq/index.md
@@ -15,18 +15,6 @@ For unresolved problems, join our [Discord](https://discord.gg/Tdedn9GTXq) or [W
 
															     Reference: [#388](https://github.com/opendatalab/MinerU/issues/388)
														
 
															-??? question "Error when installing MinerU on CentOS 7 or Ubuntu 18: `ERROR: Failed building wheel for simsimd`"
														
 
															-
														
 
															-    The new version of albumentations (1.4.21) introduces a dependency on simsimd. Since the pre-built package of simsimd for Linux requires a glibc version greater than or equal to 2.28, this causes installation issues on some Linux distributions released before 2019. You can resolve this issue by using the following command:
														
 
															-    ```
														
 
															-    conda create -n mineru python=3.11 -y
														
 
															-    conda activate mineru
														
 
															-    pip install -U "mineru[pipeline_old_linux]"
														
 
															-    ```
														
 
															-    
														
 
															-    Reference: [#1004](https://github.com/opendatalab/MinerU/issues/1004)
														
 
															-
														
 
															-
														
 
															 ??? question "Missing text information in parsing results when installing and using on Linux systems."
														
 
															     MinerU uses `pypdfium2` instead of `pymupdf` as the PDF page rendering engine in versions >=2.0 to resolve AGPLv3 license issues. On some Linux distributions, due to missing CJK fonts, some text may be lost during the process of rendering PDFs to images.
														
--- a/docs/en/quick_start/docker_deployment.md
+++ b/docs/en/quick_start/docker_deployment.md
@@ -6,25 +6,22 @@ MinerU provides a convenient Docker deployment method, which helps quickly set u
 
															 ```bash
														
 
															 wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docker/global/Dockerfile
														
 
															-docker build -t mineru-sglang:latest -f Dockerfile .
														
 
															+docker build -t mineru-vllm:latest -f Dockerfile .
														
 
															 ```
														
 
															 > [!TIP]
														
 
															-> The [Dockerfile](https://github.com/opendatalab/MinerU/blob/master/docker/global/Dockerfile) uses `lmsysorg/sglang:v0.4.10.post2-cu126` as the base image by default, supporting Turing/Ampere/Ada Lovelace/Hopper platforms.
														
 
															-> If you are using the newer `Blackwell` platform, please modify the base image to `lmsysorg/sglang:v0.4.10.post2-cu128-b200` before executing the build operation.
														
 
															+> The [Dockerfile](https://github.com/opendatalab/MinerU/blob/master/docker/global/Dockerfile) uses `vllm/vllm-openai:v0.10.1.1` as the base image by default, supporting Turing/Ampere/Ada Lovelace/Hopper/Blackwell platforms.
														
 
															 ## Docker Description
														
 
															-MinerU's Docker uses `lmsysorg/sglang` as the base image, so it includes the `sglang` inference acceleration framework and necessary dependencies by default. Therefore, on compatible devices, you can directly use `sglang` to accelerate VLM model inference.
														
 
															+MinerU's Docker uses `vllm/vllm-openai` as the base image, so it includes the `vllm` inference acceleration framework and necessary dependencies by default. Therefore, on compatible devices, you can directly use `vllm` to accelerate VLM model inference.
														
 
															 > [!NOTE]
														
 
															-> Requirements for using `sglang` to accelerate VLM model inference:
														
 
															+> Requirements for using `vllm` to accelerate VLM model inference:
														
 
															 > 
														
 
															 > - Device must have Turing architecture or later graphics cards with 8GB+ available VRAM.
														
 
															-> - The host machine's graphics driver should support CUDA 12.6 or higher; `Blackwell` platform should support CUDA 12.8 or higher. You can check the driver version using the `nvidia-smi` command.
														
 
															+> - The host machine's graphics driver should support CUDA 12.8 or higher; You can check the driver version using the `nvidia-smi` command.
														
 
															 > - Docker container must have access to the host machine's graphics devices.
														
 
															->
														
 
															-> If your device doesn't meet the above requirements, you can still use other features of MinerU, but cannot use `sglang` to accelerate VLM model inference, meaning you cannot use the `vlm-sglang-engine` backend or start the `vlm-sglang-server` service.
														
 
															 ## Start Docker Container
														
@@ -33,7 +30,7 @@ docker run --gpus all \
 
															   --shm-size 32g \
														
 
															   -p 30000:30000 -p 7860:7860 -p 8000:8000 \
														
 
															   --ipc=host \
														
 
															-  -it mineru-sglang:latest \
														
 
															+  -it mineru-vllm:latest \
														
 
															   /bin/bash
														
 
															 ```
														
@@ -53,19 +50,19 @@ wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docker/compose.yaml
 
															 >
														
 
															 >- The `compose.yaml` file contains configurations for multiple services of MinerU, you can choose to start specific services as needed.
														
 
															 >- Different services might have additional parameter configurations, which you can view and edit in the `compose.yaml` file.
														
 
															->- Due to the pre-allocation of GPU memory by the `sglang` inference acceleration framework, you may not be able to run multiple `sglang` services simultaneously on the same machine. Therefore, ensure that other services that might use GPU memory have been stopped before starting the `vlm-sglang-server` service or using the `vlm-sglang-engine` backend.
														
 
															+>- Due to the pre-allocation of GPU memory by the `vllm` inference acceleration framework, you may not be able to run multiple `vllm` services simultaneously on the same machine. Therefore, ensure that other services that might use GPU memory have been stopped before starting the `vlm-vllm-server` service or using the `vlm-vllm-engine` backend.
														
 
															 ---
														
 
															-### Start sglang-server service
														
 
															-connect to `sglang-server` via `vlm-sglang-client` backend
														
 
															+### Start vllm-server service
														
 
															+connect to `vllm-server` via `vlm-http-client` backend
														
 
															   ```bash
														
 
															-  docker compose -f compose.yaml --profile sglang-server up -d
														
 
															+  docker compose -f compose.yaml --profile vllm-server up -d
														
 
															   ```
														
 
															   >[!TIP]
														
 
															-  >In another terminal, connect to sglang server via sglang client (only requires CPU and network, no sglang environment needed)
														
 
															+  >In another terminal, connect to vllm server via http client (only requires CPU and network, no vllm environment needed)
														
 
															   > ```bash
														
 
															-  > mineru -p <input_path> -o <output_path> -b vlm-sglang-client -u http://<server_ip>:30000
														
 
															+  > mineru -p <input_path> -o <output_path> -b vlm-http-client -u http://<server_ip>:30000
														
 
															   > ```
														
 
															 ---
														
--- a/docs/en/quick_start/extension_modules.md
+++ b/docs/en/quick_start/extension_modules.md
@@ -4,34 +4,26 @@ MinerU supports installing extension modules on demand based on different needs
 
															 ## Common Scenarios
														
 
															 ### Core Functionality Installation
														
 
															-The `core` module is the core dependency of MinerU, containing all functional modules except `sglang`. Installing this module ensures the basic functionality of MinerU works properly.
														
 
															+The `core` module is the core dependency of MinerU, containing all functional modules except `vllm`. Installing this module ensures the basic functionality of MinerU works properly.
														
 
															 ```bash
														
 
															 uv pip install mineru[core]
														
 
															 ```
														
 
															 ---
														
 
															-### Using `sglang` to Accelerate VLM Model Inference
														
 
															-The `sglang` module provides acceleration support for VLM model inference, suitable for graphics cards with Turing architecture and later (8GB+ VRAM). Installing this module can significantly improve model inference speed.
														
 
															-In the configuration, `all` includes both `core` and `sglang` modules, so `mineru[all]` and `mineru[core,sglang]` are equivalent.
														
 
															+### Using `vllm` to Accelerate VLM Model Inference
														
 
															+The `vllm` module provides acceleration support for VLM model inference, suitable for graphics cards with Turing architecture and later (8GB+ VRAM). Installing this module can significantly improve model inference speed.
														
 
															+In the configuration, `all` includes both `core` and `vllm` modules, so `mineru[all]` and `mineru[core,vllm]` are equivalent.
														
 
															 ```bash
														
 
															 uv pip install mineru[all]
														
 
															 ```
														
 
															 > [!TIP]
														
 
															-> If exceptions occur during installation of the complete package including sglang, please refer to the [sglang official documentation](https://docs.sglang.ai/start/install.html) to try to resolve the issue, or directly use the [Docker](./docker_deployment.md) deployment method.
														
 
															+> If exceptions occur during installation of the complete package including vllm, please refer to the [vllm official documentation](https://docs.vllm.ai/en/latest/getting_started/installation/index.html) to try to resolve the issue, or directly use the [Docker](./docker_deployment.md) deployment method.
														
 
															 ---
														
 
															 ### Installing Lightweight Client to Connect to sglang-server
														
 
															-If you need to install a lightweight client on edge devices to connect to `sglang-server`, you can install the basic mineru package, which is very lightweight and suitable for devices with only CPU and network connectivity.
														
 
															+If you need to install a lightweight client on edge devices to connect to `vllm-server`, you can install the basic mineru package, which is very lightweight and suitable for devices with only CPU and network connectivity.
														
 
															 ```bash
														
 
															 uv pip install mineru
														
 
															 ```
														
 
															-
														
 
															----
														
 
															-
														
 
															-### Using Pipeline Backend on Outdated Linux Systems
														
 
															-If your system is too outdated to meet the dependency requirements of `mineru[core]`, this option can minimally meet MinerU's runtime requirements, suitable for old systems that cannot be upgraded and only need to use the pipeline backend.
														
 
															-```bash
														
 
															-uv pip install mineru[pipeline_old_linux]
														
 
															-```
														
--- a/docs/en/quick_start/index.md
+++ b/docs/en/quick_start/index.md
@@ -31,7 +31,7 @@ A WebUI developed based on Gradio, with a simple interface and only core parsing
 
															         <td>Parsing Backend</td>
														
 
															         <td>pipeline</td>
														
 
															         <td>vlm-transformers</td>
														
 
															-        <td>vlm-sglang</td>
														
 
															+        <td>vlm-vllm</td>
														
 
															     </tr>
														
 
															     <tr>
														
 
															         <td>Operating System</td>
														
@@ -80,8 +80,8 @@ uv pip install -e .[core]
 
															 ```
														
 
															 > [!TIP]
														
 
															-> `mineru[core]` includes all core features except `sglang` acceleration, compatible with Windows / Linux / macOS systems, suitable for most users.
														
 
															-> If you need to use `sglang` acceleration for VLM model inference or install a lightweight client on edge devices, please refer to the documentation [Extension Modules Installation Guide](./extension_modules.md).
														
 
															+> `mineru[core]` includes all core features except `vllm` acceleration, compatible with Windows / Linux / macOS systems, suitable for most users.
														
 
															+> If you need to use `vllm` acceleration for VLM model inference or install a lightweight client on edge devices, please refer to the documentation [Extension Modules Installation Guide](./extension_modules.md).
														
 
															 ---
														
--- a/docs/en/reference/output_files.md
+++ b/docs/en/reference/output_files.md
@@ -165,49 +165,51 @@ inference_result: list[PageInferenceResults] = []
 
															 ]
														
 
															 ```
														
 
															-### VLM Output Results (model_output.txt)
														
 
															+### VLM Output Results (model.json)
														
 
															 > [!NOTE]
														
 
															 > Only applicable to VLM backend
														
 
															-**File naming format**: `{original_filename}_model_output.txt`
														
 
															+**File naming format**: `{original_filename}_model.json`
														
 
															 #### File Format Description
														
 
															-- Uses `----` to separate output results for each page
														
 
															-- Each page contains multiple text blocks starting with `<|box_start|>` and ending with `<|md_end|>`
														
 
															-
														
 
															-#### Field Meanings
														
 
															+- This file contains the raw output results from the VLM model, with two nested list layers: the outer layer represents pages, and the inner layer represents content blocks for each page
														
 
															+- Each content block is a dict containing `type`, `bbox`, `angle`, and `content` fields
														
 
															-| Tag | Format | Description |
														
 
															-|-----|--------|-------------|
														
 
															-| Bounding box | `<\|box_start\|>x0 y0 x1 y1<\|box_end\|>` | Quadrilateral coordinates (top-left, bottom-right points), coordinate values after scaling page to 1000×1000 |
														
 
															-| Type tag | `<\|ref_start\|>type<\|ref_end\|>` | Content block type identifier |
														
 
															-| Content | `<\|md_start\|>markdown content<\|md_end\|>` | Markdown content of the block |
														
 
															 #### Supported Content Types
														
 
															 ```json
														
 
															 {
														
 
															-    "text": "Text",
														
 
															-    "title": "Title", 
														
 
															-    "image": "Image",
														
 
															-    "image_caption": "Image caption",
														
 
															-    "image_footnote": "Image footnote",
														
 
															-    "table": "Table",
														
 
															-    "table_caption": "Table caption", 
														
 
															-    "table_footnote": "Table footnote",
														
 
															-    "equation": "Interline formula"
														
 
															+    "text",
														
 
															+    "title", 
														
 
															+    "equation",
														
 
															+    "image",
														
 
															+    "image_caption",
														
 
															+    "image_footnote",
														
 
															+    "table",
														
 
															+    "table_caption",
														
 
															+    "table_footnote",
														
 
															+    "phonetic",
														
 
															+    "code",
														
 
															+    "code_caption",
														
 
															+    "ref_text",
														
 
															+    "algorithm",
														
 
															+    "list",
														
 
															+    "header",
														
 
															+    "footer",
														
 
															+    "page_number",
														
 
															+    "aside_text", 
														
 
															+    "page_footnote", 
														
 
															 }
														
 
															 ```
														
 
															-#### Special Tags
														
 
															-
														
 
															-- `<|txt_contd|>`: Appears at the end of text, indicating that this text block can be connected with subsequent text blocks
														
 
															-- Table content uses `otsl` format and needs to be converted to HTML for rendering in Markdown
														
 
															-
														
 
															 ### Intermediate Processing Results (middle.json)
														
 
															+> [!NOTE]
														
 
															+> Only applicable to pipeline backend
														
 
															+
														
 
															 **File naming format**: `{original_filename}_middle.json`
														
 
															 #### Top-level Structure
														
@@ -390,6 +392,9 @@ Level 1 blocks (table | image)
 
															 ### Content List (content_list.json)
														
 
															+> [!NOTE]
														
 
															+> Only applicable to pipeline backend
														
 
															+
														
 
															 **File naming format**: `{original_filename}_content_list.json`
														
 
															 #### Functionality
														
--- a/docs/en/usage/advanced_cli_parameters.md
+++ b/docs/en/usage/advanced_cli_parameters.md
@@ -1,25 +1,17 @@
 
															 # Advanced Command Line Parameters
														
 
															-## SGLang Acceleration Parameter Optimization
														
 
															-
														
 
															-### Memory Optimization Parameters
														
 
															-> [!TIP]
														
 
															-> SGLang acceleration mode currently supports running on Turing architecture graphics cards with a minimum of 8GB VRAM, but graphics cards with <24GB VRAM may encounter insufficient memory issues. You can optimize memory usage with the following parameters:
														
 
															-> 
														
 
															-> - If you encounter insufficient VRAM when using a single graphics card, you may need to reduce the KV cache size with `--mem-fraction-static 0.5`. If VRAM issues persist, try reducing it further to `0.4` or lower.
														
 
															-> - If you have two or more graphics cards, you can try using tensor parallelism (TP) mode to simply expand available VRAM: `--tp-size 2`
														
 
															+## vllm Acceleration Parameter Optimization
														
 
															 ### Performance Optimization Parameters
														
 
															 > [!TIP]
														
 
															-> If you can already use SGLang normally for accelerated VLM model inference but still want to further improve inference speed, you can try the following parameters:
														
 
															+> If you can already use vllm normally for accelerated VLM model inference but still want to further improve inference speed, you can try the following parameters:
														
 
															 > 
														
 
															-> - If you have multiple graphics cards, you can use SGLang's multi-card parallel mode to increase throughput: `--dp-size 2`
														
 
															-> - You can also enable `torch.compile` to accelerate inference speed by approximately 15%: `--enable-torch-compile`
														
 
															+> - If you have multiple graphics cards, you can use vllm's multi-card parallel mode to increase throughput: `--data-parallel-size 2`
														
 
															 ### Parameter Passing Instructions
														
 
															 > [!TIP]
														
 
															-> - All officially supported SGLang parameters can be passed to MinerU through command line arguments, including the following commands: `mineru`, `mineru-sglang-server`, `mineru-gradio`, `mineru-api`
														
 
															-> - If you want to learn more about `sglang` parameter usage, please refer to the [SGLang official documentation](https://docs.sglang.ai/backend/server_arguments.html#common-launch-commands)
														
 
															+> - All officially supported vllm parameters can be passed to MinerU through command line arguments, including the following commands: `mineru`, `mineru-vllm-server`, `mineru-gradio`, `mineru-api`
														
 
															+> - If you want to learn more about `vllm` parameter usage, please refer to the [vllm official documentation](https://docs.vllm.ai/en/latest/cli/serve.html)
														
 
															 ## GPU Device Selection and Configuration
														
@@ -29,7 +21,7 @@
 
															 >   ```bash
														
 
															 >   CUDA_VISIBLE_DEVICES=1 mineru -p <input_path> -o <output_path>
														
 
															 >   ```
														
 
															-> - This specification method is effective for all command line calls, including `mineru`, `mineru-sglang-server`, `mineru-gradio`, and `mineru-api`, and applies to both `pipeline` and `vlm` backends.
														
 
															+> - This specification method is effective for all command line calls, including `mineru`, `mineru-vllm-server`, `mineru-gradio`, and `mineru-api`, and applies to both `pipeline` and `vlm` backends.
														
 
															 ### Common Device Configuration Examples
														
 
															 > [!TIP]
														
@@ -46,14 +38,9 @@
 
															 > [!TIP]
														
 
															 > Here are some possible usage scenarios:
														
 
															 > 
														
 
															-> - If you have multiple graphics cards and need to specify cards 0 and 1, using multi-card parallelism to start `sglang-server`, you can use the following command:
														
 
															->   ```bash
														
 
															->   CUDA_VISIBLE_DEVICES=0,1 mineru-sglang-server --port 30000 --dp-size 2
														
 
															->   ```
														
 
															-> 
														
 
															-> - If you have multiple GPUs and need to specify GPU 0–3, and start the `sglang-server` using multi-GPU data parallelism and tensor parallelism, you can use the following command:
														
 
															+> - If you have multiple graphics cards and need to specify cards 0 and 1, using multi-card parallelism to start `vllm-server`, you can use the following command:
														
 
															 >   ```bash
														
 
															->   CUDA_VISIBLE_DEVICES=0,1,2,3 mineru-sglang-server --port 30000 --dp-size 2 --tp-size 2
														
 
															+>   CUDA_VISIBLE_DEVICES=0,1 mineru-sglang-server --port 30000 --data-parallel-size 2
														
 
															 >   ```
														
 
															 >       
														
 
															 > - If you have multiple graphics cards and need to start two `fastapi` services on cards 0 and 1, listening on different ports respectively, you can use the following commands:
														
--- a/docs/en/usage/cli_tools.md
+++ b/docs/en/usage/cli_tools.md
@@ -11,11 +11,11 @@ Options:
 
															   -p, --path PATH                 Input file path or directory (required)
														
 
															   -o, --output PATH               Output directory (required)
														
 
															   -m, --method [auto|txt|ocr]     Parsing method: auto (default), txt, ocr (pipeline backend only)
														
 
															-  -b, --backend [pipeline|vlm-transformers|vlm-sglang-engine|vlm-sglang-client]
														
 
															+  -b, --backend [pipeline|vlm-transformers|vlm-vllm-engine|vlm-http-client]
														
 
															                                   Parsing backend (default: pipeline)
														
 
															   -l, --lang [ch|ch_server|ch_lite|en|korean|japan|chinese_cht|ta|te|ka|th|el|latin|arabic|east_slavic|cyrillic|devanagari]
														
 
															                                   Specify document language (improves OCR accuracy, pipeline backend only)
														
 
															-  -u, --url TEXT                  Service address when using sglang-client
														
 
															+  -u, --url TEXT                  Service address when using http-client
														
 
															   -s, --start INTEGER             Starting page number for parsing (0-based)
														
 
															   -e, --end INTEGER               Ending page number for parsing (0-based)
														
 
															   -f, --formula BOOLEAN           Enable formula parsing (default: enabled)
														
@@ -45,7 +45,7 @@ Options:
 
															                                   files to be input need to be placed in the
														
 
															                                   `example` folder within the directory where
														
 
															                                   the command is currently executed.
														
 
															-  --enable-sglang-engine BOOLEAN  Enable SgLang engine backend for faster
														
 
															+  --enable-vllm-engine BOOLEAN  Enable vllm engine backend for faster
														
 
															                                   processing.
														
 
															   --enable-api BOOLEAN            Enable gradio API for serving the
														
 
															                                   application.
														
--- a/docs/en/usage/quick_usage.md
+++ b/docs/en/usage/quick_usage.md
@@ -33,7 +33,7 @@ mineru -p <input_path> -o <output_path> -b vlm-transformers
 
															 If you need to adjust parsing options through custom parameters, you can also check the more detailed [Command Line Tools Usage Instructions](./cli_tools.md) in the documentation.
														
 
															-## Advanced Usage via API, WebUI, sglang-client/server
														
 
															+## Advanced Usage via API, WebUI, http-client/server
														
 
															 - Direct Python API calls: [Python Usage Example](https://github.com/opendatalab/MinerU/blob/master/demo/demo.py)
														
 
															 - FastAPI calls:
														
@@ -44,29 +44,29 @@ If you need to adjust parsing options through custom parameters, you can also ch
 
															   >Access `http://127.0.0.1:8000/docs` in your browser to view the API documentation.
														
 
															 - Start Gradio WebUI visual frontend:
														
 
															   ```bash
														
 
															-  # Using pipeline/vlm-transformers/vlm-sglang-client backends
														
 
															+  # Using pipeline/vlm-transformers/vlm-vllm-client backends
														
 
															   mineru-gradio --server-name 0.0.0.0 --server-port 7860
														
 
															-  # Or using vlm-sglang-engine/pipeline backends (requires sglang environment)
														
 
															-  mineru-gradio --server-name 0.0.0.0 --server-port 7860 --enable-sglang-engine true
														
 
															+  # Or using vlm-vllm-engine/pipeline backends (requires vllm environment)
														
 
															+  mineru-gradio --server-name 0.0.0.0 --server-port 7860 --enable-vllm-engine true
														
 
															   ```
														
 
															   >[!TIP]
														
 
															   >
														
 
															   >- Access `http://127.0.0.1:7860` in your browser to use the Gradio WebUI.
														
 
															   >- Access `http://127.0.0.1:7860/?view=api` to use the Gradio API.
														
 
															-- Using `sglang-client/server` method:
														
 
															+- Using `http-client/server` method:
														
 
															   ```bash
														
 
															-  # Start sglang server (requires sglang environment)
														
 
															-  mineru-sglang-server --port 30000
														
 
															+  # Start vllm server (requires vllm environment)
														
 
															+  mineru-vllm-server --port 30000
														
 
															   ``` 
														
 
															   >[!TIP]
														
 
															-  >In another terminal, connect to sglang server via sglang client (only requires CPU and network, no sglang environment needed)
														
 
															+  >In another terminal, connect to vllm server via http client (only requires CPU and network, no vllm environment needed)
														
 
															   > ```bash
														
 
															-  > mineru -p <input_path> -o <output_path> -b vlm-sglang-client -u http://127.0.0.1:30000
														
 
															+  > mineru -p <input_path> -o <output_path> -b vlm-http-client -u http://127.0.0.1:30000
														
 
															   > ```
														
 
															 > [!NOTE]
														
 
															-> All officially supported sglang parameters can be passed to MinerU through command line arguments, including the following commands: `mineru`, `mineru-sglang-server`, `mineru-gradio`, `mineru-api`.
														
 
															-> We have compiled some commonly used parameters and usage methods for `sglang`, which can be found in the documentation [Advanced Command Line Parameters](./advanced_cli_parameters.md).
														
 
															+> All officially supported vllm parameters can be passed to MinerU through command line arguments, including the following commands: `mineru`, `mineru-vllm-server`, `mineru-gradio`, `mineru-api`.
														
 
															+> We have compiled some commonly used parameters and usage methods for `vllm`, which can be found in the documentation [Advanced Command Line Parameters](./advanced_cli_parameters.md).
														
 
															 ## Extending MinerU Functionality with Configuration Files
														
--- a/docs/zh/faq/index.md
+++ b/docs/zh/faq/index.md
@@ -14,18 +14,6 @@
 
															     参考：[#388](https://github.com/opendatalab/MinerU/issues/388)
														
 
															-
														
 
															-??? question "在 CentOS 7 或 Ubuntu 18 系统安装MinerU时报错`ERROR: Failed building wheel for simsimd`"
														
 
															-
														
 
															-    新版本albumentations(1.4.21)引入了依赖simsimd,由于simsimd在linux的预编译包要求glibc的版本大于等于2.28，导致部分2019年之前发布的Linux发行版无法正常安装，可通过如下命令安装:
														
 
															-    ```
														
 
															-    conda create -n mineru python=3.11 -y
														
 
															-    conda activate mineru
														
 
															-    pip install -U "mineru[pipeline_old_linux]"
														
 
															-    ```
														
 
															-    
														
 
															-    参考：[#1004](https://github.com/opendatalab/MinerU/issues/1004)
														
 
															-
														
 
															 ??? question "在 Linux 系统安装并使用时，解析结果缺失部份文字信息。"
														
 
															     MinerU在>=2.0的版本中使用`pypdfium2`代替`pymupdf`作为PDF页面的渲染引擎，以解决AGPLv3的许可证问题，在某些Linux发行版，由于缺少CJK字体，可能会在将PDF渲染成图片的过程中丢失部份文字。
														
--- a/docs/zh/quick_start/docker_deployment.md
+++ b/docs/zh/quick_start/docker_deployment.md
@@ -6,24 +6,22 @@ MinerU提供了便捷的docker部署方式，这有助于快速搭建环境并
 
															 ```bash
														
 
															 wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docker/china/Dockerfile
														
 
															-docker build -t mineru-sglang:latest -f Dockerfile .
														
 
															+docker build -t mineru-vllm:latest -f Dockerfile .
														
 
															 ```
														
 
															 > [!TIP]
														
 
															-> [Dockerfile](https://github.com/opendatalab/MinerU/blob/master/docker/china/Dockerfile)默认使用`lmsysorg/sglang:v0.4.10.post2-cu126`作为基础镜像，支持Turing/Ampere/Ada Lovelace/Hopper平台，
														
 
															-> 如您使用较新的`Blackwell`平台，请将基础镜像修改为`lmsysorg/sglang:v0.4.10.post2-cu128-b200` 再执行build操作。
														
 
															+> [Dockerfile](https://github.com/opendatalab/MinerU/blob/master/docker/china/Dockerfile)默认使用`vllm/vllm-openai:v0.10.1.1`作为基础镜像，支持Turing/Ampere/Ada Lovelace/Hopper/Blackwell平台，
														
 
															 ## Docker说明
														
 
															-Mineru的docker使用了`lmsysorg/sglang`作为基础镜像，因此在docker中默认集成了`sglang`推理加速框架和必需的依赖环境。因此在满足条件的设备上，您可以直接使用`sglang`加速VLM模型推理。
														
 
															+Mineru的docker使用了`vllm/vllm-openai`作为基础镜像，因此在docker中默认集成了`vllm`推理加速框架和必需的依赖环境。因此在满足条件的设备上，您可以直接使用`vllm`加速VLM模型推理。
														
 
															 > [!NOTE]
														
 
															-> 使用`sglang`加速VLM模型推理需要满足的条件是：
														
 
															+> 使用`vllm`加速VLM模型推理需要满足的条件是：
														
 
															 > 
														
 
															 > - 设备包含Turing及以后架构的显卡，且可用显存大于等于8G。
														
 
															-> - 物理机的显卡驱动应支持CUDA 12.6或更高版本，`Blackwell`平台应支持CUDA 12.8及更高版本，可通过`nvidia-smi`命令检查驱动版本。
														
 
															+> - 物理机的显卡驱动应支持CUDA 12.8或更高版本，可通过`nvidia-smi`命令检查驱动版本。
														
 
															 > - docker中能够访问物理机的显卡设备。
														
 
															->
														
 
															-> 如果您的设备不满足上述条件，您仍然可以使用MinerU的其他功能，但无法使用`sglang`加速VLM模型推理，即无法使用`vlm-sglang-engine`后端和启动`vlm-sglang-server`服务。
														
 
															+
														
 
															 ## 启动 Docker 容器
														
@@ -32,7 +30,7 @@ docker run --gpus all \
 
															   --shm-size 32g \
														
 
															   -p 30000:30000 -p 7860:7860 -p 8000:8000 \
														
 
															   --ipc=host \
														
 
															-  -it mineru-sglang:latest \
														
 
															+  -it mineru-vllm:latest \
														
 
															   /bin/bash
														
 
															 ```
														
@@ -51,19 +49,19 @@ wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docker/compose.yaml
 
															 >  
														
 
															 >- `compose.yaml`文件中包含了MinerU的多个服务配置，您可以根据需要选择启动特定的服务。
														
 
															 >- 不同的服务可能会有额外的参数配置，您可以在`compose.yaml`文件中查看并编辑。
														
 
															->- 由于`sglang`推理加速框架预分配显存的特性，您可能无法在同一台机器上同时运行多个`sglang`服务，因此请确保在启动`vlm-sglang-server`服务或使用`vlm-sglang-engine`后端时，其他可能使用显存的服务已停止。
														
 
															+>- 由于`vllm`推理加速框架预分配显存的特性，您可能无法在同一台机器上同时运行多个`vllm`服务，因此请确保在启动`vlm-vllm-server`服务或使用`vlm-vllm-engine`后端时，其他可能使用显存的服务已停止。
														
 
															 ---
														
 
															-### 启动 sglang-server 服务
														
 
															-并通过`vlm-sglang-client`后端连接`sglang-server`
														
 
															+### 启动 vllm-server 服务
														
 
															+并通过`vlm-http-client`后端连接`vllm-server`
														
 
															   ```bash
														
 
															-  docker compose -f compose.yaml --profile sglang-server up -d
														
 
															+  docker compose -f compose.yaml --profile vllm-server up -d
														
 
															   ```
														
 
															   >[!TIP]
														
 
															-  >在另一个终端中通过sglang client连接sglang server（只需cpu与网络，不需要sglang环境）
														
 
															+  >在另一个终端中通过http client连接vllm server（只需cpu与网络，不需要vllm环境）
														
 
															   > ```bash
														
 
															-  > mineru -p <input_path> -o <output_path> -b vlm-sglang-client -u http://<server_ip>:30000
														
 
															+  > mineru -p <input_path> -o <output_path> -b vlm-http-client -u http://<server_ip>:30000
														
 
															   > ```
														
 
															 ---
														
--- a/docs/zh/quick_start/extension_modules.md
+++ b/docs/zh/quick_start/extension_modules.md
@@ -4,34 +4,26 @@ MinerU 支持根据不同需求，按需安装扩展模块，以增强功能或
 
															 ## 常见场景
														
 
															 ### 核心功能安装
														
 
															-`core` 模块是 MinerU 的核心依赖，包含了除`sglang`外的所有功能模块。安装此模块可以确保 MinerU 的基本功能正常运行。
														
 
															+`core` 模块是 MinerU 的核心依赖，包含了除`vllm`外的所有功能模块。安装此模块可以确保 MinerU 的基本功能正常运行。
														
 
															 ```bash
														
 
															 uv pip install mineru[core]
														
 
															 ```
														
 
															 ---
														
 
															-### 使用`sglang`加速 VLM 模型推理
														
 
															-`sglang` 模块提供了对 VLM 模型推理的加速支持，适用于具有 Turing 及以后架构的显卡（8G 显存及以上）。安装此模块可以显著提升模型推理速度。
														
 
															-在配置中，`all`包含了`core`和`sglang`模块，因此`mineru[all]`和`mineru[core,sglang]`是等价的。
														
 
															+### 使用`vllm`加速 VLM 模型推理
														
 
															+`vllm` 模块提供了对 VLM 模型推理的加速支持，适用于具有 Turing 及以后架构的显卡（8G 显存及以上）。安装此模块可以显著提升模型推理速度。
														
 
															+在配置中，`all`包含了`core`和`vllm`模块，因此`mineru[all]`和`mineru[core,vllm]`是等价的。
														
 
															 ```bash
														
 
															 uv pip install mineru[all]
														
 
															 ```
														
 
															 > [!TIP]
														
 
															-> 如在安装包含sglang的完整包过程中发生异常，请参考 [sglang 官方文档](https://docs.sglang.ai/start/install.html) 尝试解决，或直接使用 [Docker](./docker_deployment.md) 方式部署镜像。
														
 
															+> 如在安装包含vllm的完整包过程中发生异常，请参考 [vllm 官方文档](https://docs.vllm.ai/en/latest/getting_started/installation/index.html) 尝试解决，或直接使用 [Docker](./docker_deployment.md) 方式部署镜像。
														
 
															 ---
														
 
															-### 安装轻量版client连接sglang-server使用
														
 
															-如果您需要在边缘设备上安装轻量版的 client 端以连接 `sglang-server`，可以安装mineru的基础包，非常轻量，适合在只有cpu和网络连接的设备上使用。
														
 
															+### 安装轻量版client连接vllm-server使用
														
 
															+如果您需要在边缘设备上安装轻量版的 client 端以连接 `vllm-server`，可以安装mineru的基础包，非常轻量，适合在只有cpu和网络连接的设备上使用。
														
 
															 ```bash
														
 
															 uv pip install mineru
														
 
															 ```
														
 
															-
														
 
															----
														
 
															-
														
 
															-### 在过时的linux系统上使用pipeline后端
														
 
															-如果您的系统过于陈旧，无法满足`mineru[core]`的依赖要求，该选项可以最低限度的满足 MinerU 的运行需求，适用于老旧系统无法升级且仅需使用 pipeline 后端的场景。
														
 
															-```bash
														
 
															-uv pip install mineru[pipeline_old_linux]
														
 
															-```
														
--- a/docs/zh/quick_start/index.md
+++ b/docs/zh/quick_start/index.md
@@ -31,7 +31,7 @@
 
															         <td>解析后端</td>
														
 
															         <td>pipeline</td>
														
 
															         <td>vlm-transformers</td>
														
 
															-        <td>vlm-sglang</td>
														
 
															+        <td>vlm-vllm</td>
														
 
															     </tr>
														
 
															     <tr>
														
 
															         <td>操作系统</td>
														
@@ -80,8 +80,8 @@ uv pip install -e .[core] -i https://mirrors.aliyun.com/pypi/simple
 
															 ```
														
 
															 > [!TIP]
														
 
															-> `mineru[core]`包含除`sglang`加速外的所有核心功能，兼容Windows / Linux / macOS系统，适合绝大多数用户。
														
 
															-> 如果您有使用`sglang`加速VLM模型推理，或是在边缘设备安装轻量版client端等需求，可以参考文档[扩展模块安装指南](./extension_modules.md)。
														
 
															+> `mineru[core]`包含除`vllm`加速外的所有核心功能，兼容Windows / Linux / macOS系统，适合绝大多数用户。
														
 
															+> 如果您有使用`vllm`加速VLM模型推理，或是在边缘设备安装轻量版client端等需求，可以参考文档[扩展模块安装指南](./extension_modules.md)。
														
 
															 ---
														
--- a/docs/zh/reference/output_files.md
+++ b/docs/zh/reference/output_files.md
@@ -165,49 +165,52 @@ inference_result: list[PageInferenceResults] = []
 
															 ]
														
 
															 ```
														
 
															-### VLM 输出结果 (model_output.txt)
														
 
															+### VLM 输出结果 (model.json)
														
 
															 > [!NOTE]
														
 
															 > 仅适用于 VLM 后端
														
 
															-**文件命名格式**：`{原文件名}_model_output.txt`
														
 
															+**文件命名格式**：`{原文件名}_model.json`
														
 
															 #### 文件格式说明
														
 
															-- 使用 `----` 分割每一页的输出结果
														
 
															-- 每页包含多个以 `<|box_start|>` 开头、`<|md_end|>` 结尾的文本块
														
 
															-
														
 
															-#### 字段含义
														
 
															+- 该文件为 VLM 模型的原始输出结果，包含两层嵌套list，外层表示页面，内层表示该页的内容块
														
 
															+- 每个内容块都是一个dict，包含 `type`、`bbox`、`angle`、`content` 字段
														
 
															-| 标记 | 格式 | 说明 |
														
 
															-|------|---|------|
														
 
															-| 边界框 | `<\|box_start\|>x0 y0 x1 y1<\|box_end\|>` | 四边形坐标（左上、右下两点），页面缩放至 1000×1000 后的坐标值 |
														
 
															-| 类型标记 | `<\|ref_start\|>type<\|ref_end\|>` | 内容块类型标识 |
														
 
															-| 内容 | `<\|md_start\|>markdown内容<\|md_end\|>` | 该块的 Markdown 内容 |
														
 
															 #### 支持的内容类型
														
 
															 ```json
														
 
															 {
														
 
															-    "text": "文本",
														
 
															-    "title": "标题", 
														
 
															-    "image": "图片",
														
 
															-    "image_caption": "图片描述",
														
 
															-    "image_footnote": "图片脚注",
														
 
															-    "table": "表格",
														
 
															-    "table_caption": "表格描述", 
														
 
															-    "table_footnote": "表格脚注",
														
 
															-    "equation": "行间公式"
														
 
															+    "text",
														
 
															+    "title", 
														
 
															+    "equation",
														
 
															+    "image",
														
 
															+    "image_caption",
														
 
															+    "image_footnote",
														
 
															+    "table",
														
 
															+    "table_caption",
														
 
															+    "table_footnote",
														
 
															+    "phonetic",
														
 
															+    "code",
														
 
															+    "code_caption",
														
 
															+    "ref_text",
														
 
															+    "algorithm",
														
 
															+    "list",
														
 
															+    "header",
														
 
															+    "footer",
														
 
															+    "page_number",
														
 
															+    "aside_text", 
														
 
															+    "page_footnote", 
														
 
															 }
														
 
															 ```
														
 
															-#### 特殊标记
														
 
															-
														
 
															-- `<|txt_contd|>`：出现在文本末尾，表示该文本块可与后续文本块连接
														
 
															-- 表格内容采用 `otsl` 格式，需转换为 HTML 才能在 Markdown 中渲染
														
 
															 ### 中间处理结果 (middle.json)
														
 
															+> [!NOTE]
														
 
															+> 仅适用于 pipeline 后端
														
 
															+
														
 
															 **文件命名格式**：`{原文件名}_middle.json`
														
 
															 #### 顶层结构
														
@@ -390,6 +393,9 @@ inference_result: list[PageInferenceResults] = []
 
															 ### 内容列表 (content_list.json)
														
 
															+> [!NOTE]
														
 
															+> 仅适用于 pipeline 后端
														
 
															+
														
 
															 **文件命名格式**：`{原文件名}_content_list.json`
														
 
															 #### 功能说明
														
--- a/docs/zh/usage/advanced_cli_parameters.md
+++ b/docs/zh/usage/advanced_cli_parameters.md
@@ -1,25 +1,17 @@
 
															 # 命令行参数进阶
														
 
															-## SGLang 加速参数优化
														
 
															-
														
 
															-### 显存优化参数
														
 
															-> [!TIP]
														
 
															-> sglang加速模式目前支持在最低8G显存的Turing架构显卡上运行，但在显存<24G的显卡上可能会遇到显存不足的问题, 可以通过使用以下参数来优化显存使用：
														
 
															-> 
														
 
															-> - 如果您使用单张显卡遇到显存不足的情况时，可能需要调低KV缓存大小，`--mem-fraction-static 0.5`，如仍出现显存不足问题，可尝试进一步降低到`0.4`或更低
														
 
															-> - 如您有两张以上显卡，可尝试通过张量并行（TP）模式简单扩充可用显存：`--tp-size 2`
														
 
															+## vllm 加速参数优化
														
 
															 ### 性能优化参数
														
 
															 > [!TIP]
														
 
															-> 如果您已经可以正常使用sglang对vlm模型进行加速推理，但仍然希望进一步提升推理速度，可以尝试以下参数：
														
 
															+> 如果您已经可以正常使用vllm对vlm模型进行加速推理，但仍然希望进一步提升推理速度，可以尝试以下参数：
														
 
															 > 
														
 
															-> - 如果您有超过多张显卡，可以使用sglang的多卡并行模式来增加吞吐量：`--dp-size 2`
														
 
															-> - 同时您可以启用`torch.compile`来将推理速度加速约15%：`--enable-torch-compile`
														
 
															+> - 如果您有超过多张显卡，可以使用vllm的多卡并行模式来增加吞吐量：`--data-parallel-size 2`
														
 
															 ### 参数传递说明
														
 
															 > [!TIP]
														
 
															-> - 所有sglang官方支持的参数都可用通过命令行参数传递给 MinerU，包括以下命令:`mineru`、`mineru-sglang-server`、`mineru-gradio`、`mineru-api`
														
 
															-> - 如果您想了解更多有关`sglang`的参数使用方法，请参考 [sglang官方文档](https://docs.sglang.ai/backend/server_arguments.html#common-launch-commands)
														
 
															+> - 所有vllm官方支持的参数都可用通过命令行参数传递给 MinerU，包括以下命令:`mineru`、`mineru-vllm-server`、`mineru-gradio`、`mineru-api`
														
 
															+> - 如果您想了解更多有关`vllm`的参数使用方法，请参考 [vllm官方文档](https://docs.vllm.ai/en/latest/cli/serve.html)
														
 
															 ## GPU 设备选择与配置
														
@@ -29,7 +21,7 @@
 
															 >   ```bash
														
 
															 >   CUDA_VISIBLE_DEVICES=1 mineru -p <input_path> -o <output_path>
														
 
															 >   ```
														
 
															-> - 这种指定方式对所有的命令行调用都有效，包括 `mineru`、`mineru-sglang-server`、`mineru-gradio` 和 `mineru-api`，且对`pipeline`、`vlm`后端均适用。
														
 
															+> - 这种指定方式对所有的命令行调用都有效，包括 `mineru`、`mineru-vllm-server`、`mineru-gradio` 和 `mineru-api`，且对`pipeline`、`vlm`后端均适用。
														
 
															 ### 常见设备配置示例
														
 
															 > [!TIP]
														
@@ -47,14 +39,9 @@
 
															 > [!TIP]
														
 
															 > 以下是一些可能的使用场景：
														
 
															 > 
														
 
															-> - 如果您有多张显卡，需要指定卡0和卡1，并使用多卡并行来启动`sglang-server`，可以使用以下命令： 
														
 
															->   ```bash
														
 
															->   CUDA_VISIBLE_DEVICES=0,1 mineru-sglang-server --port 30000 --dp-size 2
														
 
															->   ```
														
 
															->   
														
 
															-> - 如果您有多张显卡，需要指定卡0-3，并使用多卡数据并行和张量并行来启动`sglang-server`，可以使用以下命令： 
														
 
															+> - 如果您有多张显卡，需要指定卡0和卡1，并使用多卡并行来启动`vllm-server`，可以使用以下命令： 
														
 
															 >   ```bash
														
 
															->   CUDA_VISIBLE_DEVICES=0,1,2,3 mineru-sglang-server --port 30000 --dp-size 2 --tp-size 2
														
 
															+>   CUDA_VISIBLE_DEVICES=0,1 mineru-vllm-server --port 30000 --data-parallel-size 2
														
 
															 >   ```
														
 
															 >   
														
 
															 > - 如果您有多张显卡，需要在卡0和卡1上启动两个`fastapi`服务，并分别监听不同的端口，可以使用以下命令： 
														
--- a/docs/zh/usage/cli_tools.md
+++ b/docs/zh/usage/cli_tools.md
@@ -11,11 +11,11 @@ Options:
 
															   -p, --path PATH                 输入文件路径或目录（必填）
														
 
															   -o, --output PATH               输出目录（必填）
														
 
															   -m, --method [auto|txt|ocr]     解析方法：auto（默认）、txt、ocr（仅用于 pipeline 后端）
														
 
															-  -b, --backend [pipeline|vlm-transformers|vlm-sglang-engine|vlm-sglang-client]
														
 
															+  -b, --backend [pipeline|vlm-transformers|vlm-vllm-engine|vlm-http-client]
														
 
															                                   解析后端（默认为 pipeline）
														
 
															   -l, --lang [ch|ch_server|ch_lite|en|korean|japan|chinese_cht|ta|te|ka|th|el|latin|arabic|east_slavic|cyrillic|devanagari]
														
 
															                                   指定文档语言（可提升 OCR 准确率，仅用于 pipeline 后端）
														
 
															-  -u, --url TEXT                  当使用 sglang-client 时，需指定服务地址
														
 
															+  -u, --url TEXT                  当使用 http-client 时，需指定服务地址
														
 
															   -s, --start INTEGER             开始解析的页码（从 0 开始）
														
 
															   -e, --end INTEGER               结束解析的页码（从 0 开始）
														
 
															   -f, --formula BOOLEAN           是否启用公式解析（默认开启）
														
@@ -43,7 +43,7 @@ Usage: mineru-gradio [OPTIONS]
 
															 Options:
														
 
															   --enable-example BOOLEAN        启用示例文件输入(需要将示例文件放置在当前
														
 
															                                   执行命令目录下的 `example` 文件夹中)
														
 
															-  --enable-sglang-engine BOOLEAN  启用 SgLang 引擎后端以提高处理速度
														
 
															+  --enable-vllm-engine BOOLEAN  启用 vllm 引擎后端以提高处理速度
														
 
															   --enable-api BOOLEAN            启用 Gradio API 以提供应用程序服务
														
 
															   --max-convert-pages INTEGER     设置从 PDF 转换为 Markdown 的最大页数
														
 
															   --server-name TEXT              设置 Gradio 应用程序的服务器主机名
														
--- a/docs/zh/usage/quick_usage.md
+++ b/docs/zh/usage/quick_usage.md
@@ -28,11 +28,11 @@ mineru -p <input_path> -o <output_path>
 
															 mineru -p <input_path> -o <output_path> -b vlm-transformers
														
 
															 ```
														
 
															 > [!TIP]
														
 
															-> vlm后端另外支持`sglang`加速，与`transformers`后端相比，`sglang`的加速比可达20～30倍，可以在[扩展模块安装指南](../quick_start/extension_modules.md)中查看支持`sglang`加速的完整包安装方法。
														
 
															+> vlm后端另外支持`vllm`加速，与`transformers`后端相比，`vllm`的加速比可达20～30倍，可以在[扩展模块安装指南](../quick_start/extension_modules.md)中查看支持`vllm`加速的完整包安装方法。
														
 
															 如果需要通过自定义参数调整解析选项，您也可以在文档中查看更详细的[命令行工具使用说明](./cli_tools.md)。
														
 
															-## 通过api、webui、sglang-client/server进阶使用
														
 
															+## 通过api、webui、http-client/server进阶使用
														
 
															 - 通过python api直接调用：[Python 调用示例](https://github.com/opendatalab/MinerU/blob/master/demo/demo.py)
														
 
															 - 通过fast api方式调用：
														
@@ -43,29 +43,29 @@ mineru -p <input_path> -o <output_path> -b vlm-transformers
 
															   >在浏览器中访问 `http://127.0.0.1:8000/docs` 查看API文档。
														
 
															 - 启动gradio webui 可视化前端：
														
 
															   ```bash
														
 
															-  # 使用 pipeline/vlm-transformers/vlm-sglang-client 后端
														
 
															+  # 使用 pipeline/vlm-transformers/vlm-http-client 后端
														
 
															   mineru-gradio --server-name 0.0.0.0 --server-port 7860
														
 
															-  # 或使用 vlm-sglang-engine/pipeline 后端（需安装sglang环境）
														
 
															-  mineru-gradio --server-name 0.0.0.0 --server-port 7860 --enable-sglang-engine true
														
 
															+  # 或使用 vlm-vllm-engine/pipeline 后端（需安装vllm环境）
														
 
															+  mineru-gradio --server-name 0.0.0.0 --server-port 7860 --enable-vllm-engine true
														
 
															   ```
														
 
															   >[!TIP]
														
 
															   > 
														
 
															   >- 在浏览器中访问 `http://127.0.0.1:7860` 使用 Gradio WebUI。
														
 
															   >- 访问 `http://127.0.0.1:7860/?view=api` 使用 Gradio API。
														
 
															-- 使用`sglang-client/server`方式调用：
														
 
															+- 使用`http-client/server`方式调用：
														
 
															   ```bash
														
 
															-  # 启动sglang server(需要安装sglang环境)
														
 
															-  mineru-sglang-server --port 30000
														
 
															+  # 启动vllm server(需要安装vllm环境)
														
 
															+  mineru-vllm-server --port 30000
														
 
															   ``` 
														
 
															   >[!TIP]
														
 
															-  >在另一个终端中通过sglang client连接sglang server（只需cpu与网络，不需要sglang环境）
														
 
															+  >在另一个终端中通过http client连接vllm server（只需cpu与网络，不需要vllm环境）
														
 
															   > ```bash
														
 
															-  > mineru -p <input_path> -o <output_path> -b vlm-sglang-client -u http://127.0.0.1:30000
														
 
															+  > mineru -p <input_path> -o <output_path> -b vlm-http-client -u http://127.0.0.1:30000
														
 
															   > ```
														
 
															 > [!NOTE]
														
 
															-> 所有sglang官方支持的参数都可用通过命令行参数传递给 MinerU，包括以下命令:`mineru`、`mineru-sglang-server`、`mineru-gradio`、`mineru-api`，
														
 
															-> 我们整理了一些`sglang`使用中的常用参数和使用方法，可以在文档[命令行进阶参数](./advanced_cli_parameters.md)中获取。
														
 
															+> 所有vllm官方支持的参数都可用通过命令行参数传递给 MinerU，包括以下命令:`mineru`、`mineru-vllm -server`、`mineru-gradio`、`mineru-api`，
														
 
															+> 我们整理了一些`vllm`使用中的常用参数和使用方法，可以在文档[命令行进阶参数](./advanced_cli_parameters.md)中获取。
														
 
															 ## 基于配置文件扩展 MinerU 功能
														
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -39,6 +39,7 @@ dependencies = [
 
															     "openai>=1.70.0,<2",
														
 
															     "beautifulsoup4>=4.13.5,<5",
														
 
															     "Pygments",
														
 
															+    "mineru_vl_utils",
														
 
															 ]
														
 
															 [project.optional-dependencies]
														
@@ -50,10 +51,12 @@ test = [
 
															     "fuzzywuzzy"
														
 
															 ]
														
 
															 vlm = [
														
 
															-    "mineru_vl_utils[transformers]",
														
 
															+    "torch>=2.6.0,<2.8.0",
														
 
															+    "transformers>=4.51.1,<5.0.0",
														
 
															+    "accelerate>=1.5.1",
														
 
															 ]
														
 
															 vllm = [
														
 
															-    "mineru_vl_utils[vllm]",
														
 
															+    "vllm==0.10.1.1",
														
 
															 ]
														
 
															 pipeline = [
														
 
															     "matplotlib>=3.10,<4",