há 2 meses atrás · e120a90d11
--- a/README.md
+++ b/README.md
@@ -583,7 +583,7 @@ A WebUI developed based on Gradio, with a simple interface and only core parsing
 
				         <td>Parsing Backend</td>
			
 
				         <td>pipeline</td>
			
 
				         <td>vlm-transformers</td>
			
 
				-        <td>vlm-sglang</td>
			
 
				+        <td>vlm-vllm</td>
			
 
				     </tr>
			
 
				     <tr>
			
 
				         <td>Operating System</td>
			
@@ -661,8 +661,8 @@ You can use MinerU for PDF parsing through various methods such as command line,
 
				 - [x] Handwritten Text Recognition  
			
 
				 - [x] Vertical Text Recognition  
			
 
				 - [x] Latin Accent Mark Recognition
			
 
				-- [ ] Code block recognition in the main text
			
 
				-- [ ] [Chemical formula recognition](docs/chemical_knowledge_introduction/introduction.pdf)
			
 
				+- [x] Code block recognition in the main text
			
 
				+- [x] [Chemical formula recognition](docs/chemical_knowledge_introduction/introduction.pdf)(mineru.net)
			
 
				 - [ ] Geometric shape recognition
			
 
				 
			
 
				 # Known Issues
			
--- a/README_zh-CN.md
+++ b/README_zh-CN.md
@@ -570,7 +570,7 @@ https://github.com/user-attachments/assets/4bea02c9-6d54-4cd6-97ed-dff14340982c
 
				         <td>解析后端</td>
			
 
				         <td>pipeline</td>
			
 
				         <td>vlm-transformers</td>
			
 
				-        <td>vlm-sglang</td>
			
 
				+        <td>vlm-vllm</td>
			
 
				     </tr>
			
 
				     <tr>
			
 
				         <td>操作系统</td>
			
@@ -648,8 +648,8 @@ mineru -p <input_path> -o <output_path>
 
				 - [x] 手写文本识别
			
 
				 - [x] 竖排文本识别
			
 
				 - [x] 拉丁字母重音符号识别
			
 
				-- [ ] 正文中代码块识别
			
 
				-- [ ] [化学式识别](docs/chemical_knowledge_introduction/introduction.pdf)
			
 
				+- [x] 正文中代码块识别
			
 
				+- [x] [化学式识别](docs/chemical_knowledge_introduction/introduction.pdf)(https://mineru.net)
			
 
				 - [ ] 图表内容识别
			
 
				 
			
 
				 # Known Issues
			
--- a/docs/en/faq/index.md
+++ b/docs/en/faq/index.md
@@ -15,18 +15,6 @@ For unresolved problems, join our [Discord](https://discord.gg/Tdedn9GTXq) or [W
 
				     Reference: [#388](https://github.com/opendatalab/MinerU/issues/388)
			
 
				 
			
 
				 
			
 
				-??? question "Error when installing MinerU on CentOS 7 or Ubuntu 18: `ERROR: Failed building wheel for simsimd`"
			
 
				-
			
 
				-    The new version of albumentations (1.4.21) introduces a dependency on simsimd. Since the pre-built package of simsimd for Linux requires a glibc version greater than or equal to 2.28, this causes installation issues on some Linux distributions released before 2019. You can resolve this issue by using the following command:
			
 
				-    ```
			
 
				-    conda create -n mineru python=3.11 -y
			
 
				-    conda activate mineru
			
 
				-    pip install -U "mineru[pipeline_old_linux]"
			
 
				-    ```
			
 
				-    
			
 
				-    Reference: [#1004](https://github.com/opendatalab/MinerU/issues/1004)
			
 
				-
			
 
				-
			
 
				 ??? question "Missing text information in parsing results when installing and using on Linux systems."
			
 
				 
			
 
				     MinerU uses `pypdfium2` instead of `pymupdf` as the PDF page rendering engine in versions >=2.0 to resolve AGPLv3 license issues. On some Linux distributions, due to missing CJK fonts, some text may be lost during the process of rendering PDFs to images.
			
--- a/docs/en/quick_start/docker_deployment.md
+++ b/docs/en/quick_start/docker_deployment.md
@@ -6,25 +6,22 @@ MinerU provides a convenient Docker deployment method, which helps quickly set u
 
				 
			
 
				 ```bash
			
 
				 wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docker/global/Dockerfile
			
 
				-docker build -t mineru-sglang:latest -f Dockerfile .
			
 
				+docker build -t mineru-vllm:latest -f Dockerfile .
			
 
				 ```
			
 
				 
			
 
				 > [!TIP]
			
 
				-> The [Dockerfile](https://github.com/opendatalab/MinerU/blob/master/docker/global/Dockerfile) uses `lmsysorg/sglang:v0.4.10.post2-cu126` as the base image by default, supporting Turing/Ampere/Ada Lovelace/Hopper platforms.
			
 
				-> If you are using the newer `Blackwell` platform, please modify the base image to `lmsysorg/sglang:v0.4.10.post2-cu128-b200` before executing the build operation.
			
 
				+> The [Dockerfile](https://github.com/opendatalab/MinerU/blob/master/docker/global/Dockerfile) uses `vllm/vllm-openai:v0.10.1.1` as the base image by default, supporting Turing/Ampere/Ada Lovelace/Hopper/Blackwell platforms.
			
 
				 
			
 
				 ## Docker Description
			
 
				 
			
 
				-MinerU's Docker uses `lmsysorg/sglang` as the base image, so it includes the `sglang` inference acceleration framework and necessary dependencies by default. Therefore, on compatible devices, you can directly use `sglang` to accelerate VLM model inference.
			
 
				+MinerU's Docker uses `vllm/vllm-openai` as the base image, so it includes the `vllm` inference acceleration framework and necessary dependencies by default. Therefore, on compatible devices, you can directly use `vllm` to accelerate VLM model inference.
			
 
				 
			
 
				 > [!NOTE]
			
 
				-> Requirements for using `sglang` to accelerate VLM model inference:
			
 
				+> Requirements for using `vllm` to accelerate VLM model inference:
			
 
				 > 
			
 
				 > - Device must have Turing architecture or later graphics cards with 8GB+ available VRAM.
			
 
				-> - The host machine's graphics driver should support CUDA 12.6 or higher; `Blackwell` platform should support CUDA 12.8 or higher. You can check the driver version using the `nvidia-smi` command.
			
 
				+> - The host machine's graphics driver should support CUDA 12.8 or higher; You can check the driver version using the `nvidia-smi` command.
			
 
				 > - Docker container must have access to the host machine's graphics devices.
			
 
				->
			
 
				-> If your device doesn't meet the above requirements, you can still use other features of MinerU, but cannot use `sglang` to accelerate VLM model inference, meaning you cannot use the `vlm-sglang-engine` backend or start the `vlm-sglang-server` service.
			
 
				 
			
 
				 ## Start Docker Container
			
 
				 
			
@@ -33,7 +30,7 @@ docker run --gpus all \
 
				   --shm-size 32g \
			
 
				   -p 30000:30000 -p 7860:7860 -p 8000:8000 \
			
 
				   --ipc=host \
			
 
				-  -it mineru-sglang:latest \
			
 
				+  -it mineru-vllm:latest \
			
 
				   /bin/bash
			
 
				 ```
			
 
				 
			
@@ -53,19 +50,19 @@ wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docker/compose.yaml
 
				 >
			
 
				 >- The `compose.yaml` file contains configurations for multiple services of MinerU, you can choose to start specific services as needed.
			
 
				 >- Different services might have additional parameter configurations, which you can view and edit in the `compose.yaml` file.
			
 
				->- Due to the pre-allocation of GPU memory by the `sglang` inference acceleration framework, you may not be able to run multiple `sglang` services simultaneously on the same machine. Therefore, ensure that other services that might use GPU memory have been stopped before starting the `vlm-sglang-server` service or using the `vlm-sglang-engine` backend.
			
 
				+>- Due to the pre-allocation of GPU memory by the `vllm` inference acceleration framework, you may not be able to run multiple `vllm` services simultaneously on the same machine. Therefore, ensure that other services that might use GPU memory have been stopped before starting the `vlm-vllm-server` service or using the `vlm-vllm-engine` backend.
			
 
				 
			
 
				 ---
			
 
				 
			
 
				-### Start sglang-server service
			
 
				-connect to `sglang-server` via `vlm-sglang-client` backend
			
 
				+### Start vllm-server service
			
 
				+connect to `vllm-server` via `vlm-http-client` backend
			
 
				   ```bash
			
 
				-  docker compose -f compose.yaml --profile sglang-server up -d
			
 
				+  docker compose -f compose.yaml --profile vllm-server up -d
			
 
				   ```
			
 
				   >[!TIP]
			
 
				-  >In another terminal, connect to sglang server via sglang client (only requires CPU and network, no sglang environment needed)
			
 
				+  >In another terminal, connect to vllm server via http client (only requires CPU and network, no vllm environment needed)
			
 
				   > ```bash
			
 
				-  > mineru -p <input_path> -o <output_path> -b vlm-sglang-client -u http://<server_ip>:30000
			
 
				+  > mineru -p <input_path> -o <output_path> -b vlm-http-client -u http://<server_ip>:30000
			
 
				   > ```
			
 
				 
			
 
				 ---
			
--- a/docs/en/quick_start/extension_modules.md
+++ b/docs/en/quick_start/extension_modules.md
@@ -4,34 +4,26 @@ MinerU supports installing extension modules on demand based on different needs
 
				 ## Common Scenarios
			
 
				 
			
 
				 ### Core Functionality Installation
			
 
				-The `core` module is the core dependency of MinerU, containing all functional modules except `sglang`. Installing this module ensures the basic functionality of MinerU works properly.
			
 
				+The `core` module is the core dependency of MinerU, containing all functional modules except `vllm`. Installing this module ensures the basic functionality of MinerU works properly.
			
 
				 ```bash
			
 
				 uv pip install mineru[core]
			
 
				 ```
			
 
				 
			
 
				 ---
			
 
				 
			
 
				-### Using `sglang` to Accelerate VLM Model Inference
			
 
				-The `sglang` module provides acceleration support for VLM model inference, suitable for graphics cards with Turing architecture and later (8GB+ VRAM). Installing this module can significantly improve model inference speed.
			
 
				-In the configuration, `all` includes both `core` and `sglang` modules, so `mineru[all]` and `mineru[core,sglang]` are equivalent.
			
 
				+### Using `vllm` to Accelerate VLM Model Inference
			
 
				+The `vllm` module provides acceleration support for VLM model inference, suitable for graphics cards with Turing architecture and later (8GB+ VRAM). Installing this module can significantly improve model inference speed.
			
 
				+In the configuration, `all` includes both `core` and `vllm` modules, so `mineru[all]` and `mineru[core,vllm]` are equivalent.
			
 
				 ```bash
			
 
				 uv pip install mineru[all]
			
 
				 ```
			
 
				 > [!TIP]
			
 
				-> If exceptions occur during installation of the complete package including sglang, please refer to the [sglang official documentation](https://docs.sglang.ai/start/install.html) to try to resolve the issue, or directly use the [Docker](./docker_deployment.md) deployment method.
			
 
				+> If exceptions occur during installation of the complete package including vllm, please refer to the [vllm official documentation](https://docs.vllm.ai/en/latest/getting_started/installation/index.html) to try to resolve the issue, or directly use the [Docker](./docker_deployment.md) deployment method.
			
 
				 
			
 
				 ---
			
 
				 
			
 
				 ### Installing Lightweight Client to Connect to sglang-server
			
 
				-If you need to install a lightweight client on edge devices to connect to `sglang-server`, you can install the basic mineru package, which is very lightweight and suitable for devices with only CPU and network connectivity.
			
 
				+If you need to install a lightweight client on edge devices to connect to `vllm-server`, you can install the basic mineru package, which is very lightweight and suitable for devices with only CPU and network connectivity.
			
 
				 ```bash
			
 
				 uv pip install mineru
			
 
				 ```
			
 
				-
			
 
				----
			
 
				-
			
 
				-### Using Pipeline Backend on Outdated Linux Systems
			
 
				-If your system is too outdated to meet the dependency requirements of `mineru[core]`, this option can minimally meet MinerU's runtime requirements, suitable for old systems that cannot be upgraded and only need to use the pipeline backend.
			
 
				-```bash
			
 
				-uv pip install mineru[pipeline_old_linux]
			
 
				-```
			
--- a/docs/en/quick_start/index.md
+++ b/docs/en/quick_start/index.md
@@ -31,7 +31,7 @@ A WebUI developed based on Gradio, with a simple interface and only core parsing
 
				         <td>Parsing Backend</td>
			
 
				         <td>pipeline</td>
			
 
				         <td>vlm-transformers</td>
			
 
				-        <td>vlm-sglang</td>
			
 
				+        <td>vlm-vllm</td>
			
 
				     </tr>
			
 
				     <tr>
			
 
				         <td>Operating System</td>
			
@@ -80,8 +80,8 @@ uv pip install -e .[core]
 
				 ```
			
 
				 
			
 
				 > [!TIP]
			
 
				-> `mineru[core]` includes all core features except `sglang` acceleration, compatible with Windows / Linux / macOS systems, suitable for most users.
			
 
				-> If you need to use `sglang` acceleration for VLM model inference or install a lightweight client on edge devices, please refer to the documentation [Extension Modules Installation Guide](./extension_modules.md).
			
 
				+> `mineru[core]` includes all core features except `vllm` acceleration, compatible with Windows / Linux / macOS systems, suitable for most users.
			
 
				+> If you need to use `vllm` acceleration for VLM model inference or install a lightweight client on edge devices, please refer to the documentation [Extension Modules Installation Guide](./extension_modules.md).
			
 
				 
			
 
				 ---
			
 
				  
			
--- a/docs/en/reference/output_files.md
+++ b/docs/en/reference/output_files.md
@@ -165,49 +165,51 @@ inference_result: list[PageInferenceResults] = []
 
				 ]
			
 
				 ```
			
 
				 
			
 
				-### VLM Output Results (model_output.txt)
			
 
				+### VLM Output Results (model.json)
			
 
				 
			
 
				 > [!NOTE]
			
 
				 > Only applicable to VLM backend
			
 
				 
			
 
				-**File naming format**: `{original_filename}_model_output.txt`
			
 
				+**File naming format**: `{original_filename}_model.json`
			
 
				 
			
 
				 #### File Format Description
			
 
				 
			
 
				-- Uses `----` to separate output results for each page
			
 
				-- Each page contains multiple text blocks starting with `<|box_start|>` and ending with `<|md_end|>`
			
 
				-
			
 
				-#### Field Meanings
			
 
				+- This file contains the raw output results from the VLM model, with two nested list layers: the outer layer represents pages, and the inner layer represents content blocks for each page
			
 
				+- Each content block is a dict containing `type`, `bbox`, `angle`, and `content` fields
			
 
				 
			
 
				-| Tag | Format | Description |
			
 
				-|-----|--------|-------------|
			
 
				-| Bounding box | `<\|box_start\|>x0 y0 x1 y1<\|box_end\|>` | Quadrilateral coordinates (top-left, bottom-right points), coordinate values after scaling page to 1000×1000 |
			
 
				-| Type tag | `<\|ref_start\|>type<\|ref_end\|>` | Content block type identifier |
			
 
				-| Content | `<\|md_start\|>markdown content<\|md_end\|>` | Markdown content of the block |
			
 
				 
			
 
				 #### Supported Content Types
			
 
				 
			
 
				 ```json
			
 
				 {
			
 
				-    "text": "Text",
			
 
				-    "title": "Title", 
			
 
				-    "image": "Image",
			
 
				-    "image_caption": "Image caption",
			
 
				-    "image_footnote": "Image footnote",
			
 
				-    "table": "Table",
			
 
				-    "table_caption": "Table caption", 
			
 
				-    "table_footnote": "Table footnote",
			
 
				-    "equation": "Interline formula"
			
 
				+    "text",
			
 
				+    "title", 
			
 
				+    "equation",
			
 
				+    "image",
			
 
				+    "image_caption",
			
 
				+    "image_footnote",
			
 
				+    "table",
			
 
				+    "table_caption",
			
 
				+    "table_footnote",
			
 
				+    "phonetic",
			
 
				+    "code",
			
 
				+    "code_caption",
			
 
				+    "ref_text",
			
 
				+    "algorithm",
			
 
				+    "list",
			
 
				+    "header",
			
 
				+    "footer",
			
 
				+    "page_number",
			
 
				+    "aside_text", 
			
 
				+    "page_footnote", 
			
 
				 }
			
 
				 ```
			
 
				 
			
 
				-#### Special Tags
			
 
				-
			
 
				-- `<|txt_contd|>`: Appears at the end of text, indicating that this text block can be connected with subsequent text blocks
			
 
				-- Table content uses `otsl` format and needs to be converted to HTML for rendering in Markdown
			
 
				-
			
 
				 ### Intermediate Processing Results (middle.json)
			
 
				 
			
 
				+> [!NOTE]
			
 
				+> Only applicable to pipeline backend
			
 
				+
			
 
				 **File naming format**: `{original_filename}_middle.json`
			
 
				 
			
 
				 #### Top-level Structure
			
@@ -390,6 +392,9 @@ Level 1 blocks (table | image)
 
				 
			
 
				 ### Content List (content_list.json)
			
 
				 
			
 
				+> [!NOTE]
			
 
				+> Only applicable to pipeline backend
			
 
				+
			
 
				 **File naming format**: `{original_filename}_content_list.json`
			
 
				 
			
 
				 #### Functionality
			
--- a/docs/en/usage/advanced_cli_parameters.md
+++ b/docs/en/usage/advanced_cli_parameters.md
@@ -1,25 +1,17 @@
 
				 # Advanced Command Line Parameters
			
 
				 
			
 
				-## SGLang Acceleration Parameter Optimization
			
 
				-
			
 
				-### Memory Optimization Parameters
			
 
				-> [!TIP]
			
 
				-> SGLang acceleration mode currently supports running on Turing architecture graphics cards with a minimum of 8GB VRAM, but graphics cards with <24GB VRAM may encounter insufficient memory issues. You can optimize memory usage with the following parameters:
			
 
				-> 
			
 
				-> - If you encounter insufficient VRAM when using a single graphics card, you may need to reduce the KV cache size with `--mem-fraction-static 0.5`. If VRAM issues persist, try reducing it further to `0.4` or lower.
			
 
				-> - If you have two or more graphics cards, you can try using tensor parallelism (TP) mode to simply expand available VRAM: `--tp-size 2`
			
 
				+## vllm Acceleration Parameter Optimization
			
 
				 
			
 
				 ### Performance Optimization Parameters
			
 
				 > [!TIP]
			
 
				-> If you can already use SGLang normally for accelerated VLM model inference but still want to further improve inference speed, you can try the following parameters:
			
 
				+> If you can already use vllm normally for accelerated VLM model inference but still want to further improve inference speed, you can try the following parameters:
			
 
				 > 
			
 
				-> - If you have multiple graphics cards, you can use SGLang's multi-card parallel mode to increase throughput: `--dp-size 2`
			
 
				-> - You can also enable `torch.compile` to accelerate inference speed by approximately 15%: `--enable-torch-compile`
			
 
				+> - If you have multiple graphics cards, you can use vllm's multi-card parallel mode to increase throughput: `--data-parallel-size 2`
			
 
				 
			
 
				 ### Parameter Passing Instructions
			
 
				 > [!TIP]
			
 
				-> - All officially supported SGLang parameters can be passed to MinerU through command line arguments, including the following commands: `mineru`, `mineru-sglang-server`, `mineru-gradio`, `mineru-api`
			
 
				-> - If you want to learn more about `sglang` parameter usage, please refer to the [SGLang official documentation](https://docs.sglang.ai/backend/server_arguments.html#common-launch-commands)
			
 
				+> - All officially supported vllm parameters can be passed to MinerU through command line arguments, including the following commands: `mineru`, `mineru-vllm-server`, `mineru-gradio`, `mineru-api`
			
 
				+> - If you want to learn more about `vllm` parameter usage, please refer to the [vllm official documentation](https://docs.vllm.ai/en/latest/cli/serve.html)
			
 
				 
			
 
				 ## GPU Device Selection and Configuration
			
 
				 
			
@@ -29,7 +21,7 @@
 
				 >   ```bash
			
 
				 >   CUDA_VISIBLE_DEVICES=1 mineru -p <input_path> -o <output_path>
			
 
				 >   ```
			
 
				-> - This specification method is effective for all command line calls, including `mineru`, `mineru-sglang-server`, `mineru-gradio`, and `mineru-api`, and applies to both `pipeline` and `vlm` backends.
			
 
				+> - This specification method is effective for all command line calls, including `mineru`, `mineru-vllm-server`, `mineru-gradio`, and `mineru-api`, and applies to both `pipeline` and `vlm` backends.
			
 
				 
			
 
				 ### Common Device Configuration Examples
			
 
				 > [!TIP]
			
@@ -46,14 +38,9 @@
 
				 > [!TIP]
			
 
				 > Here are some possible usage scenarios:
			
 
				 > 
			
 
				-> - If you have multiple graphics cards and need to specify cards 0 and 1, using multi-card parallelism to start `sglang-server`, you can use the following command:
			
 
				->   ```bash
			
 
				->   CUDA_VISIBLE_DEVICES=0,1 mineru-sglang-server --port 30000 --dp-size 2
			
 
				->   ```
			
 
				-> 
			
 
				-> - If you have multiple GPUs and need to specify GPU 0–3, and start the `sglang-server` using multi-GPU data parallelism and tensor parallelism, you can use the following command:
			
 
				+> - If you have multiple graphics cards and need to specify cards 0 and 1, using multi-card parallelism to start `vllm-server`, you can use the following command:
			
 
				 >   ```bash
			
 
				->   CUDA_VISIBLE_DEVICES=0,1,2,3 mineru-sglang-server --port 30000 --dp-size 2 --tp-size 2
			
 
				+>   CUDA_VISIBLE_DEVICES=0,1 mineru-sglang-server --port 30000 --data-parallel-size 2
			
 
				 >   ```
			
 
				 >       
			
 
				 > - If you have multiple graphics cards and need to start two `fastapi` services on cards 0 and 1, listening on different ports respectively, you can use the following commands:
			
--- a/docs/en/usage/cli_tools.md
+++ b/docs/en/usage/cli_tools.md
@@ -11,11 +11,11 @@ Options:
 
				   -p, --path PATH                 Input file path or directory (required)
			
 
				   -o, --output PATH               Output directory (required)
			
 
				   -m, --method [auto|txt|ocr]     Parsing method: auto (default), txt, ocr (pipeline backend only)
			
 
				-  -b, --backend [pipeline|vlm-transformers|vlm-sglang-engine|vlm-sglang-client]
			
 
				+  -b, --backend [pipeline|vlm-transformers|vlm-vllm-engine|vlm-http-client]
			
 
				                                   Parsing backend (default: pipeline)
			
 
				   -l, --lang [ch|ch_server|ch_lite|en|korean|japan|chinese_cht|ta|te|ka|th|el|latin|arabic|east_slavic|cyrillic|devanagari]
			
 
				                                   Specify document language (improves OCR accuracy, pipeline backend only)
			
 
				-  -u, --url TEXT                  Service address when using sglang-client
			
 
				+  -u, --url TEXT                  Service address when using http-client
			
 
				   -s, --start INTEGER             Starting page number for parsing (0-based)
			
 
				   -e, --end INTEGER               Ending page number for parsing (0-based)
			
 
				   -f, --formula BOOLEAN           Enable formula parsing (default: enabled)
			
@@ -45,7 +45,7 @@ Options:
 
				                                   files to be input need to be placed in the
			
 
				                                   `example` folder within the directory where
			
 
				                                   the command is currently executed.
			
 
				-  --enable-sglang-engine BOOLEAN  Enable SgLang engine backend for faster
			
 
				+  --enable-vllm-engine BOOLEAN  Enable vllm engine backend for faster
			
 
				                                   processing.
			
 
				   --enable-api BOOLEAN            Enable gradio API for serving the
			
 
				                                   application.
			
--- a/docs/en/usage/quick_usage.md
+++ b/docs/en/usage/quick_usage.md
@@ -33,7 +33,7 @@ mineru -p <input_path> -o <output_path> -b vlm-transformers
 
				 
			
 
				 If you need to adjust parsing options through custom parameters, you can also check the more detailed [Command Line Tools Usage Instructions](./cli_tools.md) in the documentation.
			
 
				 
			
 
				-## Advanced Usage via API, WebUI, sglang-client/server
			
 
				+## Advanced Usage via API, WebUI, http-client/server
			
 
				 
			
 
				 - Direct Python API calls: [Python Usage Example](https://github.com/opendatalab/MinerU/blob/master/demo/demo.py)
			
 
				 - FastAPI calls:
			
@@ -44,29 +44,29 @@ If you need to adjust parsing options through custom parameters, you can also ch
 
				   >Access `http://127.0.0.1:8000/docs` in your browser to view the API documentation.
			
 
				 - Start Gradio WebUI visual frontend:
			
 
				   ```bash
			
 
				-  # Using pipeline/vlm-transformers/vlm-sglang-client backends
			
 
				+  # Using pipeline/vlm-transformers/vlm-vllm-client backends
			
 
				   mineru-gradio --server-name 0.0.0.0 --server-port 7860
			
 
				-  # Or using vlm-sglang-engine/pipeline backends (requires sglang environment)
			
 
				-  mineru-gradio --server-name 0.0.0.0 --server-port 7860 --enable-sglang-engine true
			
 
				+  # Or using vlm-vllm-engine/pipeline backends (requires vllm environment)
			
 
				+  mineru-gradio --server-name 0.0.0.0 --server-port 7860 --enable-vllm-engine true
			
 
				   ```
			
 
				   >[!TIP]
			
 
				   >
			
 
				   >- Access `http://127.0.0.1:7860` in your browser to use the Gradio WebUI.
			
 
				   >- Access `http://127.0.0.1:7860/?view=api` to use the Gradio API.
			
 
				-- Using `sglang-client/server` method:
			
 
				+- Using `http-client/server` method:
			
 
				   ```bash
			
 
				-  # Start sglang server (requires sglang environment)
			
 
				-  mineru-sglang-server --port 30000
			
 
				+  # Start vllm server (requires vllm environment)
			
 
				+  mineru-vllm-server --port 30000
			
 
				   ``` 
			
 
				   >[!TIP]
			
 
				-  >In another terminal, connect to sglang server via sglang client (only requires CPU and network, no sglang environment needed)
			
 
				+  >In another terminal, connect to vllm server via http client (only requires CPU and network, no vllm environment needed)
			
 
				   > ```bash
			
 
				-  > mineru -p <input_path> -o <output_path> -b vlm-sglang-client -u http://127.0.0.1:30000
			
 
				+  > mineru -p <input_path> -o <output_path> -b vlm-http-client -u http://127.0.0.1:30000
			
 
				   > ```
			
 
				 
			
 
				 > [!NOTE]
			
 
				-> All officially supported sglang parameters can be passed to MinerU through command line arguments, including the following commands: `mineru`, `mineru-sglang-server`, `mineru-gradio`, `mineru-api`.
			
 
				-> We have compiled some commonly used parameters and usage methods for `sglang`, which can be found in the documentation [Advanced Command Line Parameters](./advanced_cli_parameters.md).
			
 
				+> All officially supported vllm parameters can be passed to MinerU through command line arguments, including the following commands: `mineru`, `mineru-vllm-server`, `mineru-gradio`, `mineru-api`.
			
 
				+> We have compiled some commonly used parameters and usage methods for `vllm`, which can be found in the documentation [Advanced Command Line Parameters](./advanced_cli_parameters.md).
			
 
				 
			
 
				 ## Extending MinerU Functionality with Configuration Files
			
 
				 
			
--- a/docs/zh/faq/index.md
+++ b/docs/zh/faq/index.md
@@ -14,18 +14,6 @@
 
				     
			
 
				     参考：[#388](https://github.com/opendatalab/MinerU/issues/388)
			
 
				 
			
 
				-
			
 
				-??? question "在 CentOS 7 或 Ubuntu 18 系统安装MinerU时报错`ERROR: Failed building wheel for simsimd`"
			
 
				-
			
 
				-    新版本albumentations(1.4.21)引入了依赖simsimd,由于simsimd在linux的预编译包要求glibc的版本大于等于2.28，导致部分2019年之前发布的Linux发行版无法正常安装，可通过如下命令安装:
			
 
				-    ```
			
 
				-    conda create -n mineru python=3.11 -y
			
 
				-    conda activate mineru
			
 
				-    pip install -U "mineru[pipeline_old_linux]"
			
 
				-    ```
			
 
				-    
			
 
				-    参考：[#1004](https://github.com/opendatalab/MinerU/issues/1004)
			
 
				-
			
 
				 ??? question "在 Linux 系统安装并使用时，解析结果缺失部份文字信息。"
			
 
				 
			
 
				     MinerU在>=2.0的版本中使用`pypdfium2`代替`pymupdf`作为PDF页面的渲染引擎，以解决AGPLv3的许可证问题，在某些Linux发行版，由于缺少CJK字体，可能会在将PDF渲染成图片的过程中丢失部份文字。
			
--- a/docs/zh/quick_start/docker_deployment.md
+++ b/docs/zh/quick_start/docker_deployment.md
@@ -6,24 +6,22 @@ MinerU提供了便捷的docker部署方式，这有助于快速搭建环境并
 
				 
			
 
				 ```bash
			
 
				 wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docker/china/Dockerfile
			
 
				-docker build -t mineru-sglang:latest -f Dockerfile .
			
 
				+docker build -t mineru-vllm:latest -f Dockerfile .
			
 
				 ```
			
 
				 
			
 
				 > [!TIP]
			
 
				-> [Dockerfile](https://github.com/opendatalab/MinerU/blob/master/docker/china/Dockerfile)默认使用`lmsysorg/sglang:v0.4.10.post2-cu126`作为基础镜像，支持Turing/Ampere/Ada Lovelace/Hopper平台，
			
 
				-> 如您使用较新的`Blackwell`平台，请将基础镜像修改为`lmsysorg/sglang:v0.4.10.post2-cu128-b200` 再执行build操作。
			
 
				+> [Dockerfile](https://github.com/opendatalab/MinerU/blob/master/docker/china/Dockerfile)默认使用`vllm/vllm-openai:v0.10.1.1`作为基础镜像，支持Turing/Ampere/Ada Lovelace/Hopper/Blackwell平台，
			
 
				 
			
 
				 ## Docker说明
			
 
				 
			
 
				-Mineru的docker使用了`lmsysorg/sglang`作为基础镜像，因此在docker中默认集成了`sglang`推理加速框架和必需的依赖环境。因此在满足条件的设备上，您可以直接使用`sglang`加速VLM模型推理。
			
 
				+Mineru的docker使用了`vllm/vllm-openai`作为基础镜像，因此在docker中默认集成了`vllm`推理加速框架和必需的依赖环境。因此在满足条件的设备上，您可以直接使用`vllm`加速VLM模型推理。
			
 
				 > [!NOTE]
			
 
				-> 使用`sglang`加速VLM模型推理需要满足的条件是：
			
 
				+> 使用`vllm`加速VLM模型推理需要满足的条件是：
			
 
				 > 
			
 
				 > - 设备包含Turing及以后架构的显卡，且可用显存大于等于8G。
			
 
				-> - 物理机的显卡驱动应支持CUDA 12.6或更高版本，`Blackwell`平台应支持CUDA 12.8及更高版本，可通过`nvidia-smi`命令检查驱动版本。
			
 
				+> - 物理机的显卡驱动应支持CUDA 12.8或更高版本，可通过`nvidia-smi`命令检查驱动版本。
			
 
				 > - docker中能够访问物理机的显卡设备。
			
 
				->
			
 
				-> 如果您的设备不满足上述条件，您仍然可以使用MinerU的其他功能，但无法使用`sglang`加速VLM模型推理，即无法使用`vlm-sglang-engine`后端和启动`vlm-sglang-server`服务。
			
 
				+
			
 
				 
			
 
				 ## 启动 Docker 容器
			
 
				 
			
@@ -32,7 +30,7 @@ docker run --gpus all \
 
				   --shm-size 32g \
			
 
				   -p 30000:30000 -p 7860:7860 -p 8000:8000 \
			
 
				   --ipc=host \
			
 
				-  -it mineru-sglang:latest \
			
 
				+  -it mineru-vllm:latest \
			
 
				   /bin/bash
			
 
				 ```
			
 
				 
			
@@ -51,19 +49,19 @@ wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docker/compose.yaml
 
				 >  
			
 
				 >- `compose.yaml`文件中包含了MinerU的多个服务配置，您可以根据需要选择启动特定的服务。
			
 
				 >- 不同的服务可能会有额外的参数配置，您可以在`compose.yaml`文件中查看并编辑。
			
 
				->- 由于`sglang`推理加速框架预分配显存的特性，您可能无法在同一台机器上同时运行多个`sglang`服务，因此请确保在启动`vlm-sglang-server`服务或使用`vlm-sglang-engine`后端时，其他可能使用显存的服务已停止。
			
 
				+>- 由于`vllm`推理加速框架预分配显存的特性，您可能无法在同一台机器上同时运行多个`vllm`服务，因此请确保在启动`vlm-vllm-server`服务或使用`vlm-vllm-engine`后端时，其他可能使用显存的服务已停止。
			
 
				 
			
 
				 ---
			
 
				 
			
 
				-### 启动 sglang-server 服务
			
 
				-并通过`vlm-sglang-client`后端连接`sglang-server`
			
 
				+### 启动 vllm-server 服务
			
 
				+并通过`vlm-http-client`后端连接`vllm-server`
			
 
				   ```bash
			
 
				-  docker compose -f compose.yaml --profile sglang-server up -d
			
 
				+  docker compose -f compose.yaml --profile vllm-server up -d
			
 
				   ```
			
 
				   >[!TIP]
			
 
				-  >在另一个终端中通过sglang client连接sglang server（只需cpu与网络，不需要sglang环境）
			
 
				+  >在另一个终端中通过http client连接vllm server（只需cpu与网络，不需要vllm环境）
			
 
				   > ```bash
			
 
				-  > mineru -p <input_path> -o <output_path> -b vlm-sglang-client -u http://<server_ip>:30000
			
 
				+  > mineru -p <input_path> -o <output_path> -b vlm-http-client -u http://<server_ip>:30000
			
 
				   > ```
			
 
				 
			
 
				 ---
			
--- a/docs/zh/quick_start/extension_modules.md
+++ b/docs/zh/quick_start/extension_modules.md
@@ -4,34 +4,26 @@ MinerU 支持根据不同需求，按需安装扩展模块，以增强功能或
 
				 ## 常见场景
			
 
				 
			
 
				 ### 核心功能安装
			
 
				-`core` 模块是 MinerU 的核心依赖，包含了除`sglang`外的所有功能模块。安装此模块可以确保 MinerU 的基本功能正常运行。
			
 
				+`core` 模块是 MinerU 的核心依赖，包含了除`vllm`外的所有功能模块。安装此模块可以确保 MinerU 的基本功能正常运行。
			
 
				 ```bash
			
 
				 uv pip install mineru[core]
			
 
				 ```
			
 
				 
			
 
				 ---
			
 
				 
			
 
				-### 使用`sglang`加速 VLM 模型推理
			
 
				-`sglang` 模块提供了对 VLM 模型推理的加速支持，适用于具有 Turing 及以后架构的显卡（8G 显存及以上）。安装此模块可以显著提升模型推理速度。
			
 
				-在配置中，`all`包含了`core`和`sglang`模块，因此`mineru[all]`和`mineru[core,sglang]`是等价的。
			
 
				+### 使用`vllm`加速 VLM 模型推理
			
 
				+`vllm` 模块提供了对 VLM 模型推理的加速支持，适用于具有 Turing 及以后架构的显卡（8G 显存及以上）。安装此模块可以显著提升模型推理速度。
			
 
				+在配置中，`all`包含了`core`和`vllm`模块，因此`mineru[all]`和`mineru[core,vllm]`是等价的。
			
 
				 ```bash
			
 
				 uv pip install mineru[all]
			
 
				 ```
			
 
				 > [!TIP]
			
 
				-> 如在安装包含sglang的完整包过程中发生异常，请参考 [sglang 官方文档](https://docs.sglang.ai/start/install.html) 尝试解决，或直接使用 [Docker](./docker_deployment.md) 方式部署镜像。
			
 
				+> 如在安装包含vllm的完整包过程中发生异常，请参考 [vllm 官方文档](https://docs.vllm.ai/en/latest/getting_started/installation/index.html) 尝试解决，或直接使用 [Docker](./docker_deployment.md) 方式部署镜像。
			
 
				 
			
 
				 ---
			
 
				 
			
 
				-### 安装轻量版client连接sglang-server使用
			
 
				-如果您需要在边缘设备上安装轻量版的 client 端以连接 `sglang-server`，可以安装mineru的基础包，非常轻量，适合在只有cpu和网络连接的设备上使用。
			
 
				+### 安装轻量版client连接vllm-server使用
			
 
				+如果您需要在边缘设备上安装轻量版的 client 端以连接 `vllm-server`，可以安装mineru的基础包，非常轻量，适合在只有cpu和网络连接的设备上使用。
			
 
				 ```bash
			
 
				 uv pip install mineru
			
 
				 ```
			
 
				-
			
 
				----
			
 
				-
			
 
				-### 在过时的linux系统上使用pipeline后端
			
 
				-如果您的系统过于陈旧，无法满足`mineru[core]`的依赖要求，该选项可以最低限度的满足 MinerU 的运行需求，适用于老旧系统无法升级且仅需使用 pipeline 后端的场景。
			
 
				-```bash
			
 
				-uv pip install mineru[pipeline_old_linux]
			
 
				-```
			
--- a/docs/zh/quick_start/index.md
+++ b/docs/zh/quick_start/index.md
@@ -31,7 +31,7 @@
 
				         <td>解析后端</td>
			
 
				         <td>pipeline</td>
			
 
				         <td>vlm-transformers</td>
			
 
				-        <td>vlm-sglang</td>
			
 
				+        <td>vlm-vllm</td>
			
 
				     </tr>
			
 
				     <tr>
			
 
				         <td>操作系统</td>
			
@@ -80,8 +80,8 @@ uv pip install -e .[core] -i https://mirrors.aliyun.com/pypi/simple
 
				 ```
			
 
				 
			
 
				 > [!TIP]
			
 
				-> `mineru[core]`包含除`sglang`加速外的所有核心功能，兼容Windows / Linux / macOS系统，适合绝大多数用户。
			
 
				-> 如果您有使用`sglang`加速VLM模型推理，或是在边缘设备安装轻量版client端等需求，可以参考文档[扩展模块安装指南](./extension_modules.md)。
			
 
				+> `mineru[core]`包含除`vllm`加速外的所有核心功能，兼容Windows / Linux / macOS系统，适合绝大多数用户。
			
 
				+> 如果您有使用`vllm`加速VLM模型推理，或是在边缘设备安装轻量版client端等需求，可以参考文档[扩展模块安装指南](./extension_modules.md)。
			
 
				 
			
 
				 ---
			
 
				  
			
--- a/docs/zh/reference/output_files.md
+++ b/docs/zh/reference/output_files.md
@@ -165,49 +165,52 @@ inference_result: list[PageInferenceResults] = []
 
				 ]
			
 
				 ```
			
 
				 
			
 
				-### VLM 输出结果 (model_output.txt)
			
 
				+### VLM 输出结果 (model.json)
			
 
				 
			
 
				 > [!NOTE]
			
 
				 > 仅适用于 VLM 后端
			
 
				 
			
 
				-**文件命名格式**：`{原文件名}_model_output.txt`
			
 
				+**文件命名格式**：`{原文件名}_model.json`
			
 
				 
			
 
				 #### 文件格式说明
			
 
				 
			
 
				-- 使用 `----` 分割每一页的输出结果
			
 
				-- 每页包含多个以 `<|box_start|>` 开头、`<|md_end|>` 结尾的文本块
			
 
				-
			
 
				-#### 字段含义
			
 
				+- 该文件为 VLM 模型的原始输出结果，包含两层嵌套list，外层表示页面，内层表示该页的内容块
			
 
				+- 每个内容块都是一个dict，包含 `type`、`bbox`、`angle`、`content` 字段
			
 
				 
			
 
				-| 标记 | 格式 | 说明 |
			
 
				-|------|---|------|
			
 
				-| 边界框 | `<\|box_start\|>x0 y0 x1 y1<\|box_end\|>` | 四边形坐标（左上、右下两点），页面缩放至 1000×1000 后的坐标值 |
			
 
				-| 类型标记 | `<\|ref_start\|>type<\|ref_end\|>` | 内容块类型标识 |
			
 
				-| 内容 | `<\|md_start\|>markdown内容<\|md_end\|>` | 该块的 Markdown 内容 |
			
 
				 
			
 
				 #### 支持的内容类型
			
 
				 
			
 
				 ```json
			
 
				 {
			
 
				-    "text": "文本",
			
 
				-    "title": "标题", 
			
 
				-    "image": "图片",
			
 
				-    "image_caption": "图片描述",
			
 
				-    "image_footnote": "图片脚注",
			
 
				-    "table": "表格",
			
 
				-    "table_caption": "表格描述", 
			
 
				-    "table_footnote": "表格脚注",
			
 
				-    "equation": "行间公式"
			
 
				+    "text",
			
 
				+    "title", 
			
 
				+    "equation",
			
 
				+    "image",
			
 
				+    "image_caption",
			
 
				+    "image_footnote",
			
 
				+    "table",
			
 
				+    "table_caption",
			
 
				+    "table_footnote",
			
 
				+    "phonetic",
			
 
				+    "code",
			
 
				+    "code_caption",
			
 
				+    "ref_text",
			
 
				+    "algorithm",
			
 
				+    "list",
			
 
				+    "header",
			
 
				+    "footer",
			
 
				+    "page_number",
			
 
				+    "aside_text", 
			
 
				+    "page_footnote", 
			
 
				 }
			
 
				 ```
			
 
				 
			
 
				-#### 特殊标记
			
 
				-
			
 
				-- `<|txt_contd|>`：出现在文本末尾，表示该文本块可与后续文本块连接
			
 
				-- 表格内容采用 `otsl` 格式，需转换为 HTML 才能在 Markdown 中渲染
			
 
				 
			
 
				 ### 中间处理结果 (middle.json)
			
 
				 
			
 
				+> [!NOTE]
			
 
				+> 仅适用于 pipeline 后端
			
 
				+
			
 
				 **文件命名格式**：`{原文件名}_middle.json`
			
 
				 
			
 
				 #### 顶层结构
			
@@ -390,6 +393,9 @@ inference_result: list[PageInferenceResults] = []
 
				 
			
 
				 ### 内容列表 (content_list.json)
			
 
				 
			
 
				+> [!NOTE]
			
 
				+> 仅适用于 pipeline 后端
			
 
				+
			
 
				 **文件命名格式**：`{原文件名}_content_list.json`
			
 
				 
			
 
				 #### 功能说明
			
--- a/docs/zh/usage/advanced_cli_parameters.md
+++ b/docs/zh/usage/advanced_cli_parameters.md
@@ -1,25 +1,17 @@
 
				 # 命令行参数进阶
			
 
				 
			
 
				-## SGLang 加速参数优化
			
 
				-
			
 
				-### 显存优化参数
			
 
				-> [!TIP]
			
 
				-> sglang加速模式目前支持在最低8G显存的Turing架构显卡上运行，但在显存<24G的显卡上可能会遇到显存不足的问题, 可以通过使用以下参数来优化显存使用：
			
 
				-> 
			
 
				-> - 如果您使用单张显卡遇到显存不足的情况时，可能需要调低KV缓存大小，`--mem-fraction-static 0.5`，如仍出现显存不足问题，可尝试进一步降低到`0.4`或更低
			
 
				-> - 如您有两张以上显卡，可尝试通过张量并行（TP）模式简单扩充可用显存：`--tp-size 2`
			
 
				+## vllm 加速参数优化
			
 
				 
			
 
				 ### 性能优化参数
			
 
				 > [!TIP]
			
 
				-> 如果您已经可以正常使用sglang对vlm模型进行加速推理，但仍然希望进一步提升推理速度，可以尝试以下参数：
			
 
				+> 如果您已经可以正常使用vllm对vlm模型进行加速推理，但仍然希望进一步提升推理速度，可以尝试以下参数：
			
 
				 > 
			
 
				-> - 如果您有超过多张显卡，可以使用sglang的多卡并行模式来增加吞吐量：`--dp-size 2`
			
 
				-> - 同时您可以启用`torch.compile`来将推理速度加速约15%：`--enable-torch-compile`
			
 
				+> - 如果您有超过多张显卡，可以使用vllm的多卡并行模式来增加吞吐量：`--data-parallel-size 2`
			
 
				 
			
 
				 ### 参数传递说明
			
 
				 > [!TIP]
			
 
				-> - 所有sglang官方支持的参数都可用通过命令行参数传递给 MinerU，包括以下命令:`mineru`、`mineru-sglang-server`、`mineru-gradio`、`mineru-api`
			
 
				-> - 如果您想了解更多有关`sglang`的参数使用方法，请参考 [sglang官方文档](https://docs.sglang.ai/backend/server_arguments.html#common-launch-commands)
			
 
				+> - 所有vllm官方支持的参数都可用通过命令行参数传递给 MinerU，包括以下命令:`mineru`、`mineru-vllm-server`、`mineru-gradio`、`mineru-api`
			
 
				+> - 如果您想了解更多有关`vllm`的参数使用方法，请参考 [vllm官方文档](https://docs.vllm.ai/en/latest/cli/serve.html)
			
 
				 
			
 
				 ## GPU 设备选择与配置
			
 
				 
			
@@ -29,7 +21,7 @@
 
				 >   ```bash
			
 
				 >   CUDA_VISIBLE_DEVICES=1 mineru -p <input_path> -o <output_path>
			
 
				 >   ```
			
 
				-> - 这种指定方式对所有的命令行调用都有效，包括 `mineru`、`mineru-sglang-server`、`mineru-gradio` 和 `mineru-api`，且对`pipeline`、`vlm`后端均适用。
			
 
				+> - 这种指定方式对所有的命令行调用都有效，包括 `mineru`、`mineru-vllm-server`、`mineru-gradio` 和 `mineru-api`，且对`pipeline`、`vlm`后端均适用。
			
 
				 
			
 
				 ### 常见设备配置示例
			
 
				 > [!TIP]
			
@@ -47,14 +39,9 @@
 
				 > [!TIP]
			
 
				 > 以下是一些可能的使用场景：
			
 
				 > 
			
 
				-> - 如果您有多张显卡，需要指定卡0和卡1，并使用多卡并行来启动`sglang-server`，可以使用以下命令： 
			
 
				->   ```bash
			
 
				->   CUDA_VISIBLE_DEVICES=0,1 mineru-sglang-server --port 30000 --dp-size 2
			
 
				->   ```
			
 
				->   
			
 
				-> - 如果您有多张显卡，需要指定卡0-3，并使用多卡数据并行和张量并行来启动`sglang-server`，可以使用以下命令： 
			
 
				+> - 如果您有多张显卡，需要指定卡0和卡1，并使用多卡并行来启动`vllm-server`，可以使用以下命令： 
			
 
				 >   ```bash
			
 
				->   CUDA_VISIBLE_DEVICES=0,1,2,3 mineru-sglang-server --port 30000 --dp-size 2 --tp-size 2
			
 
				+>   CUDA_VISIBLE_DEVICES=0,1 mineru-vllm-server --port 30000 --data-parallel-size 2
			
 
				 >   ```
			
 
				 >   
			
 
				 > - 如果您有多张显卡，需要在卡0和卡1上启动两个`fastapi`服务，并分别监听不同的端口，可以使用以下命令： 
			
--- a/docs/zh/usage/cli_tools.md
+++ b/docs/zh/usage/cli_tools.md
@@ -11,11 +11,11 @@ Options:
 
				   -p, --path PATH                 输入文件路径或目录（必填）
			
 
				   -o, --output PATH               输出目录（必填）
			
 
				   -m, --method [auto|txt|ocr]     解析方法：auto（默认）、txt、ocr（仅用于 pipeline 后端）
			
 
				-  -b, --backend [pipeline|vlm-transformers|vlm-sglang-engine|vlm-sglang-client]
			
 
				+  -b, --backend [pipeline|vlm-transformers|vlm-vllm-engine|vlm-http-client]
			
 
				                                   解析后端（默认为 pipeline）
			
 
				   -l, --lang [ch|ch_server|ch_lite|en|korean|japan|chinese_cht|ta|te|ka|th|el|latin|arabic|east_slavic|cyrillic|devanagari]
			
 
				                                   指定文档语言（可提升 OCR 准确率，仅用于 pipeline 后端）
			
 
				-  -u, --url TEXT                  当使用 sglang-client 时，需指定服务地址
			
 
				+  -u, --url TEXT                  当使用 http-client 时，需指定服务地址
			
 
				   -s, --start INTEGER             开始解析的页码（从 0 开始）
			
 
				   -e, --end INTEGER               结束解析的页码（从 0 开始）
			
 
				   -f, --formula BOOLEAN           是否启用公式解析（默认开启）
			
@@ -43,7 +43,7 @@ Usage: mineru-gradio [OPTIONS]
 
				 Options:
			
 
				   --enable-example BOOLEAN        启用示例文件输入(需要将示例文件放置在当前
			
 
				                                   执行命令目录下的 `example` 文件夹中)
			
 
				-  --enable-sglang-engine BOOLEAN  启用 SgLang 引擎后端以提高处理速度
			
 
				+  --enable-vllm-engine BOOLEAN  启用 vllm 引擎后端以提高处理速度
			
 
				   --enable-api BOOLEAN            启用 Gradio API 以提供应用程序服务
			
 
				   --max-convert-pages INTEGER     设置从 PDF 转换为 Markdown 的最大页数
			
 
				   --server-name TEXT              设置 Gradio 应用程序的服务器主机名
			
--- a/docs/zh/usage/quick_usage.md
+++ b/docs/zh/usage/quick_usage.md
@@ -28,11 +28,11 @@ mineru -p <input_path> -o <output_path>
 
				 mineru -p <input_path> -o <output_path> -b vlm-transformers
			
 
				 ```
			
 
				 > [!TIP]
			
 
				-> vlm后端另外支持`sglang`加速，与`transformers`后端相比，`sglang`的加速比可达20～30倍，可以在[扩展模块安装指南](../quick_start/extension_modules.md)中查看支持`sglang`加速的完整包安装方法。
			
 
				+> vlm后端另外支持`vllm`加速，与`transformers`后端相比，`vllm`的加速比可达20～30倍，可以在[扩展模块安装指南](../quick_start/extension_modules.md)中查看支持`vllm`加速的完整包安装方法。
			
 
				 
			
 
				 如果需要通过自定义参数调整解析选项，您也可以在文档中查看更详细的[命令行工具使用说明](./cli_tools.md)。
			
 
				 
			
 
				-## 通过api、webui、sglang-client/server进阶使用
			
 
				+## 通过api、webui、http-client/server进阶使用
			
 
				 
			
 
				 - 通过python api直接调用：[Python 调用示例](https://github.com/opendatalab/MinerU/blob/master/demo/demo.py)
			
 
				 - 通过fast api方式调用：
			
@@ -43,29 +43,29 @@ mineru -p <input_path> -o <output_path> -b vlm-transformers
 
				   >在浏览器中访问 `http://127.0.0.1:8000/docs` 查看API文档。
			
 
				 - 启动gradio webui 可视化前端：
			
 
				   ```bash
			
 
				-  # 使用 pipeline/vlm-transformers/vlm-sglang-client 后端
			
 
				+  # 使用 pipeline/vlm-transformers/vlm-http-client 后端
			
 
				   mineru-gradio --server-name 0.0.0.0 --server-port 7860
			
 
				-  # 或使用 vlm-sglang-engine/pipeline 后端（需安装sglang环境）
			
 
				-  mineru-gradio --server-name 0.0.0.0 --server-port 7860 --enable-sglang-engine true
			
 
				+  # 或使用 vlm-vllm-engine/pipeline 后端（需安装vllm环境）
			
 
				+  mineru-gradio --server-name 0.0.0.0 --server-port 7860 --enable-vllm-engine true
			
 
				   ```
			
 
				   >[!TIP]
			
 
				   > 
			
 
				   >- 在浏览器中访问 `http://127.0.0.1:7860` 使用 Gradio WebUI。
			
 
				   >- 访问 `http://127.0.0.1:7860/?view=api` 使用 Gradio API。
			
 
				-- 使用`sglang-client/server`方式调用：
			
 
				+- 使用`http-client/server`方式调用：
			
 
				   ```bash
			
 
				-  # 启动sglang server(需要安装sglang环境)
			
 
				-  mineru-sglang-server --port 30000
			
 
				+  # 启动vllm server(需要安装vllm环境)
			
 
				+  mineru-vllm-server --port 30000
			
 
				   ``` 
			
 
				   >[!TIP]
			
 
				-  >在另一个终端中通过sglang client连接sglang server（只需cpu与网络，不需要sglang环境）
			
 
				+  >在另一个终端中通过http client连接vllm server（只需cpu与网络，不需要vllm环境）
			
 
				   > ```bash
			
 
				-  > mineru -p <input_path> -o <output_path> -b vlm-sglang-client -u http://127.0.0.1:30000
			
 
				+  > mineru -p <input_path> -o <output_path> -b vlm-http-client -u http://127.0.0.1:30000
			
 
				   > ```
			
 
				 
			
 
				 > [!NOTE]
			
 
				-> 所有sglang官方支持的参数都可用通过命令行参数传递给 MinerU，包括以下命令:`mineru`、`mineru-sglang-server`、`mineru-gradio`、`mineru-api`，
			
 
				-> 我们整理了一些`sglang`使用中的常用参数和使用方法，可以在文档[命令行进阶参数](./advanced_cli_parameters.md)中获取。
			
 
				+> 所有vllm官方支持的参数都可用通过命令行参数传递给 MinerU，包括以下命令:`mineru`、`mineru-vllm -server`、`mineru-gradio`、`mineru-api`，
			
 
				+> 我们整理了一些`vllm`使用中的常用参数和使用方法，可以在文档[命令行进阶参数](./advanced_cli_parameters.md)中获取。
			
 
				 
			
 
				 ## 基于配置文件扩展 MinerU 功能
			
 
				 
			
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -39,6 +39,7 @@ dependencies = [
 
				     "openai>=1.70.0,<2",
			
 
				     "beautifulsoup4>=4.13.5,<5",
			
 
				     "Pygments",
			
 
				+    "mineru_vl_utils",
			
 
				 ]
			
 
				 
			
 
				 [project.optional-dependencies]
			
@@ -50,10 +51,12 @@ test = [
 
				     "fuzzywuzzy"
			
 
				 ]
			
 
				 vlm = [
			
 
				-    "mineru_vl_utils[transformers]",
			
 
				+    "torch>=2.6.0,<2.8.0",
			
 
				+    "transformers>=4.51.1,<5.0.0",
			
 
				+    "accelerate>=1.5.1",
			
 
				 ]
			
 
				 vllm = [
			
 
				-    "mineru_vl_utils[vllm]",
			
 
				+    "vllm==0.10.1.1",
			
 
				 ]
			
 
				 pipeline = [
			
 
				     "matplotlib>=3.10,<4",