Browse Source

docs: update documentation for vllm integration and parameter optimization

myhloli 2 months ago
parent
commit
e120a90d11

+ 3 - 3
README.md

@@ -583,7 +583,7 @@ A WebUI developed based on Gradio, with a simple interface and only core parsing
         <td>Parsing Backend</td>
         <td>Parsing Backend</td>
         <td>pipeline</td>
         <td>pipeline</td>
         <td>vlm-transformers</td>
         <td>vlm-transformers</td>
-        <td>vlm-sglang</td>
+        <td>vlm-vllm</td>
     </tr>
     </tr>
     <tr>
     <tr>
         <td>Operating System</td>
         <td>Operating System</td>
@@ -661,8 +661,8 @@ You can use MinerU for PDF parsing through various methods such as command line,
 - [x] Handwritten Text Recognition  
 - [x] Handwritten Text Recognition  
 - [x] Vertical Text Recognition  
 - [x] Vertical Text Recognition  
 - [x] Latin Accent Mark Recognition
 - [x] Latin Accent Mark Recognition
-- [ ] Code block recognition in the main text
-- [ ] [Chemical formula recognition](docs/chemical_knowledge_introduction/introduction.pdf)
+- [x] Code block recognition in the main text
+- [x] [Chemical formula recognition](docs/chemical_knowledge_introduction/introduction.pdf)(mineru.net)
 - [ ] Geometric shape recognition
 - [ ] Geometric shape recognition
 
 
 # Known Issues
 # Known Issues

+ 3 - 3
README_zh-CN.md

@@ -570,7 +570,7 @@ https://github.com/user-attachments/assets/4bea02c9-6d54-4cd6-97ed-dff14340982c
         <td>解析后端</td>
         <td>解析后端</td>
         <td>pipeline</td>
         <td>pipeline</td>
         <td>vlm-transformers</td>
         <td>vlm-transformers</td>
-        <td>vlm-sglang</td>
+        <td>vlm-vllm</td>
     </tr>
     </tr>
     <tr>
     <tr>
         <td>操作系统</td>
         <td>操作系统</td>
@@ -648,8 +648,8 @@ mineru -p <input_path> -o <output_path>
 - [x] 手写文本识别
 - [x] 手写文本识别
 - [x] 竖排文本识别
 - [x] 竖排文本识别
 - [x] 拉丁字母重音符号识别
 - [x] 拉丁字母重音符号识别
-- [ ] 正文中代码块识别
-- [ ] [化学式识别](docs/chemical_knowledge_introduction/introduction.pdf)
+- [x] 正文中代码块识别
+- [x] [化学式识别](docs/chemical_knowledge_introduction/introduction.pdf)(https://mineru.net)
 - [ ] 图表内容识别
 - [ ] 图表内容识别
 
 
 # Known Issues
 # Known Issues

+ 0 - 12
docs/en/faq/index.md

@@ -15,18 +15,6 @@ For unresolved problems, join our [Discord](https://discord.gg/Tdedn9GTXq) or [W
     Reference: [#388](https://github.com/opendatalab/MinerU/issues/388)
     Reference: [#388](https://github.com/opendatalab/MinerU/issues/388)
 
 
 
 
-??? question "Error when installing MinerU on CentOS 7 or Ubuntu 18: `ERROR: Failed building wheel for simsimd`"
-
-    The new version of albumentations (1.4.21) introduces a dependency on simsimd. Since the pre-built package of simsimd for Linux requires a glibc version greater than or equal to 2.28, this causes installation issues on some Linux distributions released before 2019. You can resolve this issue by using the following command:
-    ```
-    conda create -n mineru python=3.11 -y
-    conda activate mineru
-    pip install -U "mineru[pipeline_old_linux]"
-    ```
-    
-    Reference: [#1004](https://github.com/opendatalab/MinerU/issues/1004)
-
-
 ??? question "Missing text information in parsing results when installing and using on Linux systems."
 ??? question "Missing text information in parsing results when installing and using on Linux systems."
 
 
     MinerU uses `pypdfium2` instead of `pymupdf` as the PDF page rendering engine in versions >=2.0 to resolve AGPLv3 license issues. On some Linux distributions, due to missing CJK fonts, some text may be lost during the process of rendering PDFs to images.
     MinerU uses `pypdfium2` instead of `pymupdf` as the PDF page rendering engine in versions >=2.0 to resolve AGPLv3 license issues. On some Linux distributions, due to missing CJK fonts, some text may be lost during the process of rendering PDFs to images.

+ 12 - 15
docs/en/quick_start/docker_deployment.md

@@ -6,25 +6,22 @@ MinerU provides a convenient Docker deployment method, which helps quickly set u
 
 
 ```bash
 ```bash
 wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docker/global/Dockerfile
 wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docker/global/Dockerfile
-docker build -t mineru-sglang:latest -f Dockerfile .
+docker build -t mineru-vllm:latest -f Dockerfile .
 ```
 ```
 
 
 > [!TIP]
 > [!TIP]
-> The [Dockerfile](https://github.com/opendatalab/MinerU/blob/master/docker/global/Dockerfile) uses `lmsysorg/sglang:v0.4.10.post2-cu126` as the base image by default, supporting Turing/Ampere/Ada Lovelace/Hopper platforms.
-> If you are using the newer `Blackwell` platform, please modify the base image to `lmsysorg/sglang:v0.4.10.post2-cu128-b200` before executing the build operation.
+> The [Dockerfile](https://github.com/opendatalab/MinerU/blob/master/docker/global/Dockerfile) uses `vllm/vllm-openai:v0.10.1.1` as the base image by default, supporting Turing/Ampere/Ada Lovelace/Hopper/Blackwell platforms.
 
 
 ## Docker Description
 ## Docker Description
 
 
-MinerU's Docker uses `lmsysorg/sglang` as the base image, so it includes the `sglang` inference acceleration framework and necessary dependencies by default. Therefore, on compatible devices, you can directly use `sglang` to accelerate VLM model inference.
+MinerU's Docker uses `vllm/vllm-openai` as the base image, so it includes the `vllm` inference acceleration framework and necessary dependencies by default. Therefore, on compatible devices, you can directly use `vllm` to accelerate VLM model inference.
 
 
 > [!NOTE]
 > [!NOTE]
-> Requirements for using `sglang` to accelerate VLM model inference:
+> Requirements for using `vllm` to accelerate VLM model inference:
 > 
 > 
 > - Device must have Turing architecture or later graphics cards with 8GB+ available VRAM.
 > - Device must have Turing architecture or later graphics cards with 8GB+ available VRAM.
-> - The host machine's graphics driver should support CUDA 12.6 or higher; `Blackwell` platform should support CUDA 12.8 or higher. You can check the driver version using the `nvidia-smi` command.
+> - The host machine's graphics driver should support CUDA 12.8 or higher; You can check the driver version using the `nvidia-smi` command.
 > - Docker container must have access to the host machine's graphics devices.
 > - Docker container must have access to the host machine's graphics devices.
->
-> If your device doesn't meet the above requirements, you can still use other features of MinerU, but cannot use `sglang` to accelerate VLM model inference, meaning you cannot use the `vlm-sglang-engine` backend or start the `vlm-sglang-server` service.
 
 
 ## Start Docker Container
 ## Start Docker Container
 
 
@@ -33,7 +30,7 @@ docker run --gpus all \
   --shm-size 32g \
   --shm-size 32g \
   -p 30000:30000 -p 7860:7860 -p 8000:8000 \
   -p 30000:30000 -p 7860:7860 -p 8000:8000 \
   --ipc=host \
   --ipc=host \
-  -it mineru-sglang:latest \
+  -it mineru-vllm:latest \
   /bin/bash
   /bin/bash
 ```
 ```
 
 
@@ -53,19 +50,19 @@ wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docker/compose.yaml
 >
 >
 >- The `compose.yaml` file contains configurations for multiple services of MinerU, you can choose to start specific services as needed.
 >- The `compose.yaml` file contains configurations for multiple services of MinerU, you can choose to start specific services as needed.
 >- Different services might have additional parameter configurations, which you can view and edit in the `compose.yaml` file.
 >- Different services might have additional parameter configurations, which you can view and edit in the `compose.yaml` file.
->- Due to the pre-allocation of GPU memory by the `sglang` inference acceleration framework, you may not be able to run multiple `sglang` services simultaneously on the same machine. Therefore, ensure that other services that might use GPU memory have been stopped before starting the `vlm-sglang-server` service or using the `vlm-sglang-engine` backend.
+>- Due to the pre-allocation of GPU memory by the `vllm` inference acceleration framework, you may not be able to run multiple `vllm` services simultaneously on the same machine. Therefore, ensure that other services that might use GPU memory have been stopped before starting the `vlm-vllm-server` service or using the `vlm-vllm-engine` backend.
 
 
 ---
 ---
 
 
-### Start sglang-server service
-connect to `sglang-server` via `vlm-sglang-client` backend
+### Start vllm-server service
+connect to `vllm-server` via `vlm-http-client` backend
   ```bash
   ```bash
-  docker compose -f compose.yaml --profile sglang-server up -d
+  docker compose -f compose.yaml --profile vllm-server up -d
   ```
   ```
   >[!TIP]
   >[!TIP]
-  >In another terminal, connect to sglang server via sglang client (only requires CPU and network, no sglang environment needed)
+  >In another terminal, connect to vllm server via http client (only requires CPU and network, no vllm environment needed)
   > ```bash
   > ```bash
-  > mineru -p <input_path> -o <output_path> -b vlm-sglang-client -u http://<server_ip>:30000
+  > mineru -p <input_path> -o <output_path> -b vlm-http-client -u http://<server_ip>:30000
   > ```
   > ```
 
 
 ---
 ---

+ 6 - 14
docs/en/quick_start/extension_modules.md

@@ -4,34 +4,26 @@ MinerU supports installing extension modules on demand based on different needs
 ## Common Scenarios
 ## Common Scenarios
 
 
 ### Core Functionality Installation
 ### Core Functionality Installation
-The `core` module is the core dependency of MinerU, containing all functional modules except `sglang`. Installing this module ensures the basic functionality of MinerU works properly.
+The `core` module is the core dependency of MinerU, containing all functional modules except `vllm`. Installing this module ensures the basic functionality of MinerU works properly.
 ```bash
 ```bash
 uv pip install mineru[core]
 uv pip install mineru[core]
 ```
 ```
 
 
 ---
 ---
 
 
-### Using `sglang` to Accelerate VLM Model Inference
-The `sglang` module provides acceleration support for VLM model inference, suitable for graphics cards with Turing architecture and later (8GB+ VRAM). Installing this module can significantly improve model inference speed.
-In the configuration, `all` includes both `core` and `sglang` modules, so `mineru[all]` and `mineru[core,sglang]` are equivalent.
+### Using `vllm` to Accelerate VLM Model Inference
+The `vllm` module provides acceleration support for VLM model inference, suitable for graphics cards with Turing architecture and later (8GB+ VRAM). Installing this module can significantly improve model inference speed.
+In the configuration, `all` includes both `core` and `vllm` modules, so `mineru[all]` and `mineru[core,vllm]` are equivalent.
 ```bash
 ```bash
 uv pip install mineru[all]
 uv pip install mineru[all]
 ```
 ```
 > [!TIP]
 > [!TIP]
-> If exceptions occur during installation of the complete package including sglang, please refer to the [sglang official documentation](https://docs.sglang.ai/start/install.html) to try to resolve the issue, or directly use the [Docker](./docker_deployment.md) deployment method.
+> If exceptions occur during installation of the complete package including vllm, please refer to the [vllm official documentation](https://docs.vllm.ai/en/latest/getting_started/installation/index.html) to try to resolve the issue, or directly use the [Docker](./docker_deployment.md) deployment method.
 
 
 ---
 ---
 
 
 ### Installing Lightweight Client to Connect to sglang-server
 ### Installing Lightweight Client to Connect to sglang-server
-If you need to install a lightweight client on edge devices to connect to `sglang-server`, you can install the basic mineru package, which is very lightweight and suitable for devices with only CPU and network connectivity.
+If you need to install a lightweight client on edge devices to connect to `vllm-server`, you can install the basic mineru package, which is very lightweight and suitable for devices with only CPU and network connectivity.
 ```bash
 ```bash
 uv pip install mineru
 uv pip install mineru
 ```
 ```
-
----
-
-### Using Pipeline Backend on Outdated Linux Systems
-If your system is too outdated to meet the dependency requirements of `mineru[core]`, this option can minimally meet MinerU's runtime requirements, suitable for old systems that cannot be upgraded and only need to use the pipeline backend.
-```bash
-uv pip install mineru[pipeline_old_linux]
-```

+ 3 - 3
docs/en/quick_start/index.md

@@ -31,7 +31,7 @@ A WebUI developed based on Gradio, with a simple interface and only core parsing
         <td>Parsing Backend</td>
         <td>Parsing Backend</td>
         <td>pipeline</td>
         <td>pipeline</td>
         <td>vlm-transformers</td>
         <td>vlm-transformers</td>
-        <td>vlm-sglang</td>
+        <td>vlm-vllm</td>
     </tr>
     </tr>
     <tr>
     <tr>
         <td>Operating System</td>
         <td>Operating System</td>
@@ -80,8 +80,8 @@ uv pip install -e .[core]
 ```
 ```
 
 
 > [!TIP]
 > [!TIP]
-> `mineru[core]` includes all core features except `sglang` acceleration, compatible with Windows / Linux / macOS systems, suitable for most users.
-> If you need to use `sglang` acceleration for VLM model inference or install a lightweight client on edge devices, please refer to the documentation [Extension Modules Installation Guide](./extension_modules.md).
+> `mineru[core]` includes all core features except `vllm` acceleration, compatible with Windows / Linux / macOS systems, suitable for most users.
+> If you need to use `vllm` acceleration for VLM model inference or install a lightweight client on edge devices, please refer to the documentation [Extension Modules Installation Guide](./extension_modules.md).
 
 
 ---
 ---
  
  

+ 30 - 25
docs/en/reference/output_files.md

@@ -165,49 +165,51 @@ inference_result: list[PageInferenceResults] = []
 ]
 ]
 ```
 ```
 
 
-### VLM Output Results (model_output.txt)
+### VLM Output Results (model.json)
 
 
 > [!NOTE]
 > [!NOTE]
 > Only applicable to VLM backend
 > Only applicable to VLM backend
 
 
-**File naming format**: `{original_filename}_model_output.txt`
+**File naming format**: `{original_filename}_model.json`
 
 
 #### File Format Description
 #### File Format Description
 
 
-- Uses `----` to separate output results for each page
-- Each page contains multiple text blocks starting with `<|box_start|>` and ending with `<|md_end|>`
-
-#### Field Meanings
+- This file contains the raw output results from the VLM model, with two nested list layers: the outer layer represents pages, and the inner layer represents content blocks for each page
+- Each content block is a dict containing `type`, `bbox`, `angle`, and `content` fields
 
 
-| Tag | Format | Description |
-|-----|--------|-------------|
-| Bounding box | `<\|box_start\|>x0 y0 x1 y1<\|box_end\|>` | Quadrilateral coordinates (top-left, bottom-right points), coordinate values after scaling page to 1000×1000 |
-| Type tag | `<\|ref_start\|>type<\|ref_end\|>` | Content block type identifier |
-| Content | `<\|md_start\|>markdown content<\|md_end\|>` | Markdown content of the block |
 
 
 #### Supported Content Types
 #### Supported Content Types
 
 
 ```json
 ```json
 {
 {
-    "text": "Text",
-    "title": "Title", 
-    "image": "Image",
-    "image_caption": "Image caption",
-    "image_footnote": "Image footnote",
-    "table": "Table",
-    "table_caption": "Table caption", 
-    "table_footnote": "Table footnote",
-    "equation": "Interline formula"
+    "text",
+    "title", 
+    "equation",
+    "image",
+    "image_caption",
+    "image_footnote",
+    "table",
+    "table_caption",
+    "table_footnote",
+    "phonetic",
+    "code",
+    "code_caption",
+    "ref_text",
+    "algorithm",
+    "list",
+    "header",
+    "footer",
+    "page_number",
+    "aside_text", 
+    "page_footnote", 
 }
 }
 ```
 ```
 
 
-#### Special Tags
-
-- `<|txt_contd|>`: Appears at the end of text, indicating that this text block can be connected with subsequent text blocks
-- Table content uses `otsl` format and needs to be converted to HTML for rendering in Markdown
-
 ### Intermediate Processing Results (middle.json)
 ### Intermediate Processing Results (middle.json)
 
 
+> [!NOTE]
+> Only applicable to pipeline backend
+
 **File naming format**: `{original_filename}_middle.json`
 **File naming format**: `{original_filename}_middle.json`
 
 
 #### Top-level Structure
 #### Top-level Structure
@@ -390,6 +392,9 @@ Level 1 blocks (table | image)
 
 
 ### Content List (content_list.json)
 ### Content List (content_list.json)
 
 
+> [!NOTE]
+> Only applicable to pipeline backend
+
 **File naming format**: `{original_filename}_content_list.json`
 **File naming format**: `{original_filename}_content_list.json`
 
 
 #### Functionality
 #### Functionality

+ 8 - 21
docs/en/usage/advanced_cli_parameters.md

@@ -1,25 +1,17 @@
 # Advanced Command Line Parameters
 # Advanced Command Line Parameters
 
 
-## SGLang Acceleration Parameter Optimization
-
-### Memory Optimization Parameters
-> [!TIP]
-> SGLang acceleration mode currently supports running on Turing architecture graphics cards with a minimum of 8GB VRAM, but graphics cards with <24GB VRAM may encounter insufficient memory issues. You can optimize memory usage with the following parameters:
-> 
-> - If you encounter insufficient VRAM when using a single graphics card, you may need to reduce the KV cache size with `--mem-fraction-static 0.5`. If VRAM issues persist, try reducing it further to `0.4` or lower.
-> - If you have two or more graphics cards, you can try using tensor parallelism (TP) mode to simply expand available VRAM: `--tp-size 2`
+## vllm Acceleration Parameter Optimization
 
 
 ### Performance Optimization Parameters
 ### Performance Optimization Parameters
 > [!TIP]
 > [!TIP]
-> If you can already use SGLang normally for accelerated VLM model inference but still want to further improve inference speed, you can try the following parameters:
+> If you can already use vllm normally for accelerated VLM model inference but still want to further improve inference speed, you can try the following parameters:
 > 
 > 
-> - If you have multiple graphics cards, you can use SGLang's multi-card parallel mode to increase throughput: `--dp-size 2`
-> - You can also enable `torch.compile` to accelerate inference speed by approximately 15%: `--enable-torch-compile`
+> - If you have multiple graphics cards, you can use vllm's multi-card parallel mode to increase throughput: `--data-parallel-size 2`
 
 
 ### Parameter Passing Instructions
 ### Parameter Passing Instructions
 > [!TIP]
 > [!TIP]
-> - All officially supported SGLang parameters can be passed to MinerU through command line arguments, including the following commands: `mineru`, `mineru-sglang-server`, `mineru-gradio`, `mineru-api`
-> - If you want to learn more about `sglang` parameter usage, please refer to the [SGLang official documentation](https://docs.sglang.ai/backend/server_arguments.html#common-launch-commands)
+> - All officially supported vllm parameters can be passed to MinerU through command line arguments, including the following commands: `mineru`, `mineru-vllm-server`, `mineru-gradio`, `mineru-api`
+> - If you want to learn more about `vllm` parameter usage, please refer to the [vllm official documentation](https://docs.vllm.ai/en/latest/cli/serve.html)
 
 
 ## GPU Device Selection and Configuration
 ## GPU Device Selection and Configuration
 
 
@@ -29,7 +21,7 @@
 >   ```bash
 >   ```bash
 >   CUDA_VISIBLE_DEVICES=1 mineru -p <input_path> -o <output_path>
 >   CUDA_VISIBLE_DEVICES=1 mineru -p <input_path> -o <output_path>
 >   ```
 >   ```
-> - This specification method is effective for all command line calls, including `mineru`, `mineru-sglang-server`, `mineru-gradio`, and `mineru-api`, and applies to both `pipeline` and `vlm` backends.
+> - This specification method is effective for all command line calls, including `mineru`, `mineru-vllm-server`, `mineru-gradio`, and `mineru-api`, and applies to both `pipeline` and `vlm` backends.
 
 
 ### Common Device Configuration Examples
 ### Common Device Configuration Examples
 > [!TIP]
 > [!TIP]
@@ -46,14 +38,9 @@
 > [!TIP]
 > [!TIP]
 > Here are some possible usage scenarios:
 > Here are some possible usage scenarios:
 > 
 > 
-> - If you have multiple graphics cards and need to specify cards 0 and 1, using multi-card parallelism to start `sglang-server`, you can use the following command:
->   ```bash
->   CUDA_VISIBLE_DEVICES=0,1 mineru-sglang-server --port 30000 --dp-size 2
->   ```
-> 
-> - If you have multiple GPUs and need to specify GPU 0–3, and start the `sglang-server` using multi-GPU data parallelism and tensor parallelism, you can use the following command:
+> - If you have multiple graphics cards and need to specify cards 0 and 1, using multi-card parallelism to start `vllm-server`, you can use the following command:
 >   ```bash
 >   ```bash
->   CUDA_VISIBLE_DEVICES=0,1,2,3 mineru-sglang-server --port 30000 --dp-size 2 --tp-size 2
+>   CUDA_VISIBLE_DEVICES=0,1 mineru-sglang-server --port 30000 --data-parallel-size 2
 >   ```
 >   ```
 >       
 >       
 > - If you have multiple graphics cards and need to start two `fastapi` services on cards 0 and 1, listening on different ports respectively, you can use the following commands:
 > - If you have multiple graphics cards and need to start two `fastapi` services on cards 0 and 1, listening on different ports respectively, you can use the following commands:

+ 3 - 3
docs/en/usage/cli_tools.md

@@ -11,11 +11,11 @@ Options:
   -p, --path PATH                 Input file path or directory (required)
   -p, --path PATH                 Input file path or directory (required)
   -o, --output PATH               Output directory (required)
   -o, --output PATH               Output directory (required)
   -m, --method [auto|txt|ocr]     Parsing method: auto (default), txt, ocr (pipeline backend only)
   -m, --method [auto|txt|ocr]     Parsing method: auto (default), txt, ocr (pipeline backend only)
-  -b, --backend [pipeline|vlm-transformers|vlm-sglang-engine|vlm-sglang-client]
+  -b, --backend [pipeline|vlm-transformers|vlm-vllm-engine|vlm-http-client]
                                   Parsing backend (default: pipeline)
                                   Parsing backend (default: pipeline)
   -l, --lang [ch|ch_server|ch_lite|en|korean|japan|chinese_cht|ta|te|ka|th|el|latin|arabic|east_slavic|cyrillic|devanagari]
   -l, --lang [ch|ch_server|ch_lite|en|korean|japan|chinese_cht|ta|te|ka|th|el|latin|arabic|east_slavic|cyrillic|devanagari]
                                   Specify document language (improves OCR accuracy, pipeline backend only)
                                   Specify document language (improves OCR accuracy, pipeline backend only)
-  -u, --url TEXT                  Service address when using sglang-client
+  -u, --url TEXT                  Service address when using http-client
   -s, --start INTEGER             Starting page number for parsing (0-based)
   -s, --start INTEGER             Starting page number for parsing (0-based)
   -e, --end INTEGER               Ending page number for parsing (0-based)
   -e, --end INTEGER               Ending page number for parsing (0-based)
   -f, --formula BOOLEAN           Enable formula parsing (default: enabled)
   -f, --formula BOOLEAN           Enable formula parsing (default: enabled)
@@ -45,7 +45,7 @@ Options:
                                   files to be input need to be placed in the
                                   files to be input need to be placed in the
                                   `example` folder within the directory where
                                   `example` folder within the directory where
                                   the command is currently executed.
                                   the command is currently executed.
-  --enable-sglang-engine BOOLEAN  Enable SgLang engine backend for faster
+  --enable-vllm-engine BOOLEAN  Enable vllm engine backend for faster
                                   processing.
                                   processing.
   --enable-api BOOLEAN            Enable gradio API for serving the
   --enable-api BOOLEAN            Enable gradio API for serving the
                                   application.
                                   application.

+ 11 - 11
docs/en/usage/quick_usage.md

@@ -33,7 +33,7 @@ mineru -p <input_path> -o <output_path> -b vlm-transformers
 
 
 If you need to adjust parsing options through custom parameters, you can also check the more detailed [Command Line Tools Usage Instructions](./cli_tools.md) in the documentation.
 If you need to adjust parsing options through custom parameters, you can also check the more detailed [Command Line Tools Usage Instructions](./cli_tools.md) in the documentation.
 
 
-## Advanced Usage via API, WebUI, sglang-client/server
+## Advanced Usage via API, WebUI, http-client/server
 
 
 - Direct Python API calls: [Python Usage Example](https://github.com/opendatalab/MinerU/blob/master/demo/demo.py)
 - Direct Python API calls: [Python Usage Example](https://github.com/opendatalab/MinerU/blob/master/demo/demo.py)
 - FastAPI calls:
 - FastAPI calls:
@@ -44,29 +44,29 @@ If you need to adjust parsing options through custom parameters, you can also ch
   >Access `http://127.0.0.1:8000/docs` in your browser to view the API documentation.
   >Access `http://127.0.0.1:8000/docs` in your browser to view the API documentation.
 - Start Gradio WebUI visual frontend:
 - Start Gradio WebUI visual frontend:
   ```bash
   ```bash
-  # Using pipeline/vlm-transformers/vlm-sglang-client backends
+  # Using pipeline/vlm-transformers/vlm-vllm-client backends
   mineru-gradio --server-name 0.0.0.0 --server-port 7860
   mineru-gradio --server-name 0.0.0.0 --server-port 7860
-  # Or using vlm-sglang-engine/pipeline backends (requires sglang environment)
-  mineru-gradio --server-name 0.0.0.0 --server-port 7860 --enable-sglang-engine true
+  # Or using vlm-vllm-engine/pipeline backends (requires vllm environment)
+  mineru-gradio --server-name 0.0.0.0 --server-port 7860 --enable-vllm-engine true
   ```
   ```
   >[!TIP]
   >[!TIP]
   >
   >
   >- Access `http://127.0.0.1:7860` in your browser to use the Gradio WebUI.
   >- Access `http://127.0.0.1:7860` in your browser to use the Gradio WebUI.
   >- Access `http://127.0.0.1:7860/?view=api` to use the Gradio API.
   >- Access `http://127.0.0.1:7860/?view=api` to use the Gradio API.
-- Using `sglang-client/server` method:
+- Using `http-client/server` method:
   ```bash
   ```bash
-  # Start sglang server (requires sglang environment)
-  mineru-sglang-server --port 30000
+  # Start vllm server (requires vllm environment)
+  mineru-vllm-server --port 30000
   ``` 
   ``` 
   >[!TIP]
   >[!TIP]
-  >In another terminal, connect to sglang server via sglang client (only requires CPU and network, no sglang environment needed)
+  >In another terminal, connect to vllm server via http client (only requires CPU and network, no vllm environment needed)
   > ```bash
   > ```bash
-  > mineru -p <input_path> -o <output_path> -b vlm-sglang-client -u http://127.0.0.1:30000
+  > mineru -p <input_path> -o <output_path> -b vlm-http-client -u http://127.0.0.1:30000
   > ```
   > ```
 
 
 > [!NOTE]
 > [!NOTE]
-> All officially supported sglang parameters can be passed to MinerU through command line arguments, including the following commands: `mineru`, `mineru-sglang-server`, `mineru-gradio`, `mineru-api`.
-> We have compiled some commonly used parameters and usage methods for `sglang`, which can be found in the documentation [Advanced Command Line Parameters](./advanced_cli_parameters.md).
+> All officially supported vllm parameters can be passed to MinerU through command line arguments, including the following commands: `mineru`, `mineru-vllm-server`, `mineru-gradio`, `mineru-api`.
+> We have compiled some commonly used parameters and usage methods for `vllm`, which can be found in the documentation [Advanced Command Line Parameters](./advanced_cli_parameters.md).
 
 
 ## Extending MinerU Functionality with Configuration Files
 ## Extending MinerU Functionality with Configuration Files
 
 

+ 0 - 12
docs/zh/faq/index.md

@@ -14,18 +14,6 @@
     
     
     参考:[#388](https://github.com/opendatalab/MinerU/issues/388)
     参考:[#388](https://github.com/opendatalab/MinerU/issues/388)
 
 
-
-??? question "在 CentOS 7 或 Ubuntu 18 系统安装MinerU时报错`ERROR: Failed building wheel for simsimd`"
-
-    新版本albumentations(1.4.21)引入了依赖simsimd,由于simsimd在linux的预编译包要求glibc的版本大于等于2.28,导致部分2019年之前发布的Linux发行版无法正常安装,可通过如下命令安装:
-    ```
-    conda create -n mineru python=3.11 -y
-    conda activate mineru
-    pip install -U "mineru[pipeline_old_linux]"
-    ```
-    
-    参考:[#1004](https://github.com/opendatalab/MinerU/issues/1004)
-
 ??? question "在 Linux 系统安装并使用时,解析结果缺失部份文字信息。"
 ??? question "在 Linux 系统安装并使用时,解析结果缺失部份文字信息。"
 
 
     MinerU在>=2.0的版本中使用`pypdfium2`代替`pymupdf`作为PDF页面的渲染引擎,以解决AGPLv3的许可证问题,在某些Linux发行版,由于缺少CJK字体,可能会在将PDF渲染成图片的过程中丢失部份文字。
     MinerU在>=2.0的版本中使用`pypdfium2`代替`pymupdf`作为PDF页面的渲染引擎,以解决AGPLv3的许可证问题,在某些Linux发行版,由于缺少CJK字体,可能会在将PDF渲染成图片的过程中丢失部份文字。

+ 13 - 15
docs/zh/quick_start/docker_deployment.md

@@ -6,24 +6,22 @@ MinerU提供了便捷的docker部署方式,这有助于快速搭建环境并
 
 
 ```bash
 ```bash
 wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docker/china/Dockerfile
 wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docker/china/Dockerfile
-docker build -t mineru-sglang:latest -f Dockerfile .
+docker build -t mineru-vllm:latest -f Dockerfile .
 ```
 ```
 
 
 > [!TIP]
 > [!TIP]
-> [Dockerfile](https://github.com/opendatalab/MinerU/blob/master/docker/china/Dockerfile)默认使用`lmsysorg/sglang:v0.4.10.post2-cu126`作为基础镜像,支持Turing/Ampere/Ada Lovelace/Hopper平台,
-> 如您使用较新的`Blackwell`平台,请将基础镜像修改为`lmsysorg/sglang:v0.4.10.post2-cu128-b200` 再执行build操作。
+> [Dockerfile](https://github.com/opendatalab/MinerU/blob/master/docker/china/Dockerfile)默认使用`vllm/vllm-openai:v0.10.1.1`作为基础镜像,支持Turing/Ampere/Ada Lovelace/Hopper/Blackwell平台,
 
 
 ## Docker说明
 ## Docker说明
 
 
-Mineru的docker使用了`lmsysorg/sglang`作为基础镜像,因此在docker中默认集成了`sglang`推理加速框架和必需的依赖环境。因此在满足条件的设备上,您可以直接使用`sglang`加速VLM模型推理。
+Mineru的docker使用了`vllm/vllm-openai`作为基础镜像,因此在docker中默认集成了`vllm`推理加速框架和必需的依赖环境。因此在满足条件的设备上,您可以直接使用`vllm`加速VLM模型推理。
 > [!NOTE]
 > [!NOTE]
-> 使用`sglang`加速VLM模型推理需要满足的条件是:
+> 使用`vllm`加速VLM模型推理需要满足的条件是:
 > 
 > 
 > - 设备包含Turing及以后架构的显卡,且可用显存大于等于8G。
 > - 设备包含Turing及以后架构的显卡,且可用显存大于等于8G。
-> - 物理机的显卡驱动应支持CUDA 12.6或更高版本,`Blackwell`平台应支持CUDA 12.8及更高版本,可通过`nvidia-smi`命令检查驱动版本。
+> - 物理机的显卡驱动应支持CUDA 12.8或更高版本,可通过`nvidia-smi`命令检查驱动版本。
 > - docker中能够访问物理机的显卡设备。
 > - docker中能够访问物理机的显卡设备。
->
-> 如果您的设备不满足上述条件,您仍然可以使用MinerU的其他功能,但无法使用`sglang`加速VLM模型推理,即无法使用`vlm-sglang-engine`后端和启动`vlm-sglang-server`服务。
+
 
 
 ## 启动 Docker 容器
 ## 启动 Docker 容器
 
 
@@ -32,7 +30,7 @@ docker run --gpus all \
   --shm-size 32g \
   --shm-size 32g \
   -p 30000:30000 -p 7860:7860 -p 8000:8000 \
   -p 30000:30000 -p 7860:7860 -p 8000:8000 \
   --ipc=host \
   --ipc=host \
-  -it mineru-sglang:latest \
+  -it mineru-vllm:latest \
   /bin/bash
   /bin/bash
 ```
 ```
 
 
@@ -51,19 +49,19 @@ wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docker/compose.yaml
 >  
 >  
 >- `compose.yaml`文件中包含了MinerU的多个服务配置,您可以根据需要选择启动特定的服务。
 >- `compose.yaml`文件中包含了MinerU的多个服务配置,您可以根据需要选择启动特定的服务。
 >- 不同的服务可能会有额外的参数配置,您可以在`compose.yaml`文件中查看并编辑。
 >- 不同的服务可能会有额外的参数配置,您可以在`compose.yaml`文件中查看并编辑。
->- 由于`sglang`推理加速框架预分配显存的特性,您可能无法在同一台机器上同时运行多个`sglang`服务,因此请确保在启动`vlm-sglang-server`服务或使用`vlm-sglang-engine`后端时,其他可能使用显存的服务已停止。
+>- 由于`vllm`推理加速框架预分配显存的特性,您可能无法在同一台机器上同时运行多个`vllm`服务,因此请确保在启动`vlm-vllm-server`服务或使用`vlm-vllm-engine`后端时,其他可能使用显存的服务已停止。
 
 
 ---
 ---
 
 
-### 启动 sglang-server 服务
-并通过`vlm-sglang-client`后端连接`sglang-server`
+### 启动 vllm-server 服务
+并通过`vlm-http-client`后端连接`vllm-server`
   ```bash
   ```bash
-  docker compose -f compose.yaml --profile sglang-server up -d
+  docker compose -f compose.yaml --profile vllm-server up -d
   ```
   ```
   >[!TIP]
   >[!TIP]
-  >在另一个终端中通过sglang client连接sglang server(只需cpu与网络,不需要sglang环境)
+  >在另一个终端中通过http client连接vllm server(只需cpu与网络,不需要vllm环境)
   > ```bash
   > ```bash
-  > mineru -p <input_path> -o <output_path> -b vlm-sglang-client -u http://<server_ip>:30000
+  > mineru -p <input_path> -o <output_path> -b vlm-http-client -u http://<server_ip>:30000
   > ```
   > ```
 
 
 ---
 ---

+ 7 - 15
docs/zh/quick_start/extension_modules.md

@@ -4,34 +4,26 @@ MinerU 支持根据不同需求,按需安装扩展模块,以增强功能或
 ## 常见场景
 ## 常见场景
 
 
 ### 核心功能安装
 ### 核心功能安装
-`core` 模块是 MinerU 的核心依赖,包含了除`sglang`外的所有功能模块。安装此模块可以确保 MinerU 的基本功能正常运行。
+`core` 模块是 MinerU 的核心依赖,包含了除`vllm`外的所有功能模块。安装此模块可以确保 MinerU 的基本功能正常运行。
 ```bash
 ```bash
 uv pip install mineru[core]
 uv pip install mineru[core]
 ```
 ```
 
 
 ---
 ---
 
 
-### 使用`sglang`加速 VLM 模型推理
-`sglang` 模块提供了对 VLM 模型推理的加速支持,适用于具有 Turing 及以后架构的显卡(8G 显存及以上)。安装此模块可以显著提升模型推理速度。
-在配置中,`all`包含了`core`和`sglang`模块,因此`mineru[all]`和`mineru[core,sglang]`是等价的。
+### 使用`vllm`加速 VLM 模型推理
+`vllm` 模块提供了对 VLM 模型推理的加速支持,适用于具有 Turing 及以后架构的显卡(8G 显存及以上)。安装此模块可以显著提升模型推理速度。
+在配置中,`all`包含了`core`和`vllm`模块,因此`mineru[all]`和`mineru[core,vllm]`是等价的。
 ```bash
 ```bash
 uv pip install mineru[all]
 uv pip install mineru[all]
 ```
 ```
 > [!TIP]
 > [!TIP]
-> 如在安装包含sglang的完整包过程中发生异常,请参考 [sglang 官方文档](https://docs.sglang.ai/start/install.html) 尝试解决,或直接使用 [Docker](./docker_deployment.md) 方式部署镜像。
+> 如在安装包含vllm的完整包过程中发生异常,请参考 [vllm 官方文档](https://docs.vllm.ai/en/latest/getting_started/installation/index.html) 尝试解决,或直接使用 [Docker](./docker_deployment.md) 方式部署镜像。
 
 
 ---
 ---
 
 
-### 安装轻量版client连接sglang-server使用
-如果您需要在边缘设备上安装轻量版的 client 端以连接 `sglang-server`,可以安装mineru的基础包,非常轻量,适合在只有cpu和网络连接的设备上使用。
+### 安装轻量版client连接vllm-server使用
+如果您需要在边缘设备上安装轻量版的 client 端以连接 `vllm-server`,可以安装mineru的基础包,非常轻量,适合在只有cpu和网络连接的设备上使用。
 ```bash
 ```bash
 uv pip install mineru
 uv pip install mineru
 ```
 ```
-
----
-
-### 在过时的linux系统上使用pipeline后端
-如果您的系统过于陈旧,无法满足`mineru[core]`的依赖要求,该选项可以最低限度的满足 MinerU 的运行需求,适用于老旧系统无法升级且仅需使用 pipeline 后端的场景。
-```bash
-uv pip install mineru[pipeline_old_linux]
-```

+ 3 - 3
docs/zh/quick_start/index.md

@@ -31,7 +31,7 @@
         <td>解析后端</td>
         <td>解析后端</td>
         <td>pipeline</td>
         <td>pipeline</td>
         <td>vlm-transformers</td>
         <td>vlm-transformers</td>
-        <td>vlm-sglang</td>
+        <td>vlm-vllm</td>
     </tr>
     </tr>
     <tr>
     <tr>
         <td>操作系统</td>
         <td>操作系统</td>
@@ -80,8 +80,8 @@ uv pip install -e .[core] -i https://mirrors.aliyun.com/pypi/simple
 ```
 ```
 
 
 > [!TIP]
 > [!TIP]
-> `mineru[core]`包含除`sglang`加速外的所有核心功能,兼容Windows / Linux / macOS系统,适合绝大多数用户。
-> 如果您有使用`sglang`加速VLM模型推理,或是在边缘设备安装轻量版client端等需求,可以参考文档[扩展模块安装指南](./extension_modules.md)。
+> `mineru[core]`包含除`vllm`加速外的所有核心功能,兼容Windows / Linux / macOS系统,适合绝大多数用户。
+> 如果您有使用`vllm`加速VLM模型推理,或是在边缘设备安装轻量版client端等需求,可以参考文档[扩展模块安装指南](./extension_modules.md)。
 
 
 ---
 ---
  
  

+ 30 - 24
docs/zh/reference/output_files.md

@@ -165,49 +165,52 @@ inference_result: list[PageInferenceResults] = []
 ]
 ]
 ```
 ```
 
 
-### VLM 输出结果 (model_output.txt)
+### VLM 输出结果 (model.json)
 
 
 > [!NOTE]
 > [!NOTE]
 > 仅适用于 VLM 后端
 > 仅适用于 VLM 后端
 
 
-**文件命名格式**:`{原文件名}_model_output.txt`
+**文件命名格式**:`{原文件名}_model.json`
 
 
 #### 文件格式说明
 #### 文件格式说明
 
 
-- 使用 `----` 分割每一页的输出结果
-- 每页包含多个以 `<|box_start|>` 开头、`<|md_end|>` 结尾的文本块
-
-#### 字段含义
+- 该文件为 VLM 模型的原始输出结果,包含两层嵌套list,外层表示页面,内层表示该页的内容块
+- 每个内容块都是一个dict,包含 `type`、`bbox`、`angle`、`content` 字段
 
 
-| 标记 | 格式 | 说明 |
-|------|---|------|
-| 边界框 | `<\|box_start\|>x0 y0 x1 y1<\|box_end\|>` | 四边形坐标(左上、右下两点),页面缩放至 1000×1000 后的坐标值 |
-| 类型标记 | `<\|ref_start\|>type<\|ref_end\|>` | 内容块类型标识 |
-| 内容 | `<\|md_start\|>markdown内容<\|md_end\|>` | 该块的 Markdown 内容 |
 
 
 #### 支持的内容类型
 #### 支持的内容类型
 
 
 ```json
 ```json
 {
 {
-    "text": "文本",
-    "title": "标题", 
-    "image": "图片",
-    "image_caption": "图片描述",
-    "image_footnote": "图片脚注",
-    "table": "表格",
-    "table_caption": "表格描述", 
-    "table_footnote": "表格脚注",
-    "equation": "行间公式"
+    "text",
+    "title", 
+    "equation",
+    "image",
+    "image_caption",
+    "image_footnote",
+    "table",
+    "table_caption",
+    "table_footnote",
+    "phonetic",
+    "code",
+    "code_caption",
+    "ref_text",
+    "algorithm",
+    "list",
+    "header",
+    "footer",
+    "page_number",
+    "aside_text", 
+    "page_footnote", 
 }
 }
 ```
 ```
 
 
-#### 特殊标记
-
-- `<|txt_contd|>`:出现在文本末尾,表示该文本块可与后续文本块连接
-- 表格内容采用 `otsl` 格式,需转换为 HTML 才能在 Markdown 中渲染
 
 
 ### 中间处理结果 (middle.json)
 ### 中间处理结果 (middle.json)
 
 
+> [!NOTE]
+> 仅适用于 pipeline 后端
+
 **文件命名格式**:`{原文件名}_middle.json`
 **文件命名格式**:`{原文件名}_middle.json`
 
 
 #### 顶层结构
 #### 顶层结构
@@ -390,6 +393,9 @@ inference_result: list[PageInferenceResults] = []
 
 
 ### 内容列表 (content_list.json)
 ### 内容列表 (content_list.json)
 
 
+> [!NOTE]
+> 仅适用于 pipeline 后端
+
 **文件命名格式**:`{原文件名}_content_list.json`
 **文件命名格式**:`{原文件名}_content_list.json`
 
 
 #### 功能说明
 #### 功能说明

+ 8 - 21
docs/zh/usage/advanced_cli_parameters.md

@@ -1,25 +1,17 @@
 # 命令行参数进阶
 # 命令行参数进阶
 
 
-## SGLang 加速参数优化
-
-### 显存优化参数
-> [!TIP]
-> sglang加速模式目前支持在最低8G显存的Turing架构显卡上运行,但在显存<24G的显卡上可能会遇到显存不足的问题, 可以通过使用以下参数来优化显存使用:
-> 
-> - 如果您使用单张显卡遇到显存不足的情况时,可能需要调低KV缓存大小,`--mem-fraction-static 0.5`,如仍出现显存不足问题,可尝试进一步降低到`0.4`或更低
-> - 如您有两张以上显卡,可尝试通过张量并行(TP)模式简单扩充可用显存:`--tp-size 2`
+## vllm 加速参数优化
 
 
 ### 性能优化参数
 ### 性能优化参数
 > [!TIP]
 > [!TIP]
-> 如果您已经可以正常使用sglang对vlm模型进行加速推理,但仍然希望进一步提升推理速度,可以尝试以下参数:
+> 如果您已经可以正常使用vllm对vlm模型进行加速推理,但仍然希望进一步提升推理速度,可以尝试以下参数:
 > 
 > 
-> - 如果您有超过多张显卡,可以使用sglang的多卡并行模式来增加吞吐量:`--dp-size 2`
-> - 同时您可以启用`torch.compile`来将推理速度加速约15%:`--enable-torch-compile`
+> - 如果您有超过多张显卡,可以使用vllm的多卡并行模式来增加吞吐量:`--data-parallel-size 2`
 
 
 ### 参数传递说明
 ### 参数传递说明
 > [!TIP]
 > [!TIP]
-> - 所有sglang官方支持的参数都可用通过命令行参数传递给 MinerU,包括以下命令:`mineru`、`mineru-sglang-server`、`mineru-gradio`、`mineru-api`
-> - 如果您想了解更多有关`sglang`的参数使用方法,请参考 [sglang官方文档](https://docs.sglang.ai/backend/server_arguments.html#common-launch-commands)
+> - 所有vllm官方支持的参数都可用通过命令行参数传递给 MinerU,包括以下命令:`mineru`、`mineru-vllm-server`、`mineru-gradio`、`mineru-api`
+> - 如果您想了解更多有关`vllm`的参数使用方法,请参考 [vllm官方文档](https://docs.vllm.ai/en/latest/cli/serve.html)
 
 
 ## GPU 设备选择与配置
 ## GPU 设备选择与配置
 
 
@@ -29,7 +21,7 @@
 >   ```bash
 >   ```bash
 >   CUDA_VISIBLE_DEVICES=1 mineru -p <input_path> -o <output_path>
 >   CUDA_VISIBLE_DEVICES=1 mineru -p <input_path> -o <output_path>
 >   ```
 >   ```
-> - 这种指定方式对所有的命令行调用都有效,包括 `mineru`、`mineru-sglang-server`、`mineru-gradio` 和 `mineru-api`,且对`pipeline`、`vlm`后端均适用。
+> - 这种指定方式对所有的命令行调用都有效,包括 `mineru`、`mineru-vllm-server`、`mineru-gradio` 和 `mineru-api`,且对`pipeline`、`vlm`后端均适用。
 
 
 ### 常见设备配置示例
 ### 常见设备配置示例
 > [!TIP]
 > [!TIP]
@@ -47,14 +39,9 @@
 > [!TIP]
 > [!TIP]
 > 以下是一些可能的使用场景:
 > 以下是一些可能的使用场景:
 > 
 > 
-> - 如果您有多张显卡,需要指定卡0和卡1,并使用多卡并行来启动`sglang-server`,可以使用以下命令: 
->   ```bash
->   CUDA_VISIBLE_DEVICES=0,1 mineru-sglang-server --port 30000 --dp-size 2
->   ```
->   
-> - 如果您有多张显卡,需要指定卡0-3,并使用多卡数据并行和张量并行来启动`sglang-server`,可以使用以下命令: 
+> - 如果您有多张显卡,需要指定卡0和卡1,并使用多卡并行来启动`vllm-server`,可以使用以下命令: 
 >   ```bash
 >   ```bash
->   CUDA_VISIBLE_DEVICES=0,1,2,3 mineru-sglang-server --port 30000 --dp-size 2 --tp-size 2
+>   CUDA_VISIBLE_DEVICES=0,1 mineru-vllm-server --port 30000 --data-parallel-size 2
 >   ```
 >   ```
 >   
 >   
 > - 如果您有多张显卡,需要在卡0和卡1上启动两个`fastapi`服务,并分别监听不同的端口,可以使用以下命令: 
 > - 如果您有多张显卡,需要在卡0和卡1上启动两个`fastapi`服务,并分别监听不同的端口,可以使用以下命令: 

+ 3 - 3
docs/zh/usage/cli_tools.md

@@ -11,11 +11,11 @@ Options:
   -p, --path PATH                 输入文件路径或目录(必填)
   -p, --path PATH                 输入文件路径或目录(必填)
   -o, --output PATH               输出目录(必填)
   -o, --output PATH               输出目录(必填)
   -m, --method [auto|txt|ocr]     解析方法:auto(默认)、txt、ocr(仅用于 pipeline 后端)
   -m, --method [auto|txt|ocr]     解析方法:auto(默认)、txt、ocr(仅用于 pipeline 后端)
-  -b, --backend [pipeline|vlm-transformers|vlm-sglang-engine|vlm-sglang-client]
+  -b, --backend [pipeline|vlm-transformers|vlm-vllm-engine|vlm-http-client]
                                   解析后端(默认为 pipeline)
                                   解析后端(默认为 pipeline)
   -l, --lang [ch|ch_server|ch_lite|en|korean|japan|chinese_cht|ta|te|ka|th|el|latin|arabic|east_slavic|cyrillic|devanagari]
   -l, --lang [ch|ch_server|ch_lite|en|korean|japan|chinese_cht|ta|te|ka|th|el|latin|arabic|east_slavic|cyrillic|devanagari]
                                   指定文档语言(可提升 OCR 准确率,仅用于 pipeline 后端)
                                   指定文档语言(可提升 OCR 准确率,仅用于 pipeline 后端)
-  -u, --url TEXT                  当使用 sglang-client 时,需指定服务地址
+  -u, --url TEXT                  当使用 http-client 时,需指定服务地址
   -s, --start INTEGER             开始解析的页码(从 0 开始)
   -s, --start INTEGER             开始解析的页码(从 0 开始)
   -e, --end INTEGER               结束解析的页码(从 0 开始)
   -e, --end INTEGER               结束解析的页码(从 0 开始)
   -f, --formula BOOLEAN           是否启用公式解析(默认开启)
   -f, --formula BOOLEAN           是否启用公式解析(默认开启)
@@ -43,7 +43,7 @@ Usage: mineru-gradio [OPTIONS]
 Options:
 Options:
   --enable-example BOOLEAN        启用示例文件输入(需要将示例文件放置在当前
   --enable-example BOOLEAN        启用示例文件输入(需要将示例文件放置在当前
                                   执行命令目录下的 `example` 文件夹中)
                                   执行命令目录下的 `example` 文件夹中)
-  --enable-sglang-engine BOOLEAN  启用 SgLang 引擎后端以提高处理速度
+  --enable-vllm-engine BOOLEAN  启用 vllm 引擎后端以提高处理速度
   --enable-api BOOLEAN            启用 Gradio API 以提供应用程序服务
   --enable-api BOOLEAN            启用 Gradio API 以提供应用程序服务
   --max-convert-pages INTEGER     设置从 PDF 转换为 Markdown 的最大页数
   --max-convert-pages INTEGER     设置从 PDF 转换为 Markdown 的最大页数
   --server-name TEXT              设置 Gradio 应用程序的服务器主机名
   --server-name TEXT              设置 Gradio 应用程序的服务器主机名

+ 12 - 12
docs/zh/usage/quick_usage.md

@@ -28,11 +28,11 @@ mineru -p <input_path> -o <output_path>
 mineru -p <input_path> -o <output_path> -b vlm-transformers
 mineru -p <input_path> -o <output_path> -b vlm-transformers
 ```
 ```
 > [!TIP]
 > [!TIP]
-> vlm后端另外支持`sglang`加速,与`transformers`后端相比,`sglang`的加速比可达20~30倍,可以在[扩展模块安装指南](../quick_start/extension_modules.md)中查看支持`sglang`加速的完整包安装方法。
+> vlm后端另外支持`vllm`加速,与`transformers`后端相比,`vllm`的加速比可达20~30倍,可以在[扩展模块安装指南](../quick_start/extension_modules.md)中查看支持`vllm`加速的完整包安装方法。
 
 
 如果需要通过自定义参数调整解析选项,您也可以在文档中查看更详细的[命令行工具使用说明](./cli_tools.md)。
 如果需要通过自定义参数调整解析选项,您也可以在文档中查看更详细的[命令行工具使用说明](./cli_tools.md)。
 
 
-## 通过api、webui、sglang-client/server进阶使用
+## 通过api、webui、http-client/server进阶使用
 
 
 - 通过python api直接调用:[Python 调用示例](https://github.com/opendatalab/MinerU/blob/master/demo/demo.py)
 - 通过python api直接调用:[Python 调用示例](https://github.com/opendatalab/MinerU/blob/master/demo/demo.py)
 - 通过fast api方式调用:
 - 通过fast api方式调用:
@@ -43,29 +43,29 @@ mineru -p <input_path> -o <output_path> -b vlm-transformers
   >在浏览器中访问 `http://127.0.0.1:8000/docs` 查看API文档。
   >在浏览器中访问 `http://127.0.0.1:8000/docs` 查看API文档。
 - 启动gradio webui 可视化前端:
 - 启动gradio webui 可视化前端:
   ```bash
   ```bash
-  # 使用 pipeline/vlm-transformers/vlm-sglang-client 后端
+  # 使用 pipeline/vlm-transformers/vlm-http-client 后端
   mineru-gradio --server-name 0.0.0.0 --server-port 7860
   mineru-gradio --server-name 0.0.0.0 --server-port 7860
-  # 或使用 vlm-sglang-engine/pipeline 后端(需安装sglang环境)
-  mineru-gradio --server-name 0.0.0.0 --server-port 7860 --enable-sglang-engine true
+  # 或使用 vlm-vllm-engine/pipeline 后端(需安装vllm环境)
+  mineru-gradio --server-name 0.0.0.0 --server-port 7860 --enable-vllm-engine true
   ```
   ```
   >[!TIP]
   >[!TIP]
   > 
   > 
   >- 在浏览器中访问 `http://127.0.0.1:7860` 使用 Gradio WebUI。
   >- 在浏览器中访问 `http://127.0.0.1:7860` 使用 Gradio WebUI。
   >- 访问 `http://127.0.0.1:7860/?view=api` 使用 Gradio API。
   >- 访问 `http://127.0.0.1:7860/?view=api` 使用 Gradio API。
-- 使用`sglang-client/server`方式调用:
+- 使用`http-client/server`方式调用:
   ```bash
   ```bash
-  # 启动sglang server(需要安装sglang环境)
-  mineru-sglang-server --port 30000
+  # 启动vllm server(需要安装vllm环境)
+  mineru-vllm-server --port 30000
   ``` 
   ``` 
   >[!TIP]
   >[!TIP]
-  >在另一个终端中通过sglang client连接sglang server(只需cpu与网络,不需要sglang环境)
+  >在另一个终端中通过http client连接vllm server(只需cpu与网络,不需要vllm环境)
   > ```bash
   > ```bash
-  > mineru -p <input_path> -o <output_path> -b vlm-sglang-client -u http://127.0.0.1:30000
+  > mineru -p <input_path> -o <output_path> -b vlm-http-client -u http://127.0.0.1:30000
   > ```
   > ```
 
 
 > [!NOTE]
 > [!NOTE]
-> 所有sglang官方支持的参数都可用通过命令行参数传递给 MinerU,包括以下命令:`mineru`、`mineru-sglang-server`、`mineru-gradio`、`mineru-api`,
-> 我们整理了一些`sglang`使用中的常用参数和使用方法,可以在文档[命令行进阶参数](./advanced_cli_parameters.md)中获取。
+> 所有vllm官方支持的参数都可用通过命令行参数传递给 MinerU,包括以下命令:`mineru`、`mineru-vllm -server`、`mineru-gradio`、`mineru-api`,
+> 我们整理了一些`vllm`使用中的常用参数和使用方法,可以在文档[命令行进阶参数](./advanced_cli_parameters.md)中获取。
 
 
 ## 基于配置文件扩展 MinerU 功能
 ## 基于配置文件扩展 MinerU 功能
 
 

+ 5 - 2
pyproject.toml

@@ -39,6 +39,7 @@ dependencies = [
     "openai>=1.70.0,<2",
     "openai>=1.70.0,<2",
     "beautifulsoup4>=4.13.5,<5",
     "beautifulsoup4>=4.13.5,<5",
     "Pygments",
     "Pygments",
+    "mineru_vl_utils",
 ]
 ]
 
 
 [project.optional-dependencies]
 [project.optional-dependencies]
@@ -50,10 +51,12 @@ test = [
     "fuzzywuzzy"
     "fuzzywuzzy"
 ]
 ]
 vlm = [
 vlm = [
-    "mineru_vl_utils[transformers]",
+    "torch>=2.6.0,<2.8.0",
+    "transformers>=4.51.1,<5.0.0",
+    "accelerate>=1.5.1",
 ]
 ]
 vllm = [
 vllm = [
-    "mineru_vl_utils[vllm]",
+    "vllm==0.10.1.1",
 ]
 ]
 pipeline = [
 pipeline = [
     "matplotlib>=3.10,<4",
     "matplotlib>=3.10,<4",