Ver Fonte

docs: update documentation for vllm integration and parameter optimization

myhloli há 2 meses atrás
pai
commit
e120a90d11

+ 3 - 3
README.md

@@ -583,7 +583,7 @@ A WebUI developed based on Gradio, with a simple interface and only core parsing
         <td>Parsing Backend</td>
         <td>pipeline</td>
         <td>vlm-transformers</td>
-        <td>vlm-sglang</td>
+        <td>vlm-vllm</td>
     </tr>
     <tr>
         <td>Operating System</td>
@@ -661,8 +661,8 @@ You can use MinerU for PDF parsing through various methods such as command line,
 - [x] Handwritten Text Recognition  
 - [x] Vertical Text Recognition  
 - [x] Latin Accent Mark Recognition
-- [ ] Code block recognition in the main text
-- [ ] [Chemical formula recognition](docs/chemical_knowledge_introduction/introduction.pdf)
+- [x] Code block recognition in the main text
+- [x] [Chemical formula recognition](docs/chemical_knowledge_introduction/introduction.pdf)(mineru.net)
 - [ ] Geometric shape recognition
 
 # Known Issues

+ 3 - 3
README_zh-CN.md

@@ -570,7 +570,7 @@ https://github.com/user-attachments/assets/4bea02c9-6d54-4cd6-97ed-dff14340982c
         <td>解析后端</td>
         <td>pipeline</td>
         <td>vlm-transformers</td>
-        <td>vlm-sglang</td>
+        <td>vlm-vllm</td>
     </tr>
     <tr>
         <td>操作系统</td>
@@ -648,8 +648,8 @@ mineru -p <input_path> -o <output_path>
 - [x] 手写文本识别
 - [x] 竖排文本识别
 - [x] 拉丁字母重音符号识别
-- [ ] 正文中代码块识别
-- [ ] [化学式识别](docs/chemical_knowledge_introduction/introduction.pdf)
+- [x] 正文中代码块识别
+- [x] [化学式识别](docs/chemical_knowledge_introduction/introduction.pdf)(https://mineru.net)
 - [ ] 图表内容识别
 
 # Known Issues

+ 0 - 12
docs/en/faq/index.md

@@ -15,18 +15,6 @@ For unresolved problems, join our [Discord](https://discord.gg/Tdedn9GTXq) or [W
     Reference: [#388](https://github.com/opendatalab/MinerU/issues/388)
 
 
-??? question "Error when installing MinerU on CentOS 7 or Ubuntu 18: `ERROR: Failed building wheel for simsimd`"
-
-    The new version of albumentations (1.4.21) introduces a dependency on simsimd. Since the pre-built package of simsimd for Linux requires a glibc version greater than or equal to 2.28, this causes installation issues on some Linux distributions released before 2019. You can resolve this issue by using the following command:
-    ```
-    conda create -n mineru python=3.11 -y
-    conda activate mineru
-    pip install -U "mineru[pipeline_old_linux]"
-    ```
-    
-    Reference: [#1004](https://github.com/opendatalab/MinerU/issues/1004)
-
-
 ??? question "Missing text information in parsing results when installing and using on Linux systems."
 
     MinerU uses `pypdfium2` instead of `pymupdf` as the PDF page rendering engine in versions >=2.0 to resolve AGPLv3 license issues. On some Linux distributions, due to missing CJK fonts, some text may be lost during the process of rendering PDFs to images.

+ 12 - 15
docs/en/quick_start/docker_deployment.md

@@ -6,25 +6,22 @@ MinerU provides a convenient Docker deployment method, which helps quickly set u
 
 ```bash
 wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docker/global/Dockerfile
-docker build -t mineru-sglang:latest -f Dockerfile .
+docker build -t mineru-vllm:latest -f Dockerfile .
 ```
 
 > [!TIP]
-> The [Dockerfile](https://github.com/opendatalab/MinerU/blob/master/docker/global/Dockerfile) uses `lmsysorg/sglang:v0.4.10.post2-cu126` as the base image by default, supporting Turing/Ampere/Ada Lovelace/Hopper platforms.
-> If you are using the newer `Blackwell` platform, please modify the base image to `lmsysorg/sglang:v0.4.10.post2-cu128-b200` before executing the build operation.
+> The [Dockerfile](https://github.com/opendatalab/MinerU/blob/master/docker/global/Dockerfile) uses `vllm/vllm-openai:v0.10.1.1` as the base image by default, supporting Turing/Ampere/Ada Lovelace/Hopper/Blackwell platforms.
 
 ## Docker Description
 
-MinerU's Docker uses `lmsysorg/sglang` as the base image, so it includes the `sglang` inference acceleration framework and necessary dependencies by default. Therefore, on compatible devices, you can directly use `sglang` to accelerate VLM model inference.
+MinerU's Docker uses `vllm/vllm-openai` as the base image, so it includes the `vllm` inference acceleration framework and necessary dependencies by default. Therefore, on compatible devices, you can directly use `vllm` to accelerate VLM model inference.
 
 > [!NOTE]
-> Requirements for using `sglang` to accelerate VLM model inference:
+> Requirements for using `vllm` to accelerate VLM model inference:
 > 
 > - Device must have Turing architecture or later graphics cards with 8GB+ available VRAM.
-> - The host machine's graphics driver should support CUDA 12.6 or higher; `Blackwell` platform should support CUDA 12.8 or higher. You can check the driver version using the `nvidia-smi` command.
+> - The host machine's graphics driver should support CUDA 12.8 or higher; You can check the driver version using the `nvidia-smi` command.
 > - Docker container must have access to the host machine's graphics devices.
->
-> If your device doesn't meet the above requirements, you can still use other features of MinerU, but cannot use `sglang` to accelerate VLM model inference, meaning you cannot use the `vlm-sglang-engine` backend or start the `vlm-sglang-server` service.
 
 ## Start Docker Container
 
@@ -33,7 +30,7 @@ docker run --gpus all \
   --shm-size 32g \
   -p 30000:30000 -p 7860:7860 -p 8000:8000 \
   --ipc=host \
-  -it mineru-sglang:latest \
+  -it mineru-vllm:latest \
   /bin/bash
 ```
 
@@ -53,19 +50,19 @@ wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docker/compose.yaml
 >
 >- The `compose.yaml` file contains configurations for multiple services of MinerU, you can choose to start specific services as needed.
 >- Different services might have additional parameter configurations, which you can view and edit in the `compose.yaml` file.
->- Due to the pre-allocation of GPU memory by the `sglang` inference acceleration framework, you may not be able to run multiple `sglang` services simultaneously on the same machine. Therefore, ensure that other services that might use GPU memory have been stopped before starting the `vlm-sglang-server` service or using the `vlm-sglang-engine` backend.
+>- Due to the pre-allocation of GPU memory by the `vllm` inference acceleration framework, you may not be able to run multiple `vllm` services simultaneously on the same machine. Therefore, ensure that other services that might use GPU memory have been stopped before starting the `vlm-vllm-server` service or using the `vlm-vllm-engine` backend.
 
 ---
 
-### Start sglang-server service
-connect to `sglang-server` via `vlm-sglang-client` backend
+### Start vllm-server service
+connect to `vllm-server` via `vlm-http-client` backend
   ```bash
-  docker compose -f compose.yaml --profile sglang-server up -d
+  docker compose -f compose.yaml --profile vllm-server up -d
   ```
   >[!TIP]
-  >In another terminal, connect to sglang server via sglang client (only requires CPU and network, no sglang environment needed)
+  >In another terminal, connect to vllm server via http client (only requires CPU and network, no vllm environment needed)
   > ```bash
-  > mineru -p <input_path> -o <output_path> -b vlm-sglang-client -u http://<server_ip>:30000
+  > mineru -p <input_path> -o <output_path> -b vlm-http-client -u http://<server_ip>:30000
   > ```
 
 ---

+ 6 - 14
docs/en/quick_start/extension_modules.md

@@ -4,34 +4,26 @@ MinerU supports installing extension modules on demand based on different needs
 ## Common Scenarios
 
 ### Core Functionality Installation
-The `core` module is the core dependency of MinerU, containing all functional modules except `sglang`. Installing this module ensures the basic functionality of MinerU works properly.
+The `core` module is the core dependency of MinerU, containing all functional modules except `vllm`. Installing this module ensures the basic functionality of MinerU works properly.
 ```bash
 uv pip install mineru[core]
 ```
 
 ---
 
-### Using `sglang` to Accelerate VLM Model Inference
-The `sglang` module provides acceleration support for VLM model inference, suitable for graphics cards with Turing architecture and later (8GB+ VRAM). Installing this module can significantly improve model inference speed.
-In the configuration, `all` includes both `core` and `sglang` modules, so `mineru[all]` and `mineru[core,sglang]` are equivalent.
+### Using `vllm` to Accelerate VLM Model Inference
+The `vllm` module provides acceleration support for VLM model inference, suitable for graphics cards with Turing architecture and later (8GB+ VRAM). Installing this module can significantly improve model inference speed.
+In the configuration, `all` includes both `core` and `vllm` modules, so `mineru[all]` and `mineru[core,vllm]` are equivalent.
 ```bash
 uv pip install mineru[all]
 ```
 > [!TIP]
-> If exceptions occur during installation of the complete package including sglang, please refer to the [sglang official documentation](https://docs.sglang.ai/start/install.html) to try to resolve the issue, or directly use the [Docker](./docker_deployment.md) deployment method.
+> If exceptions occur during installation of the complete package including vllm, please refer to the [vllm official documentation](https://docs.vllm.ai/en/latest/getting_started/installation/index.html) to try to resolve the issue, or directly use the [Docker](./docker_deployment.md) deployment method.
 
 ---
 
 ### Installing Lightweight Client to Connect to sglang-server
-If you need to install a lightweight client on edge devices to connect to `sglang-server`, you can install the basic mineru package, which is very lightweight and suitable for devices with only CPU and network connectivity.
+If you need to install a lightweight client on edge devices to connect to `vllm-server`, you can install the basic mineru package, which is very lightweight and suitable for devices with only CPU and network connectivity.
 ```bash
 uv pip install mineru
 ```
-
----
-
-### Using Pipeline Backend on Outdated Linux Systems
-If your system is too outdated to meet the dependency requirements of `mineru[core]`, this option can minimally meet MinerU's runtime requirements, suitable for old systems that cannot be upgraded and only need to use the pipeline backend.
-```bash
-uv pip install mineru[pipeline_old_linux]
-```

+ 3 - 3
docs/en/quick_start/index.md

@@ -31,7 +31,7 @@ A WebUI developed based on Gradio, with a simple interface and only core parsing
         <td>Parsing Backend</td>
         <td>pipeline</td>
         <td>vlm-transformers</td>
-        <td>vlm-sglang</td>
+        <td>vlm-vllm</td>
     </tr>
     <tr>
         <td>Operating System</td>
@@ -80,8 +80,8 @@ uv pip install -e .[core]
 ```
 
 > [!TIP]
-> `mineru[core]` includes all core features except `sglang` acceleration, compatible with Windows / Linux / macOS systems, suitable for most users.
-> If you need to use `sglang` acceleration for VLM model inference or install a lightweight client on edge devices, please refer to the documentation [Extension Modules Installation Guide](./extension_modules.md).
+> `mineru[core]` includes all core features except `vllm` acceleration, compatible with Windows / Linux / macOS systems, suitable for most users.
+> If you need to use `vllm` acceleration for VLM model inference or install a lightweight client on edge devices, please refer to the documentation [Extension Modules Installation Guide](./extension_modules.md).
 
 ---
  

+ 30 - 25
docs/en/reference/output_files.md

@@ -165,49 +165,51 @@ inference_result: list[PageInferenceResults] = []
 ]
 ```
 
-### VLM Output Results (model_output.txt)
+### VLM Output Results (model.json)
 
 > [!NOTE]
 > Only applicable to VLM backend
 
-**File naming format**: `{original_filename}_model_output.txt`
+**File naming format**: `{original_filename}_model.json`
 
 #### File Format Description
 
-- Uses `----` to separate output results for each page
-- Each page contains multiple text blocks starting with `<|box_start|>` and ending with `<|md_end|>`
-
-#### Field Meanings
+- This file contains the raw output results from the VLM model, with two nested list layers: the outer layer represents pages, and the inner layer represents content blocks for each page
+- Each content block is a dict containing `type`, `bbox`, `angle`, and `content` fields
 
-| Tag | Format | Description |
-|-----|--------|-------------|
-| Bounding box | `<\|box_start\|>x0 y0 x1 y1<\|box_end\|>` | Quadrilateral coordinates (top-left, bottom-right points), coordinate values after scaling page to 1000×1000 |
-| Type tag | `<\|ref_start\|>type<\|ref_end\|>` | Content block type identifier |
-| Content | `<\|md_start\|>markdown content<\|md_end\|>` | Markdown content of the block |
 
 #### Supported Content Types
 
 ```json
 {
-    "text": "Text",
-    "title": "Title", 
-    "image": "Image",
-    "image_caption": "Image caption",
-    "image_footnote": "Image footnote",
-    "table": "Table",
-    "table_caption": "Table caption", 
-    "table_footnote": "Table footnote",
-    "equation": "Interline formula"
+    "text",
+    "title", 
+    "equation",
+    "image",
+    "image_caption",
+    "image_footnote",
+    "table",
+    "table_caption",
+    "table_footnote",
+    "phonetic",
+    "code",
+    "code_caption",
+    "ref_text",
+    "algorithm",
+    "list",
+    "header",
+    "footer",
+    "page_number",
+    "aside_text", 
+    "page_footnote", 
 }
 ```
 
-#### Special Tags
-
-- `<|txt_contd|>`: Appears at the end of text, indicating that this text block can be connected with subsequent text blocks
-- Table content uses `otsl` format and needs to be converted to HTML for rendering in Markdown
-
 ### Intermediate Processing Results (middle.json)
 
+> [!NOTE]
+> Only applicable to pipeline backend
+
 **File naming format**: `{original_filename}_middle.json`
 
 #### Top-level Structure
@@ -390,6 +392,9 @@ Level 1 blocks (table | image)
 
 ### Content List (content_list.json)
 
+> [!NOTE]
+> Only applicable to pipeline backend
+
 **File naming format**: `{original_filename}_content_list.json`
 
 #### Functionality

+ 8 - 21
docs/en/usage/advanced_cli_parameters.md

@@ -1,25 +1,17 @@
 # Advanced Command Line Parameters
 
-## SGLang Acceleration Parameter Optimization
-
-### Memory Optimization Parameters
-> [!TIP]
-> SGLang acceleration mode currently supports running on Turing architecture graphics cards with a minimum of 8GB VRAM, but graphics cards with <24GB VRAM may encounter insufficient memory issues. You can optimize memory usage with the following parameters:
-> 
-> - If you encounter insufficient VRAM when using a single graphics card, you may need to reduce the KV cache size with `--mem-fraction-static 0.5`. If VRAM issues persist, try reducing it further to `0.4` or lower.
-> - If you have two or more graphics cards, you can try using tensor parallelism (TP) mode to simply expand available VRAM: `--tp-size 2`
+## vllm Acceleration Parameter Optimization
 
 ### Performance Optimization Parameters
 > [!TIP]
-> If you can already use SGLang normally for accelerated VLM model inference but still want to further improve inference speed, you can try the following parameters:
+> If you can already use vllm normally for accelerated VLM model inference but still want to further improve inference speed, you can try the following parameters:
 > 
-> - If you have multiple graphics cards, you can use SGLang's multi-card parallel mode to increase throughput: `--dp-size 2`
-> - You can also enable `torch.compile` to accelerate inference speed by approximately 15%: `--enable-torch-compile`
+> - If you have multiple graphics cards, you can use vllm's multi-card parallel mode to increase throughput: `--data-parallel-size 2`
 
 ### Parameter Passing Instructions
 > [!TIP]
-> - All officially supported SGLang parameters can be passed to MinerU through command line arguments, including the following commands: `mineru`, `mineru-sglang-server`, `mineru-gradio`, `mineru-api`
-> - If you want to learn more about `sglang` parameter usage, please refer to the [SGLang official documentation](https://docs.sglang.ai/backend/server_arguments.html#common-launch-commands)
+> - All officially supported vllm parameters can be passed to MinerU through command line arguments, including the following commands: `mineru`, `mineru-vllm-server`, `mineru-gradio`, `mineru-api`
+> - If you want to learn more about `vllm` parameter usage, please refer to the [vllm official documentation](https://docs.vllm.ai/en/latest/cli/serve.html)
 
 ## GPU Device Selection and Configuration
 
@@ -29,7 +21,7 @@
 >   ```bash
 >   CUDA_VISIBLE_DEVICES=1 mineru -p <input_path> -o <output_path>
 >   ```
-> - This specification method is effective for all command line calls, including `mineru`, `mineru-sglang-server`, `mineru-gradio`, and `mineru-api`, and applies to both `pipeline` and `vlm` backends.
+> - This specification method is effective for all command line calls, including `mineru`, `mineru-vllm-server`, `mineru-gradio`, and `mineru-api`, and applies to both `pipeline` and `vlm` backends.
 
 ### Common Device Configuration Examples
 > [!TIP]
@@ -46,14 +38,9 @@
 > [!TIP]
 > Here are some possible usage scenarios:
 > 
-> - If you have multiple graphics cards and need to specify cards 0 and 1, using multi-card parallelism to start `sglang-server`, you can use the following command:
->   ```bash
->   CUDA_VISIBLE_DEVICES=0,1 mineru-sglang-server --port 30000 --dp-size 2
->   ```
-> 
-> - If you have multiple GPUs and need to specify GPU 0–3, and start the `sglang-server` using multi-GPU data parallelism and tensor parallelism, you can use the following command:
+> - If you have multiple graphics cards and need to specify cards 0 and 1, using multi-card parallelism to start `vllm-server`, you can use the following command:
 >   ```bash
->   CUDA_VISIBLE_DEVICES=0,1,2,3 mineru-sglang-server --port 30000 --dp-size 2 --tp-size 2
+>   CUDA_VISIBLE_DEVICES=0,1 mineru-sglang-server --port 30000 --data-parallel-size 2
 >   ```
 >       
 > - If you have multiple graphics cards and need to start two `fastapi` services on cards 0 and 1, listening on different ports respectively, you can use the following commands:

+ 3 - 3
docs/en/usage/cli_tools.md

@@ -11,11 +11,11 @@ Options:
   -p, --path PATH                 Input file path or directory (required)
   -o, --output PATH               Output directory (required)
   -m, --method [auto|txt|ocr]     Parsing method: auto (default), txt, ocr (pipeline backend only)
-  -b, --backend [pipeline|vlm-transformers|vlm-sglang-engine|vlm-sglang-client]
+  -b, --backend [pipeline|vlm-transformers|vlm-vllm-engine|vlm-http-client]
                                   Parsing backend (default: pipeline)
   -l, --lang [ch|ch_server|ch_lite|en|korean|japan|chinese_cht|ta|te|ka|th|el|latin|arabic|east_slavic|cyrillic|devanagari]
                                   Specify document language (improves OCR accuracy, pipeline backend only)
-  -u, --url TEXT                  Service address when using sglang-client
+  -u, --url TEXT                  Service address when using http-client
   -s, --start INTEGER             Starting page number for parsing (0-based)
   -e, --end INTEGER               Ending page number for parsing (0-based)
   -f, --formula BOOLEAN           Enable formula parsing (default: enabled)
@@ -45,7 +45,7 @@ Options:
                                   files to be input need to be placed in the
                                   `example` folder within the directory where
                                   the command is currently executed.
-  --enable-sglang-engine BOOLEAN  Enable SgLang engine backend for faster
+  --enable-vllm-engine BOOLEAN  Enable vllm engine backend for faster
                                   processing.
   --enable-api BOOLEAN            Enable gradio API for serving the
                                   application.

+ 11 - 11
docs/en/usage/quick_usage.md

@@ -33,7 +33,7 @@ mineru -p <input_path> -o <output_path> -b vlm-transformers
 
 If you need to adjust parsing options through custom parameters, you can also check the more detailed [Command Line Tools Usage Instructions](./cli_tools.md) in the documentation.
 
-## Advanced Usage via API, WebUI, sglang-client/server
+## Advanced Usage via API, WebUI, http-client/server
 
 - Direct Python API calls: [Python Usage Example](https://github.com/opendatalab/MinerU/blob/master/demo/demo.py)
 - FastAPI calls:
@@ -44,29 +44,29 @@ If you need to adjust parsing options through custom parameters, you can also ch
   >Access `http://127.0.0.1:8000/docs` in your browser to view the API documentation.
 - Start Gradio WebUI visual frontend:
   ```bash
-  # Using pipeline/vlm-transformers/vlm-sglang-client backends
+  # Using pipeline/vlm-transformers/vlm-vllm-client backends
   mineru-gradio --server-name 0.0.0.0 --server-port 7860
-  # Or using vlm-sglang-engine/pipeline backends (requires sglang environment)
-  mineru-gradio --server-name 0.0.0.0 --server-port 7860 --enable-sglang-engine true
+  # Or using vlm-vllm-engine/pipeline backends (requires vllm environment)
+  mineru-gradio --server-name 0.0.0.0 --server-port 7860 --enable-vllm-engine true
   ```
   >[!TIP]
   >
   >- Access `http://127.0.0.1:7860` in your browser to use the Gradio WebUI.
   >- Access `http://127.0.0.1:7860/?view=api` to use the Gradio API.
-- Using `sglang-client/server` method:
+- Using `http-client/server` method:
   ```bash
-  # Start sglang server (requires sglang environment)
-  mineru-sglang-server --port 30000
+  # Start vllm server (requires vllm environment)
+  mineru-vllm-server --port 30000
   ``` 
   >[!TIP]
-  >In another terminal, connect to sglang server via sglang client (only requires CPU and network, no sglang environment needed)
+  >In another terminal, connect to vllm server via http client (only requires CPU and network, no vllm environment needed)
   > ```bash
-  > mineru -p <input_path> -o <output_path> -b vlm-sglang-client -u http://127.0.0.1:30000
+  > mineru -p <input_path> -o <output_path> -b vlm-http-client -u http://127.0.0.1:30000
   > ```
 
 > [!NOTE]
-> All officially supported sglang parameters can be passed to MinerU through command line arguments, including the following commands: `mineru`, `mineru-sglang-server`, `mineru-gradio`, `mineru-api`.
-> We have compiled some commonly used parameters and usage methods for `sglang`, which can be found in the documentation [Advanced Command Line Parameters](./advanced_cli_parameters.md).
+> All officially supported vllm parameters can be passed to MinerU through command line arguments, including the following commands: `mineru`, `mineru-vllm-server`, `mineru-gradio`, `mineru-api`.
+> We have compiled some commonly used parameters and usage methods for `vllm`, which can be found in the documentation [Advanced Command Line Parameters](./advanced_cli_parameters.md).
 
 ## Extending MinerU Functionality with Configuration Files
 

+ 0 - 12
docs/zh/faq/index.md

@@ -14,18 +14,6 @@
     
     参考:[#388](https://github.com/opendatalab/MinerU/issues/388)
 
-
-??? question "在 CentOS 7 或 Ubuntu 18 系统安装MinerU时报错`ERROR: Failed building wheel for simsimd`"
-
-    新版本albumentations(1.4.21)引入了依赖simsimd,由于simsimd在linux的预编译包要求glibc的版本大于等于2.28,导致部分2019年之前发布的Linux发行版无法正常安装,可通过如下命令安装:
-    ```
-    conda create -n mineru python=3.11 -y
-    conda activate mineru
-    pip install -U "mineru[pipeline_old_linux]"
-    ```
-    
-    参考:[#1004](https://github.com/opendatalab/MinerU/issues/1004)
-
 ??? question "在 Linux 系统安装并使用时,解析结果缺失部份文字信息。"
 
     MinerU在>=2.0的版本中使用`pypdfium2`代替`pymupdf`作为PDF页面的渲染引擎,以解决AGPLv3的许可证问题,在某些Linux发行版,由于缺少CJK字体,可能会在将PDF渲染成图片的过程中丢失部份文字。

+ 13 - 15
docs/zh/quick_start/docker_deployment.md

@@ -6,24 +6,22 @@ MinerU提供了便捷的docker部署方式,这有助于快速搭建环境并
 
 ```bash
 wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docker/china/Dockerfile
-docker build -t mineru-sglang:latest -f Dockerfile .
+docker build -t mineru-vllm:latest -f Dockerfile .
 ```
 
 > [!TIP]
-> [Dockerfile](https://github.com/opendatalab/MinerU/blob/master/docker/china/Dockerfile)默认使用`lmsysorg/sglang:v0.4.10.post2-cu126`作为基础镜像,支持Turing/Ampere/Ada Lovelace/Hopper平台,
-> 如您使用较新的`Blackwell`平台,请将基础镜像修改为`lmsysorg/sglang:v0.4.10.post2-cu128-b200` 再执行build操作。
+> [Dockerfile](https://github.com/opendatalab/MinerU/blob/master/docker/china/Dockerfile)默认使用`vllm/vllm-openai:v0.10.1.1`作为基础镜像,支持Turing/Ampere/Ada Lovelace/Hopper/Blackwell平台,
 
 ## Docker说明
 
-Mineru的docker使用了`lmsysorg/sglang`作为基础镜像,因此在docker中默认集成了`sglang`推理加速框架和必需的依赖环境。因此在满足条件的设备上,您可以直接使用`sglang`加速VLM模型推理。
+Mineru的docker使用了`vllm/vllm-openai`作为基础镜像,因此在docker中默认集成了`vllm`推理加速框架和必需的依赖环境。因此在满足条件的设备上,您可以直接使用`vllm`加速VLM模型推理。
 > [!NOTE]
-> 使用`sglang`加速VLM模型推理需要满足的条件是:
+> 使用`vllm`加速VLM模型推理需要满足的条件是:
 > 
 > - 设备包含Turing及以后架构的显卡,且可用显存大于等于8G。
-> - 物理机的显卡驱动应支持CUDA 12.6或更高版本,`Blackwell`平台应支持CUDA 12.8及更高版本,可通过`nvidia-smi`命令检查驱动版本。
+> - 物理机的显卡驱动应支持CUDA 12.8或更高版本,可通过`nvidia-smi`命令检查驱动版本。
 > - docker中能够访问物理机的显卡设备。
->
-> 如果您的设备不满足上述条件,您仍然可以使用MinerU的其他功能,但无法使用`sglang`加速VLM模型推理,即无法使用`vlm-sglang-engine`后端和启动`vlm-sglang-server`服务。
+
 
 ## 启动 Docker 容器
 
@@ -32,7 +30,7 @@ docker run --gpus all \
   --shm-size 32g \
   -p 30000:30000 -p 7860:7860 -p 8000:8000 \
   --ipc=host \
-  -it mineru-sglang:latest \
+  -it mineru-vllm:latest \
   /bin/bash
 ```
 
@@ -51,19 +49,19 @@ wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docker/compose.yaml
 >  
 >- `compose.yaml`文件中包含了MinerU的多个服务配置,您可以根据需要选择启动特定的服务。
 >- 不同的服务可能会有额外的参数配置,您可以在`compose.yaml`文件中查看并编辑。
->- 由于`sglang`推理加速框架预分配显存的特性,您可能无法在同一台机器上同时运行多个`sglang`服务,因此请确保在启动`vlm-sglang-server`服务或使用`vlm-sglang-engine`后端时,其他可能使用显存的服务已停止。
+>- 由于`vllm`推理加速框架预分配显存的特性,您可能无法在同一台机器上同时运行多个`vllm`服务,因此请确保在启动`vlm-vllm-server`服务或使用`vlm-vllm-engine`后端时,其他可能使用显存的服务已停止。
 
 ---
 
-### 启动 sglang-server 服务
-并通过`vlm-sglang-client`后端连接`sglang-server`
+### 启动 vllm-server 服务
+并通过`vlm-http-client`后端连接`vllm-server`
   ```bash
-  docker compose -f compose.yaml --profile sglang-server up -d
+  docker compose -f compose.yaml --profile vllm-server up -d
   ```
   >[!TIP]
-  >在另一个终端中通过sglang client连接sglang server(只需cpu与网络,不需要sglang环境)
+  >在另一个终端中通过http client连接vllm server(只需cpu与网络,不需要vllm环境)
   > ```bash
-  > mineru -p <input_path> -o <output_path> -b vlm-sglang-client -u http://<server_ip>:30000
+  > mineru -p <input_path> -o <output_path> -b vlm-http-client -u http://<server_ip>:30000
   > ```
 
 ---

+ 7 - 15
docs/zh/quick_start/extension_modules.md

@@ -4,34 +4,26 @@ MinerU 支持根据不同需求,按需安装扩展模块,以增强功能或
 ## 常见场景
 
 ### 核心功能安装
-`core` 模块是 MinerU 的核心依赖,包含了除`sglang`外的所有功能模块。安装此模块可以确保 MinerU 的基本功能正常运行。
+`core` 模块是 MinerU 的核心依赖,包含了除`vllm`外的所有功能模块。安装此模块可以确保 MinerU 的基本功能正常运行。
 ```bash
 uv pip install mineru[core]
 ```
 
 ---
 
-### 使用`sglang`加速 VLM 模型推理
-`sglang` 模块提供了对 VLM 模型推理的加速支持,适用于具有 Turing 及以后架构的显卡(8G 显存及以上)。安装此模块可以显著提升模型推理速度。
-在配置中,`all`包含了`core`和`sglang`模块,因此`mineru[all]`和`mineru[core,sglang]`是等价的。
+### 使用`vllm`加速 VLM 模型推理
+`vllm` 模块提供了对 VLM 模型推理的加速支持,适用于具有 Turing 及以后架构的显卡(8G 显存及以上)。安装此模块可以显著提升模型推理速度。
+在配置中,`all`包含了`core`和`vllm`模块,因此`mineru[all]`和`mineru[core,vllm]`是等价的。
 ```bash
 uv pip install mineru[all]
 ```
 > [!TIP]
-> 如在安装包含sglang的完整包过程中发生异常,请参考 [sglang 官方文档](https://docs.sglang.ai/start/install.html) 尝试解决,或直接使用 [Docker](./docker_deployment.md) 方式部署镜像。
+> 如在安装包含vllm的完整包过程中发生异常,请参考 [vllm 官方文档](https://docs.vllm.ai/en/latest/getting_started/installation/index.html) 尝试解决,或直接使用 [Docker](./docker_deployment.md) 方式部署镜像。
 
 ---
 
-### 安装轻量版client连接sglang-server使用
-如果您需要在边缘设备上安装轻量版的 client 端以连接 `sglang-server`,可以安装mineru的基础包,非常轻量,适合在只有cpu和网络连接的设备上使用。
+### 安装轻量版client连接vllm-server使用
+如果您需要在边缘设备上安装轻量版的 client 端以连接 `vllm-server`,可以安装mineru的基础包,非常轻量,适合在只有cpu和网络连接的设备上使用。
 ```bash
 uv pip install mineru
 ```
-
----
-
-### 在过时的linux系统上使用pipeline后端
-如果您的系统过于陈旧,无法满足`mineru[core]`的依赖要求,该选项可以最低限度的满足 MinerU 的运行需求,适用于老旧系统无法升级且仅需使用 pipeline 后端的场景。
-```bash
-uv pip install mineru[pipeline_old_linux]
-```

+ 3 - 3
docs/zh/quick_start/index.md

@@ -31,7 +31,7 @@
         <td>解析后端</td>
         <td>pipeline</td>
         <td>vlm-transformers</td>
-        <td>vlm-sglang</td>
+        <td>vlm-vllm</td>
     </tr>
     <tr>
         <td>操作系统</td>
@@ -80,8 +80,8 @@ uv pip install -e .[core] -i https://mirrors.aliyun.com/pypi/simple
 ```
 
 > [!TIP]
-> `mineru[core]`包含除`sglang`加速外的所有核心功能,兼容Windows / Linux / macOS系统,适合绝大多数用户。
-> 如果您有使用`sglang`加速VLM模型推理,或是在边缘设备安装轻量版client端等需求,可以参考文档[扩展模块安装指南](./extension_modules.md)。
+> `mineru[core]`包含除`vllm`加速外的所有核心功能,兼容Windows / Linux / macOS系统,适合绝大多数用户。
+> 如果您有使用`vllm`加速VLM模型推理,或是在边缘设备安装轻量版client端等需求,可以参考文档[扩展模块安装指南](./extension_modules.md)。
 
 ---
  

+ 30 - 24
docs/zh/reference/output_files.md

@@ -165,49 +165,52 @@ inference_result: list[PageInferenceResults] = []
 ]
 ```
 
-### VLM 输出结果 (model_output.txt)
+### VLM 输出结果 (model.json)
 
 > [!NOTE]
 > 仅适用于 VLM 后端
 
-**文件命名格式**:`{原文件名}_model_output.txt`
+**文件命名格式**:`{原文件名}_model.json`
 
 #### 文件格式说明
 
-- 使用 `----` 分割每一页的输出结果
-- 每页包含多个以 `<|box_start|>` 开头、`<|md_end|>` 结尾的文本块
-
-#### 字段含义
+- 该文件为 VLM 模型的原始输出结果,包含两层嵌套list,外层表示页面,内层表示该页的内容块
+- 每个内容块都是一个dict,包含 `type`、`bbox`、`angle`、`content` 字段
 
-| 标记 | 格式 | 说明 |
-|------|---|------|
-| 边界框 | `<\|box_start\|>x0 y0 x1 y1<\|box_end\|>` | 四边形坐标(左上、右下两点),页面缩放至 1000×1000 后的坐标值 |
-| 类型标记 | `<\|ref_start\|>type<\|ref_end\|>` | 内容块类型标识 |
-| 内容 | `<\|md_start\|>markdown内容<\|md_end\|>` | 该块的 Markdown 内容 |
 
 #### 支持的内容类型
 
 ```json
 {
-    "text": "文本",
-    "title": "标题", 
-    "image": "图片",
-    "image_caption": "图片描述",
-    "image_footnote": "图片脚注",
-    "table": "表格",
-    "table_caption": "表格描述", 
-    "table_footnote": "表格脚注",
-    "equation": "行间公式"
+    "text",
+    "title", 
+    "equation",
+    "image",
+    "image_caption",
+    "image_footnote",
+    "table",
+    "table_caption",
+    "table_footnote",
+    "phonetic",
+    "code",
+    "code_caption",
+    "ref_text",
+    "algorithm",
+    "list",
+    "header",
+    "footer",
+    "page_number",
+    "aside_text", 
+    "page_footnote", 
 }
 ```
 
-#### 特殊标记
-
-- `<|txt_contd|>`:出现在文本末尾,表示该文本块可与后续文本块连接
-- 表格内容采用 `otsl` 格式,需转换为 HTML 才能在 Markdown 中渲染
 
 ### 中间处理结果 (middle.json)
 
+> [!NOTE]
+> 仅适用于 pipeline 后端
+
 **文件命名格式**:`{原文件名}_middle.json`
 
 #### 顶层结构
@@ -390,6 +393,9 @@ inference_result: list[PageInferenceResults] = []
 
 ### 内容列表 (content_list.json)
 
+> [!NOTE]
+> 仅适用于 pipeline 后端
+
 **文件命名格式**:`{原文件名}_content_list.json`
 
 #### 功能说明

+ 8 - 21
docs/zh/usage/advanced_cli_parameters.md

@@ -1,25 +1,17 @@
 # 命令行参数进阶
 
-## SGLang 加速参数优化
-
-### 显存优化参数
-> [!TIP]
-> sglang加速模式目前支持在最低8G显存的Turing架构显卡上运行,但在显存<24G的显卡上可能会遇到显存不足的问题, 可以通过使用以下参数来优化显存使用:
-> 
-> - 如果您使用单张显卡遇到显存不足的情况时,可能需要调低KV缓存大小,`--mem-fraction-static 0.5`,如仍出现显存不足问题,可尝试进一步降低到`0.4`或更低
-> - 如您有两张以上显卡,可尝试通过张量并行(TP)模式简单扩充可用显存:`--tp-size 2`
+## vllm 加速参数优化
 
 ### 性能优化参数
 > [!TIP]
-> 如果您已经可以正常使用sglang对vlm模型进行加速推理,但仍然希望进一步提升推理速度,可以尝试以下参数:
+> 如果您已经可以正常使用vllm对vlm模型进行加速推理,但仍然希望进一步提升推理速度,可以尝试以下参数:
 > 
-> - 如果您有超过多张显卡,可以使用sglang的多卡并行模式来增加吞吐量:`--dp-size 2`
-> - 同时您可以启用`torch.compile`来将推理速度加速约15%:`--enable-torch-compile`
+> - 如果您有超过多张显卡,可以使用vllm的多卡并行模式来增加吞吐量:`--data-parallel-size 2`
 
 ### 参数传递说明
 > [!TIP]
-> - 所有sglang官方支持的参数都可用通过命令行参数传递给 MinerU,包括以下命令:`mineru`、`mineru-sglang-server`、`mineru-gradio`、`mineru-api`
-> - 如果您想了解更多有关`sglang`的参数使用方法,请参考 [sglang官方文档](https://docs.sglang.ai/backend/server_arguments.html#common-launch-commands)
+> - 所有vllm官方支持的参数都可用通过命令行参数传递给 MinerU,包括以下命令:`mineru`、`mineru-vllm-server`、`mineru-gradio`、`mineru-api`
+> - 如果您想了解更多有关`vllm`的参数使用方法,请参考 [vllm官方文档](https://docs.vllm.ai/en/latest/cli/serve.html)
 
 ## GPU 设备选择与配置
 
@@ -29,7 +21,7 @@
 >   ```bash
 >   CUDA_VISIBLE_DEVICES=1 mineru -p <input_path> -o <output_path>
 >   ```
-> - 这种指定方式对所有的命令行调用都有效,包括 `mineru`、`mineru-sglang-server`、`mineru-gradio` 和 `mineru-api`,且对`pipeline`、`vlm`后端均适用。
+> - 这种指定方式对所有的命令行调用都有效,包括 `mineru`、`mineru-vllm-server`、`mineru-gradio` 和 `mineru-api`,且对`pipeline`、`vlm`后端均适用。
 
 ### 常见设备配置示例
 > [!TIP]
@@ -47,14 +39,9 @@
 > [!TIP]
 > 以下是一些可能的使用场景:
 > 
-> - 如果您有多张显卡,需要指定卡0和卡1,并使用多卡并行来启动`sglang-server`,可以使用以下命令: 
->   ```bash
->   CUDA_VISIBLE_DEVICES=0,1 mineru-sglang-server --port 30000 --dp-size 2
->   ```
->   
-> - 如果您有多张显卡,需要指定卡0-3,并使用多卡数据并行和张量并行来启动`sglang-server`,可以使用以下命令: 
+> - 如果您有多张显卡,需要指定卡0和卡1,并使用多卡并行来启动`vllm-server`,可以使用以下命令: 
 >   ```bash
->   CUDA_VISIBLE_DEVICES=0,1,2,3 mineru-sglang-server --port 30000 --dp-size 2 --tp-size 2
+>   CUDA_VISIBLE_DEVICES=0,1 mineru-vllm-server --port 30000 --data-parallel-size 2
 >   ```
 >   
 > - 如果您有多张显卡,需要在卡0和卡1上启动两个`fastapi`服务,并分别监听不同的端口,可以使用以下命令: 

+ 3 - 3
docs/zh/usage/cli_tools.md

@@ -11,11 +11,11 @@ Options:
   -p, --path PATH                 输入文件路径或目录(必填)
   -o, --output PATH               输出目录(必填)
   -m, --method [auto|txt|ocr]     解析方法:auto(默认)、txt、ocr(仅用于 pipeline 后端)
-  -b, --backend [pipeline|vlm-transformers|vlm-sglang-engine|vlm-sglang-client]
+  -b, --backend [pipeline|vlm-transformers|vlm-vllm-engine|vlm-http-client]
                                   解析后端(默认为 pipeline)
   -l, --lang [ch|ch_server|ch_lite|en|korean|japan|chinese_cht|ta|te|ka|th|el|latin|arabic|east_slavic|cyrillic|devanagari]
                                   指定文档语言(可提升 OCR 准确率,仅用于 pipeline 后端)
-  -u, --url TEXT                  当使用 sglang-client 时,需指定服务地址
+  -u, --url TEXT                  当使用 http-client 时,需指定服务地址
   -s, --start INTEGER             开始解析的页码(从 0 开始)
   -e, --end INTEGER               结束解析的页码(从 0 开始)
   -f, --formula BOOLEAN           是否启用公式解析(默认开启)
@@ -43,7 +43,7 @@ Usage: mineru-gradio [OPTIONS]
 Options:
   --enable-example BOOLEAN        启用示例文件输入(需要将示例文件放置在当前
                                   执行命令目录下的 `example` 文件夹中)
-  --enable-sglang-engine BOOLEAN  启用 SgLang 引擎后端以提高处理速度
+  --enable-vllm-engine BOOLEAN  启用 vllm 引擎后端以提高处理速度
   --enable-api BOOLEAN            启用 Gradio API 以提供应用程序服务
   --max-convert-pages INTEGER     设置从 PDF 转换为 Markdown 的最大页数
   --server-name TEXT              设置 Gradio 应用程序的服务器主机名

+ 12 - 12
docs/zh/usage/quick_usage.md

@@ -28,11 +28,11 @@ mineru -p <input_path> -o <output_path>
 mineru -p <input_path> -o <output_path> -b vlm-transformers
 ```
 > [!TIP]
-> vlm后端另外支持`sglang`加速,与`transformers`后端相比,`sglang`的加速比可达20~30倍,可以在[扩展模块安装指南](../quick_start/extension_modules.md)中查看支持`sglang`加速的完整包安装方法。
+> vlm后端另外支持`vllm`加速,与`transformers`后端相比,`vllm`的加速比可达20~30倍,可以在[扩展模块安装指南](../quick_start/extension_modules.md)中查看支持`vllm`加速的完整包安装方法。
 
 如果需要通过自定义参数调整解析选项,您也可以在文档中查看更详细的[命令行工具使用说明](./cli_tools.md)。
 
-## 通过api、webui、sglang-client/server进阶使用
+## 通过api、webui、http-client/server进阶使用
 
 - 通过python api直接调用:[Python 调用示例](https://github.com/opendatalab/MinerU/blob/master/demo/demo.py)
 - 通过fast api方式调用:
@@ -43,29 +43,29 @@ mineru -p <input_path> -o <output_path> -b vlm-transformers
   >在浏览器中访问 `http://127.0.0.1:8000/docs` 查看API文档。
 - 启动gradio webui 可视化前端:
   ```bash
-  # 使用 pipeline/vlm-transformers/vlm-sglang-client 后端
+  # 使用 pipeline/vlm-transformers/vlm-http-client 后端
   mineru-gradio --server-name 0.0.0.0 --server-port 7860
-  # 或使用 vlm-sglang-engine/pipeline 后端(需安装sglang环境)
-  mineru-gradio --server-name 0.0.0.0 --server-port 7860 --enable-sglang-engine true
+  # 或使用 vlm-vllm-engine/pipeline 后端(需安装vllm环境)
+  mineru-gradio --server-name 0.0.0.0 --server-port 7860 --enable-vllm-engine true
   ```
   >[!TIP]
   > 
   >- 在浏览器中访问 `http://127.0.0.1:7860` 使用 Gradio WebUI。
   >- 访问 `http://127.0.0.1:7860/?view=api` 使用 Gradio API。
-- 使用`sglang-client/server`方式调用:
+- 使用`http-client/server`方式调用:
   ```bash
-  # 启动sglang server(需要安装sglang环境)
-  mineru-sglang-server --port 30000
+  # 启动vllm server(需要安装vllm环境)
+  mineru-vllm-server --port 30000
   ``` 
   >[!TIP]
-  >在另一个终端中通过sglang client连接sglang server(只需cpu与网络,不需要sglang环境)
+  >在另一个终端中通过http client连接vllm server(只需cpu与网络,不需要vllm环境)
   > ```bash
-  > mineru -p <input_path> -o <output_path> -b vlm-sglang-client -u http://127.0.0.1:30000
+  > mineru -p <input_path> -o <output_path> -b vlm-http-client -u http://127.0.0.1:30000
   > ```
 
 > [!NOTE]
-> 所有sglang官方支持的参数都可用通过命令行参数传递给 MinerU,包括以下命令:`mineru`、`mineru-sglang-server`、`mineru-gradio`、`mineru-api`,
-> 我们整理了一些`sglang`使用中的常用参数和使用方法,可以在文档[命令行进阶参数](./advanced_cli_parameters.md)中获取。
+> 所有vllm官方支持的参数都可用通过命令行参数传递给 MinerU,包括以下命令:`mineru`、`mineru-vllm -server`、`mineru-gradio`、`mineru-api`,
+> 我们整理了一些`vllm`使用中的常用参数和使用方法,可以在文档[命令行进阶参数](./advanced_cli_parameters.md)中获取。
 
 ## 基于配置文件扩展 MinerU 功能
 

+ 5 - 2
pyproject.toml

@@ -39,6 +39,7 @@ dependencies = [
     "openai>=1.70.0,<2",
     "beautifulsoup4>=4.13.5,<5",
     "Pygments",
+    "mineru_vl_utils",
 ]
 
 [project.optional-dependencies]
@@ -50,10 +51,12 @@ test = [
     "fuzzywuzzy"
 ]
 vlm = [
-    "mineru_vl_utils[transformers]",
+    "torch>=2.6.0,<2.8.0",
+    "transformers>=4.51.1,<5.0.0",
+    "accelerate>=1.5.1",
 ]
 vllm = [
-    "mineru_vl_utils[vllm]",
+    "vllm==0.10.1.1",
 ]
 pipeline = [
     "matplotlib>=3.10,<4",