Parcourir la source

Merge pull request #3887 from opendatalab/release-2.6.3

Release 2.6.3
Xiaomeng Zhao il y a 2 semaines
Parent
commit
a33715c015

+ 3 - 3
.github/ISSUE_TEMPLATE/bug_report.yml

@@ -122,9 +122,9 @@ body:
       #multiple: false
       options:
         -
-        - "<2.2.0"
-        - "2.2.x"
-        - ">=2.5"
+        - "`<2.2.0`"
+        - "`2.2.x`"
+        - "`>=2.5`"
     validations:
       required: true
 

+ 68 - 35
README.md

@@ -44,6 +44,10 @@
 </div>
 
 # Changelog
+- 2025/10/31 2.6.3 Release
+  - Added support for a new backend `vlm-mlx-engine`, enabling MLX-accelerated inference for the MinerU2.5 model on Apple Silicon devices. Compared to the `vlm-transformers` backend, `vlm-mlx-engine` delivers a 100%–200% speed improvement.
+  - Bug fixes: #3849, #3859
+
 - 2025/10/24 2.6.2 Release
   - `pipeline` backend optimizations
     - Added experimental support for Chinese formulas, which can be enabled by setting the environment variable `export MINERU_FORMULA_CH_SUPPORT=1`. This feature may cause a slight decrease in MFR speed and failures in recognizing some long formulas. It is recommended to enable it only when parsing Chinese formulas is needed. To disable this feature, set the environment variable to `0`.
@@ -583,7 +587,7 @@ https://github.com/user-attachments/assets/4bea02c9-6d54-4cd6-97ed-dff14340982c
 - Automatically recognize and convert formulas in the document to LaTeX format.
 - Automatically recognize and convert tables in the document to HTML format.
 - Automatically detect scanned PDFs and garbled PDFs and enable OCR functionality.
-- OCR supports detection and recognition of 84 languages.
+- OCR supports detection and recognition of 109 languages.
 - Supports multiple output formats, such as multimodal and NLP Markdown, JSON sorted by reading order, and rich intermediate formats.
 - Supports various visualization results, including layout visualization and span visualization, for efficient confirmation of output quality.
 - Supports running in a pure CPU environment, and also supports GPU(CUDA)/NPU(CANN)/MPS acceleration
@@ -620,41 +624,70 @@ A WebUI developed based on Gradio, with a simple interface and only core parsing
 > In non-mainline environments, due to the diversity of hardware and software configurations, as well as third-party dependency compatibility issues, we cannot guarantee 100% project availability. Therefore, for users who wish to use this project in non-recommended environments, we suggest carefully reading the documentation and FAQ first. Most issues already have corresponding solutions in the FAQ. We also encourage community feedback to help us gradually expand support.
 
 <table>
-    <tr>
-        <td>Parsing Backend</td>
-        <td>pipeline</td>
-        <td>vlm-transformers</td>
-        <td>vlm-vllm</td>
-    </tr>
-    <tr>
-        <td>Operating System</td>
-        <td>Linux / Windows / macOS</td>
-        <td>Linux / Windows</td>
-        <td>Linux / Windows (via WSL2)</td>
-    </tr>
-    <tr>
-        <td>CPU Inference Support</td>
-        <td>✅</td>
-        <td colspan="2">❌</td>
-    </tr>
-    <tr>
-        <td>GPU Requirements</td>
-        <td>Turing architecture and later, 6GB+ VRAM or Apple Silicon</td>
-        <td colspan="2">Turing architecture and later, 8GB+ VRAM</td>
-    </tr>
-    <tr>
-        <td>Memory Requirements</td>
-        <td colspan="3">Minimum 16GB+, recommended 32GB+</td>
-    </tr>
-    <tr>
-        <td>Disk Space Requirements</td>
-        <td colspan="3">20GB+, SSD recommended</td>
-    </tr>
-    <tr>
-        <td>Python Version</td>
-        <td colspan="3">3.10-3.13</td>
-    </tr>
+    <thead>
+        <tr>
+            <th rowspan="2">Parsing Backend</th>
+            <th rowspan="2">pipeline <br> (Accuracy<sup>1</sup> 82+)</th>
+            <th colspan="4">vlm (Accuracy<sup>1</sup> 90+)</th>
+        </tr>
+        <tr>
+            <th>transformers</th>
+            <th>mlx-engine</th>
+            <th>vllm-engine / <br>vllm-async-engine</th>
+            <th>http-client</th>
+        </tr>
+    </thead>
+    <tbody>
+        <tr>
+            <th>Backend Features</th>
+            <td>Fast, no hallucinations</td>
+            <td>Good compatibility, <br>but slower</td>
+            <td>Faster than transformers</td>
+            <td>Fast, compatible with the vLLM ecosystem</td>
+            <td>Suitable for OpenAI-compatible servers<sup>5</sup></td>
+        </tr>
+        <tr>
+            <th>Operating System</th>
+            <td colspan="2" style="text-align:center;">Linux<sup>2</sup> / Windows / macOS</td>
+            <td style="text-align:center;">macOS<sup>3</sup></td>
+            <td style="text-align:center;">Linux<sup>2</sup> / Windows<sup>4</sup> </td>
+            <td>Any</td>
+        </tr>
+        <tr>
+            <th>CPU inference support</th>
+            <td colspan="2" style="text-align:center;">✅</td>
+            <td colspan="2" style="text-align:center;">❌</td>
+            <td>Not required</td>
+        </tr>
+        <tr>
+            <th>GPU Requirements</th><td colspan="2" style="text-align:center;">Volta or later architectures, 6 GB VRAM or more, or Apple Silicon</td>
+            <td>Apple Silicon</td>
+            <td>Volta or later architectures, 8 GB VRAM or more</td>
+            <td>Not required</td>
+        </tr>
+        <tr>
+            <th>Memory Requirements</th>
+            <td colspan="4" style="text-align:center;">Minimum 16 GB, 32 GB recommended</td>
+            <td>8 GB</td>
+        </tr>
+        <tr>
+            <th>Disk Space Requirements</th>
+            <td colspan="4" style="text-align:center;">20 GB or more, SSD recommended</td>
+            <td>2 GB</td>
+        </tr>
+        <tr>
+            <th>Python Version</th>
+            <td colspan="5" style="text-align:center;">3.10-3.13</td>
+        </tr>
+    </tbody>
 </table>
+ 
+<sup>1</sup> Accuracy metric is the End-to-End Evaluation Overall score of OmniDocBench (v1.5), tested on the latest `MinerU` version.   
+<sup>2</sup> Linux supports only distributions released in 2019 or later.  
+<sup>3</sup> MLX requires macOS 13.5 or later, recommended for use with version 14.0 or higher.
+<sup>4</sup> Windows vLLM support via WSL2(Windows Subsystem for Linux).  
+<sup>5</sup> Servers compatible with the OpenAI API, such as local or remote model services deployed via inference frameworks like `vLLM`, `SGLang`, or `LMDeploy`.
+
 
 ### Install MinerU
 

+ 73 - 36
README_zh-CN.md

@@ -44,6 +44,10 @@
 </div>
 
 # 更新记录
+- 2025/10/31 2.6.3 发布
+  - 增加新后端`vlm-mlx-engine`支持,在Apple Silicon设备上支持使用`MLX`加速`MinerU2.5`模型推理,相比`vlm-transformers`后端,`vlm-mlx-engine`后端速度提升100%~200%。
+  - bug修复:  #3849  #3859
+
 - 2025/10/24 2.6.2 发布
   - `pipline`后端优化
     - 增加对中文公式的实验性支持,可通过配置环境变量`export MINERU_FORMULA_CH_SUPPORT=1`开启。该功能可能会导致MFR速率略微下降、部分长公式识别失败等问题,建议仅在需要解析中文公式的场景下开启。如需关闭该功能,可将环境变量设置为`0`。
@@ -570,7 +574,7 @@ https://github.com/user-attachments/assets/4bea02c9-6d54-4cd6-97ed-dff14340982c
 - 自动识别并转换文档中的公式为LaTeX格式
 - 自动识别并转换文档中的表格为HTML格式
 - 自动检测扫描版PDF和乱码PDF,并启用OCR功能
-- OCR支持84种语言的检测与识别
+- OCR支持109种语言的检测与识别
 - 支持多种输出格式,如多模态与NLP的Markdown、按阅读顺序排序的JSON、含有丰富信息的中间格式等
 - 支持多种可视化结果,包括layout可视化、span可视化等,便于高效确认输出效果与质检
 - 支持纯CPU环境运行,并支持 GPU(CUDA)/NPU(CANN)/MPS 加速
@@ -605,42 +609,75 @@ https://github.com/user-attachments/assets/4bea02c9-6d54-4cd6-97ed-dff14340982c
 >
 > 在非主线环境中,由于硬件、软件配置的多样性,以及第三方依赖项的兼容性问题,我们无法100%保证项目的完全可用性。因此,对于希望在非推荐环境中使用本项目的用户,我们建议先仔细阅读文档以及FAQ,大多数问题已经在FAQ中有对应的解决方案,除此之外我们鼓励社区反馈问题,以便我们能够逐步扩大支持范围。
 
+
 <table>
-    <tr>
-        <td>解析后端</td>
-        <td>pipeline</td>
-        <td>vlm-transformers</td>
-        <td>vlm-vllm</td>
-    </tr>
-    <tr>
-        <td>操作系统</td>
-        <td>Linux / Windows / macOS</td>
-        <td>Linux / Windows</td>
-        <td>Linux / Windows (via WSL2)</td>
-    </tr>
-    <tr>
-        <td>CPU推理支持</td>
-        <td>✅</td>
-        <td colspan="2">❌</td>
-    </tr>
-    <tr>
-        <td>GPU要求</td>
-        <td>Turing及以后架构,6G显存以上或Apple Silicon</td>
-        <td colspan="2">Turing及以后架构,8G显存以上</td>
-    </tr>
-    <tr>
-        <td>内存要求</td>
-        <td colspan="3">最低16G以上,推荐32G以上</td>
-    </tr>
-    <tr>
-        <td>磁盘空间要求</td>
-        <td colspan="3">20G以上,推荐使用SSD</td>
-    </tr>
-    <tr>
-        <td>python版本</td>
-        <td colspan="3">3.10-3.13</td>
-    </tr>
-</table>
+    <thead>
+        <tr>
+            <th rowspan="2">解析后端</th>
+            <th rowspan="2">pipeline <br> (精度<sup>1</sup> 82+)</th>
+            <th colspan="4">vlm (精度<sup>1</sup> 90+)</th>
+        </tr>
+        <tr>
+            <th>transformers</th>
+            <th>mlx-engine</th>
+            <th>vllm-engine / <br>vllm-async-engine</th>
+            <th>http-client</th>
+        </tr>
+    </thead>
+    <tbody>
+        <tr>
+            <th>后端特性</th>
+            <td>速度快, 无幻觉</td>
+            <td>兼容性好, 速度较慢</td>
+            <td>比transformers快</td>
+            <td>速度快, 兼容vllm生态</td>
+            <td>适用于OpenAI兼容服务器<sup>5</sup></td>
+        </tr>
+        <tr>
+            <th>操作系统</th>
+            <td colspan="2" style="text-align:center;">Linux<sup>2</sup> / Windows / macOS</td>
+            <td style="text-align:center;">macOS<sup>3</sup></td>
+            <td style="text-align:center;">Linux<sup>2</sup> / Windows<sup>4</sup> </td>
+            <td>不限</td>
+        </tr>
+        <tr>
+            <th>CPU推理支持</th>
+            <td colspan="2" style="text-align:center;">✅</td>
+            <td colspan="2" style="text-align:center;">❌</td>
+            <td >不需要</td>
+        </tr>
+        <tr>
+            <th>GPU要求</th><td colspan="2" style="text-align:center;">Volta及以后架构, 6G显存以上或Apple Silicon</td>
+            <td>Apple Silicon</td>
+            <td>Volta及以后架构, 8G显存以上</td>
+            <td>不需要</td>
+        </tr>
+        <tr>
+            <th>内存要求</th>
+            <td colspan="4" style="text-align:center;">最低16GB以上, 推荐32GB以上</td>
+            <td>8GB</td>
+        </tr>
+        <tr>
+            <th>磁盘空间要求</th>
+            <td colspan="4" style="text-align:center;">20GB以上, 推荐使用SSD</td>
+            <td>2GB</td>
+        </tr>
+        <tr>
+            <th>python版本</th>
+            <td colspan="5" style="text-align:center;">3.10-3.13</td>
+        </tr>
+    </tbody>
+</table> 
+
+<sup>1</sup> 精度指标为OmniDocBench (v1.5)的End-to-End Evaluation Overall分数,基于`MinerU`最新版本测试  
+<sup>2</sup> Linux仅支持2019年及以后发行版
+<sup>3</sup> MLX需macOS 13.5及以上版本支持,推荐14.0以上版本使用
+<sup>4</sup> Windows vLLM通过WSL2(适用于 Linux 的 Windows 子系统)实现支持  
+<sup>5</sup> 兼容OpenAI API的服务器,如通过`vLLM`/`SGLang`/`LMDeploy`等推理框架部署的本地模型服务器或远程模型服务
+
+> [!TIP]
+> 除以上主流环境与平台外,我们也收录了一些社区用户反馈的其他平台支持情况,详情请参考[其他加速卡适配](https://opendatalab.github.io/MinerU/zh/usage/)。  
+> 如果您有意将自己的环境适配经验分享给社区,欢迎通过[show-and-tell](https://github.com/opendatalab/MinerU/discussions/categories/show-and-tell)提交或提交PR至[其他加速卡适配](https://github.com/opendatalab/MinerU/tree/master/docs/zh/usage/acceleration_cards)文档。
 
 ### 安装 MinerU
 

+ 1 - 0
demo/demo.py

@@ -235,5 +235,6 @@ if __name__ == '__main__':
 
     """To enable VLM mode, change the backend to 'vlm-xxx'"""
     # parse_doc(doc_path_list, output_dir, backend="vlm-transformers")  # more general.
+    # parse_doc(doc_path_list, output_dir, backend="vlm-mlx-engine")  # faster than transformers in macOS 13.5+.
     # parse_doc(doc_path_list, output_dir, backend="vlm-vllm-engine")  # faster(engine).
     # parse_doc(doc_path_list, output_dir, backend="vlm-http-client", server_url="http://127.0.0.1:30000")  # faster(client).

+ 1 - 1
docs/en/index.md

@@ -57,7 +57,7 @@ Compared to well-known commercial products domestically and internationally, Min
 - Automatically identify and convert formulas in documents to LaTeX format
 - Automatically identify and convert tables in documents to HTML format
 - Automatically detect scanned PDFs and garbled PDFs, and enable OCR functionality
-- OCR supports detection and recognition of 84 languages
+- OCR supports detection and recognition of 109 languages
 - Support multiple output formats, such as multimodal and NLP Markdown, reading-order-sorted JSON, and information-rich intermediate formats
 - Support multiple visualization results, including layout visualization, span visualization, etc., for efficient confirmation of output effects and quality inspection
 - Support pure CPU environment operation, and support GPU(CUDA)/NPU(CANN)/MPS acceleration

+ 2 - 2
docs/en/quick_start/extension_modules.md

@@ -6,7 +6,7 @@ MinerU supports installing extension modules on demand based on different needs
 ### Core Functionality Installation
 The `core` module is the core dependency of MinerU, containing all functional modules except `vllm`. Installing this module ensures the basic functionality of MinerU works properly.
 ```bash
-uv pip install mineru[core]
+uv pip install "mineru[core]"
 ```
 
 ---
@@ -15,7 +15,7 @@ uv pip install mineru[core]
 The `vllm` module provides acceleration support for VLM model inference, suitable for graphics cards with Turing architecture and later (8GB+ VRAM). Installing this module can significantly improve model inference speed.
 In the configuration, `all` includes both `core` and `vllm` modules, so `mineru[all]` and `mineru[core,vllm]` are equivalent.
 ```bash
-uv pip install mineru[all]
+uv pip install "mineru[all]"
 ```
 > [!TIP]
 > If exceptions occur during installation of the complete package including vllm, please refer to the [vllm official documentation](https://docs.vllm.ai/en/latest/getting_started/installation/index.html) to try to resolve the issue, or directly use the [Docker](./docker_deployment.md) deployment method.

+ 62 - 34
docs/en/quick_start/index.md

@@ -27,41 +27,69 @@ A WebUI developed based on Gradio, with a simple interface and only core parsing
 > In non-mainstream environments, due to the diversity of hardware and software configurations, as well as compatibility issues with third-party dependencies, we cannot guarantee 100% usability of the project. Therefore, for users who wish to use this project in non-recommended environments, we suggest carefully reading the documentation and FAQ first, as most issues have corresponding solutions in the FAQ. Additionally, we encourage community feedback on issues so that we can gradually expand our support range.
 
 <table border="1">
-    <tr>
-        <td>Parsing Backend</td>
-        <td>pipeline</td>
-        <td>vlm-transformers</td>
-        <td>vlm-vllm</td>
-    </tr>
-    <tr>
-        <td>Operating System</td>
-        <td>Linux / Windows / macOS</td>
-        <td>Linux / Windows</td>
-        <td>Linux / Windows (via WSL2)</td>
-    </tr>
-    <tr>
-        <td>CPU Inference Support</td>
-        <td>✅</td>
-        <td colspan="2">❌</td>
-    </tr>
-    <tr>
-        <td>GPU Requirements</td>
-        <td>Turing architecture and later, 6GB+ VRAM or Apple Silicon</td>
-        <td colspan="2">Turing architecture and later, 8GB+ VRAM</td>
-    </tr>
-    <tr>
-        <td>Memory Requirements</td>
-        <td colspan="3">Minimum 16GB+, recommended 32GB+</td>
-    </tr>
-    <tr>
-        <td>Disk Space Requirements</td>
-        <td colspan="3">20GB+, SSD recommended</td>
-    </tr>
-    <tr>
-        <td>Python Version</td>
-        <td colspan="3">3.10-3.13</td>
-    </tr>
+    <thead>
+        <tr>
+            <th rowspan="2">Parsing Backend</th>
+            <th rowspan="2">pipeline <br> (Accuracy<sup>1</sup> 82+)</th>
+            <th colspan="4">vlm (Accuracy<sup>1</sup> 90+)</th>
+        </tr>
+        <tr>
+            <th>transformers</th>
+            <th>mlx-engine</th>
+            <th>vllm-engine / <br>vllm-async-engine</th>
+            <th>http-client</th>
+        </tr>
+    </thead>
+    <tbody>
+        <tr>
+            <th>Backend Features</th>
+            <td>Fast, no hallucinations</td>
+            <td>Good compatibility, <br>but slower</td>
+            <td>Faster than transformers</td>
+            <td>Fast, compatible with the vLLM ecosystem</td>
+            <td>Suitable for OpenAI-compatible servers<sup>5</sup></td>
+        </tr>
+        <tr>
+            <th>Operating System</th>
+            <td colspan="2" style="text-align:center;">Linux<sup>2</sup> / Windows / macOS</td>
+            <td style="text-align:center;">macOS<sup>3</sup></td>
+            <td style="text-align:center;">Linux<sup>2</sup> / Windows<sup>4</sup> </td>
+            <td>Any</td>
+        </tr>
+        <tr>
+            <th>CPU inference support</th>
+            <td colspan="2" style="text-align:center;">✅</td>
+            <td colspan="2" style="text-align:center;">❌</td>
+            <td>Not required</td>
+        </tr>
+        <tr>
+            <th>GPU Requirements</th><td colspan="2" style="text-align:center;">Volta or later architectures, 6 GB VRAM or more, or Apple Silicon</td>
+            <td>Apple Silicon</td>
+            <td>Volta or later architectures, 8 GB VRAM or more</td>
+            <td>Not required</td>
+        </tr>
+        <tr>
+            <th>Memory Requirements</th>
+            <td colspan="4" style="text-align:center;">Minimum 16 GB, 32 GB recommended</td>
+            <td>8 GB</td>
+        </tr>
+        <tr>
+            <th>Disk Space Requirements</th>
+            <td colspan="4" style="text-align:center;">20 GB or more, SSD recommended</td>
+            <td>2 GB</td>
+        </tr>
+        <tr>
+            <th>Python Version</th>
+            <td colspan="5" style="text-align:center;">3.10-3.13</td>
+        </tr>
+    </tbody>
 </table>
+ 
+<sup>1</sup> Accuracy metric is the End-to-End Evaluation Overall score of OmniDocBench (v1.5), tested on the latest `MinerU` version.   
+<sup>2</sup> Linux supports only distributions released in 2019 or later.  
+<sup>3</sup> MLX requires macOS 13.5 or later, recommended for use with version 14.0 or higher.  
+<sup>4</sup> Windows vLLM support via WSL2(Windows Subsystem for Linux).  
+<sup>5</sup> Servers compatible with the OpenAI API, such as local or remote model services deployed via inference frameworks like `vLLM`, `SGLang`, or `LMDeploy`.  
 
 ### Install MinerU
 

+ 2 - 2
docs/en/usage/cli_tools.md

@@ -20,7 +20,7 @@ Options:
   -e, --end INTEGER               Ending page number for parsing (0-based)
   -f, --formula BOOLEAN           Enable formula parsing (default: enabled)
   -t, --table BOOLEAN             Enable table parsing (default: enabled)
-  -d, --device TEXT               Inference device (e.g., cpu/cuda/cuda:0/npu/mps, pipeline backend only)
+  -d, --device TEXT               Inference device (e.g., cpu/cuda/cuda:0/npu/mps, pipeline and vlm-transformers backend only)
   --vram INTEGER                  Maximum GPU VRAM usage per process (GB) (pipeline backend only)
   --source [huggingface|modelscope|local]
                                   Model source, default: huggingface
@@ -68,7 +68,7 @@ Here are the environment variables and their descriptions:
 - `MINERU_DEVICE_MODE`:
     * Used to specify inference device
     * supports device types like `cpu/cuda/cuda:0/npu/mps`
-    * only effective for `pipeline` backend.
+    * only effective for `pipeline` and `vlm-transformers` backends.
   
 - `MINERU_VIRTUAL_VRAM_SIZE`: 
     * Used to specify maximum GPU VRAM usage per process (GB)

+ 21 - 1
docs/en/usage/quick_usage.md

@@ -83,8 +83,28 @@ Here are some available configuration options:
   
 - `llm-aided-config`:
     * Used to configure parameters for LLM-assisted title hierarchy
-    * Compatible with all LLM models supporting `openai protocol`, defaults to using Alibaba Cloud Bailian's `qwen2.5-32b-instruct` model. 
+    * Compatible with all LLM models supporting `openai protocol`, defaults to using Alibaba Cloud Bailian's `qwen3-next-80b-a3b-instruct` model. 
     * You need to configure your own API key and set `enable` to `true` to enable this feature.
+    * If your API provider does not support the `enable_thinking` parameter, please manually remove it.
+        * For example, in your configuration file, the `llm-aided-config` section may look like:
+          ```json
+          "llm-aided-config": {
+             "api_key": "your_api_key",
+             "base_url": "https://dashscope.aliyuncs.com/compatible-mode/v1",
+             "model": "qwen3-next-80b-a3b-instruct",
+             "enable_thinking": false,
+             "enable": false
+          }
+          ```
+        * To remove the `enable_thinking` parameter, simply delete the line containing `"enable_thinking": false`, resulting in:
+          ```json
+          "llm-aided-config": {
+             "api_key": "your_api_key",
+             "base_url": "https://dashscope.aliyuncs.com/compatible-mode/v1",
+             "model": "qwen3-next-80b-a3b-instruct",
+             "enable": false
+          }
+          ```
   
 - `models-dir`: 
     * Used to specify local model storage directory

+ 1 - 1
docs/zh/index.md

@@ -56,7 +56,7 @@ MinerU诞生于[书生-浦语](https://github.com/InternLM/InternLM)的预训练
 - 自动识别并转换文档中的公式为LaTeX格式
 - 自动识别并转换文档中的表格为HTML格式
 - 自动检测扫描版PDF和乱码PDF,并启用OCR功能
-- OCR支持84种语言的检测与识别
+- OCR支持109种语言的检测与识别
 - 支持多种输出格式,如多模态与NLP的Markdown、按阅读顺序排序的JSON、含有丰富信息的中间格式等
 - 支持多种可视化结果,包括layout可视化、span可视化等,便于高效确认输出效果与质检
 - 支持纯CPU环境运行,并支持 GPU(CUDA)/NPU(CANN)/MPS 加速

+ 2 - 2
docs/zh/quick_start/extension_modules.md

@@ -6,7 +6,7 @@ MinerU 支持根据不同需求,按需安装扩展模块,以增强功能或
 ### 核心功能安装
 `core` 模块是 MinerU 的核心依赖,包含了除`vllm`外的所有功能模块。安装此模块可以确保 MinerU 的基本功能正常运行。
 ```bash
-uv pip install mineru[core]
+uv pip install "mineru[core]"
 ```
 
 ---
@@ -15,7 +15,7 @@ uv pip install mineru[core]
 `vllm` 模块提供了对 VLM 模型推理的加速支持,适用于具有 Turing 及以后架构的显卡(8G 显存及以上)。安装此模块可以显著提升模型推理速度。
 在配置中,`all`包含了`core`和`vllm`模块,因此`mineru[all]`和`mineru[core,vllm]`是等价的。
 ```bash
-uv pip install mineru[all]
+uv pip install "mineru[all]"
 ```
 > [!TIP]
 > 如在安装包含vllm的完整包过程中发生异常,请参考 [vllm 官方文档](https://docs.vllm.ai/en/latest/getting_started/installation/index.html) 尝试解决,或直接使用 [Docker](./docker_deployment.md) 方式部署镜像。

+ 69 - 36
docs/zh/quick_start/index.md

@@ -26,42 +26,75 @@
 >
 > 在非主线环境中,由于硬件、软件配置的多样性,以及第三方依赖项的兼容性问题,我们无法100%保证项目的完全可用性。因此,对于希望在非推荐环境中使用本项目的用户,我们建议先仔细阅读文档以及FAQ,大多数问题已经在FAQ中有对应的解决方案,除此之外我们鼓励社区反馈问题,以便我们能够逐步扩大支持范围。
 
-<table border="1">
-    <tr>
-        <td>解析后端</td>
-        <td>pipeline</td>
-        <td>vlm-transformers</td>
-        <td>vlm-vllm</td>
-    </tr>
-    <tr>
-        <td>操作系统</td>
-        <td>Linux / Windows / macOS</td>
-        <td>Linux / Windows</td>
-        <td>Linux / Windows (via WSL2)</td>
-    </tr>
-    <tr>
-        <td>CPU推理支持</td>
-        <td>✅</td>
-        <td colspan="2">❌</td>
-    </tr>
-    <tr>
-        <td>GPU要求</td>
-        <td>Turing及以后架构,6G显存以上或Apple Silicon</td>
-        <td colspan="2">Turing及以后架构,8G显存以上</td>
-    </tr>
-    <tr>
-        <td>内存要求</td>
-        <td colspan="3">最低16G以上,推荐32G以上</td>
-    </tr>
-    <tr>
-        <td>磁盘空间要求</td>
-        <td colspan="3">20G以上,推荐使用SSD</td>
-    </tr>
-    <tr>
-        <td>python版本</td>
-        <td colspan="3">3.10-3.13</td>
-    </tr>
-</table>
+<table>
+    <thead>
+        <tr>
+            <th rowspan="2">解析后端</th>
+            <th rowspan="2">pipeline <br> (精度<sup>1</sup> 82+)</th>
+            <th colspan="4">vlm (精度<sup>1</sup> 90+)</th>
+        </tr>
+        <tr>
+            <th>transformers</th>
+            <th>mlx-engine</th>
+            <th>vllm-engine / <br>vllm-async-engine</th>
+            <th>http-client</th>
+        </tr>
+    </thead>
+    <tbody>
+        <tr>
+            <th>后端特性</th>
+            <td>速度快, 无幻觉</td>
+            <td>兼容性好, 速度较慢</td>
+            <td>比transformers快</td>
+            <td>速度快, 兼容vllm生态</td>
+            <td>适用于OpenAI兼容服务器<sup>5</sup></td>
+        </tr>
+        <tr>
+            <th>操作系统</th>
+            <td colspan="2" style="text-align:center;">Linux<sup>2</sup> / Windows / macOS</td>
+            <td style="text-align:center;">macOS<sup>3</sup></td>
+            <td style="text-align:center;">Linux<sup>2</sup> / Windows<sup>4</sup> </td>
+            <td>不限</td>
+        </tr>
+        <tr>
+            <th>CPU推理支持</th>
+            <td colspan="2" style="text-align:center;">✅</td>
+            <td colspan="2" style="text-align:center;">❌</td>
+            <td >不需要</td>
+        </tr>
+        <tr>
+            <th>GPU要求</th><td colspan="2" style="text-align:center;">Volta及以后架构, 6G显存以上或Apple Silicon</td>
+            <td>Apple Silicon</td>
+            <td>Volta及以后架构, 8G显存以上</td>
+            <td>不需要</td>
+        </tr>
+        <tr>
+            <th>内存要求</th>
+            <td colspan="4" style="text-align:center;">最低16GB以上, 推荐32GB以上</td>
+            <td>8GB</td>
+        </tr>
+        <tr>
+            <th>磁盘空间要求</th>
+            <td colspan="4" style="text-align:center;">20GB以上, 推荐使用SSD</td>
+            <td>2GB</td>
+        </tr>
+        <tr>
+            <th>python版本</th>
+            <td colspan="5" style="text-align:center;">3.10-3.13</td>
+        </tr>
+    </tbody>
+</table> 
+
+<sup>1</sup> 精度指标为OmniDocBench (v1.5)的End-to-End Evaluation Overall分数,基于`MinerU`最新版本测试  
+<sup>2</sup> Linux仅支持2019年及以后发行版  
+<sup>3</sup> MLX需macOS 13.5及以上版本支持,推荐14.0以上版本使用  
+<sup>4</sup> Windows vLLM通过WSL2(适用于 Linux 的 Windows 子系统)实现支持  
+<sup>5</sup> 兼容OpenAI API的服务器,如通过`vLLM`/`SGLang`/`LMDeploy`等推理框架部署的本地模型服务器或远程模型服务  
+
+> [!TIP]
+> 除以上主流环境与平台外,我们也收录了一些社区用户反馈的其他平台支持情况,详情请参考[其他加速卡适配](https://opendatalab.github.io/MinerU/zh/usage/)。  
+> 如果您有意将自己的环境适配经验分享给社区,欢迎通过[show-and-tell](https://github.com/opendatalab/MinerU/discussions/categories/show-and-tell)提交或提交PR至[其他加速卡适配](https://github.com/opendatalab/MinerU/tree/master/docs/zh/usage/acceleration_cards)文档。
+
 
 ### 安装 MinerU
 

+ 22 - 2
docs/zh/usage/quick_usage.md

@@ -82,8 +82,28 @@ MinerU 现已实现开箱即用,但也支持通过配置文件扩展功能。
   
 - `llm-aided-config`:
     * 用于配置 LLM 辅助标题分级的相关参数,兼容所有支持`openai协议`的 LLM 模型
-    * 默认使用`阿里云百炼`的`qwen2.5-32b-instruct`模型
-    * 您需要自行配置 API 密钥并将`enable`设置为`true`来启用此功能。
+    * 默认使用`阿里云百炼`的`qwen3-next-80b-a3b-instruct`模型
+    * 您需要自行配置 API 密钥并将`enable`设置为`true`来启用此功能
+    * 如果您的api供应商不支持`enable_thinking`参数,请手动将该参数删除
+        * 例如,在您的配置文件中,`llm-aided-config` 部分可能如下所示:
+          ```json
+          "llm-aided-config": {
+             "api_key": "your_api_key",
+             "base_url": "https://dashscope.aliyuncs.com/compatible-mode/v1",
+             "model": "qwen3-next-80b-a3b-instruct",
+             "enable_thinking": false,
+             "enable": false
+          }
+          ```
+        * 要移除`enable_thinking`参数,只需删除包含`"enable_thinking": false`的那一行,结果如下:
+          ```json
+          "llm-aided-config": {
+             "api_key": "your_api_key",
+             "base_url": "https://dashscope.aliyuncs.com/compatible-mode/v1",
+             "model": "qwen3-next-80b-a3b-instruct",
+             "enable": false
+          }
+          ```
   
 - `models-dir`:
     * 用于指定本地模型存储目录,请为`pipeline`和`vlm`后端分别指定模型目录,

+ 2 - 1
mineru.template.json

@@ -18,6 +18,7 @@
             "api_key": "your_api_key",
             "base_url": "https://dashscope.aliyuncs.com/compatible-mode/v1",
             "model": "qwen3-next-80b-a3b-instruct",
+            "enable_thinking": false,
             "enable": false
         }
     },
@@ -25,5 +26,5 @@
         "pipeline": "",
         "vlm": ""
     },
-    "config_version": "1.3.0"
+    "config_version": "1.3.1"
 }

+ 20 - 43
mineru/backend/pipeline/batch_analyze.py

@@ -281,28 +281,20 @@ class BatchAnalyze:
 
                 # 按分辨率分组并同时完成padding
                 # RESOLUTION_GROUP_STRIDE = 32
-                RESOLUTION_GROUP_STRIDE = 64  # 定义分辨率分组的步进值
+                RESOLUTION_GROUP_STRIDE = 64
 
                 resolution_groups = defaultdict(list)
                 for crop_info in lang_crop_list:
                     cropped_img = crop_info[0]
                     h, w = cropped_img.shape[:2]
-                    # 使用更大的分组容差,减少分组数量
-                    # 将尺寸标准化到32的倍数
-                    normalized_h = ((h + RESOLUTION_GROUP_STRIDE) // RESOLUTION_GROUP_STRIDE) * RESOLUTION_GROUP_STRIDE  # 向上取整到32的倍数
-                    normalized_w = ((w + RESOLUTION_GROUP_STRIDE) // RESOLUTION_GROUP_STRIDE) * RESOLUTION_GROUP_STRIDE
-                    group_key = (normalized_h, normalized_w)
+                    # 直接计算目标尺寸并用作分组键
+                    target_h = ((h + RESOLUTION_GROUP_STRIDE - 1) // RESOLUTION_GROUP_STRIDE) * RESOLUTION_GROUP_STRIDE
+                    target_w = ((w + RESOLUTION_GROUP_STRIDE - 1) // RESOLUTION_GROUP_STRIDE) * RESOLUTION_GROUP_STRIDE
+                    group_key = (target_h, target_w)
                     resolution_groups[group_key].append(crop_info)
 
                 # 对每个分辨率组进行批处理
-                for group_key, group_crops in tqdm(resolution_groups.items(), desc=f"OCR-det {lang}"):
-
-                    # 计算目标尺寸(组内最大尺寸,向上取整到32的倍数)
-                    max_h = max(crop_info[0].shape[0] for crop_info in group_crops)
-                    max_w = max(crop_info[0].shape[1] for crop_info in group_crops)
-                    target_h = ((max_h + RESOLUTION_GROUP_STRIDE - 1) // RESOLUTION_GROUP_STRIDE) * RESOLUTION_GROUP_STRIDE
-                    target_w = ((max_w + RESOLUTION_GROUP_STRIDE - 1) // RESOLUTION_GROUP_STRIDE) * RESOLUTION_GROUP_STRIDE
-
+                for (target_h, target_w), group_crops in tqdm(resolution_groups.items(), desc=f"OCR-det {lang}"):
                     # 对所有图像进行padding到统一尺寸
                     batch_images = []
                     for crop_info in group_crops:
@@ -310,49 +302,34 @@ class BatchAnalyze:
                         h, w = img.shape[:2]
                         # 创建目标尺寸的白色背景
                         padded_img = np.ones((target_h, target_w, 3), dtype=np.uint8) * 255
-                        # 将原图像粘贴到左上角
                         padded_img[:h, :w] = img
                         batch_images.append(padded_img)
 
                     # 批处理检测
-                    det_batch_size = min(len(batch_images), self.batch_ratio * OCR_DET_BASE_BATCH_SIZE)  # 增加批处理大小
-                    # logger.debug(f"OCR-det batch: {det_batch_size} images, target size: {target_h}x{target_w}")
+                    det_batch_size = min(len(batch_images), self.batch_ratio * OCR_DET_BASE_BATCH_SIZE)
                     batch_results = ocr_model.text_detector.batch_predict(batch_images, det_batch_size)
 
                     # 处理批处理结果
-                    for i, (crop_info, (dt_boxes, elapse)) in enumerate(zip(group_crops, batch_results)):
+                    for crop_info, (dt_boxes, _) in zip(group_crops, batch_results):
                         bgr_image, useful_list, ocr_res_list_dict, res, adjusted_mfdetrec_res, _lang = crop_info
 
                         if dt_boxes is not None and len(dt_boxes) > 0:
-                            # 直接应用原始OCR流程中的关键处理步骤
-
-                            # 1. 排序检测框
-                            if len(dt_boxes) > 0:
-                                dt_boxes_sorted = sorted_boxes(dt_boxes)
-                            else:
-                                dt_boxes_sorted = []
-
-                            # 2. 合并相邻检测框
-                            if dt_boxes_sorted:
-                                dt_boxes_merged = merge_det_boxes(dt_boxes_sorted)
-                            else:
-                                dt_boxes_merged = []
-
-                            # 3. 根据公式位置更新检测框(关键步骤!)
-                            if dt_boxes_merged and adjusted_mfdetrec_res:
-                                dt_boxes_final = update_det_boxes(dt_boxes_merged, adjusted_mfdetrec_res)
-                            else:
-                                dt_boxes_final = dt_boxes_merged
-
-                            # 构造OCR结果格式
-                            ocr_res = [box.tolist() if hasattr(box, 'tolist') else box for box in dt_boxes_final]
-
-                            if ocr_res:
+                            # 处理检测框
+                            dt_boxes_sorted = sorted_boxes(dt_boxes)
+                            dt_boxes_merged = merge_det_boxes(dt_boxes_sorted) if dt_boxes_sorted else []
+
+                            # 根据公式位置更新检测框
+                            dt_boxes_final = (update_det_boxes(dt_boxes_merged, adjusted_mfdetrec_res)
+                                              if dt_boxes_merged and adjusted_mfdetrec_res
+                                              else dt_boxes_merged)
+
+                            if dt_boxes_final:
+                                ocr_res = [box.tolist() if hasattr(box, 'tolist') else box for box in dt_boxes_final]
                                 ocr_result_list = get_ocr_result_list(
                                     ocr_res, useful_list, ocr_res_list_dict['ocr_enable'], bgr_image, _lang
                                 )
-
                                 ocr_res_list_dict['layout_res'].extend(ocr_result_list)
+
         else:
             # 原始单张处理模式
             for ocr_res_list_dict in tqdm(ocr_res_list_all_page, desc="OCR-det Predict"):

+ 1 - 1
mineru/backend/pipeline/model_init.py

@@ -8,7 +8,7 @@ from ...model.layout.doclayoutyolo import DocLayoutYOLOModel
 from ...model.mfd.yolo_v8 import YOLOv8MFDModel
 from ...model.mfr.unimernet.Unimernet import UnimernetModel
 from ...model.mfr.pp_formulanet_plus_m.predict_formula import FormulaRecognizer
-from ...model.ocr.paddleocr2pytorch.pytorch_paddle import PytorchPaddleOCR
+from mineru.model.ocr.pytorch_paddle import PytorchPaddleOCR
 from ...model.ori_cls.paddle_ori_cls import PaddleOrientationClsModel
 from ...model.table.cls.paddle_table_cls import PaddleTableClsModel
 # from ...model.table.rec.RapidTable import RapidTableModel

+ 1 - 1
mineru/backend/pipeline/model_json_to_middle_json.py

@@ -148,7 +148,7 @@ def page_model_info_to_page_info(page_model_info, image_dict, page, image_writer
     fix_discarded_blocks = fix_discarded_block(discarded_block_with_spans)
 
     """如果当前页面没有有效的bbox则跳过"""
-    if len(all_bboxes) == 0:
+    if len(all_bboxes) == 0 and len(fix_discarded_blocks) == 0:
         return None
 
     """对image/table/interline_equation截图"""

+ 17 - 4
mineru/backend/pipeline/pipeline_middle_json_mkcontent.py

@@ -191,11 +191,20 @@ def merge_para_with_text(para_block):
 def make_blocks_to_content_list(para_block, img_buket_path, page_idx, page_size):
     para_type = para_block['type']
     para_content = {}
-    if para_type in [BlockType.TEXT, BlockType.LIST, BlockType.INDEX]:
+    if para_type in [
+        BlockType.TEXT,
+        BlockType.LIST,
+        BlockType.INDEX,
+    ]:
         para_content = {
             'type': ContentType.TEXT,
             'text': merge_para_with_text(para_block),
         }
+    elif para_type == BlockType.DISCARDED:
+        para_content = {
+            'type': para_type,
+            'text': merge_para_with_text(para_block),
+        }
     elif para_type == BlockType.TITLE:
         para_content = {
             'type': ContentType.TEXT,
@@ -268,15 +277,19 @@ def union_make(pdf_info_dict: list,
     output_content = []
     for page_info in pdf_info_dict:
         paras_of_layout = page_info.get('para_blocks')
+        paras_of_discarded = page_info.get('discarded_blocks')
         page_idx = page_info.get('page_idx')
         page_size = page_info.get('page_size')
-        if not paras_of_layout:
-            continue
         if make_mode in [MakeMode.MM_MD, MakeMode.NLP_MD]:
+            if not paras_of_layout:
+                continue
             page_markdown = make_blocks_to_markdown(paras_of_layout, make_mode, img_buket_path)
             output_content.extend(page_markdown)
         elif make_mode == MakeMode.CONTENT_LIST:
-            for para_block in paras_of_layout:
+            para_blocks = (paras_of_layout or []) + (paras_of_discarded or [])
+            if not para_blocks:
+                continue
+            for para_block in para_blocks:
                 para_content = make_blocks_to_content_list(para_block, img_buket_path, page_idx, page_size)
                 if para_content:
                     output_content.append(para_content)

+ 11 - 1
mineru/backend/vlm/vlm_analyze.py

@@ -8,6 +8,7 @@ from .utils import enable_custom_logits_processors, set_default_gpu_memory_utili
 from .model_output_to_middle_json import result_to_middle_json
 from ...data.data_reader_writer import DataWriter
 from mineru.utils.pdf_image_tools import load_images_from_pdf
+from ...utils.check_mac_env import is_mac_os_version_supported
 from ...utils.config_reader import get_device
 
 from ...utils.enum_class import ImageType
@@ -47,7 +48,7 @@ class ModelSingleton:
             for param in ["batch_size", "max_concurrency", "http_timeout"]:
                 if param in kwargs:
                     del kwargs[param]
-            if backend in ['transformers', 'vllm-engine', "vllm-async-engine"] and not model_path:
+            if backend in ['transformers', 'vllm-engine', "vllm-async-engine", "mlx-engine"] and not model_path:
                 model_path = auto_download_and_get_model_root_path("/","vlm")
                 if backend == "transformers":
                     try:
@@ -75,6 +76,15 @@ class ModelSingleton:
                     )
                     if batch_size == 0:
                         batch_size = set_default_batch_size()
+                elif backend == "mlx-engine":
+                    mlx_supported = is_mac_os_version_supported()
+                    if not mlx_supported:
+                        raise EnvironmentError("mlx-engine backend is only supported on macOS 13.5+ with Apple Silicon.")
+                    try:
+                        from mlx_vlm import load as mlx_load
+                    except ImportError:
+                        raise ImportError("Please install mlx-vlm to use the mlx-engine backend.")
+                    model, processor = mlx_load(model_path)
                 else:
                     if os.getenv('OMP_NUM_THREADS') is None:
                         os.environ["OMP_NUM_THREADS"] = "1"

+ 6 - 3
mineru/backend/vlm/vlm_middle_json_mkcontent.py

@@ -248,13 +248,16 @@ def union_make(pdf_info_dict: list,
         paras_of_discarded = page_info.get('discarded_blocks')
         page_idx = page_info.get('page_idx')
         page_size = page_info.get('page_size')
-        if not paras_of_layout:
-            continue
         if make_mode in [MakeMode.MM_MD, MakeMode.NLP_MD]:
+            if not paras_of_layout:
+                continue
             page_markdown = mk_blocks_to_markdown(paras_of_layout, make_mode, formula_enable, table_enable, img_buket_path)
             output_content.extend(page_markdown)
         elif make_mode == MakeMode.CONTENT_LIST:
-            for para_block in paras_of_layout+paras_of_discarded:
+            para_blocks = (paras_of_layout or []) + (paras_of_discarded or [])
+            if not para_blocks:
+                continue
+            for para_block in para_blocks:
                 para_content = make_blocks_to_content_list(para_block, img_buket_path, page_idx, page_size)
                 output_content.append(para_content)
 

+ 24 - 14
mineru/cli/client.py

@@ -4,6 +4,7 @@ import click
 from pathlib import Path
 from loguru import logger
 
+from mineru.utils.check_mac_env import is_mac_os_version_supported
 from mineru.utils.cli_parser import arg_parse
 from mineru.utils.config_reader import get_device
 from mineru.utils.guess_suffix_or_lang import guess_suffix_by_path
@@ -11,6 +12,11 @@ from mineru.utils.model_utils import get_vram
 from ..version import __version__
 from .common import do_parse, read_fn, pdf_suffixes, image_suffixes
 
+
+backends = ['pipeline', 'vlm-transformers', 'vlm-vllm-engine', 'vlm-http-client']
+if is_mac_os_version_supported():
+    backends.append("vlm-mlx-engine")
+
 @click.command(context_settings=dict(ignore_unknown_options=True, allow_extra_args=True))
 @click.pass_context
 @click.version_option(__version__,
@@ -38,25 +44,28 @@ from .common import do_parse, read_fn, pdf_suffixes, image_suffixes
     '--method',
     'method',
     type=click.Choice(['auto', 'txt', 'ocr']),
-    help="""the method for parsing pdf:
-    auto: Automatically determine the method based on the file type.
-    txt: Use text extraction method.
-    ocr: Use OCR method for image-based PDFs.
+    help="""\b
+    the method for parsing pdf:
+      auto: Automatically determine the method based on the file type.
+      txt: Use text extraction method.
+      ocr: Use OCR method for image-based PDFs.
     Without method specified, 'auto' will be used by default.
-    Adapted only for the case where the backend is set to "pipeline".""",
+    Adapted only for the case where the backend is set to 'pipeline'.""",
     default='auto',
 )
 @click.option(
     '-b',
     '--backend',
     'backend',
-    type=click.Choice(['pipeline', 'vlm-transformers', 'vlm-vllm-engine', 'vlm-http-client']),
-    help="""the backend for parsing pdf:
-    pipeline: More general.
-    vlm-transformers: More general.
-    vlm-vllm-engine: Faster(engine).
-    vlm-http-client: Faster(client).
-    without method specified, pipeline will be used by default.""",
+    type=click.Choice(backends),
+    help="""\b
+    the backend for parsing pdf:
+      pipeline: More general.
+      vlm-transformers: More general, but slower.
+      vlm-mlx-engine: Faster than transformers.
+      vlm-vllm-engine: Faster(engine).
+      vlm-http-client: Faster(client).
+    Without method specified, pipeline will be used by default.""",
     default='pipeline',
 )
 @click.option(
@@ -66,7 +75,7 @@ from .common import do_parse, read_fn, pdf_suffixes, image_suffixes
     type=click.Choice(['ch', 'ch_server', 'ch_lite', 'en', 'korean', 'japan', 'chinese_cht', 'ta', 'te', 'ka', 'th', 'el',
                        'latin', 'arabic', 'east_slavic', 'cyrillic', 'devanagari']),
     help="""
-    Input the languages in the pdf (if known) to improve OCR accuracy.  Optional.
+    Input the languages in the pdf (if known) to improve OCR accuracy.
     Without languages specified, 'ch' will be used by default.
     Adapted only for the case where the backend is set to "pipeline".
     """,
@@ -119,7 +128,8 @@ from .common import do_parse, read_fn, pdf_suffixes, image_suffixes
     '--device',
     'device_mode',
     type=str,
-    help='Device mode for model inference, e.g., "cpu", "cuda", "cuda:0", "npu", "npu:0", "mps". Adapted only for the case where the backend is set to "pipeline". ',
+    help="""Device mode for model inference, e.g., "cpu", "cuda", "cuda:0", "npu", "npu:0", "mps".
+         Adapted only for the case where the backend is set to "pipeline" and "vlm-transformers". """,
     default=None,
 )
 @click.option(

+ 4 - 1
mineru/cli/gradio_app.py

@@ -13,6 +13,7 @@ from gradio_pdf import PDF
 from loguru import logger
 
 from mineru.cli.common import prepare_env, read_fn, aio_do_parse, pdf_suffixes, image_suffixes
+from mineru.utils.check_mac_env import is_mac_os_version_supported
 from mineru.utils.cli_parser import arg_parse
 from mineru.utils.hash_utils import str_sha256
 
@@ -273,7 +274,7 @@ def to_pdf(file_path):
 
 # 更新界面函数
 def update_interface(backend_choice):
-    if backend_choice in ["vlm-transformers", "vlm-vllm-async-engine"]:
+    if backend_choice in ["vlm-transformers", "vlm-vllm-async-engine", "vlm-mlx-engine"]:
         return gr.update(visible=False), gr.update(visible=False)
     elif backend_choice in ["vlm-http-client"]:
         return gr.update(visible=True), gr.update(visible=False)
@@ -381,6 +382,8 @@ def main(ctx,
                         preferred_option = "vlm-vllm-async-engine"
                     else:
                         drop_list = ["pipeline", "vlm-transformers", "vlm-http-client"]
+                        if is_mac_os_version_supported():
+                            drop_list.append("vlm-mlx-engine")
                         preferred_option = "pipeline"
                     backend = gr.Dropdown(drop_list, label="Backend", value=preferred_option)
                 with gr.Row(visible=False) as client_options:

+ 1 - 1
mineru/cli/models_download.py

@@ -21,7 +21,7 @@ def download_and_modify_json(url, local_filename, modifications):
     if os.path.exists(local_filename):
         data = json.load(open(local_filename))
         config_version = data.get('config_version', '0.0.0')
-        if config_version < '1.3.0':
+        if config_version < '1.3.1':
             data = download_json(url)
     else:
         data = download_json(url)

+ 0 - 1
mineru/model/ocr/paddleocr2pytorch/__init__.py

@@ -1 +0,0 @@
-# Copyright (c) Opendatalab. All rights reserved.

+ 1 - 1
mineru/model/ocr/paddleocr2pytorch/pytorch_paddle.py → mineru/model/ocr/pytorch_paddle.py

@@ -134,7 +134,7 @@ def get_model_params(lang, config):
         raise Exception (f'Language {lang} not supported')
 
 
-root_dir = os.path.join(Path(__file__).resolve().parent.parent.parent, 'utils')
+root_dir = os.path.join(Path(__file__).resolve().parent.parent, 'utils')
 
 
 class PytorchPaddleOCR(TextSystem):

+ 1 - 1
mineru/model/table/rec/RapidTable.py

@@ -11,7 +11,7 @@ from rapid_table import ModelType, RapidTable, RapidTableInput
 from rapid_table.utils import RapidTableOutput
 from tqdm import tqdm
 
-from mineru.model.ocr.paddleocr2pytorch.pytorch_paddle import PytorchPaddleOCR
+from mineru.model.ocr.pytorch_paddle import PytorchPaddleOCR
 from mineru.utils.enum_class import ModelPath
 from mineru.utils.models_download_utils import auto_download_and_get_model_root_path
 

+ 3 - 2
mineru/utils/block_sort.py

@@ -179,13 +179,14 @@ def insert_lines_into_block(block_bbox, line_height, page_w, page_h):
 def model_init(model_name: str):
     from transformers import LayoutLMv3ForTokenClassification
     device_name = get_device()
+    device = torch.device(device_name)
     bf_16_support = False
     if device_name.startswith("cuda"):
-        bf_16_support = torch.cuda.is_bf16_supported()
+        if torch.cuda.get_device_properties(device).major >= 8:
+            bf_16_support = True
     elif device_name.startswith("mps"):
         bf_16_support = True
 
-    device = torch.device(device_name)
     if model_name == 'layoutreader':
         # 检测modelscope的缓存目录是否存在
         layoutreader_model_dir = os.path.join(auto_download_and_get_model_root_path(ModelPath.layout_reader), ModelPath.layout_reader)

+ 30 - 0
mineru/utils/check_mac_env.py

@@ -0,0 +1,30 @@
+# Copyright (c) Opendatalab. All rights reserved.
+import platform
+
+from packaging import version
+
+
+# Detect if the current environment is a Mac computer
+def is_mac_environment() -> bool:
+    return platform.system() == "Darwin"
+
+
+# Detect if CPU is Apple Silicon architecture
+def is_apple_silicon_cpu() -> bool:
+    return platform.machine() in ["arm64", "aarch64"]
+
+
+# If Mac computer with Apple Silicon architecture, check if macOS version is 13.5 or above
+def is_mac_os_version_supported(min_version: str = "13.5") -> bool:
+    if not is_mac_environment() or not is_apple_silicon_cpu():
+        return False
+    mac_version = platform.mac_ver()[0]
+    if not mac_version:
+        return False
+    # print("Mac OS Version:", mac_version)
+    return version.parse(mac_version) >= version.parse(min_version)
+
+if __name__ == "__main__":
+    print("Is Mac Environment:", is_mac_environment())
+    print("Is Apple Silicon CPU:", is_apple_silicon_cpu())
+    print("Is Mac OS Version Supported (>=13.5):", is_mac_os_version_supported())

+ 13 - 8
mineru/utils/llm_aided.py

@@ -84,16 +84,21 @@ Corrected title list:
     max_retries = 3
     dict_completion = None
 
+    # Build API call parameters
+    api_params = {
+        "model": title_aided_config["model"],
+        "messages": [{'role': 'user', 'content': title_optimize_prompt}],
+        "temperature": 0.7,
+        "stream": True,
+    }
+
+    # Only add extra_body when explicitly specified in config
+    if "enable_thinking" in title_aided_config:
+        api_params["extra_body"] = {"enable_thinking": title_aided_config["enable_thinking"]}
+
     while retry_count < max_retries:
         try:
-            completion = client.chat.completions.create(
-                model=title_aided_config["model"],
-                messages=[
-                    {'role': 'user', 'content': title_optimize_prompt}],
-                extra_body={"enable_thinking": False},
-                temperature=0.7,
-                stream=True,
-            )
+            completion = client.chat.completions.create(**api_params)
             content_pieces = []
             for chunk in completion:
                 if chunk.choices and chunk.choices[0].delta.content is not None:

+ 5 - 1
pyproject.toml

@@ -39,7 +39,7 @@ dependencies = [
     "openai>=1.70.0,<3",
     "beautifulsoup4>=4.13.5,<5",
     "magika>=0.6.2,<0.7.0",
-    "mineru-vl-utils>=0.1.14,<1",
+    "mineru-vl-utils>=0.1.15,<1",
 ]
 
 [project.optional-dependencies]
@@ -58,6 +58,9 @@ vlm = [
 vllm = [
     "vllm>=0.10.1.1,<0.12",
 ]
+mlx = [
+    "mlx-vlm>=0.3.3,<0.4",
+]
 pipeline = [
     "matplotlib>=3.10,<4",
     "ultralytics>=8.3.48,<9",
@@ -87,6 +90,7 @@ core = [
     "mineru[pipeline]",
     "mineru[api]",
     "mineru[gradio]",
+    "mineru[mlx] ; sys_platform == 'darwin'",
 ]
 all = [
     "mineru[core]",