Pārlūkot izejas kodu

Merge pull request #3886 from myhloli/dev

Dev
Xiaomeng Zhao 2 nedēļas atpakaļ
vecāks
revīzija
ef71228e1a

+ 3 - 3
.github/ISSUE_TEMPLATE/bug_report.yml

@@ -122,9 +122,9 @@ body:
       #multiple: false
       options:
         -
-        - "<2.2.0"
-        - "2.2.x"
-        - ">=2.5"
+        - "`<2.2.0`"
+        - "`2.2.x`"
+        - "`>=2.5`"
     validations:
       required: true
 

+ 11 - 7
README.md

@@ -44,6 +44,10 @@
 </div>
 
 # Changelog
+- 2025/10/31 2.6.3 Release
+  - Added support for a new backend `vlm-mlx-engine`, enabling MLX-accelerated inference for the MinerU2.5 model on Apple Silicon devices. Compared to the `vlm-transformers` backend, `vlm-mlx-engine` delivers a 100%–200% speed improvement.
+  - Bug fixes: #3849, #3859
+
 - 2025/10/24 2.6.2 Release
   - `pipeline` backend optimizations
     - Added experimental support for Chinese formulas, which can be enabled by setting the environment variable `export MINERU_FORMULA_CH_SUPPORT=1`. This feature may cause a slight decrease in MFR speed and failures in recognizing some long formulas. It is recommended to enable it only when parsing Chinese formulas is needed. To disable this feature, set the environment variable to `0`.
@@ -583,7 +587,7 @@ https://github.com/user-attachments/assets/4bea02c9-6d54-4cd6-97ed-dff14340982c
 - Automatically recognize and convert formulas in the document to LaTeX format.
 - Automatically recognize and convert tables in the document to HTML format.
 - Automatically detect scanned PDFs and garbled PDFs and enable OCR functionality.
-- OCR supports detection and recognition of 84 languages.
+- OCR supports detection and recognition of 109 languages.
 - Supports multiple output formats, such as multimodal and NLP Markdown, JSON sorted by reading order, and rich intermediate formats.
 - Supports various visualization results, including layout visualization and span visualization, for efficient confirmation of output quality.
 - Supports running in a pure CPU environment, and also supports GPU(CUDA)/NPU(CANN)/MPS acceleration
@@ -640,7 +644,7 @@ A WebUI developed based on Gradio, with a simple interface and only core parsing
             <td>Good compatibility, <br>but slower</td>
             <td>Faster than transformers</td>
             <td>Fast, compatible with the vLLM ecosystem</td>
-            <td>No configuration required, suitable for OpenAI-compatible servers<sup>5</sup></td>
+            <td>Suitable for OpenAI-compatible servers<sup>5</sup></td>
         </tr>
         <tr>
             <th>Operating System</th>
@@ -678,11 +682,11 @@ A WebUI developed based on Gradio, with a simple interface and only core parsing
     </tbody>
 </table>
  
-<sup>1</sup> Accuracy metric is the End-to-End Evaluation Overall score of OmniDocBench (v1.5)  
-<sup>2</sup> Linux supports only distributions released in 2019 or later  
-<sup>3</sup> Requires macOS 13.5 or later  
-<sup>4</sup> Windows vLLM support via WSL2  
-<sup>5</sup> Servers compatible with the OpenAI API, such as `vLLM`/`SGLang`/`LMDeploy`, etc.
+<sup>1</sup> Accuracy metric is the End-to-End Evaluation Overall score of OmniDocBench (v1.5), tested on the latest `MinerU` version.   
+<sup>2</sup> Linux supports only distributions released in 2019 or later.  
+<sup>3</sup> MLX requires macOS 13.5 or later, recommended for use with version 14.0 or higher.
+<sup>4</sup> Windows vLLM support via WSL2(Windows Subsystem for Linux).  
+<sup>5</sup> Servers compatible with the OpenAI API, such as local or remote model services deployed via inference frameworks like `vLLM`, `SGLang`, or `LMDeploy`.
 
 
 ### Install MinerU

+ 11 - 7
README_zh-CN.md

@@ -44,6 +44,10 @@
 </div>
 
 # 更新记录
+- 2025/10/31 2.6.3 发布
+  - 增加新后端`vlm-mlx-engine`支持,在Apple Silicon设备上支持使用`MLX`加速`MinerU2.5`模型推理,相比`vlm-transformers`后端,`vlm-mlx-engine`后端速度提升100%~200%。
+  - bug修复:  #3849  #3859
+
 - 2025/10/24 2.6.2 发布
   - `pipline`后端优化
     - 增加对中文公式的实验性支持,可通过配置环境变量`export MINERU_FORMULA_CH_SUPPORT=1`开启。该功能可能会导致MFR速率略微下降、部分长公式识别失败等问题,建议仅在需要解析中文公式的场景下开启。如需关闭该功能,可将环境变量设置为`0`。
@@ -570,7 +574,7 @@ https://github.com/user-attachments/assets/4bea02c9-6d54-4cd6-97ed-dff14340982c
 - 自动识别并转换文档中的公式为LaTeX格式
 - 自动识别并转换文档中的表格为HTML格式
 - 自动检测扫描版PDF和乱码PDF,并启用OCR功能
-- OCR支持84种语言的检测与识别
+- OCR支持109种语言的检测与识别
 - 支持多种输出格式,如多模态与NLP的Markdown、按阅读顺序排序的JSON、含有丰富信息的中间格式等
 - 支持多种可视化结果,包括layout可视化、span可视化等,便于高效确认输出效果与质检
 - 支持纯CPU环境运行,并支持 GPU(CUDA)/NPU(CANN)/MPS 加速
@@ -627,7 +631,7 @@ https://github.com/user-attachments/assets/4bea02c9-6d54-4cd6-97ed-dff14340982c
             <td>兼容性好, 速度较慢</td>
             <td>比transformers快</td>
             <td>速度快, 兼容vllm生态</td>
-            <td>无配置要求, 适用于OpenAI兼容服务器<sup>5</sup></td>
+            <td>适用于OpenAI兼容服务器<sup>5</sup></td>
         </tr>
         <tr>
             <th>操作系统</th>
@@ -665,11 +669,11 @@ https://github.com/user-attachments/assets/4bea02c9-6d54-4cd6-97ed-dff14340982c
     </tbody>
 </table> 
 
-<sup>1</sup> 精度指标为OmniDocBench (v1.5)的End-to-End Evaluation Overall分数  
-<sup>2</sup> Linux仅支持2019年及以后发行版  
-<sup>3</sup> 需macOS 13.5及以上版本  
-<sup>4</sup> 通过WSL2实现Windows vLLM支持  
-<sup>5</sup> 兼容OpenAI API的服务器,如`vLLM`/`SGLang`/`LMDeploy`等
+<sup>1</sup> 精度指标为OmniDocBench (v1.5)的End-to-End Evaluation Overall分数,基于`MinerU`最新版本测试  
+<sup>2</sup> Linux仅支持2019年及以后发行版
+<sup>3</sup> MLX需macOS 13.5及以上版本支持,推荐14.0以上版本使用
+<sup>4</sup> Windows vLLM通过WSL2(适用于 Linux 的 Windows 子系统)实现支持  
+<sup>5</sup> 兼容OpenAI API的服务器,如通过`vLLM`/`SGLang`/`LMDeploy`等推理框架部署的本地模型服务器或远程模型服务
 
 > [!TIP]
 > 除以上主流环境与平台外,我们也收录了一些社区用户反馈的其他平台支持情况,详情请参考[其他加速卡适配](https://opendatalab.github.io/MinerU/zh/usage/)。  

+ 62 - 34
docs/en/quick_start/index.md

@@ -27,41 +27,69 @@ A WebUI developed based on Gradio, with a simple interface and only core parsing
 > In non-mainstream environments, due to the diversity of hardware and software configurations, as well as compatibility issues with third-party dependencies, we cannot guarantee 100% usability of the project. Therefore, for users who wish to use this project in non-recommended environments, we suggest carefully reading the documentation and FAQ first, as most issues have corresponding solutions in the FAQ. Additionally, we encourage community feedback on issues so that we can gradually expand our support range.
 
 <table border="1">
-    <tr>
-        <td>Parsing Backend</td>
-        <td>pipeline</td>
-        <td>vlm-transformers</td>
-        <td>vlm-vllm</td>
-    </tr>
-    <tr>
-        <td>Operating System</td>
-        <td>Linux / Windows / macOS</td>
-        <td>Linux / Windows</td>
-        <td>Linux / Windows (via WSL2)</td>
-    </tr>
-    <tr>
-        <td>CPU Inference Support</td>
-        <td>✅</td>
-        <td colspan="2">❌</td>
-    </tr>
-    <tr>
-        <td>GPU Requirements</td>
-        <td>Turing architecture and later, 6GB+ VRAM or Apple Silicon</td>
-        <td colspan="2">Turing architecture and later, 8GB+ VRAM</td>
-    </tr>
-    <tr>
-        <td>Memory Requirements</td>
-        <td colspan="3">Minimum 16GB+, recommended 32GB+</td>
-    </tr>
-    <tr>
-        <td>Disk Space Requirements</td>
-        <td colspan="3">20GB+, SSD recommended</td>
-    </tr>
-    <tr>
-        <td>Python Version</td>
-        <td colspan="3">3.10-3.13</td>
-    </tr>
+    <thead>
+        <tr>
+            <th rowspan="2">Parsing Backend</th>
+            <th rowspan="2">pipeline <br> (Accuracy<sup>1</sup> 82+)</th>
+            <th colspan="4">vlm (Accuracy<sup>1</sup> 90+)</th>
+        </tr>
+        <tr>
+            <th>transformers</th>
+            <th>mlx-engine</th>
+            <th>vllm-engine / <br>vllm-async-engine</th>
+            <th>http-client</th>
+        </tr>
+    </thead>
+    <tbody>
+        <tr>
+            <th>Backend Features</th>
+            <td>Fast, no hallucinations</td>
+            <td>Good compatibility, <br>but slower</td>
+            <td>Faster than transformers</td>
+            <td>Fast, compatible with the vLLM ecosystem</td>
+            <td>Suitable for OpenAI-compatible servers<sup>5</sup></td>
+        </tr>
+        <tr>
+            <th>Operating System</th>
+            <td colspan="2" style="text-align:center;">Linux<sup>2</sup> / Windows / macOS</td>
+            <td style="text-align:center;">macOS<sup>3</sup></td>
+            <td style="text-align:center;">Linux<sup>2</sup> / Windows<sup>4</sup> </td>
+            <td>Any</td>
+        </tr>
+        <tr>
+            <th>CPU inference support</th>
+            <td colspan="2" style="text-align:center;">✅</td>
+            <td colspan="2" style="text-align:center;">❌</td>
+            <td>Not required</td>
+        </tr>
+        <tr>
+            <th>GPU Requirements</th><td colspan="2" style="text-align:center;">Volta or later architectures, 6 GB VRAM or more, or Apple Silicon</td>
+            <td>Apple Silicon</td>
+            <td>Volta or later architectures, 8 GB VRAM or more</td>
+            <td>Not required</td>
+        </tr>
+        <tr>
+            <th>Memory Requirements</th>
+            <td colspan="4" style="text-align:center;">Minimum 16 GB, 32 GB recommended</td>
+            <td>8 GB</td>
+        </tr>
+        <tr>
+            <th>Disk Space Requirements</th>
+            <td colspan="4" style="text-align:center;">20 GB or more, SSD recommended</td>
+            <td>2 GB</td>
+        </tr>
+        <tr>
+            <th>Python Version</th>
+            <td colspan="5" style="text-align:center;">3.10-3.13</td>
+        </tr>
+    </tbody>
 </table>
+ 
+<sup>1</sup> Accuracy metric is the End-to-End Evaluation Overall score of OmniDocBench (v1.5), tested on the latest `MinerU` version.   
+<sup>2</sup> Linux supports only distributions released in 2019 or later.  
+<sup>3</sup> MLX requires macOS 13.5 or later, recommended for use with version 14.0 or higher.  
+<sup>4</sup> Windows vLLM support via WSL2(Windows Subsystem for Linux).  
+<sup>5</sup> Servers compatible with the OpenAI API, such as local or remote model services deployed via inference frameworks like `vLLM`, `SGLang`, or `LMDeploy`.  
 
 ### Install MinerU
 

+ 2 - 1
docs/en/usage/quick_usage.md

@@ -83,8 +83,9 @@ Here are some available configuration options:
   
 - `llm-aided-config`:
     * Used to configure parameters for LLM-assisted title hierarchy
-    * Compatible with all LLM models supporting `openai protocol`, defaults to using Alibaba Cloud Bailian's `qwen2.5-32b-instruct` model. 
+    * Compatible with all LLM models supporting `openai protocol`, defaults to using Alibaba Cloud Bailian's `qwen3-next-80b-a3b-instruct` model. 
     * You need to configure your own API key and set `enable` to `true` to enable this feature.
+    * If your API provider does not support the `enable_thinking` parameter, please manually remove it.
   
 - `models-dir`: 
     * Used to specify local model storage directory

+ 69 - 36
docs/zh/quick_start/index.md

@@ -26,42 +26,75 @@
 >
 > 在非主线环境中,由于硬件、软件配置的多样性,以及第三方依赖项的兼容性问题,我们无法100%保证项目的完全可用性。因此,对于希望在非推荐环境中使用本项目的用户,我们建议先仔细阅读文档以及FAQ,大多数问题已经在FAQ中有对应的解决方案,除此之外我们鼓励社区反馈问题,以便我们能够逐步扩大支持范围。
 
-<table border="1">
-    <tr>
-        <td>解析后端</td>
-        <td>pipeline</td>
-        <td>vlm-transformers</td>
-        <td>vlm-vllm</td>
-    </tr>
-    <tr>
-        <td>操作系统</td>
-        <td>Linux / Windows / macOS</td>
-        <td>Linux / Windows</td>
-        <td>Linux / Windows (via WSL2)</td>
-    </tr>
-    <tr>
-        <td>CPU推理支持</td>
-        <td>✅</td>
-        <td colspan="2">❌</td>
-    </tr>
-    <tr>
-        <td>GPU要求</td>
-        <td>Turing及以后架构,6G显存以上或Apple Silicon</td>
-        <td colspan="2">Turing及以后架构,8G显存以上</td>
-    </tr>
-    <tr>
-        <td>内存要求</td>
-        <td colspan="3">最低16G以上,推荐32G以上</td>
-    </tr>
-    <tr>
-        <td>磁盘空间要求</td>
-        <td colspan="3">20G以上,推荐使用SSD</td>
-    </tr>
-    <tr>
-        <td>python版本</td>
-        <td colspan="3">3.10-3.13</td>
-    </tr>
-</table>
+<table>
+    <thead>
+        <tr>
+            <th rowspan="2">解析后端</th>
+            <th rowspan="2">pipeline <br> (精度<sup>1</sup> 82+)</th>
+            <th colspan="4">vlm (精度<sup>1</sup> 90+)</th>
+        </tr>
+        <tr>
+            <th>transformers</th>
+            <th>mlx-engine</th>
+            <th>vllm-engine / <br>vllm-async-engine</th>
+            <th>http-client</th>
+        </tr>
+    </thead>
+    <tbody>
+        <tr>
+            <th>后端特性</th>
+            <td>速度快, 无幻觉</td>
+            <td>兼容性好, 速度较慢</td>
+            <td>比transformers快</td>
+            <td>速度快, 兼容vllm生态</td>
+            <td>适用于OpenAI兼容服务器<sup>5</sup></td>
+        </tr>
+        <tr>
+            <th>操作系统</th>
+            <td colspan="2" style="text-align:center;">Linux<sup>2</sup> / Windows / macOS</td>
+            <td style="text-align:center;">macOS<sup>3</sup></td>
+            <td style="text-align:center;">Linux<sup>2</sup> / Windows<sup>4</sup> </td>
+            <td>不限</td>
+        </tr>
+        <tr>
+            <th>CPU推理支持</th>
+            <td colspan="2" style="text-align:center;">✅</td>
+            <td colspan="2" style="text-align:center;">❌</td>
+            <td >不需要</td>
+        </tr>
+        <tr>
+            <th>GPU要求</th><td colspan="2" style="text-align:center;">Volta及以后架构, 6G显存以上或Apple Silicon</td>
+            <td>Apple Silicon</td>
+            <td>Volta及以后架构, 8G显存以上</td>
+            <td>不需要</td>
+        </tr>
+        <tr>
+            <th>内存要求</th>
+            <td colspan="4" style="text-align:center;">最低16GB以上, 推荐32GB以上</td>
+            <td>8GB</td>
+        </tr>
+        <tr>
+            <th>磁盘空间要求</th>
+            <td colspan="4" style="text-align:center;">20GB以上, 推荐使用SSD</td>
+            <td>2GB</td>
+        </tr>
+        <tr>
+            <th>python版本</th>
+            <td colspan="5" style="text-align:center;">3.10-3.13</td>
+        </tr>
+    </tbody>
+</table> 
+
+<sup>1</sup> 精度指标为OmniDocBench (v1.5)的End-to-End Evaluation Overall分数,基于`MinerU`最新版本测试  
+<sup>2</sup> Linux仅支持2019年及以后发行版  
+<sup>3</sup> MLX需macOS 13.5及以上版本支持,推荐14.0以上版本使用  
+<sup>4</sup> Windows vLLM通过WSL2(适用于 Linux 的 Windows 子系统)实现支持  
+<sup>5</sup> 兼容OpenAI API的服务器,如通过`vLLM`/`SGLang`/`LMDeploy`等推理框架部署的本地模型服务器或远程模型服务  
+
+> [!TIP]
+> 除以上主流环境与平台外,我们也收录了一些社区用户反馈的其他平台支持情况,详情请参考[其他加速卡适配](https://opendatalab.github.io/MinerU/zh/usage/)。  
+> 如果您有意将自己的环境适配经验分享给社区,欢迎通过[show-and-tell](https://github.com/opendatalab/MinerU/discussions/categories/show-and-tell)提交或提交PR至[其他加速卡适配](https://github.com/opendatalab/MinerU/tree/master/docs/zh/usage/acceleration_cards)文档。
+
 
 ### 安装 MinerU
 

+ 3 - 2
docs/zh/usage/quick_usage.md

@@ -82,8 +82,9 @@ MinerU 现已实现开箱即用,但也支持通过配置文件扩展功能。
   
 - `llm-aided-config`:
     * 用于配置 LLM 辅助标题分级的相关参数,兼容所有支持`openai协议`的 LLM 模型
-    * 默认使用`阿里云百炼`的`qwen2.5-32b-instruct`模型
-    * 您需要自行配置 API 密钥并将`enable`设置为`true`来启用此功能。
+    * 默认使用`阿里云百炼`的`qwen3-next-80b-a3b-instruct`模型
+    * 您需要自行配置 API 密钥并将`enable`设置为`true`来启用此功能
+    * 如果您的api供应商不支持`enable_thinking`参数,请手动将该参数删除
   
 - `models-dir`:
     * 用于指定本地模型存储目录,请为`pipeline`和`vlm`后端分别指定模型目录,

+ 20 - 43
mineru/backend/pipeline/batch_analyze.py

@@ -281,28 +281,20 @@ class BatchAnalyze:
 
                 # 按分辨率分组并同时完成padding
                 # RESOLUTION_GROUP_STRIDE = 32
-                RESOLUTION_GROUP_STRIDE = 64  # 定义分辨率分组的步进值
+                RESOLUTION_GROUP_STRIDE = 64
 
                 resolution_groups = defaultdict(list)
                 for crop_info in lang_crop_list:
                     cropped_img = crop_info[0]
                     h, w = cropped_img.shape[:2]
-                    # 使用更大的分组容差,减少分组数量
-                    # 将尺寸标准化到32的倍数
-                    normalized_h = ((h + RESOLUTION_GROUP_STRIDE - 1) // RESOLUTION_GROUP_STRIDE) * RESOLUTION_GROUP_STRIDE  # 向上取整到32的倍数
-                    normalized_w = ((w + RESOLUTION_GROUP_STRIDE - 1) // RESOLUTION_GROUP_STRIDE) * RESOLUTION_GROUP_STRIDE
-                    group_key = (normalized_h, normalized_w)
+                    # 直接计算目标尺寸并用作分组键
+                    target_h = ((h + RESOLUTION_GROUP_STRIDE - 1) // RESOLUTION_GROUP_STRIDE) * RESOLUTION_GROUP_STRIDE
+                    target_w = ((w + RESOLUTION_GROUP_STRIDE - 1) // RESOLUTION_GROUP_STRIDE) * RESOLUTION_GROUP_STRIDE
+                    group_key = (target_h, target_w)
                     resolution_groups[group_key].append(crop_info)
 
                 # 对每个分辨率组进行批处理
-                for group_key, group_crops in tqdm(resolution_groups.items(), desc=f"OCR-det {lang}"):
-
-                    # 计算目标尺寸(组内最大尺寸,向上取整到32的倍数)
-                    max_h = max(crop_info[0].shape[0] for crop_info in group_crops)
-                    max_w = max(crop_info[0].shape[1] for crop_info in group_crops)
-                    target_h = ((max_h + RESOLUTION_GROUP_STRIDE - 1) // RESOLUTION_GROUP_STRIDE) * RESOLUTION_GROUP_STRIDE
-                    target_w = ((max_w + RESOLUTION_GROUP_STRIDE - 1) // RESOLUTION_GROUP_STRIDE) * RESOLUTION_GROUP_STRIDE
-
+                for (target_h, target_w), group_crops in tqdm(resolution_groups.items(), desc=f"OCR-det {lang}"):
                     # 对所有图像进行padding到统一尺寸
                     batch_images = []
                     for crop_info in group_crops:
@@ -310,49 +302,34 @@ class BatchAnalyze:
                         h, w = img.shape[:2]
                         # 创建目标尺寸的白色背景
                         padded_img = np.ones((target_h, target_w, 3), dtype=np.uint8) * 255
-                        # 将原图像粘贴到左上角
                         padded_img[:h, :w] = img
                         batch_images.append(padded_img)
 
                     # 批处理检测
-                    det_batch_size = min(len(batch_images), self.batch_ratio * OCR_DET_BASE_BATCH_SIZE)  # 增加批处理大小
-                    # logger.debug(f"OCR-det batch: {det_batch_size} images, target size: {target_h}x{target_w}")
+                    det_batch_size = min(len(batch_images), self.batch_ratio * OCR_DET_BASE_BATCH_SIZE)
                     batch_results = ocr_model.text_detector.batch_predict(batch_images, det_batch_size)
 
                     # 处理批处理结果
-                    for i, (crop_info, (dt_boxes, elapse)) in enumerate(zip(group_crops, batch_results)):
+                    for crop_info, (dt_boxes, _) in zip(group_crops, batch_results):
                         bgr_image, useful_list, ocr_res_list_dict, res, adjusted_mfdetrec_res, _lang = crop_info
 
                         if dt_boxes is not None and len(dt_boxes) > 0:
-                            # 直接应用原始OCR流程中的关键处理步骤
-
-                            # 1. 排序检测框
-                            if len(dt_boxes) > 0:
-                                dt_boxes_sorted = sorted_boxes(dt_boxes)
-                            else:
-                                dt_boxes_sorted = []
-
-                            # 2. 合并相邻检测框
-                            if dt_boxes_sorted:
-                                dt_boxes_merged = merge_det_boxes(dt_boxes_sorted)
-                            else:
-                                dt_boxes_merged = []
-
-                            # 3. 根据公式位置更新检测框(关键步骤!)
-                            if dt_boxes_merged and adjusted_mfdetrec_res:
-                                dt_boxes_final = update_det_boxes(dt_boxes_merged, adjusted_mfdetrec_res)
-                            else:
-                                dt_boxes_final = dt_boxes_merged
-
-                            # 构造OCR结果格式
-                            ocr_res = [box.tolist() if hasattr(box, 'tolist') else box for box in dt_boxes_final]
-
-                            if ocr_res:
+                            # 处理检测框
+                            dt_boxes_sorted = sorted_boxes(dt_boxes)
+                            dt_boxes_merged = merge_det_boxes(dt_boxes_sorted) if dt_boxes_sorted else []
+
+                            # 根据公式位置更新检测框
+                            dt_boxes_final = (update_det_boxes(dt_boxes_merged, adjusted_mfdetrec_res)
+                                              if dt_boxes_merged and adjusted_mfdetrec_res
+                                              else dt_boxes_merged)
+
+                            if dt_boxes_final:
+                                ocr_res = [box.tolist() if hasattr(box, 'tolist') else box for box in dt_boxes_final]
                                 ocr_result_list = get_ocr_result_list(
                                     ocr_res, useful_list, ocr_res_list_dict['ocr_enable'], bgr_image, _lang
                                 )
-
                                 ocr_res_list_dict['layout_res'].extend(ocr_result_list)
+
         else:
             # 原始单张处理模式
             for ocr_res_list_dict in tqdm(ocr_res_list_all_page, desc="OCR-det Predict"):

+ 4 - 0
mineru/backend/vlm/vlm_analyze.py

@@ -8,6 +8,7 @@ from .utils import enable_custom_logits_processors, set_default_gpu_memory_utili
 from .model_output_to_middle_json import result_to_middle_json
 from ...data.data_reader_writer import DataWriter
 from mineru.utils.pdf_image_tools import load_images_from_pdf
+from ...utils.check_mac_env import is_mac_os_version_supported
 from ...utils.config_reader import get_device
 
 from ...utils.enum_class import ImageType
@@ -76,6 +77,9 @@ class ModelSingleton:
                     if batch_size == 0:
                         batch_size = set_default_batch_size()
                 elif backend == "mlx-engine":
+                    mlx_supported = is_mac_os_version_supported()
+                    if not mlx_supported:
+                        raise EnvironmentError("mlx-engine backend is only supported on macOS 13.5+ with Apple Silicon.")
                     try:
                         from mlx_vlm import load as mlx_load
                     except ImportError:

+ 1 - 1
mineru/model/ocr/pytorch_paddle.py

@@ -134,7 +134,7 @@ def get_model_params(lang, config):
         raise Exception (f'Language {lang} not supported')
 
 
-root_dir = os.path.join(Path(__file__).resolve().parent.parent.parent, 'utils')
+root_dir = os.path.join(Path(__file__).resolve().parent.parent, 'utils')
 
 
 class PytorchPaddleOCR(TextSystem):

+ 1 - 1
pyproject.toml

@@ -39,7 +39,7 @@ dependencies = [
     "openai>=1.70.0,<3",
     "beautifulsoup4>=4.13.5,<5",
     "magika>=0.6.2,<0.7.0",
-    "mineru-vl-utils>=0.1.14,<1",
+    "mineru-vl-utils>=0.1.15,<1",
 ]
 
 [project.optional-dependencies]