ソースを参照

Merge remote-tracking branch 'origin/dev' into dev

myhloli 2 週間 前
コミット
1c0d4b8bc6

+ 4 - 0
README.md

@@ -44,6 +44,10 @@
 </div>
 
 # Changelog
+- 2025/10/31 2.6.3 Release
+  - Added support for a new backend `vlm-mlx-engine`, enabling MLX-accelerated inference for the MinerU2.5 model on Apple Silicon devices. Compared to the `vlm-transformers` backend, `vlm-mlx-engine` delivers a 100%–200% speed improvement.
+  - Bug fixes: #3849, #3859
+
 - 2025/10/24 2.6.2 Release
   - `pipeline` backend optimizations
     - Added experimental support for Chinese formulas, which can be enabled by setting the environment variable `export MINERU_FORMULA_CH_SUPPORT=1`. This feature may cause a slight decrease in MFR speed and failures in recognizing some long formulas. It is recommended to enable it only when parsing Chinese formulas is needed. To disable this feature, set the environment variable to `0`.

+ 1 - 0
README_zh-CN.md

@@ -46,6 +46,7 @@
 # 更新记录
 - 2025/10/31 2.6.3 发布
   - 增加新后端`vlm-mlx-engine`支持,在Apple Silicon设备上支持使用`MLX`加速`MinerU2.5`模型推理,相比`vlm-transformers`后端,`vlm-mlx-engine`后端速度提升100%~200%。
+  - bug修复:  #3849  #3859
 
 - 2025/10/24 2.6.2 发布
   - `pipline`后端优化

+ 1 - 1
docs/en/index.md

@@ -57,7 +57,7 @@ Compared to well-known commercial products domestically and internationally, Min
 - Automatically identify and convert formulas in documents to LaTeX format
 - Automatically identify and convert tables in documents to HTML format
 - Automatically detect scanned PDFs and garbled PDFs, and enable OCR functionality
-- OCR supports detection and recognition of 84 languages
+- OCR supports detection and recognition of 109 languages
 - Support multiple output formats, such as multimodal and NLP Markdown, reading-order-sorted JSON, and information-rich intermediate formats
 - Support multiple visualization results, including layout visualization, span visualization, etc., for efficient confirmation of output effects and quality inspection
 - Support pure CPU environment operation, and support GPU(CUDA)/NPU(CANN)/MPS acceleration

+ 20 - 1
docs/en/usage/quick_usage.md

@@ -85,7 +85,26 @@ Here are some available configuration options:
     * Used to configure parameters for LLM-assisted title hierarchy
     * Compatible with all LLM models supporting `openai protocol`, defaults to using Alibaba Cloud Bailian's `qwen3-next-80b-a3b-instruct` model. 
     * You need to configure your own API key and set `enable` to `true` to enable this feature.
-    * If your API provider does not support the enable_thinking parameter, please manually remove it.
+    * If your API provider does not support the `enable_thinking` parameter, please manually remove it.
+        * For example, in your configuration file, the `llm-aided-config` section may look like:
+          ```json
+          "llm-aided-config": {
+             "api_key": "your_api_key",
+             "base_url": "https://dashscope.aliyuncs.com/compatible-mode/v1",
+             "model": "qwen3-next-80b-a3b-instruct",
+             "enable_thinking": false,
+             "enable": false
+          }
+          ```
+        * To remove the `enable_thinking` parameter, simply delete the line containing `"enable_thinking": false`, resulting in:
+          ```json
+          "llm-aided-config": {
+             "api_key": "your_api_key",
+             "base_url": "https://dashscope.aliyuncs.com/compatible-mode/v1",
+             "model": "qwen3-next-80b-a3b-instruct",
+             "enable": false
+          }
+          ```
   
 - `models-dir`: 
     * Used to specify local model storage directory

+ 1 - 1
docs/zh/index.md

@@ -56,7 +56,7 @@ MinerU诞生于[书生-浦语](https://github.com/InternLM/InternLM)的预训练
 - 自动识别并转换文档中的公式为LaTeX格式
 - 自动识别并转换文档中的表格为HTML格式
 - 自动检测扫描版PDF和乱码PDF,并启用OCR功能
-- OCR支持84种语言的检测与识别
+- OCR支持109种语言的检测与识别
 - 支持多种输出格式,如多模态与NLP的Markdown、按阅读顺序排序的JSON、含有丰富信息的中间格式等
 - 支持多种可视化结果,包括layout可视化、span可视化等,便于高效确认输出效果与质检
 - 支持纯CPU环境运行,并支持 GPU(CUDA)/NPU(CANN)/MPS 加速

+ 19 - 0
docs/zh/usage/quick_usage.md

@@ -85,6 +85,25 @@ MinerU 现已实现开箱即用,但也支持通过配置文件扩展功能。
     * 默认使用`阿里云百炼`的`qwen3-next-80b-a3b-instruct`模型
     * 您需要自行配置 API 密钥并将`enable`设置为`true`来启用此功能
     * 如果您的api供应商不支持`enable_thinking`参数,请手动将该参数删除
+        * 例如,在您的配置文件中,`llm-aided-config` 部分可能如下所示:
+          ```json
+          "llm-aided-config": {
+             "api_key": "your_api_key",
+             "base_url": "https://dashscope.aliyuncs.com/compatible-mode/v1",
+             "model": "qwen3-next-80b-a3b-instruct",
+             "enable_thinking": false,
+             "enable": false
+          }
+          ```
+        * 要移除`enable_thinking`参数,只需删除包含`"enable_thinking": false`的那一行,结果如下:
+          ```json
+          "llm-aided-config": {
+             "api_key": "your_api_key",
+             "base_url": "https://dashscope.aliyuncs.com/compatible-mode/v1",
+             "model": "qwen3-next-80b-a3b-instruct",
+             "enable": false
+          }
+          ```
   
 - `models-dir`:
     * 用于指定本地模型存储目录,请为`pipeline`和`vlm`后端分别指定模型目录,

+ 3 - 2
mineru/backend/pipeline/pipeline_middle_json_mkcontent.py

@@ -286,9 +286,10 @@ def union_make(pdf_info_dict: list,
             page_markdown = make_blocks_to_markdown(paras_of_layout, make_mode, img_buket_path)
             output_content.extend(page_markdown)
         elif make_mode == MakeMode.CONTENT_LIST:
-            if not paras_of_layout + paras_of_discarded:
+            para_blocks = (paras_of_layout or []) + (paras_of_discarded or [])
+            if not para_blocks:
                 continue
-            for para_block in paras_of_layout + paras_of_discarded:
+            for para_block in para_blocks:
                 para_content = make_blocks_to_content_list(para_block, img_buket_path, page_idx, page_size)
                 if para_content:
                     output_content.append(para_content)

+ 3 - 2
mineru/backend/vlm/vlm_middle_json_mkcontent.py

@@ -254,9 +254,10 @@ def union_make(pdf_info_dict: list,
             page_markdown = mk_blocks_to_markdown(paras_of_layout, make_mode, formula_enable, table_enable, img_buket_path)
             output_content.extend(page_markdown)
         elif make_mode == MakeMode.CONTENT_LIST:
-            if not paras_of_layout + paras_of_discarded:
+            para_blocks = (paras_of_layout or []) + (paras_of_discarded or [])
+            if not para_blocks:
                 continue
-            for para_block in paras_of_layout + paras_of_discarded:
+            for para_block in para_blocks:
                 para_content = make_blocks_to_content_list(para_block, img_buket_path, page_idx, page_size)
                 output_content.append(para_content)
 

+ 2 - 0
mineru/utils/check_mac_env.py

@@ -19,6 +19,8 @@ def is_mac_os_version_supported(min_version: str = "13.5") -> bool:
     if not is_mac_environment() or not is_apple_silicon_cpu():
         return False
     mac_version = platform.mac_ver()[0]
+    if not mac_version:
+        return False
     # print("Mac OS Version:", mac_version)
     return version.parse(mac_version) >= version.parse(min_version)