пре 3 недеља · 6b2f414438
--- a/.github/ISSUE_TEMPLATE/bug_report.yml
+++ b/.github/ISSUE_TEMPLATE/bug_report.yml
@@ -122,7 +122,21 @@ body:
 
				       #multiple: false
			
 
				       options:
			
 
				         -
			
 
				-        - "2.0.x"
			
 
				+        - "<2.2.0"
			
 
				+        - "2.2.x"
			
 
				+        - ">=2.5"
			
 
				+    validations:
			
 
				+      required: true
			
 
				+
			
 
				+  - type: dropdown
			
 
				+    id: backend_name
			
 
				+    attributes:
			
 
				+      label: Backend name | 解析后端
			
 
				+      #multiple: false
			
 
				+      options:
			
 
				+        -
			
 
				+        - "vlm"
			
 
				+        - "pipeline"
			
 
				     validations:
			
 
				       required: true
			
 
				 
			
--- a/README.md
+++ b/README.md
@@ -1,7 +1,7 @@
 
				 <div align="center" xmlns="http://www.w3.org/1999/html">
			
 
				 <!-- logo -->
			
 
				 <p align="center">
			
 
				-  <img src="docs/images/MinerU-logo.png" width="300px" style="vertical-align:middle;">
			
 
				+  <img src="https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docs/images/MinerU-logo.png" width="300px" style="vertical-align:middle;">
			
 
				 </p>
			
 
				 
			
 
				 <!-- icon -->
			
@@ -44,6 +44,18 @@
 
				 </div>
			
 
				 
			
 
				 # Changelog
			
 
				+- 2025/10/24 2.6.0 Release
			
 
				+  - `pipeline` backend optimizations
			
 
				+    - Added experimental support for Chinese formulas, which can be enabled by setting the environment variable `export MINERU_FORMULA_CH_SUPPORT=1`. This feature may cause a slight decrease in MFR speed and failures in recognizing some long formulas. It is recommended to enable it only when parsing Chinese formulas is needed. To disable this feature, set the environment variable to `0`.
			
 
				+    - `OCR` speed significantly improved by 200%~300%, thanks to the optimization solution provided by @cjsdurj
			
 
				+    - `OCR` models updated to `ppocr-v5` version for Cyrillic, Arabic, Devanagari, Telugu (te), and Tamil (ta) languages, with accuracy improved by over 40% compared to previous models
			
 
				+  - `vlm` backend optimizations
			
 
				+    - `table_caption` and `table_footnote` matching logic optimized to improve the accuracy of table caption and footnote matching and reading order rationality in scenarios with multiple consecutive tables on a page
			
 
				+    - Optimized CPU resource usage during high concurrency when using `vllm` backend, reducing server pressure
			
 
				+    - Adapted to `vllm` version 0.11.0
			
 
				+  - General optimizations
			
 
				+    - Cross-page table merging effect optimized, added support for cross-page continuation table merging, improving table merging effectiveness in multi-column merge scenarios
			
 
				+    - Added environment variable configuration option `MINERU_TABLE_MERGE_ENABLE` for table merging feature. Table merging is enabled by default and can be disabled by setting this variable to `0`
			
 
				 
			
 
				 - 2025/09/26 2.5.4 released
			
 
				   - 🎉🎉 The MinerU2.5 [Technical Report](https://arxiv.org/abs/2509.22186) is now available! We welcome you to read it for a comprehensive overview of its model architecture, training strategy, data engineering and evaluation results.
			
--- a/README_zh-CN.md
+++ b/README_zh-CN.md
@@ -1,7 +1,7 @@
 
				 <div align="center" xmlns="http://www.w3.org/1999/html">
			
 
				 <!-- logo -->
			
 
				 <p align="center">
			
 
				-  <img src="docs/images/MinerU-logo.png" width="300px" style="vertical-align:middle;">
			
 
				+  <img src="https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docs/images/MinerU-logo.png" width="300px" style="vertical-align:middle;">
			
 
				 </p>
			
 
				 
			
 
				 <!-- icon -->
			
@@ -44,7 +44,19 @@
 
				 </div>
			
 
				 
			
 
				 # 更新记录
			
 
				-
			
 
				+- 2025/10/24 2.6.0 发布
			
 
				+  - `pipline`后端优化
			
 
				+    - 增加对中文公式的实验性支持，可通过配置环境变量`export MINERU_FORMULA_CH_SUPPORT=1`开启。该功能可能会导致MFR速率略微下降、部分长公式识别失败等问题，建议仅在需要解析中文公式的场景下开启。如需关闭该功能，可将环境变量设置为`0`。
			
 
				+    - `OCR`速度大幅提升200%~300%，感谢 @cjsdurj 提供的优化方案
			
 
				+    - `OCR`模型更新西里尔文(cyrillic)、阿拉伯文(arabic)、天城文(devanagari)、泰卢固语(te)、泰米尔语(ta)语系至`ppocr-v5`版本，精度相比上代模型提升40%以上
			
 
				+  - `vlm`后端优化
			
 
				+    - `table_caption`、`table_footnote`匹配逻辑优化，提升页内多张连续表场景下的表格标题和脚注的匹配准确率和阅读顺序合理性
			
 
				+    - 优化使用`vllm`后端时高并发时的cpu资源占用，降低服务端压力
			
 
				+    - 适配`vllm`0.11.0版本
			
 
				+  - 通用优化
			
 
				+    - 跨页表格合并效果优化，新增跨页续表合并支持，提升在多列合并场景下的表格合并效果
			
 
				+    - 为表格合并功能增加环境变量配置选项`MINERU_TABLE_MERGE_ENABLE`，表格合并功能默认开启，可通过设置该变量为`0`来关闭表格合并功能
			
 
				+    
			
 
				 - 2025/09/26 2.5.4 发布
			
 
				   - 🎉🎉 MinerU2.5[技术报告](https://arxiv.org/abs/2509.22186)现已发布，欢迎阅读全面了解其模型架构、训练策略、数据工程和评测结果。
			
 
				   - 修复部分`pdf`文件被识别成`ai`文件导致无法解析的问题
			
--- a/docs/assets/images/BISHENG_01.png
+++ b/docs/assets/images/BISHENG_01.png
--- a/docs/assets/images/Cherry_Studio_1.png
+++ b/docs/assets/images/Cherry_Studio_1.png
--- a/docs/assets/images/Cherry_Studio_2.png
+++ b/docs/assets/images/Cherry_Studio_2.png
--- a/docs/assets/images/Cherry_Studio_3.png
+++ b/docs/assets/images/Cherry_Studio_3.png
--- a/docs/assets/images/Cherry_Studio_4.png
+++ b/docs/assets/images/Cherry_Studio_4.png
--- a/docs/assets/images/Cherry_Studio_5.png
+++ b/docs/assets/images/Cherry_Studio_5.png
--- a/docs/assets/images/Cherry_Studio_6.png
+++ b/docs/assets/images/Cherry_Studio_6.png
--- a/docs/assets/images/Cherry_Studio_7.png
+++ b/docs/assets/images/Cherry_Studio_7.png
--- a/docs/assets/images/Cherry_Studio_8.png
+++ b/docs/assets/images/Cherry_Studio_8.png
--- a/docs/assets/images/Coze_1.png
+++ b/docs/assets/images/Coze_1.png
--- a/docs/assets/images/Coze_10.png
+++ b/docs/assets/images/Coze_10.png
--- a/docs/assets/images/Coze_11.png
+++ b/docs/assets/images/Coze_11.png
--- a/docs/assets/images/Coze_12.png
+++ b/docs/assets/images/Coze_12.png
--- a/docs/assets/images/Coze_13.png
+++ b/docs/assets/images/Coze_13.png
--- a/docs/assets/images/Coze_14.png
+++ b/docs/assets/images/Coze_14.png
--- a/docs/assets/images/Coze_15.png
+++ b/docs/assets/images/Coze_15.png
--- a/docs/assets/images/Coze_16.png
+++ b/docs/assets/images/Coze_16.png
--- a/docs/assets/images/Coze_17.png
+++ b/docs/assets/images/Coze_17.png
--- a/docs/assets/images/Coze_18.png
+++ b/docs/assets/images/Coze_18.png
--- a/docs/assets/images/Coze_19.png
+++ b/docs/assets/images/Coze_19.png
--- a/docs/assets/images/Coze_2.png
+++ b/docs/assets/images/Coze_2.png
--- a/docs/assets/images/Coze_20.png
+++ b/docs/assets/images/Coze_20.png
--- a/docs/assets/images/Coze_21.png
+++ b/docs/assets/images/Coze_21.png
--- a/docs/assets/images/Coze_3.png
+++ b/docs/assets/images/Coze_3.png
--- a/docs/assets/images/Coze_4.png
+++ b/docs/assets/images/Coze_4.png
--- a/docs/assets/images/Coze_5.png
+++ b/docs/assets/images/Coze_5.png
--- a/docs/assets/images/Coze_6.png
+++ b/docs/assets/images/Coze_6.png
--- a/docs/assets/images/Coze_7.png
+++ b/docs/assets/images/Coze_7.png
--- a/docs/assets/images/Coze_8.png
+++ b/docs/assets/images/Coze_8.png
--- a/docs/assets/images/Coze_9.png
+++ b/docs/assets/images/Coze_9.png
--- a/docs/assets/images/DataFLow_01.png
+++ b/docs/assets/images/DataFLow_01.png
--- a/docs/assets/images/DataFlow_02.png
+++ b/docs/assets/images/DataFlow_02.png
--- a/docs/assets/images/Dify_1.png
+++ b/docs/assets/images/Dify_1.png
--- a/docs/assets/images/Dify_10.png
+++ b/docs/assets/images/Dify_10.png
--- a/docs/assets/images/Dify_11.png
+++ b/docs/assets/images/Dify_11.png
--- a/docs/assets/images/Dify_12.png
+++ b/docs/assets/images/Dify_12.png
--- a/docs/assets/images/Dify_13.png
+++ b/docs/assets/images/Dify_13.png
--- a/docs/assets/images/Dify_14.png
+++ b/docs/assets/images/Dify_14.png
--- a/docs/assets/images/Dify_15.png
+++ b/docs/assets/images/Dify_15.png
--- a/docs/assets/images/Dify_16.png
+++ b/docs/assets/images/Dify_16.png
--- a/docs/assets/images/Dify_17.png
+++ b/docs/assets/images/Dify_17.png
--- a/docs/assets/images/Dify_18.png
+++ b/docs/assets/images/Dify_18.png
--- a/docs/assets/images/Dify_19.png
+++ b/docs/assets/images/Dify_19.png
--- a/docs/assets/images/Dify_2.png
+++ b/docs/assets/images/Dify_2.png
--- a/docs/assets/images/Dify_20.png
+++ b/docs/assets/images/Dify_20.png
--- a/docs/assets/images/Dify_21.png
+++ b/docs/assets/images/Dify_21.png
--- a/docs/assets/images/Dify_22.png
+++ b/docs/assets/images/Dify_22.png
--- a/docs/assets/images/Dify_23.png
+++ b/docs/assets/images/Dify_23.png
--- a/docs/assets/images/Dify_24.png
+++ b/docs/assets/images/Dify_24.png
--- a/docs/assets/images/Dify_25.png
+++ b/docs/assets/images/Dify_25.png
--- a/docs/assets/images/Dify_26.png
+++ b/docs/assets/images/Dify_26.png
--- a/docs/assets/images/Dify_3.png
+++ b/docs/assets/images/Dify_3.png
--- a/docs/assets/images/Dify_4.png
+++ b/docs/assets/images/Dify_4.png
--- a/docs/assets/images/Dify_5.png
+++ b/docs/assets/images/Dify_5.png
--- a/docs/assets/images/Dify_6.png
+++ b/docs/assets/images/Dify_6.png
--- a/docs/assets/images/Dify_7.png
+++ b/docs/assets/images/Dify_7.png
--- a/docs/assets/images/Dify_8.png
+++ b/docs/assets/images/Dify_8.png
--- a/docs/assets/images/Dify_9.png
+++ b/docs/assets/images/Dify_9.png
--- a/docs/assets/images/DingTalk_01.png
+++ b/docs/assets/images/DingTalk_01.png
--- a/docs/assets/images/FastGPT_01.png
+++ b/docs/assets/images/FastGPT_01.png
--- a/docs/assets/images/FastGPT_02.png
+++ b/docs/assets/images/FastGPT_02.png
--- a/docs/assets/images/ModelWhale_01.png
+++ b/docs/assets/images/ModelWhale_01.png
--- a/docs/assets/images/ModelWhale_02.png
+++ b/docs/assets/images/ModelWhale_02.png
--- a/docs/assets/images/ModelWhale_1.png
+++ b/docs/assets/images/ModelWhale_1.png
--- a/docs/assets/images/RagFlow_01.png
+++ b/docs/assets/images/RagFlow_01.png
--- a/docs/assets/images/Sider_1.png
+++ b/docs/assets/images/Sider_1.png
--- a/docs/assets/images/coze_0.png
+++ b/docs/assets/images/coze_0.png
--- a/docs/assets/images/n8n_0.png
+++ b/docs/assets/images/n8n_0.png
--- a/docs/assets/images/n8n_1.png
+++ b/docs/assets/images/n8n_1.png
--- a/docs/assets/images/n8n_10.png
+++ b/docs/assets/images/n8n_10.png
--- a/docs/assets/images/n8n_2.png
+++ b/docs/assets/images/n8n_2.png
--- a/docs/assets/images/n8n_3.png
+++ b/docs/assets/images/n8n_3.png
--- a/docs/assets/images/n8n_4.png
+++ b/docs/assets/images/n8n_4.png
--- a/docs/assets/images/n8n_5.png
+++ b/docs/assets/images/n8n_5.png
--- a/docs/assets/images/n8n_6.png
+++ b/docs/assets/images/n8n_6.png
--- a/docs/assets/images/n8n_7.png
+++ b/docs/assets/images/n8n_7.png
--- a/docs/assets/images/n8n_8.png
+++ b/docs/assets/images/n8n_8.png
--- a/docs/assets/images/n8n_9.png
+++ b/docs/assets/images/n8n_9.png
--- a/docs/en/index.md
+++ b/docs/en/index.md
--- a/docs/en/reference/output_files.md
+++ b/docs/en/reference/output_files.md
@@ -397,10 +397,10 @@ Text levels are distinguished through the `text_level` field:
 
				     {
			
 
				         "type": "image",
			
 
				         "img_path": "images/a8ecda1c69b27e4f79fce1589175a9d721cbdc1cf78b4cc06a015f3746f6b9d8.jpg",
			
 
				-        "img_caption": [
			
 
				+        "image_caption": [
			
 
				             "Fig. 1. Annual flow duration curves of daily flows from Pine Creek, Australia, 1989–2000. "
			
 
				         ],
			
 
				-        "img_footnote": [],
			
 
				+        "image_footnote": [],
			
 
				         "bbox": [
			
 
				             62,
			
 
				             480,
			
--- a/docs/en/usage/cli_tools.md
+++ b/docs/en/usage/cli_tools.md
@@ -87,6 +87,16 @@ Here are the environment variables and their descriptions:
 
				     * Used to enable formula parsing
			
 
				     * defaults to `true`, can be set to `false` through environment variables to disable formula parsing.
			
 
				   
			
 
				-- `MINERU_TABLE_ENABLE`: 
			
 
				+- `MINERU_FORMULA_CH_SUPPORT`:
			
 
				+    * Used to enable Chinese formula parsing optimization (experimental feature)
			
 
				+    * Default is `false`, can be set to `true` via environment variable to enable Chinese formula parsing optimization.
			
 
				+    * Only effective for `pipeline` backend.
			
 
				+  
			
 
				+- `MINERU_TABLE_ENABLE`:
			
 
				     * Used to enable table parsing
			
 
				-    * defaults to `true`, can be set to `false` through environment variables to disable table parsing.
			
 
				+    * Default is `true`, can be set to `false` via environment variable to disable table parsing.
			
 
				+
			
 
				+- `MINERU_TABLE_MERGE_ENABLE`:
			
 
				+    * Used to enable table merging functionality
			
 
				+    * Default is `true`, can be set to `false` via environment variable to disable table merging functionality.
			
 
				+
			
--- a/docs/en/usage/quick_usage.md
+++ b/docs/en/usage/quick_usage.md
@@ -52,7 +52,7 @@ If you need to adjust parsing options through custom parameters, you can also ch
 
				   >[!TIP]
			
 
				   >
			
 
				   >- Access `http://127.0.0.1:7860` in your browser to use the Gradio WebUI.
			
 
				-  >- Access `http://127.0.0.1:7860/?view=api` to use the Gradio API.
			
 
				+
			
 
				 - Using `http-client/server` method:
			
 
				   ```bash
			
 
				   # Start vllm server (requires vllm environment)
			
--- a/docs/zh/index.md
+++ b/docs/zh/index.md
--- a/docs/zh/reference/output_files.md
+++ b/docs/zh/reference/output_files.md
@@ -397,10 +397,10 @@ inference_result: list[PageInferenceResults] = []
 
				     {
			
 
				         "type": "image",
			
 
				         "img_path": "images/a8ecda1c69b27e4f79fce1589175a9d721cbdc1cf78b4cc06a015f3746f6b9d8.jpg",
			
 
				-        "img_caption": [
			
 
				+        "image_caption": [
			
 
				             "Fig. 1. Annual flow duration curves of daily flows from Pine Creek, Australia, 1989–2000. "
			
 
				         ],
			
 
				-        "img_footnote": [],
			
 
				+        "image_footnote": [],
			
 
				         "bbox": [
			
 
				             62,
			
 
				             480,
			
--- a/docs/zh/usage/acceleration_cards/AMD.md
+++ b/docs/zh/usage/acceleration_cards/AMD.md
@@ -0,0 +1,365 @@
 
				+## 基于Triton的ROCm 不同后端实现优化，基本实现vllm后端正常推理，以及pipeline后端中第一步layout用的DocLayout-YOLO
			
 
				+
			
 
				+**已有完整python vllm和mineru环境直接跳转第五步！！！**
			
 
				+**其他GPU执行问题可以参考，先prof查看定位找到哪个算子问题，然后triton后端实现即可**
			
 
				+测试了一下，基本和MinerU官网效果差不多，用AMD的人也不是很多，就在评论区分享给大家了
			
 
				+
			
 
				+### 1.结果介绍
			
 
				+**补充一个200页的PDF python编程书测试一下速度，可以到1.99it/s：**
			
 
				+Two Step Extraction: 100%|████████████████████████████████████████| 200/200 [01:40<00:00,  1.99it/s]
			
 
				+
			
 
				+**下面为之前14学术论文测试结果：**
			
 
				+7900xtx mineru-gradio --server-name 0.0.0.0 --server-port 7860 --enable-vllm-engine true 速度大概为**1.6-1.8s/it**，没有仔细测试，简单试了两个文档。第二种矩阵乘法代替原来的dots点乘可以进一步提速到1.3s/it，优化后的主要算子耗时在hipblast(这个没法提升了)和vllm triton后端，各占25%耗时吧，vllm tirion后端这个这个只能等官方优化了。。。。
			
 
				+doclayout-yolo的layout速度从原来的1.6it/s提高到15it/s，注意需要缓存一下输入的pdf尺寸后，triton必须要缓存尺寸没办法。主要是为了保留模型输入输出接口，最小代码改动。
			
 
				+采用-b vlm-vllm-engine模式举个例子
			
 
				+
			
 
				+---
			
 
				+**测试结果为优化为5d矩阵乘代替原来的点积结果：**
			
 
				+2025-10-05 15:45:12.985 | INFO     | mineru.backend.vlm.vlm_analyze:get_model:128 - get vllm-engine predictor cost: 18.45s
			
 
				+Adding requests: 100%|████████████████████████████████████████████████████████████████████████████████| 14/14 [00:01<00:00, 12.20it/s]
			
 
				+Processed prompts: 100%|█████████████████████| 14/14 [00:08<00:00,  1.56it/s, est. speed input: 2174.18 toks/s, output: 791.87 toks/s]
			
 
				+Adding requests: 100%|█████████████████████████████████████████████████████████████████████████████| 278/278 [00:00<00:00, 323.03it/s]
			
 
				+Processed prompts: 100%|██████████████████| 278/278 [00:07<00:00, 37.63it/s, est. speed input: 5264.66 toks/s, output: 2733.31 toks/s]
			
 
				+
			
 
				+mineru-gradio --server-name 0.0.0.0 --server-port 7860 --enable-vllm-engine true测试：
			
 
				+2025-10-05 15:46:55.953 | WARNING  | mineru.cli.common:convert_pdf_bytes_to_bytes_by_pypdfium2:54 - end_page_id is out of range, use pdf_docs length
			
 
				+Two Step Extraction: 100%|████████████████████████████████████████████████████████████████████████████| 14/14 [00:18<00:00,  1.30s/it]
			
 
				+
			
 
				+---
			
 
				+
			
 
				+### 2.原因介绍
			
 
				+AMD RDNA使用vllm后端有严重的性能问题，原因是因为vllm的**qwen2_vl.py**中有一个算子在rocm kernel上没有对应的实现，导致性能出现严重的卷积计算回退，一次执行花了12s，。。。。。。。。一言难尽。即**MIOpen 库中缺少模型中特定 Conv3d(bfloat16) 的优化内核**。
			
 
				+DocLayout-YOLO的**g2l_crm.py**空洞卷积也是这个问题，专业的CDNA MI210也没解决这个问题
			
 
				+正好一起处理了。
			
 
				+
			
 
				+---
			
 
				+
			
 
				+### 3.环境介绍
			
 
				+System: Ubuntu 24.04.3        Kernel: Linux 6.14.0-33-generic      ROCm version: 7.0.1
			
 
				+python环境：
			
 
				+python 3.12
			
 
				+pytorch-triton-rocm   3.5.0+gitbbb06c03 
			
 
				+torch                            2.10.0.dev20251001+rocm7.0
			
 
				+torchvision                  0.25.0.dev20251003+rocm7.0
			
 
				+vllm                              0.11.0rc2.dev198+g736fbf4c8.rocm701
			
 
				+不同版本无所谓，处理方法是一样的。
			
 
				+
			
 
				+---
			
 
				+
			
 
				+### 4.前置环境安装
			
 
				+```
			
 
				+uv venv --python python3.12
			
 
				+source .venv/bin/activate
			
 
				+uv pip install --pre torch torchvision   -i https://pypi.tuna.tsinghua.edu.cn/simple/   --extra-index-url https://download.pytorch.org/whl/nightly/rocm7.0
			
 
				+uv pip install pip
			
 
				+# 避免覆盖我们本地的pytorch，改用pip而没有继续使用uv pip
			
 
				+pip install -U "mineru[core]" -i https://pypi.mirrors.ustc.edu.cn/simple/
			
 
				+```
			
 
				+vllm 安装参考官方手册[Vllm](https://docs.vllm.com.cn/en/latest/getting_started/installation/gpu.html#amd-rocm)
			
 
				+```
			
 
				+#手动安装aiter，vllm，amd-smi等，自行找一个位置clone，然后进入该目录吧
			
 
				+git clone --recursive https://github.com/ROCm/aiter.git
			
 
				+cd aiter
			
 
				+git submodule sync; git submodule update --init --recursive
			
 
				+python setup.py develop
			
 
				+cd ..
			
 
				+git clone https://github.com/vllm-project/vllm.git
			
 
				+cd vllm/
			
 
				+cp -r /opt/rocm/share/amd_smi ~/Pytorch/vllm/
			
 
				+pip install amd_smi/
			
 
				+pip install --upgrade numba \
			
 
				+    scipy \
			
 
				+    huggingface-hub[cli,hf_transfer] \
			
 
				+    setuptools_scm
			
 
				+pip install -r requirements/rocm.txt
			
 
				+export PYTORCH_ROCM_ARCH="gfx1100"   #根据自己的GPU架构 rocminfo | grep gfx
			
 
				+python setup.py develop
			
 
				+```
			
 
				+---
			
 
				+
			
 
				+### 5.vllm中关键triton算子添加
			
 
				+#### 这里我给出两种解决方法，第一种解决方法就是前面提到的优化到1.5到1.8s/it，第二种方法有手动优化算子到矩阵乘法，7900xtx肯定适用，大概1.3s/it，其他AMD GPU相对方案一也有提速，但是不一定是最佳速度实现，里面的手动部分可能需要微调。
			
 
				+**注意pip把triton 后端的flash_attn卸载了，搞了半天各种尝试还是报错，问题比较大，直接不用就行了**
			
 
				+```
			
 
				+#定位自己vllm位置XXX
			
 
				+pip show vllm
			
 
				+```
			
 
				+**关键更改**
			
 
				+XXX/vllm/model_executor/models/qwen2_vl.py文件：
			
 
				+**1.qwen2_vl.py文件33行下增加from .qwen2_vl_vision_kernels import triton_conv3d_patchify**
			
 
				+```
			
 
				+from collections.abc import Iterable, Mapping, Sequence
			
 
				+from functools import partial
			
 
				+from typing import Annotated, Any, Callable, Literal, Optional, Union
			
 
				+
			
 
				+import torch
			
 
				+import torch.nn as nn
			
 
				+import torch.nn.functional as F
			
 
				+from .qwen2_vl_vision_kernels import triton_conv3d_patchify
			
 
				+```
			
 
				+**接下来分为方案一(2.1和3.1)和方案二(2.2和3.2)，选取一种实现即可**
			
 
				+
			
 
				+---
			
 
				+**方案1**
			
 
				+**2.1qwen2_vl.py文件498行class Qwen2VisionPatchEmbed(nn.Module),PS.就是这玩意AMD没有现成的内核算子导致回退**
			
 
				+```
			
 
				+class Qwen2VisionPatchEmbed(nn.Module):
			
 
				+
			
 
				+    def __init__(
			
 
				+        self,
			
 
				+        patch_size: int = 14,
			
 
				+        temporal_patch_size: int = 2,
			
 
				+        in_channels: int = 3,
			
 
				+        embed_dim: int = 1152,
			
 
				+    ) -> None:
			
 
				+        super().__init__()
			
 
				+        self.patch_size = patch_size
			
 
				+        self.temporal_patch_size = temporal_patch_size
			
 
				+        self.embed_dim = embed_dim
			
 
				+
			
 
				+        kernel_size = (temporal_patch_size, patch_size, patch_size)
			
 
				+        self.proj = nn.Conv3d(in_channels,
			
 
				+                              embed_dim,
			
 
				+                              kernel_size=kernel_size,
			
 
				+                              stride=kernel_size,
			
 
				+                              bias=False)
			
 
				+    def forward(self, x: torch.Tensor) -> torch.Tensor:
			
 
				+        L, C = x.shape
			
 
				+        x_reshaped = x.view(L, -1, self.temporal_patch_size, self.patch_size,
			
 
				+                            self.patch_size)
			
 
				+        
			
 
				+        # Call your custom Triton kernel instead of self.proj
			
 
				+        x_out = triton_conv3d_patchify(x_reshaped, self.proj.weight)
			
 
				+        
			
 
				+        # The output of our kernel is already the correct shape [L, embed_dim]
			
 
				+        return x_out
			
 
				+```
			
 
				+**3.1XXX/vllm/model_executor/models/目录下创建qwen2_vl_vision_kernels.py文件，用triton实现**
			
 
				+```
			
 
				+import torch
			
 
				+from vllm.triton_utils import tl, triton
			
 
				+
			
 
				+@triton.jit
			
 
				+def _conv3d_patchify_kernel(
			
 
				+    # Pointers to tensors
			
 
				+    X, W, Y,
			
 
				+    # Tensor dimensions
			
 
				+    N, C_in, D_in, H_in, W_in,
			
 
				+    C_out, KD, KH, KW,
			
 
				+    # Stride and padding for memory access
			
 
				+    stride_xn, stride_xc, stride_xd, stride_xh, stride_xw,
			
 
				+    stride_wn, stride_wc, stride_wd, stride_wh, stride_ww,
			
 
				+    stride_yn, stride_yc,
			
 
				+    # Triton-specific metaparameters
			
 
				+    BLOCK_SIZE: tl.constexpr,
			
 
				+):
			
 
				+    """
			
 
				+    Triton kernel for a non-overlapping 3D patching convolution.
			
 
				+    Each kernel instance computes one output value for one patch.
			
 
				+    """
			
 
				+    # Get the program IDs for the N (patch) and C_out (output channel) dimensions
			
 
				+    pid_n = tl.program_id(0)  # The index of the patch we are processing
			
 
				+    pid_cout = tl.program_id(1) # The index of the output channel we are computing
			
 
				+
			
 
				+    # --- Calculate memory pointers ---
			
 
				+    # Pointer to the start of the current input patch
			
 
				+    x_ptr = X + (pid_n * stride_xn)
			
 
				+    # Pointer to the start of the current filter (weight)
			
 
				+    w_ptr = W + (pid_cout * stride_wn)
			
 
				+    # Pointer to where the output will be stored
			
 
				+    y_ptr = Y + (pid_n * stride_yn + pid_cout * stride_yc)
			
 
				+
			
 
				+    # --- Perform the convolution (element-wise product and sum) ---
			
 
				+    # This is a dot product between the flattened patch and the flattened filter.
			
 
				+    accumulator = tl.zeros((BLOCK_SIZE,), dtype=tl.float32)
			
 
				+
			
 
				+    # Iterate over the elements of the patch/filter
			
 
				+    for c_offset in range(0, C_in):
			
 
				+        for d_offset in range(0, KD):
			
 
				+            for h_offset in range(0, KH):
			
 
				+                # Unrolled loop for the innermost dimension (width) for performance
			
 
				+                for w_offset in range(0, KW, BLOCK_SIZE):
			
 
				+                    # Create masks to handle cases where KW is not a multiple of BLOCK_SIZE
			
 
				+                    w_range = w_offset + tl.arange(0, BLOCK_SIZE)
			
 
				+                    w_mask = w_range < KW
			
 
				+
			
 
				+                    # Calculate offsets to load data
			
 
				+                    patch_offset = (c_offset * stride_xc + d_offset * stride_xd +
			
 
				+                                    h_offset * stride_xh + w_range * stride_xw)
			
 
				+                    filter_offset = (c_offset * stride_wc + d_offset * stride_wd +
			
 
				+                                     h_offset * stride_wh + w_range * stride_ww)
			
 
				+
			
 
				+                    # Load patch and filter data, applying masks
			
 
				+                    patch_vals = tl.load(x_ptr + patch_offset, mask=w_mask, other=0.0)
			
 
				+                    filter_vals = tl.load(w_ptr + filter_offset, mask=w_mask, other=0.0)
			
 
				+
			
 
				+                    # Multiply and accumulate
			
 
				+                    accumulator += patch_vals.to(tl.float32) * filter_vals.to(tl.float32)
			
 
				+
			
 
				+    # Sum the accumulator block and store the single output value
			
 
				+    output_val = tl.sum(accumulator, axis=0)
			
 
				+    tl.store(y_ptr, output_val)
			
 
				+
			
 
				+
			
 
				+def triton_conv3d_patchify(x: torch.Tensor, weight: torch.Tensor) -> torch.Tensor:
			
 
				+    """
			
 
				+    Python wrapper for the 3D patching convolution Triton kernel.
			
 
				+    """
			
 
				+    # Get tensor dimensions
			
 
				+    N, C_in, D_in, H_in, W_in = x.shape
			
 
				+    C_out, _, KD, KH, KW = weight.shape
			
 
				+
			
 
				+    # Create the output tensor
			
 
				+    # The output of this specific conv is (N, C_out, 1, 1, 1), which we squeeze
			
 
				+    Y = torch.empty((N, C_out), dtype=x.dtype, device=x.device)
			
 
				+
			
 
				+    # Define the grid for launching the Triton kernel
			
 
				+    # Each kernel instance handles one patch (N) for one output channel (C_out)
			
 
				+    grid = (N, C_out)
			
 
				+
			
 
				+    # Launch the kernel
			
 
				+    # We pass all strides to make the kernel flexible
			
 
				+    _conv3d_patchify_kernel[grid](
			
 
				+        x, weight, Y,
			
 
				+        N, C_in, D_in, H_in, W_in,
			
 
				+        C_out, KD, KH, KW,
			
 
				+        x.stride(0), x.stride(1), x.stride(2), x.stride(3), x.stride(4),
			
 
				+        weight.stride(0), weight.stride(1), weight.stride(2), weight.stride(3), weight.stride(4),
			
 
				+        Y.stride(0), Y.stride(1),
			
 
				+        BLOCK_SIZE=16, # A reasonable default, can be tuned
			
 
				+    )
			
 
				+
			
 
				+    return Y
			
 
				+```
			
 
				+---
			
 
				+**方案2**
			
 
				+**2.2qwen2_vl.py文件498行class Qwen2VisionPatchEmbed(nn.Module)函数,PS.就是这玩意AMD没有现成的内核算子导致回退，这里我们直接5D张量一步到位，改为矩阵乘法**
			
 
				+```
			
 
				+class Qwen2VisionPatchEmbed(nn.Module):
			
 
				+
			
 
				+    def __init__(
			
 
				+        self,
			
 
				+        patch_size: int = 14,
			
 
				+        temporal_patch_size: int = 2,
			
 
				+        in_channels: int = 3,
			
 
				+        embed_dim: int = 1152,
			
 
				+    ) -> None:
			
 
				+        super().__init__()
			
 
				+        self.patch_size = patch_size
			
 
				+        self.temporal_patch_size = temporal_patch_size
			
 
				+        self.embed_dim = embed_dim
			
 
				+
			
 
				+        kernel_size = (temporal_patch_size, patch_size, patch_size)
			
 
				+
			
 
				+        self.proj = nn.Conv3d(in_channels,
			
 
				+                              embed_dim,
			
 
				+                              kernel_size=kernel_size,
			
 
				+                              stride=kernel_size,
			
 
				+                              bias=False)
			
 
				+
			
 
				+    def forward(self, x: torch.Tensor) -> torch.Tensor:
			
 
				+        L, C = x.shape
			
 
				+        x_reshaped_5d = x.view(L, -1, self.temporal_patch_size, self.patch_size,
			
 
				+                               self.patch_size)
			
 
				+
			
 
				+        return triton_conv3d_patchify(x_reshaped_5d, self.proj.weight)
			
 
				+```
			
 
				+**3.2XXX/vllm/model_executor/models/目录下创建qwen2_vl_vision_kernels.py文件，用triton实现**
			
 
				+```
			
 
				+import torch
			
 
				+from vllm.triton_utils import tl, triton
			
 
				+
			
 
				+@triton.jit
			
 
				+def _conv_gemm_kernel(
			
 
				+    A, B, C, M, N, K,
			
 
				+    stride_am, stride_ak,
			
 
				+    stride_bk, stride_bn,
			
 
				+    stride_cm, stride_cn,
			
 
				+    BLOCK_M: tl.constexpr, BLOCK_N: tl.constexpr, BLOCK_K: tl.constexpr,
			
 
				+):
			
 
				+    pid_m = tl.program_id(0)
			
 
				+    pid_n = tl.program_id(1)
			
 
				+    offs_m = pid_m * BLOCK_M + tl.arange(0, BLOCK_M)
			
 
				+    offs_n = pid_n * BLOCK_N + tl.arange(0, BLOCK_N)
			
 
				+    offs_k = tl.arange(0, BLOCK_K)
			
 
				+    a_ptrs = A + (offs_m[:, None] * stride_am + offs_k[None, :] * stride_ak)
			
 
				+    b_ptrs = B + (offs_k[:, None] * stride_bk + offs_n[None, :] * stride_bn)
			
 
				+    accumulator = tl.zeros((BLOCK_M, BLOCK_N), dtype=tl.float32)
			
 
				+    for k in range(0, K, BLOCK_K):
			
 
				+        a = tl.load(a_ptrs, mask=(offs_m[:, None] < M) & (offs_k[None, :] < K), other=0.0)
			
 
				+        b = tl.load(b_ptrs, mask=(offs_k[:, None] < K) & (offs_n[None, :] < N), other=0.0)
			
 
				+        accumulator += tl.dot(a, b)
			
 
				+        a_ptrs += BLOCK_K * stride_ak
			
 
				+        b_ptrs += BLOCK_K * stride_bk
			
 
				+        offs_k += BLOCK_K
			
 
				+    c = accumulator.to(C.dtype.element_ty)
			
 
				+    offs_cm = pid_m * BLOCK_M + tl.arange(0, BLOCK_M)
			
 
				+    offs_cn = pid_n * BLOCK_N + tl.arange(0, BLOCK_N)
			
 
				+    c_ptrs = C + stride_cm * offs_cm[:, None] + stride_cn * offs_cn[None, :]
			
 
				+    c_mask = (offs_cm[:, None] < M) & (offs_cn[None, :] < N)
			
 
				+    tl.store(c_ptrs, c, mask=c_mask)
			
 
				+
			
 
				+def triton_conv3d_patchify(x_5d: torch.Tensor, weight_5d: torch.Tensor) -> torch.Tensor:
			
 
				+    N_patches, _, _, _, _ = x_5d.shape
			
 
				+    C_out, _, _, _, _ = weight_5d.shape
			
 
				+    A = x_5d.view(N_patches, -1)
			
 
				+    B = weight_5d.view(C_out, -1).transpose(0, 1).contiguous()
			
 
				+    M, K = A.shape
			
 
				+    _K, N = B.shape
			
 
				+    assert K == _K
			
 
				+    C = torch.empty((M, N), device=A.device, dtype=A.dtype)
			
 
				+
			
 
				+    # --- 针对7900xtx的手动调优配置，其他GPU的最优组合可能需要自行寻找，AMD的autotune效果就是没有效果 ---
			
 
				+    best_config = {
			
 
				+        'BLOCK_M': 128,
			
 
				+        'BLOCK_N': 128,
			
 
				+        'BLOCK_K': 32,
			
 
				+    }
			
 
				+    num_stages = 4
			
 
				+    num_warps = 8
			
 
				+
			
 
				+    grid = (triton.cdiv(M, best_config['BLOCK_M']),
			
 
				+            triton.cdiv(N, best_config['BLOCK_N']))
			
 
				+
			
 
				+    _conv_gemm_kernel[grid](
			
 
				+        A, B, C,
			
 
				+        M, N, K,
			
 
				+        A.stride(0), A.stride(1),
			
 
				+        B.stride(0), B.stride(1),
			
 
				+        C.stride(0), C.stride(1),
			
 
				+        **best_config,
			
 
				+        num_stages=num_stages,
			
 
				+        num_warps=num_warps
			
 
				+    )
			
 
				+
			
 
				+    return C
			
 
				+```
			
 
				+---
			
 
				+**4.关闭终端后再次使用mineru-gradio会报一个Lora错误，修改代码跳过它**
			
 
				+```
			
 
				+pip show mineru_vl_utils
			
 
				+```
			
 
				+
			
 
				+打开该文件XXX/mineru_vl_utils/vlm_client/vllm_async_engine_client.py修改第58行self.tokenizer = vllm_async_llm.tokenizer.get_lora_tokenizer()为：
			
 
				+```
			
 
				+        try:
			
 
				+            self.tokenizer = vllm_async_llm.tokenizer.get_lora_tokenizer()
			
 
				+        except AttributeError:
			
 
				+            # 如果没有 get_lora_tokenizer 方法，直接使用原始 tokenizer
			
 
				+            self.tokenizer = vllm_async_llm.tokenizer
			
 
				+```
			
 
				+
			
 
				+**最后整两个环境变量后愉快玩耍即可**
			
 
				+```
			
 
				+export MINERU_MODEL_SOURCE=modelscope
			
 
				+export TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1
			
 
				+```
			
 
				+---
			
 
				+
			
 
				+### 6.vllm后端已经没有问题，下面是pipeline 中layout用的doclayout-yolo模型空洞卷积问题
			
 
				+### 我在 [DocLayout-YOLO](https://github.com/opendatalab/DocLayout-YOLO/issues/120#issuecomment-3368144275) 下做了一个回答，因此 pipeline 的空洞卷积问题不在这里赘述，直接点击链接查看即可。
			
 
				+查看自己doclayout-yolo安装位置如下，然后进入修改链接中回复介绍的文件即可
			
 
				+```
			
 
				+pip show doclayout-yolo
			
 
				+```
			
 
				+
			
--- a/docs/zh/usage/acceleration_cards/Ascend.md
+++ b/docs/zh/usage/acceleration_cards/Ascend.md
@@ -0,0 +1,64 @@
 
				+#### 1 系统
			
 
				+NAME="Ubuntu"
			
 
				+VERSION="20.04.6 LTS (Focal Fossa)"
			
 
				+昇腾910B2
			
 
				+驱动 23.0.6.2
			
 
				+CANN 7.5.X
			
 
				+Miner U 2.1.9
			
 
				+#### 2 踩坑记录
			
 
				+坑1： **图形库相关的问题，总之就是动态库导致TLS的内存分配失败（OpenCV库在ARM64架构上的兼容性问题）**
			
 
				+⭐这个错误 ImportError: /lib/aarch64-linux-gnu/libGLdispatch.so.0: cannot allocate memory in static TLS block 是由于OpenCV库在ARM64架构上的兼容性问题导致的。从错误堆栈可以看到，问题出现在导入cv2模块时，这发生在MinerU的VLM后端初始化过程中。
			
 
				+解决方法：
			
 
				+1 安装减少内存问题的opencv版本
			
 
				+```
			
 
				+pip install --upgrade albumentations albucore simsimd# Uninstall current opencv
			
 
				+pip uninstall opencv-python opencv-contrib-python
			
 
				+
			
 
				+# Install headless version (no GUI dependencies)
			
 
				+pip install opencv-python-headless
			
 
				+
			
 
				+python -c "import cv2; print(cv2.__version__)"2 apt-get install一些包
			
 
				+```
			
 
				+换成清华源然后重命名为sources.list.tuna，然后挪到根目录下面
			
 
				+```
			
 
				+deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ focal main restricted universe multiverse
			
 
				+deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ focal-updates main restricted universe multiverse
			
 
				+deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ focal-backports main restricted universe multiverse
			
 
				+deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ focal-security main restricted universe multiversesudo apt-get update -o Dir::Etc::sourcelist="sources.list.tuna" -o Dir::Etc::sourceparts="-" -o APT::Get::List-Cleanup="0"
			
 
				+sudo apt-get install libgl1-mesa-glx -o Dir::Etc::sourcelist="sources.list.tuna" -o Dir::Etc::sourceparts="-" -o APT::Get::List-Cleanup="0"
			
 
				+sudo apt-get install libglib2.0-0 libsm6 libxext6 libxrender-dev libgomp1 -o Dir::Etc::sourcelist="sources.list.tuna" -o Dir::Etc::sourceparts="-" -o APT::Get::List-Cleanup="0"
			
 
				+sudo apt-get install libgl1-mesa-dev libgles2-mesa-dev -o Dir::Etc::sourcelist="sources.list.tuna" -o Dir::Etc::sourceparts="-" -o APT::Get::List-Cleanup="0"
			
 
				+sudo apt-get install libgomp1 -o Dir::Etc::sourcelist="sources.list.tuna" -o Dir::Etc::sourceparts="-" -o APT::Get::List-Cleanup="0"
			
 
				+export OPENCV_IO_ENABLE_OPENEXR=0  export QT_QPA_PLATFORM=offscreen
			
 
				+```
			
 
				+↑这些不知道哪些好使，或者有没有好使的
			
 
				+
			
 
				+3  强制覆盖conda环境自带的动态库（conda的和系统的冲突）
			
 
				+```
			
 
				+查找：find /usr/lib /lib /root/.local/conda -name "libgomp.so*" 2>/dev/null
			
 
				+export LD_PRELOAD="/usr/lib/aarch64-linux-gnu/libstdc++.so.6:/usr/lib/aarch64-linux-gnu/libgomp.so.1"
			
 
				+export LD_PRELOAD=/lib/aarch64-linux-gnu/libGLdispatch.so.0:$LD_PRELOAD
			
 
				+```
			
 
				+此外，还可以把conda环境中自带的的强制挪走
			
 
				+```
			
 
				+mv $CONDA_PREFIX/lib/libstdc++.so.6 $CONDA_PREFIX/lib/libstdc++.so.6.bak
			
 
				+mv $CONDA_PREFIX/lib/libgomp.so.1 $CONDA_PREFIX/lib/libgomp.so.1.bak
			
 
				+mv $CONDA_PREFIX/lib/libGLdispatch.so.0 $CONDA_PREFIX/lib/libGLdispatch.so.0.bak  # 如果有的话
			
 
				+simsimd包相关：
			
 
				+mv /root/.local/conda/envs/pdfparser/lib/python3.10/site-packages/simsimd./libgomp-947d5fa1.so.1.0.0 /root/.local/conda/envs/pdfparser/lib/python3.10/site-packages/simsimd./libgomp-947d5fa1.so.1.0.0.bak
			
 
				+```
			
 
				+或者：
			
 
				+降级simsimd                3.7.2
			
 
				+降级albumentations         1.3.1
			
 
				+sklean包相关：
			
 
				+```
			
 
				+# 找到 scikit-learn 内部的 libgomp 路径
			
 
				+SKLEARN_LIBGOMP="/root/.local/conda/envs/pdfparser/lib/python3.10/site-packages/scikit_learn.libs/libgomp-947d5fa1.so.1.0.0"
			
 
				+
			
 
				+# 预加载这个特定的 libgomp 版本
			
 
				+export LD_PRELOAD="$SKLEARN_LIBGOMP:$LD_PRELOAD"
			
 
				+```
			
 
				+4 其他
			
 
				+torch / torch_npu 2.5.1
			
 
				+pip install "numpy<2.0" 2.0和昇腾不兼容
			
 
				+export MINERU_MODEL_SOURCE=modelscope
			
--- a/docs/zh/usage/acceleration_cards/METAX.md
+++ b/docs/zh/usage/acceleration_cards/METAX.md
@@ -0,0 +1,117 @@
 
				+## 在C500+MACA上部署并使用Mineru
			
 
				+
			
 
				+### 获取MACA镜像，包含torch-maca,maca,sglang-maca
			
 
				+
			
 
				+镜像获取地址：https://developer.metax-tech.com/softnova/docker ,
			
 
				+选择maca-c500-pytorch:2.33.0.6-ubuntu22.04-amd64
			
 
				+
			
 
				+若在docker上部署镜像则需要启动GPU设备访问
			
 
				+```bash
			
 
				+docker run --device=/dev/dri --device=/dev/mxcd....
			
 
				+```
			
 
				+
			
 
				+#### 注意事项
			
 
				+
			
 
				+由于此镜像默认开启TORCH_ALLOW_TF32_CUBLAS_OVERRIDE，会导致backed:vlm-transformers推理结果错误
			
 
				+
			
 
				+```bash
			
 
				+unset TORCH_ALLOW_TF32_CUBLAS_OVERRIDE
			
 
				+```
			
 
				+
			
 
				+### 安装MinerU
			
 
				+
			
 
				+使用--no-deps，去除对一些cuda版本包的依赖，后续采用pip install-r requirements.txt 安装其他依赖
			
 
				+```bash
			
 
				+pip install -U "mineru[core]" --no-deps
			
 
				+```
			
 
				+
			
 
				+```tex
			
 
				+boto3>=1.28.43
			
 
				+click>=8.1.7
			
 
				+loguru>=0.7.2
			
 
				+numpy==1.26.4
			
 
				+pdfminer.six==20250506
			
 
				+tqdm>=4.67.1
			
 
				+requests
			
 
				+httpx
			
 
				+pillow>=11.0.0
			
 
				+pypdfium2>=4.30.0
			
 
				+pypdf>=5.6.0
			
 
				+reportlab
			
 
				+pdftext>=0.6.2
			
 
				+modelscope>=1.26.0
			
 
				+huggingface-hub>=0.32.4
			
 
				+json-repair>=0.46.2
			
 
				+opencv-python>=4.11.0.86
			
 
				+fast-langdetect>=0.2.3,<0.3.0
			
 
				+transformers>=4.51.1
			
 
				+accelerate>=1.5.1
			
 
				+pydantic
			
 
				+matplotlib>=3.10,<4
			
 
				+ultralytics>=8.3.48,<9
			
 
				+dill>=0.3.8,<1
			
 
				+rapid_table>=1.0.5,<2.0.0
			
 
				+PyYAML>=6.0.2,<7 
			
 
				+ftfy>=6.3.1,<7
			
 
				+openai>=1.70.0,<2
			
 
				+shapely>=2.0.7,<3
			
 
				+pyclipper>=1.3.0,<2
			
 
				+omegaconf>=2.3.0,<3
			
 
				+transformers>=4.49.0,!=4.51.0,<5.0.0
			
 
				+fastapi
			
 
				+python-multipart
			
 
				+uvicorn
			
 
				+gradio>=5.34,<6
			
 
				+gradio-pdf>=0.0.22
			
 
				+albumentations
			
 
				+beautifulsoup4
			
 
				+scikit-image==0.25.0
			
 
				+outlines==0.1.11
			
 
				+magika>=0.6.2,<0.7.0
			
 
				+mineru-vl-utils>=0.1.6,<1
			
 
				+```
			
 
				+上述内容保存为requirments.txt,进行安装
			
 
				+```bash
			
 
				+pip install -r requirments.txt
			
 
				+```
			
 
				+安装doclayout_yolo，这里doclayout_yolo会依赖torch-cuda,使用--no-deps
			
 
				+```bash
			
 
				+pip install doclayout-yolo --no-deps
			
 
				+```
			
 
				+### 在线使用
			
 
				+**基础使用命令为:mineru -p <input_path> -o <output_path> -b vlm-transformers**
			
 
				+
			
 
				+- `<input_path>`: Local PDF/image file or directory
			
 
				+- `<output_path>`: Output directory
			
 
				+- -b  --backend [pipeline|vlm-transformers|vlm-vllm-engine|vlm-http-client] (default:pipeline)<br/>
			
 
				+
			
 
				+其他详细使用命令可参考官方文档[Quick Usage - MinerU](https://opendatalab.github.io/MinerU/usage/quick_usage/#quick-model-source-configuration)
			
 
				+
			
 
				+### 离线使用
			
 
				+
			
 
				+**所用模型为本地模型，需要设置环境变量和config配置文件**<br/>
			
 
				+#### 下载模型到本地
			
 
				+通过mineru交互式命令行工具进行下载，下载完后会自动更新mineru.json配置文件
			
 
				+```bash
			
 
				+mineru-models-download
			
 
				+```
			
 
				+也可以在[HuggingFace](http://www.huggingface.co.)或[ModelScope](https://www.modelscope.cn/home)找到所需模型源（PDF-Extract-Kit-1.0和MinerU2.5-2509-1.2B）进行下载，
			
 
				+下载完成后，创建mineru.json文件，按如下进行修改
			
 
				+```json
			
 
				+{
			
 
				+    "models-dir": {
			
 
				+        "pipeline": "/path/pdf-extract-kit-1.0/",
			
 
				+        "vlm": "/path/MinerU2.5-2509-1.2B"
			
 
				+    },
			
 
				+    "config_version": "1.3.0"
			
 
				+}
			
 
				+```
			
 
				+path为本地模型的存储路径，其中models-dir为本地模型的路径，pipeline代表backend为pipeline时，所需要的模型路径，vlm代表backend为vlm-开头，所需要的模型路径
			
 
				+
			
 
				+#### 修改环境变量
			
 
				+
			
 
				+```bash
			
 
				+export MINERU_MODEL_SOURCE=local
			
 
				+export MINERU_TOOLS_CONFIG_JSON=/path/mineru.json   //此环境变量为配置文件的路径
			
 
				+```
			
 
				+修改完成后即可正常使用<br/>
			
--- a/docs/zh/usage/acceleration_cards/Tecorigin.md
+++ b/docs/zh/usage/acceleration_cards/Tecorigin.md
@@ -0,0 +1,73 @@
 
				+# TECO适配
			
 
				+
			
 
				+## 快速开始
			
 
				+使用本工具执行推理的主要流程如下：
			
 
				+1. 基础环境安装：介绍推理前需要完成的基础环境检查和安装。
			
 
				+3. 构建Docker环境：介绍如何使用Dockerfile创建模型推理时所需的Docker环境。
			
 
				+4. 启动推理：介绍如何启动推理。
			
 
				+
			
 
				+### 1 基础环境安装
			
 
				+请参考[Teco用户手册的安装准备章节](http://docs.tecorigin.com/release/torch_2.4/v2.2.0/#fc980a30f1125aa88bad4246ff0cedcc)，完成训练前的基础环境检查和安装。
			
 
				+
			
 
				+### 2 构建docker
			
 
				+#### 2.1 执行以下命令，下载Docker镜像至本地（Docker镜像包：pytorch-3.0.0-torch_sdaa3.0.0.tar）
			
 
				+
			
 
				+    wget 镜像下载链接(链接获取请联系太初内部人员)
			
 
				+
			
 
				+#### 2.2 校验Docker镜像包，执行以下命令，生成MD5码是否与官方MD5码b2a7f60508c0d199a99b8b6b35da3954一致：
			
 
				+
			
 
				+    md5sum pytorch-3.0.0-torch_sdaa3.0.0.tar
			
 
				+
			
 
				+#### 2.3 执行以下命令，导入Docker镜像
			
 
				+
			
 
				+    docker load < pytorch-3.0.0-torch_sdaa3.0.0.tar
			
 
				+
			
 
				+#### 2.4 执行以下命令，构建名为MinerU的Docker容器
			
 
				+
			
 
				+    docker run -itd --name="MinerU" --net=host --device=/dev/tcaicard0 --device=/dev/tcaicard1 --device=/dev/tcaicard2 --device=/dev/tcaicard3 --cap-add SYS_PTRACE --cap-add SYS_ADMIN --shm-size 64g jfrog.tecorigin.net/tecotp-docker/release/ubuntu22.04/x86_64/pytorch:3.0.0-torch_sdaa3.0.0 /bin/bash
			
 
				+
			
 
				+#### 2.5 执行以下命令，进入名称为tecopytorch_docker的Docker容器。
			
 
				+
			
 
				+    docker exec -it MinerU bash
			
 
				+
			
 
				+
			
 
				+### 3 执行以下命令安装MinerU 
			
 
				+- 安装前的准备
			
 
				+    ```
			
 
				+    cd <MinerU>
			
 
				+    pip install --upgrade pip
			
 
				+    pip install uv
			
 
				+    ```    
			
 
				+- 由于镜像中安装了torch，并且不需要安装nvidia-nccl-cu12、nvidia-cudnn-cu12等包，因此需要注释掉一部分安装依赖。
			
 
				+- 请注释掉<MinerU>/pyproject.toml文件中所有的"doclayout_yolo==0.0.4"依赖，并且将torch开头的包也注释掉。
			
 
				+- 执行以下命令安装MinerU
			
 
				+    ```
			
 
				+    uv pip install -e .[core]
			
 
				+    ``` 
			
 
				+- 下载安装doclayout_yolo==0.0.4
			
 
				+    ```
			
 
				+    pip install doclayout_yolo==0.0.4 --no-deps
			
 
				+    ``` 
			
 
				+- 下载安装其他包(doclayout_yolo==0.0.4的依赖)
			
 
				+    ```
			
 
				+    pip install albumentations py-cpuinfo seaborn thop numpy==1.24.4
			
 
				+    ``` 
			
 
				+- 由于部分张量内部内存分布不连续，需要修改如下两个文件
			
 
				+    <ultralytics安装路径>/ultralytics/utils/tal.py(330行左右,将view --> reshape)
			
 
				+    <doclayout_yolo安装路径>/doclayout_yolo/utils/tal.py(375行左右,将view --> reshape)
			
 
				+### 4 执行推理
			
 
				+- 开启sdaa环境
			
 
				+    ```
			
 
				+    export TORCH_SDAA_AUTOLOAD=cuda_migrate
			
 
				+    ```
			
 
				+- 首次运行推理命令前请添加以下环境下载模型权重
			
 
				+    ```
			
 
				+    export HF_ENDPOINT=https://hf-mirror.com
			
 
				+    ```
			
 
				+- 运行以下命令执行推理
			
 
				+    ```
			
 
				+     mineru   -p 'input path'  -o  'output_path' --lang 'model_name'
			
 
				+    ```
			
 
				+其中model_name可从'ch', 'ch_server', 'ch_lite', 'en', 'korean', 'japan', 'chinese_cht', 'ta', 'te', 'ka', 'latin', 'arabic', 'east_slavic', 'cyrillic', 'devanagari'选择
			
 
				+### 5 适配用到的软件栈版本列表
			
 
				+使用v3.0.0软件栈版本适配,获取方式联系太初内部人员
			
--- a/docs/zh/usage/cli_tools.md
+++ b/docs/zh/usage/cli_tools.md
@@ -81,7 +81,16 @@ MinerU命令行工具的某些参数存在相同功能的环境变量配置，
 
				 - `MINERU_FORMULA_ENABLE`：
			
 
				     * 用于启用公式解析
			
 
				     * 默认为`true`，可通过环境变量设置为`false`来禁用公式解析。
			
 
				+
			
 
				+- `MINERU_FORMULA_CH_SUPPORT`：
			
 
				+    * 用于启用中文公式解析优化（实验性功能）
			
 
				+    * 默认为`false`，可通过环境变量设置为`true`来启用中文公式解析优化。
			
 
				+    * 仅对`pipeline`后端生效。
			
 
				   
			
 
				 - `MINERU_TABLE_ENABLE`：
			
 
				     * 用于启用表格解析
			
 
				     * 默认为`true`，可通过环境变量设置为`false`来禁用表格解析。
			
 
				+
			
 
				+- `MINERU_TABLE_MERGE_ENABLE`：
			
 
				+    * 用于启用表格合并功能
			
 
				+    * 默认为`true`，可通过环境变量设置为`false`来禁用表格合并功能。
			
--- a/docs/zh/usage/index.md
+++ b/docs/zh/usage/index.md
@@ -3,11 +3,28 @@
 
				 本章节提供了项目的完整使用说明。我们将通过以下几个部分，帮助您从基础到进阶逐步掌握项目的使用方法：
			
 
				 
			
 
				 ## 目录
			
 
				-
			
 
				-- [快速使用](./quick_usage.md) - 快速上手和基本使用
			
 
				-- [模型源配置](./model_source.md) - 模型源的详细配置说明  
			
 
				-- [命令行工具](./cli_tools.md) - 命令行工具的详细参数说明
			
 
				-- [进阶优化参数](./advanced_cli_parameters.md) - 一些适配命令行工具的进阶参数说明
			
 
				+- 本地部署
			
 
				+    * [快速使用](./quick_usage.md) - 快速上手和基本使用
			
 
				+    * [模型源配置](./model_source.md) - 模型源的详细配置说明  
			
 
				+    * [命令行工具](./cli_tools.md) - 命令行工具的详细参数说明
			
 
				+    * [进阶优化参数](./advanced_cli_parameters.md) - 一些适配命令行工具的进阶参数说明
			
 
				+- 插件与生态
			
 
				+    * [Cherry Studio](plugin/Cherry_Studio.md)
			
 
				+    * [Sider](plugin/Sider.md)
			
 
				+    * [Dify](plugin/Dify.md)
			
 
				+    * [n8n](plugin/n8n.md)
			
 
				+    * [Coze](plugin/Coze.md)
			
 
				+    * [FastGPT](plugin/FastGPT.md)
			
 
				+    * [ModelWhale](plugin/ModelWhale.md)
			
 
				+    * [DingTalk](plugin/DingTalk.md)
			
 
				+    * [DataFlow](plugin/DataFlow.md)
			
 
				+    * [BISHENG](plugin/BISHENG.md)
			
 
				+    * [RagFlow](plugin/RagFlow.md)
			
 
				+- 其他加速卡适配(由社区贡献)
			
 
				+    * [昇腾 Ascend](acceleration_cards/Ascend.md) [#3233](https://github.com/opendatalab/MinerU/discussions/3233)
			
 
				+    * [沐曦 METAX](acceleration_cards/METAX.md) [#3477](https://github.com/opendatalab/MinerU/pull/3477)
			
 
				+    * [AMD](acceleration_cards/AMD.md)  [#3662](https://github.com/opendatalab/MinerU/discussions/3662)
			
 
				+    * [太初元碁 Tecorigin](acceleration_cards/Tecorigin.md) [#3767](https://github.com/opendatalab/MinerU/pull/3767)
			
 
				 
			
 
				 ## 开始使用
			
 
				 
			
--- a/docs/zh/usage/plugin/BISHENG.md
+++ b/docs/zh/usage/plugin/BISHENG.md
@@ -0,0 +1,11 @@
 
				+# BISHENG 简介
			
 
				+
			
 
				+BISHENG毕昇 是一款开源 LLM应用开发平台，主攻企业场景， 已有大量行业头部组织及世界500强企业在使用。“毕昇”是活字印刷术的发明人，活字印刷术为人类知识的传递起到了巨大的推动作用。BISHENG毕昇团队希望“BISHENG毕昇”同样能够为智能应用的广泛落地提供有力支撑。
			
 
				+
			
 
				+![](../../../assets/Images/BISHENG_01.png)
			
 
				+
			
 
				+
			
 
				+- 官网地址：https://bisheng.dataelem.com/
			
 
				+- Miner 在BISHENG毕昇 项目中的插件项目：https://github.com/dataelement/bisheng/pulls
			
 
				+
			
 
				+特别鸣谢 [@pzc163](https://github.com/pzc163)
			
--- a/docs/zh/usage/plugin/Cherry_Studio.md
+++ b/docs/zh/usage/plugin/Cherry_Studio.md
@@ -0,0 +1,238 @@
 
				+# Cherry Studio 简介
			
 
				+
			
 
				+Cherry Studio 是一款功能强大的多模型 AI 客户端软件，支持 Windows、macOS 和 Linux 等多平台运行，集成了 OpenAI、DeepSeek、Gemini、Anthropic 等主流 AI 云服务，同时支持本地模型运行，用户可以灵活切换不同的AI模型。
			
 
				+
			
 
				+目前，MinerU 强大的文档解析能力已深度集成到 Cherry Studio 的知识库与对话交互中，为用户带来更便捷的文档处理与信息获取体验。
			
 
				+
			
 
				+![img](../../../assets/images/Cherry_Studio_1.png)
			
 
				+
			
 
				+- Cherry Studio 官网地址：https://www.cherry-ai.com/
			
 
				+
			
 
				+
			
 
				+# MinerU 在 Cherry Studio 中的使用方法
			
 
				+
			
 
				+## 进入 Cherry Studio 设置
			
 
				+
			
 
				+a. 打开 Cherry Studio 应用程序
			
 
				+
			
 
				+b. 点击左下角的"设置"按钮，进入设置页面
			
 
				+
			
 
				+c. 在左侧菜单中，选择"MCP 服务器"
			
 
				+
			
 
				+在右侧的 MCP 服务器配置界面中，您可以看到已有的 MCP 服务器列表。点击右上角的"添加服务器"按钮来创建新的 MCP 服务，或者点击现有服务来编辑配置。
			
 
				+
			
 
				+## 添加 MinerU-MCP 配置
			
 
				+
			
 
				+点击"添加服务器"后，您将看到一个配置表单。请按以下步骤填写：
			
 
				+
			
 
				+**a. 名称**：输入"MinerU-MCP"或您喜欢的其他名称
			
 
				+
			
 
				+**b. 描述**：可选，如"文档转换为Markdown工具"
			
 
				+
			
 
				+**c. 类型**：选择"标准输入/输出（stdio）"
			
 
				+
			
 
				+**d. 命令**：输入 uvx
			
 
				+
			
 
				+**e. 参数**：输入 mineru-mcp
			
 
				+
			
 
				+**f. 环境变量**：添加以下环境变量
			
 
				+
			
 
				+```Plain
			
 
				+MINERU_API_BASE=https://mineru.net
			
 
				+MINERU_API_KEY=您的API密钥
			
 
				+OUTPUT_DIR=./downloads
			
 
				+USE_LOCAL_API=false
			
 
				+LOCAL_MINERU_API_BASE=http://localhost:8888
			
 
				+```
			
 
				+
			
 
				+使用 *`uvx`* 命令可以自动处理 mineru-mcp 的安装和运行，**无需预先手动安装 mineru-mcp 包**。这是最简单的配置方式。
			
 
				+
			
 
				+## 保存配置
			
 
				+
			
 
				+确认无误后，点击界面右上角的"保存"按钮完成配置。保存后，MCP 服务器列表中会显示您刚刚添加的 MinerU-MCP 服务。
			
 
				+
			
 
				+![img](../../../assets/images/Cherry_Studio_2.png)
			
 
				+
			
 
				+![img](../../../assets/images/Cherry_Studio_3.png)
			
 
				+
			
 
				+## 使用 Cherry Studio 中的 MinerU MCP
			
 
				+
			
 
				+一旦配置完成，您可以在 Cherry Studio 中的对话中使用 MinerU MCP 工具。在 Cherry Studio 中，您可以使用如下提示让模型调用 MinerU MCP 工具。模型会自动识别任务并调用相应的工具。
			
 
				+
			
 
				+## 示例 1: 使用 URL 转换文档
			
 
				+
			
 
				+**用户输入:**
			
 
				+
			
 
				+```Plain
			
 
				+请使用 MinerU MCP 将以下 URL 的 PDF 文档转换为 Markdown 格式：https://example.com/sample.pdf
			
 
				+```
			
 
				+
			
 
				+**模型将执行的步骤：**
			
 
				+
			
 
				+模型识别这是文档转换任务，并调用 *`parse_documents`* 工具，参数为:
			
 
				+
			
 
				+```Plain
			
 
				+{"file_sources": "https://example.com/sample.pdf"}
			
 
				+```
			
 
				+
			
 
				+工具处理完成后，模型会告知您转换结果。
			
 
				+
			
 
				+![img](../../../assets/images/Cherry_Studio_4.png)
			
 
				+
			
 
				+## 示例 2: 转换本地文档
			
 
				+
			
 
				+**用户输入:**
			
 
				+
			
 
				+```Plain
			
 
				+请使用 MinerU-MCP 将本地的 D://sample.pdf 文件转换为 Markdown 格式
			
 
				+```
			
 
				+
			
 
				+**模型将执行的步骤：**
			
 
				+
			
 
				+模型识别这是本地文档转换任务，调用 `parse_documents` 工具，参数为:
			
 
				+
			
 
				+```Plain
			
 
				+{"file_sources": "D://sample.pdf"}
			
 
				+```
			
 
				+
			
 
				+![img](../../../assets/images/Cherry_Studio_5.png)
			
 
				+
			
 
				+## 示例 3: 启用 OCR 处理扫描文档
			
 
				+
			
 
				+**用户输入:**
			
 
				+
			
 
				+```Plain
			
 
				+请使用 MinerU-MCP 将以下 URL 的扫描版 PDF 文档转换为 Markdown 格式，并启用 OCR 功能：
			
 
				+https://example.com/scanned.pdf
			
 
				+```
			
 
				+
			
 
				+**模型将执行的步骤：** 
			
 
				+
			
 
				+模型识别这是需要 OCR 处理的文档转换任务，调用 `parse_documents` 工具，并启用 OCR 参数:
			
 
				+
			
 
				+```Plain
			
 
				+{"file_sources": "https://example.com/scanned.pdf","enable_ocr": true}
			
 
				+```
			
 
				+
			
 
				+![img](../../../assets/images/Cherry_Studio_6.png)
			
 
				+
			
 
				+## 示例 4: 完整对话流程
			
 
				+
			
 
				+以下是一个完整的对话流程示例：
			
 
				+
			
 
				+**用户:**
			
 
				+
			
 
				+```Plain
			
 
				+我有一份学术论文的 PDF，网址是 https://arxiv.org/pdf/2303.08774.pdf，能帮我转换成 Markdown 格式吗？
			
 
				+```
			
 
				+
			
 
				+**模型:**
			
 
				+
			
 
				+```Plain
			
 
				+我可以帮您将这份学术论文转换为 Markdown 格式。我将使用 MinerU-MCP 工具来处理这个任务。
			
 
				+[模型调用 parse_documents 工具，处理过程中...]
			
 
				+论文已成功转换为 Markdown 格式。转换结果如下：
			
 
				+# The Capacity of Diffusion Models to Memorize and Generate Training Data
			
 
				+## Abstract
			
 
				+Recent diffusion models can generate high-quality images that are nearly indistinguishable from real ones...
			
 
				+[显示论文内容...]
			
 
				+```
			
 
				+
			
 
				+![img](../../../assets/images/Cherry_Studio_7.png)
			
 
				+
			
 
				+## 工具参数详解
			
 
				+
			
 
				+在使用过程中，模型会根据您的指令自动选择合适的工具和参数。以下是主要工具的参数说明：
			
 
				+
			
 
				+### ● *parse_documents* 工具参数
			
 
				+
			
 
				+![img](../../../assets/images/Cherry_Studio_8.png)
			
 
				+
			
 
				+### ● *get_ocr_languages* 工具参数
			
 
				+
			
 
				+无需参数，用于获取OCR支持的语言列表。
			
 
				+
			
 
				+## 高级用法
			
 
				+
			
 
				+###  指定语言和页码范围
			
 
				+
			
 
				+**用户输入:**
			
 
				+
			
 
				+```Plain
			
 
				+请使用 MinerU MCP 将以下 URL 的文档转换为 Markdown 格式，只处理第 5-10 页，并指定语言为中文：https://example.com/document.pdf
			
 
				+```
			
 
				+
			
 
				+模型会使用 *`parse_documents`* 工具，并设置 *`language`* 参数为 "ch"，*`page_ranges`* 参数为 "5-10"。
			
 
				+
			
 
				+### 批量处理多个文档
			
 
				+
			
 
				+**用户输入:**
			
 
				+
			
 
				+```Plain
			
 
				+请使用 MinerU-MCP 将以下多个 URL 的文档转换为 Markdown 格式：
			
 
				+https://example.com/doc1.pdf
			
 
				+https://example.com/doc2.pdf
			
 
				+https://example.com/doc3.pdf
			
 
				+```
			
 
				+
			
 
				+模型会调用 *`parse_documents`* 工具，并将多个 URL 以逗号分隔传入 *`file_sources`* 参数。
			
 
				+
			
 
				+## 注意事项
			
 
				+
			
 
				+● 当设置 *`USE_LOCAL_API=true`* 时，使用本地配置的API进行解析
			
 
				+
			
 
				+● 当设置 *`USE_LOCAL_API=false`* 时，会使用 MinerU 官网的API进行解析
			
 
				+
			
 
				+● 处理大型文档可能需要较长时间，请耐心等待
			
 
				+
			
 
				+● 如果遇到超时问题，请考虑分批处理文档或使用本地API模式
			
 
				+
			
 
				+## 常见问题与解决方案
			
 
				+
			
 
				+### 无法启动 MCP 服务
			
 
				+
			
 
				+**问题**：运行 *`uv run -m mineru.cli`*` `时报错。
			
 
				+
			
 
				+**解决方案**：
			
 
				+
			
 
				+● 确保已激活虚拟环境
			
 
				+
			
 
				+● 检查是否已安装所有依赖
			
 
				+
			
 
				+● 尝试使用 *`python -m mineru.cli`*` `命令替代
			
 
				+
			
 
				+### 文件转换失败
			
 
				+
			
 
				+**问题**：文件上传成功但转换失败。
			
 
				+
			
 
				+**解决方案**：
			
 
				+
			
 
				+● 检查文件格式是否受支持
			
 
				+
			
 
				+● 确认API密钥是否正确
			
 
				+
			
 
				+● 查看MCP服务日志获取详细错误信息
			
 
				+
			
 
				+### 文件路径问题
			
 
				+
			
 
				+**问题**：使用 `parse_documents` 工具处理本地文件时报找不到文件错误。
			
 
				+
			
 
				+**解决方案**：请确保使用绝对路径，或者相对于服务器运行目录的正确相对路径。
			
 
				+
			
 
				+### MCP 服务调用超时问题
			
 
				+
			
 
				+**问题**：调用 *`parse_documents`* 工具时出现 *`Error calling tool 'parse_documents': MCP error -32001: Request timed out`* 错误。
			
 
				+
			
 
				+**解决方案**：这个问题常见于处理大型文档或网络不稳定的情况。在某些 MCP 客户端（如 Cursor）中，超时后可能导致无法再次调用 MCP 服务，需要重启客户端。最新版本的 Cursor 中可能会显示正在调用 MCP，但实际上没有真正调用成功。建议：
			
 
				+
			
 
				+**● 等待官方修复**：这是Cursor客户端的已知问题，建议等待Cursor官方修复
			
 
				+
			
 
				+**● 处理小文件**：尽量只处理少量小文件，避免处理大型文档导致超时
			
 
				+
			
 
				+**● 分批处理**：将多个文件分成多次请求处理，每次只处理一两个文件
			
 
				+
			
 
				+● 增加超时时间设置（如果客户端支持）
			
 
				+
			
 
				+● 对于超时后无法再次调用的问题，需要重启 MCP 客户端
			
 
				+
			
 
				+● 如果反复出现超时，请检查网络连接或考虑使用本地 API 模式
			
--- a/docs/zh/usage/plugin/Coze.md
+++ b/docs/zh/usage/plugin/Coze.md
@@ -0,0 +1,92 @@
 
				+# Coze 简介
			
 
				+
			
 
				+Coze（中文版名称：扣子） 是字节跳动推出的零代码 AI 应用开发平台。无论用户是否有编程经验，都可以通过该平台快速创建各种类型的聊天机器人、智能体、AI 应用和插件，并将其部署在社交平台和即时聊天应用程序中。
			
 
				+
			
 
				+目前，MinerU 插件已在 Coze 插件商店上线，通过其强大的文档解析能力，为用户搭建智能体与工作流提供文档解析能力，加快用户 AI 应用的开发。
			
 
				+
			
 
				+![img](../../../assets/images/coze_0.png)
			
 
				+
			
 
				+- 扣子官网地址：https://www.coze.cn/
			
 
				+- MinerU 扣子插件下载地址：https://www.coze.cn/store/plugin/7527957359730360354
			
 
				+
			
 
				+# MinerU 在 Coze 中的使用方法
			
 
				+
			
 
				+## **Coze：集成应用**
			
 
				+
			
 
				+- 进入 https://www.coze.cn/home coze 开发平台
			
 
				+
			
 
				+## 智能体
			
 
				+
			
 
				+### 工作空间 -> 项目开发 -> 创建 -> 创建智能体 -> 创建 -> 输入项目名
			
 
				+
			
 
				+![img](../../../assets/images/Coze_1.png)
			
 
				+
			
 
				+![img](../../../assets/images/Coze_2.png)
			
 
				+
			
 
				+### 插件配置 -> 添加 `插件` -> 搜索 `MinerU`
			
 
				+
			
 
				+![img](../../../assets/images/Coze_3.png)
			
 
				+
			
 
				+### 添加 `parse_file` 工具（在线版）
			
 
				+
			
 
				+![img](../../../assets/images/Coze_4.png)
			
 
				+
			
 
				+### 选择 `MinerU` 插件 -> 编辑参数 -> 填写 api key
			
 
				+
			
 
				+![img](../../../assets/images/Coze_5.png)
			
 
				+
			
 
				+![img](../../../assets/images/Coze_6.png)
			
 
				+
			
 
				+> 记得关闭 url 和 token 显示
			
 
				+
			
 
				+### 调试 `智能体`
			
 
				+
			
 
				+![img](../../../assets/images/Coze_7.png)
			
 
				+
			
 
				+## 工作流
			
 
				+
			
 
				+> 用工作流的方式使用 minerU
			
 
				+
			
 
				+### 工作流 -> 创建工作流
			
 
				+
			
 
				+![img](../../../assets/images/Coze_8.png)
			
 
				+
			
 
				+![img](../../../assets/images/Coze_9.png)
			
 
				+
			
 
				+### 工作流插件配置 -> 添加 `插件` -> 搜索 `MinerU` -> 添加
			
 
				+
			
 
				+![img](../../../assets/images/Coze_10.png)
			
 
				+
			
 
				+![img](../../../assets/images/Coze_11.png)
			
 
				+
			
 
				+###  选择`MinerU` 插件 -> 编辑参数 -> 填写 api key
			
 
				+
			
 
				+![img](../../../assets/images/Coze_12.png)
			
 
				+
			
 
				+###  选择开始节点 -> 配置 `input` 类型为文件类型 -> 连接到 `mineru` 节点
			
 
				+
			
 
				+![img](../../../assets/images/Coze_13.png)
			
 
				+
			
 
				+![img](../../../assets/images/Coze_14.png)
			
 
				+
			
 
				+###  选择结束节点 -> 连接到 `mineru` 节点 -> 配置 `output` 输出为 `mineru` 节点的 `parse_file.text`
			
 
				+
			
 
				+![img](../../../assets/images/Coze_15.png)
			
 
				+
			
 
				+![img](../../../assets/images/Coze_16.png)
			
 
				+
			
 
				+### 上传文件 -> 试运行
			
 
				+
			
 
				+![img](../../../assets/images/Coze_17.png)
			
 
				+
			
 
				+![img](../../../assets/images/Coze_18.png)
			
 
				+
			
 
				+### 发布 -> 添加到当前智能体
			
 
				+
			
 
				+![img](../../../assets/images/Coze_19.png)
			
 
				+
			
 
				+![img](../../../assets/images/Coze_20.png)
			
 
				+
			
 
				+### 移除 `mineru` 插件 -> 调试
			
 
				+
			
 
				+![img](../../../assets/images/Coze_21.png)
			
--- a/docs/zh/usage/plugin/DataFlow.md
+++ b/docs/zh/usage/plugin/DataFlow.md
@@ -0,0 +1,11 @@
 
				+# 元枢智汇 ADP 智能数据平台 简介
			
 
				+
			
 
				+元枢智汇 ADP 智能数据平台基于自研 AI 数据库和 DataFlow数据准备框架打造，旨在帮助企业高效管理、检索、处理海量数据，并通过体系化、自动化数据治理降低模型/智能体训练的专业门槛，帮助企业结合业务场景发挥私有数据的价值，真正落地AI应用。
			
 
				+
			
 
				+目前，MinerU 已深度集成于元枢智汇 ADP 智能数据平台的 DataFlow 模块中，其数据解析服务由文档语料提取引擎 MinerU 提供支持。
			
 
				+
			
 
				+![](../../../assets/images/DataFLow_01.png)
			
 
				+![](../../../assets/images/DataFLow_02.png)
			
 
				+
			
 
				+- 官网地址：https://adp.originhub.tech/agent
			
 
				+- Miner fastGPT 插件下载地址：https://cloud.fastgpt.io/dashboard/systemPlugin?type=productivity
			
--- a/docs/zh/usage/plugin/Dify.md
+++ b/docs/zh/usage/plugin/Dify.md
@@ -0,0 +1,171 @@
 
				+# Dify 简介
			
 
				+
			
 
				+**Dify** 是一个开源的大语言模型（LLM）应用开发平台，旨在简化和加速生成式 AI 应用的创建和部署。它结合了后端即服务（BaaS）和 LLMOps 的理念，为开发者提供了用户友好的界面和强大的工具，有效降低了 AI 应用开发的门槛。
			
 
				+
			
 
				+目前 MinerU 与 Dify 联合研发的 MinerU 插件已在 Dify 市场上架，帮助用户搭建工作流，提供文档解析的工作。
			
 
				+
			
 
				+![img](../../../assets/images/Dify_2.png)
			
 
				+
			
 
				+- Dify 官网地址：https://dify.ai/zh
			
 
				+- MinerU Dify 插件下载地址：https://marketplace.dify.ai/plugins/langgenius/mineru
			
 
				+
			
 
				+# MinerU 在 Dify 中的使用方法
			
 
				+
			
 
				+## 一、**新版MinerU Dify插件亮点 (v0.4.0)**
			
 
				+
			
 
				+- **完美适配MinerU2**：全面兼容MinerU2的最新功能，释放顶尖的文档解析能力。
			
 
				+- **超高灵活性**：同时支持官方在线API和本地化部署的API（并向下兼容 1.x 版本）。
			
 
				+- **赋能工作流**：让Dify的Agent拥有强大的文档“读写”能力，轻松处理复杂任务。
			
 
				+
			
 
				+
			
 
				+## **二、实战演练：两个案例带你快速上手**
			
 
				+
			
 
				+空谈不如实战。下面我们通过两个典型场景，向你展示新版插件的强大之处。
			
 
				+
			
 
				+### 准备
			
 
				+
			
 
				+1. 在Dify插件页面安装MinerU插件（私有化部署的Dify同理）
			
 
				+
			
 
				+
			
 
				+2. 填写API URL等信息
			
 
				+
			
 
				+![img](../../../assets/images/Dify_3.png)
			
 
				+
			
 
				+使用官方API时令牌（Token）必须提供👆，使用本地部署API时令牌可不填写👇
			
 
				+
			
 
				+![img](../../../assets/images/Dify_4.png)
			
 
				+
			
 
				+### **案例一：解析单文件，搭建Chat PDF应用**
			
 
				+
			
 
				+想借助AI与你的文档对话吗？跟着下面几步，轻松实现
			
 
				+
			
 
				+#### 第一步：创建空白应用，选择“Chatflow”
			
 
				+
			
 
				+输入应用名称与描述
			
 
				+
			
 
				+![img](../../../assets/images/Dify_5.png)
			
 
				+
			
 
				+#### 第二步：创建的初始模板中，选择“开始”节点
			
 
				+
			
 
				+字段类型选为单文件，填写变量名称（此处填为input_file）,支持文档类型选为文档与图片
			
 
				+
			
 
				+![img](../../../assets/images/Dify_6.png)
			
 
				+
			
 
				+#### 第三步：添加工具节点——MinerU插件来解析上一步开始节点上传的文件
			
 
				+
			
 
				+![img](../../../assets/images/Dify_7.png)
			
 
				+
			
 
				+#### 第四步：设置MinerU的输入变量，选择上一步开始节点添加的 `input_file`
			
 
				+
			
 
				+![img](../../../assets/images/Dify_8.png)
			
 
				+
			
 
				+#### 第五步：配置LLM模型
			
 
				+
			
 
				+选择“LLM”节点后，如果没有模型可用，需要单独在插件市场安装（这里使用 Deepseek作为示例）
			
 
				+
			
 
				+“上下文”选择MinerU的输出变量 `text`（MinerU解析文档后的markdown格式）
			
 
				+
			
 
				+![img](../../../assets/images/Dify_9.png)
			
 
				+
			
 
				+在“SYSTEM”区域根据实际需求填写提示词，可如图填写“在Parse File `text`中提取用户的问题答案”
			
 
				+
			
 
				+![img](../../../assets/images/Dify_10.png)
			
 
				+
			
 
				+#### 第六步：预览，上传文件并提问机器人关于文档的内容
			
 
				+
			
 
				+至此一个简单的文档问答应用Chat PDF搭建完成，点击“预览”，查看效果如何👇
			
 
				+
			
 
				+![img](../../../assets/images/Dify_11.png)
			
 
				+
			
 
				+结果如下：
			
 
				+
			
 
				+![img](../../../assets/images/Dify_12.png)
			
 
				+
			
 
				+#### **第七步：发布与测试**
			
 
				+
			
 
				+保存并发布你的应用。现在，上传一份PDF或图片，你就可以和它自由对话了！
			
 
				+
			
 
				+![img](../../../assets/images/Dify_13.png)
			
 
				+
			
 
				+### **案例二：自动化批量处理文档，并上传至云端S3**
			
 
				+
			
 
				+需要处理大量文档并归档？MinerU 插件同样能胜任
			
 
				+
			
 
				+#### 第一步：安装 botos3 插件
			
 
				+
			
 
				+![img](../../../assets/images/Dify_14.png)
			
 
				+
			
 
				+#### 第二步：配置 S3 bucket
			
 
				+
			
 
				+![img](../../../assets/images/Dify_15.png)
			
 
				+
			
 
				+#### 第三步：创建工作流
			
 
				+
			
 
				+选择字段类型为“文件列表”，填写变量名称（此处填为input_files）,支持的文档类型选为文档与图片
			
 
				+
			
 
				+![img](../../../assets/images/Dify_16.png)
			
 
				+
			
 
				+#### 第四步：添加“迭代”
			
 
				+
			
 
				+在“开始”节点后添加“迭代”，并配置迭代内的MinerU节点,设置迭代的输入为上一步开始节点的`upload_files`，输出节点暂时不填写，再整个迭代配置完成后选择MinerU节点Parse File的`full_zip_url`
			
 
				+
			
 
				+![img](../../../assets/images/Dify_17.png)
			
 
				+
			
 
				+将MinerU的输入参数file选择为迭代器的 `item`
			
 
				+
			
 
				+![img](../../../assets/images/Dify_18.png)
			
 
				+
			
 
				+![img](../../../assets/images/Dify_19.png)
			
 
				+
			
 
				+#### 第五步：增加中间节点“代码执行”来转换MinerU的解析结果
			
 
				+
			
 
				+**输入变量(变量名称需与代码定义一致)**
			
 
				+
			
 
				+- **text：**选择MinerU Parse File的输出变量`text`
			
 
				+- **uploadFiles：**选择“开始”节点的文件列表`upload_files`，用来根据迭代的index索引下标找到对应的原始文件名
			
 
				+- **index：**迭代的下标索引，选择迭代器的`index`
			
 
				+
			
 
				+**输出变量(变量名称需与代码定义一致)**
			
 
				+
			
 
				+- **fileName：**String
			
 
				+- **base64：**String
			
 
				+
			
 
				+![img](../../../assets/images/Dify_20.png)
			
 
				+
			
 
				+代码选择JavaScript，编写转换代码：
			
 
				+
			
 
				+暂时无法在飞书文档外展示此内容
			
 
				+
			
 
				+以下为Python版本：
			
 
				+
			
 
				+暂时无法在飞书文档外展示此内容
			
 
				+
			
 
				+#### 第六步：配置 Botos3 插件来上传内容
			
 
				+
			
 
				+添加工具节点Botos3，选择“通过s3上传base64”
			
 
				+
			
 
				+![img](../../../assets/images/Dify_21.png)
			
 
				+
			
 
				+文件base64选择代码执行（图中为**转换MINERU MD文本**）输出的base64字段
			
 
				+
			
 
				+![img](../../../assets/images/Dify_22.png)
			
 
				+
			
 
				+S3对象key，S3 对象key填写文件存储的路径，在 botos3 插件配置界面已经填写了 bucket 名称，这里只需要填写在bucket下存储的目录即可。选择代码执行**（图中为转换MINERU MD文本）**的`fileName`
			
 
				+
			
 
				+![img](../../../assets/images/Dify_23.png)
			
 
				+
			
 
				+#### 第七步：预览效果
			
 
				+
			
 
				+连接结束节点，至此，一个简单的上传到s3的工作流配置完成，点击“运行”看看效果👇：
			
 
				+
			
 
				+![img](../../../assets/images/Dify_24.png)
			
 
				+
			
 
				+![img](../../../assets/images/Dify_25.png)
			
 
				+
			
 
				+#### 第八步：Vis3查看文档
			
 
				+
			
 
				+运行结束，可通过[vis3](https://github.com/opendatalab/Vis3?tab=readme-ov-file#features)来查看S3桶内是否已上传解析后的md文件，Vis3使用可参考
			
 
				+
			
 
				+[新工具开源！Vis3大模型数据可视化利器：填 AK/SK 直接预览 S3 数据，JSON/视频/图片秒开！本地文件也可用](https://mp.weixin.qq.com/s/p3rH4EaoJB-AK7RWeDvOhg)
			
 
				+
			
 
				+![img](../../../assets/images/Dify_26.png)
			
--- a/docs/zh/usage/plugin/DingTalk.md
+++ b/docs/zh/usage/plugin/DingTalk.md
@@ -0,0 +1,12 @@
 
				+# 钉钉简介
			
 
				+
			
 
				+钉钉（DingTalk）是阿里巴巴集团打造的企业级智能移动办公平台，是数字经济时代的企业组织协同办公和应用开发平台。钉钉整合了 IM 即时沟通、钉钉文档、钉闪会、钉盘、Teambition、OA审批、智能人事、钉工牌、工作台等功能，旨在实现简单、高效、安全、智能的数字化工作方式。它支持企业组织数字化和业务数字化，覆盖“人、财、物、事、产、供、销、存”的全链路管理。
			
 
				+
			
 
				+通过钉钉开放平台上的SaaS软件，企业可低成本搭建数字化应用，整合所有数字化系统。此外，钉钉提供超过2000个API接口，为企业数字化转型提供开放兼容环境。不会代码的用户也可利用低代码工具构建CRM、ERP、OA、项目管理、进销存等系统。
			
 
				+
			
 
				+目前，钉钉文档、AI 表格等产品此前已深度集成 MinerU 能力，并通过开放平台向生态开发者开放文档解析功能，为 DLU 的联合研发提供了扎实的技术与场景基础。
			
 
				+
			
 
				+![](../../../assets/images/DingTalk_01.png)
			
 
				+
			
 
				+
			
 
				+- 钉钉官网：https://www.dingtalk.com/
			
--- a/docs/zh/usage/plugin/FastGPT.md
+++ b/docs/zh/usage/plugin/FastGPT.md
@@ -0,0 +1,13 @@
 
				+# FastGPT  简介
			
 
				+
			
 
				+FastGPT 是一个基于 LLM 大语言模型的知识库问答系统，将智能对话与可视化编排完美结合，让 AI 应用开发变得简单自然。无论您是开发者还是业务人员，都能轻松打造专属的 AI 应用。
			
 
				+
			
 
				+目前，MinerU 插件已在 Coze 插件商店上线，通过其强大的文档解析能力，为用户搭建智能体与工作流提供文档解析能力，加快用户 AI 应用的开发。
			
 
				+
			
 
				+
			
 
				+![img](../../../assets/images/FastGPT_01.png)
			
 
				+
			
 
				+![img](../../../assets/images/FastGPT_02.png)
			
 
				+
			
 
				+- 官网地址：https://fastgpt.cn
			
 
				+- Miner fastGPT 插件下载地址：https://cloud.fastgpt.io/dashboard/systemPlugin?type=productivity