2 tháng trước cách đây · 05a9920ffe
--- a/README.md
+++ b/README.md
@@ -43,48 +43,122 @@
 
				 </div>
			
 
				 
			
 
				 # Changelog
			
 
				-- 2025/08/01 2.1.10 Released
			
 
				-  - Fixed an issue in the `pipeline` backend where block overlap caused the parsing results to deviate from expectations #3232
			
 
				-- 2025/07/30 2.1.9 Released
			
 
				-  - `transformers` 4.54.1 version adaptation
			
 
				-- 2025/07/28 2.1.8 Released
			
 
				-  - `sglang` 0.4.9.post5 version adaptation
			
 
				-- 2025/07/27 2.1.7 Released
			
 
				-  - `transformers` 4.54.0 version adaptation
			
 
				-- 2025/07/26 2.1.6 Released
			
 
				-  - Fixed table parsing issues in handwritten documents when using `vlm` backend
			
 
				-  - Fixed visualization box position drift issue when document is rotated #3175
			
 
				-- 2025/07/24 2.1.5 Released
			
 
				-  - `sglang` 0.4.9 version adaptation, synchronously upgrading the dockerfile base image to sglang 0.4.9.post3
			
 
				-- 2025/07/23 2.1.4 Released
			
 
				-  - Bug Fixes
			
 
				-    - Fixed the issue of excessive memory consumption during the `MFR` step in the `pipeline` backend under certain scenarios #2771
			
 
				-    - Fixed the inaccurate matching between `image`/`table` and `caption`/`footnote` under certain conditions #3129
			
 
				-- 2025/07/16 2.1.1 Released
			
 
				-  - Bug fixes
			
 
				-    - Fixed text block content loss issue that could occur in certain `pipeline` scenarios #3005
			
 
				-    - Fixed issue where `sglang-client` required unnecessary packages like `torch` #2968
			
 
				-    - Updated `dockerfile` to fix incomplete text content parsing due to missing fonts in Linux #2915
			
 
				-  - Usability improvements
			
 
				-    - Updated `compose.yaml` to facilitate direct startup of `sglang-server`, `mineru-api`, and `mineru-gradio` services
			
 
				-    - Launched brand new [online documentation site](https://opendatalab.github.io/MinerU/), simplified readme, providing better documentation experience
			
 
				-- 2025/07/05 Version 2.1.0 Released
			
 
				-  - This is the first major update of MinerU 2, which includes a large number of new features and improvements, covering significant performance optimizations, user experience enhancements, and bug fixes. The detailed update contents are as follows:
			
 
				-  - **Performance Optimizations:**
			
 
				-    - Significantly improved preprocessing speed for documents with specific resolutions (around 2000 pixels on the long side).
			
 
				-    - Greatly enhanced post-processing speed when the `pipeline` backend handles batch processing of documents with fewer pages (<10 pages).
			
 
				-    - Layout analysis speed of the `pipeline` backend has been increased by approximately 20%.
			
 
				-  - **Experience Enhancements:**
			
 
				-    - Built-in ready-to-use `fastapi service` and `gradio webui`. For detailed usage instructions, please refer to [Documentation](https://opendatalab.github.io/MinerU/usage/quick_usage/#advanced-usage-via-api-webui-sglang-clientserver).
			
 
				-    - Adapted to `sglang` version `0.4.8`, significantly reducing the GPU memory requirements for the `vlm-sglang` backend. It can now run on graphics cards with as little as `8GB GPU memory` (Turing architecture or newer).
			
 
				-    - Added transparent parameter passing for all commands related to `sglang`, allowing the `sglang-engine` backend to receive all `sglang` parameters consistently with the `sglang-server`.
			
 
				-    - Supports feature extensions based on configuration files, including `custom formula delimiters`, `enabling heading classification`, and `customizing local model directories`. For detailed usage instructions, please refer to [Documentation](https://opendatalab.github.io/MinerU/usage/quick_usage/#extending-mineru-functionality-with-configuration-files).
			
 
				-  - **New Features:**
			
 
				-    - Updated the `pipeline` backend with the PP-OCRv5 multilingual text recognition model, supporting text recognition in 37 languages such as French, Spanish, Portuguese, Russian, and Korean, with an average accuracy improvement of over 30%. [Details](https://paddlepaddle.github.io/PaddleOCR/latest/en/version3.x/algorithm/PP-OCRv5/PP-OCRv5_multi_languages.html)
			
 
				-    - Introduced limited support for vertical text layout in the `pipeline` backend.
			
 
				+
			
 
				+- 2025/09/05 2.2.0 Released
			
 
				+  - Major Updates
			
 
				+    - In this version, we focused on improving table parsing accuracy by introducing a new [wired table recognition model](https://github.com/RapidAI/TableStructureRec) and a brand-new hybrid table structure parsing algorithm, significantly enhancing the table recognition capabilities of the `pipeline` backend.
			
 
				+    - We also added support for cross-page table merging, which is supported by both `pipeline` and `vlm` backends, further improving the completeness and accuracy of table parsing.
			
 
				+  - Other Updates
			
 
				+    - The `pipeline` backend now supports 270-degree rotated table parsing, bringing support for table parsing in 0/90/270-degree orientations
			
 
				+    - `pipeline` added OCR capability support for Thai and Greek, and updated the English OCR model to the latest version. English recognition accuracy improved by 11%, Thai recognition model accuracy is 82.68%, and Greek recognition model accuracy is 89.28% (by PPOCRv5)
			
 
				+    - Added `bbox` field (mapped to 0-1000 range) in the output `content_list.json`, making it convenient for users to directly obtain position information for each content block
			
 
				+
			
 
				 
			
 
				 <details>
			
 
				   <summary>History Log</summary>
			
 
				+
			
 
				+  <details>
			
 
				+    <summary>2025/08/01 2.1.10 Released</summary>
			
 
				+    <ul>
			
 
				+      <li>Fixed an issue in the <code>pipeline</code> backend where block overlap caused the parsing results to deviate from expectations #3232</li>
			
 
				+    </ul>
			
 
				+  </details>  
			
 
				+
			
 
				+  <details>
			
 
				+    <summary>2025/07/30 2.1.9 Released</summary>
			
 
				+    <ul>
			
 
				+      <li><code>transformers</code> 4.54.1 version adaptation</li>
			
 
				+    </ul>
			
 
				+  </details>  
			
 
				+
			
 
				+  <details>
			
 
				+    <summary>2025/07/28 2.1.8 Released</summary>
			
 
				+    <ul>
			
 
				+      <li><code>sglang</code> 0.4.9.post5 version adaptation</li>
			
 
				+    </ul>
			
 
				+  </details>  
			
 
				+
			
 
				+  <details>
			
 
				+    <summary>2025/07/27 2.1.7 Released</summary>
			
 
				+    <ul>
			
 
				+      <li><code>transformers</code> 4.54.0 version adaptation</li>
			
 
				+    </ul>
			
 
				+  </details>  
			
 
				+
			
 
				+  <details>
			
 
				+    <summary>2025/07/26 2.1.6 Released</summary>
			
 
				+    <ul>
			
 
				+      <li>Fixed table parsing issues in handwritten documents when using <code>vlm</code> backend</li>
			
 
				+      <li>Fixed visualization box position drift issue when document is rotated #3175</li>
			
 
				+    </ul>
			
 
				+  </details>  
			
 
				+
			
 
				+  <details>
			
 
				+    <summary>2025/07/24 2.1.5 Released</summary>
			
 
				+    <ul>
			
 
				+      <li><code>sglang</code> 0.4.9 version adaptation, synchronously upgrading the dockerfile base image to sglang 0.4.9.post3</li>
			
 
				+    </ul>
			
 
				+  </details>  
			
 
				+
			
 
				+  <details>
			
 
				+    <summary>2025/07/23 2.1.4 Released</summary>
			
 
				+    <ul>
			
 
				+      <li><strong>Bug Fixes</strong>
			
 
				+        <ul>
			
 
				+          <li>Fixed the issue of excessive memory consumption during the <code>MFR</code> step in the <code>pipeline</code> backend under certain scenarios #2771</li>
			
 
				+          <li>Fixed the inaccurate matching between <code>image</code>/<code>table</code> and <code>caption</code>/<code>footnote</code> under certain conditions #3129</li>
			
 
				+        </ul>
			
 
				+      </li>
			
 
				+    </ul>
			
 
				+  </details>  
			
 
				+
			
 
				+  <details>
			
 
				+    <summary>2025/07/16 2.1.1 Released</summary>
			
 
				+    <ul>
			
 
				+      <li><strong>Bug fixes</strong>
			
 
				+        <ul>
			
 
				+          <li>Fixed text block content loss issue that could occur in certain <code>pipeline</code> scenarios #3005</li>
			
 
				+          <li>Fixed issue where <code>sglang-client</code> required unnecessary packages like <code>torch</code> #2968</li>
			
 
				+          <li>Updated <code>dockerfile</code> to fix incomplete text content parsing due to missing fonts in Linux #2915</li>
			
 
				+        </ul>
			
 
				+      </li>
			
 
				+      <li><strong>Usability improvements</strong>
			
 
				+        <ul>
			
 
				+          <li>Updated <code>compose.yaml</code> to facilitate direct startup of <code>sglang-server</code>, <code>mineru-api</code>, and <code>mineru-gradio</code> services</li>
			
 
				+          <li>Launched brand new <a href="https://opendatalab.github.io/MinerU/">online documentation site</a>, simplified readme, providing better documentation experience</li>
			
 
				+        </ul>
			
 
				+      </li>
			
 
				+    </ul>
			
 
				+  </details>  
			
 
				+
			
 
				+  <details>
			
 
				+    <summary>2025/07/05 2.1.0 Released</summary>
			
 
				+    <ul>
			
 
				+      <li>This is the first major update of MinerU 2, which includes a large number of new features and improvements, covering significant performance optimizations, user experience enhancements, and bug fixes. The detailed update contents are as follows:</li>
			
 
				+      <li><strong>Performance Optimizations:</strong>
			
 
				+        <ul>
			
 
				+          <li>Significantly improved preprocessing speed for documents with specific resolutions (around 2000 pixels on the long side).</li>
			
 
				+          <li>Greatly enhanced post-processing speed when the <code>pipeline</code> backend handles batch processing of documents with fewer pages (&lt;10 pages).</li>
			
 
				+          <li>Layout analysis speed of the <code>pipeline</code> backend has been increased by approximately 20%.</li>
			
 
				+        </ul>
			
 
				+      </li>
			
 
				+      <li><strong>Experience Enhancements:</strong>
			
 
				+        <ul>
			
 
				+          <li>Built-in ready-to-use <code>fastapi service</code> and <code>gradio webui</code>. For detailed usage instructions, please refer to <a href="https://opendatalab.github.io/MinerU/usage/quick_usage/#advanced-usage-via-api-webui-sglang-clientserver">Documentation</a>.</li>
			
 
				+          <li>Adapted to <code>sglang</code> version <code>0.4.8</code>, significantly reducing the GPU memory requirements for the <code>vlm-sglang</code> backend. It can now run on graphics cards with as little as <code>8GB GPU memory</code> (Turing architecture or newer).</li>
			
 
				+          <li>Added transparent parameter passing for all commands related to <code>sglang</code>, allowing the <code>sglang-engine</code> backend to receive all <code>sglang</code> parameters consistently with the <code>sglang-server</code>.</li>
			
 
				+          <li>Supports feature extensions based on configuration files, including <code>custom formula delimiters</code>, <code>enabling heading classification</code>, and <code>customizing local model directories</code>. For detailed usage instructions, please refer to <a href="https://opendatalab.github.io/MinerU/usage/quick_usage/#extending-mineru-functionality-with-configuration-files">Documentation</a>.</li>
			
 
				+        </ul>
			
 
				+      </li>
			
 
				+      <li><strong>New Features:</strong>
			
 
				+        <ul>
			
 
				+          <li>Updated the <code>pipeline</code> backend with the PP-OCRv5 multilingual text recognition model, supporting text recognition in 37 languages such as French, Spanish, Portuguese, Russian, and Korean, with an average accuracy improvement of over 30%. <a href="https://paddlepaddle.github.io/PaddleOCR/latest/en/version3.x/algorithm/PP-OCRv5/PP-OCRv5_multi_languages.html">Details</a></li>
			
 
				+          <li>Introduced limited support for vertical text layout in the <code>pipeline</code> backend.</li>
			
 
				+        </ul>
			
 
				+      </li>
			
 
				+    </ul>
			
 
				+  </details>
			
 
				+
			
 
				   <details>
			
 
				     <summary>2025/06/20 2.0.6 Released</summary>
			
 
				     <ul>
			
@@ -596,6 +670,7 @@ Currently, some models in this project are trained based on YOLO. However, since
 
				 - [DocLayout-YOLO](https://github.com/opendatalab/DocLayout-YOLO)
			
 
				 - [UniMERNet](https://github.com/opendatalab/UniMERNet)
			
 
				 - [RapidTable](https://github.com/RapidAI/RapidTable)
			
 
				+- [TableStructureRec](https://github.com/RapidAI/TableStructureRec)
			
 
				 - [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)
			
 
				 - [PaddleOCR2Pytorch](https://github.com/frotms/PaddleOCR2Pytorch)
			
 
				 - [layoutreader](https://github.com/ppaanngggg/layoutreader)
			
--- a/README_zh-CN.md
+++ b/README_zh-CN.md
@@ -43,48 +43,122 @@
 
				 </div>
			
 
				 
			
 
				 # 更新记录
			
 
				-- 2025/08/01 2.1.10 发布
			
 
				-  - 修复`pipeline`后端因block覆盖导致的解析结果与预期不符  #3232
			
 
				-- 2025/07/30 2.1.9 发布
			
 
				-  - `transformers` 4.54.1 版本适配
			
 
				-- 2025/07/28 2.1.8 发布
			
 
				-  - `sglang` 0.4.9.post5 版本适配
			
 
				-- 2025/07/27 2.1.7 发布
			
 
				-  - `transformers` 4.54.0 版本适配
			
 
				-- 2025/07/26 2.1.6 发布
			
 
				-  - 修复`vlm`后端解析部分手写文档时的表格异常问题
			
 
				-  - 修复文档旋转时可视化框位置漂移问题 #3175
			
 
				-- 2025/07/24 2.1.5 发布
			
 
				-  - `sglang` 0.4.9 版本适配，同步升级dockerfile基础镜像为sglang 0.4.9.post3
			
 
				-- 2025/07/23 2.1.4 发布
			
 
				-  - bug修复
			
 
				-    - 修复`pipeline`后端中`MFR`步骤在某些情况下显存消耗过大的问题 #2771
			
 
				-    - 修复某些情况下`image`/`table`与`caption`/`footnote`匹配不准确的问题 #3129
			
 
				-- 2025/07/16 2.1.1 发布
			
 
				-  - bug修复 
			
 
				-    - 修复`pipeline`在某些情况可能发生的文本块内容丢失问题 #3005
			
 
				-    - 修复`sglang-client`需要安装`torch`等不必要的包的问题 #2968
			
 
				-    - 更新`dockerfile`以修复linux字体缺失导致的解析文本内容不完整问题 #2915
			
 
				-  - 易用性更新
			
 
				-    - 更新`compose.yaml`，便于用户直接启动`sglang-server`、`mineru-api`、`mineru-gradio`服务
			
 
				-    - 启用全新的[在线文档站点](https://opendatalab.github.io/MinerU/zh/)，简化readme，提供更好的文档体验
			
 
				-- 2025/07/05 2.1.0 发布
			
 
				-  - 这是 MinerU 2 的第一个大版本更新，包含了大量新功能和改进，包含众多性能优化、体验优化和bug修复，具体更新内容如下： 
			
 
				-  - 性能优化： 
			
 
				-    - 大幅提升某些特定分辨率（长边2000像素左右）文档的预处理速度
			
 
				-    - 大幅提升`pipeline`后端批量处理大量页数较少（<10）文档时的后处理速度
			
 
				-    - `pipeline`后端的layout分析速度提升约20%
			
 
				-  - 体验优化：
			
 
				-    - 内置开箱即用的`fastapi服务`和`gradio webui`，详细使用方法请参考[文档](https://opendatalab.github.io/MinerU/zh/usage/quick_usage/#apiwebuisglang-clientserver)
			
 
				-    - `sglang`适配`0.4.8`版本，大幅降低`vlm-sglang`后端的显存要求，最低可在`8G显存`(Turing及以后架构)的显卡上运行
			
 
				-    - 对所有命令增加`sglang`的参数透传，使得`sglang-engine`后端可以与`sglang-server`一致，接收`sglang`的所有参数
			
 
				-    - 支持基于配置文件的功能扩展，包含`自定义公式标识符`、`开启标题分级功能`、`自定义本地模型目录`，详细使用方法请参考[文档](https://opendatalab.github.io/MinerU/zh/usage/quick_usage/#mineru_1)
			
 
				-  - 新特性：  
			
 
				-    - `pipeline`后端更新 PP-OCRv5 多语种文本识别模型，支持法语、西班牙语、葡萄牙语、俄语、韩语等 37 种语言的文字识别，平均精度涨幅超30%。[详情](https://paddlepaddle.github.io/PaddleOCR/latest/version3.x/algorithm/PP-OCRv5/PP-OCRv5_multi_languages.html)
			
 
				-    - `pipeline`后端增加对竖排文本的有限支持
			
 
				+
			
 
				+- 2025/09/05 2.2.0 发布
			
 
				+  - 主要更新
			
 
				+    - 在这个版本我们重点提升了表格的解析精度，通过引入新的[有线表识别模型](https://github.com/RapidAI/TableStructureRec)和全新的混合表格结构解析算法，显著提升了`pipeline`后端的表格识别能力。
			
 
				+    - 另外我们增加了对跨页表格合并的支持，这一功能同时支持`pipeline`和`vlm`后端，进一步提升了表格解析的完整性和准确性。
			
 
				+  - 其他更新
			
 
				+    - `pipeline`后端增加270度旋转的表格解析能力，现已支持0/90/270度三个方向的表格解析
			
 
				+    - `pipeline`增加对泰文、希腊文的ocr能力支持，并更新了英文ocr模型至最新，英文识别精度提升11%，泰文识别模型精度 82.68%，希腊文识别模型精度 89.28%（by PPOCRv5）
			
 
				+    - 在输出的`content_list.json`中增加了`bbox`字段(映射至0-1000范围内)，方便用户直接获取每个内容块的位置信息
			
 
				+
			
 
				 
			
 
				 <details>
			
 
				   <summary>历史日志</summary>
			
 
				+
			
 
				+  <details>
			
 
				+    <summary>2025/08/01 2.1.10 发布</summary>
			
 
				+    <ul>
			
 
				+      <li>修复<code>pipeline</code>后端因block覆盖导致的解析结果与预期不符 #3232</li>
			
 
				+    </ul>
			
 
				+  </details>
			
 
				+
			
 
				+  <details>
			
 
				+    <summary>2025/07/30 2.1.9 发布</summary>
			
 
				+    <ul>
			
 
				+      <li><code>transformers</code> 4.54.1 版本适配</li>
			
 
				+    </ul>
			
 
				+  </details>
			
 
				+
			
 
				+  <details>
			
 
				+    <summary>2025/07/28 2.1.8 发布</summary>
			
 
				+    <ul>
			
 
				+      <li><code>sglang</code> 0.4.9.post5 版本适配</li>
			
 
				+    </ul>
			
 
				+  </details>
			
 
				+
			
 
				+  <details>
			
 
				+    <summary>2025/07/27 2.1.7 发布</summary>
			
 
				+    <ul>
			
 
				+      <li><code>transformers</code> 4.54.0 版本适配</li>
			
 
				+    </ul>
			
 
				+  </details>
			
 
				+
			
 
				+  <details>
			
 
				+    <summary>2025/07/26 2.1.6 发布</summary>
			
 
				+    <ul>
			
 
				+      <li>修复<code>vlm</code>后端解析部分手写文档时的表格异常问题</li>
			
 
				+      <li>修复文档旋转时可视化框位置漂移问题 #3175</li>
			
 
				+    </ul>
			
 
				+  </details>
			
 
				+
			
 
				+  <details>
			
 
				+    <summary>2025/07/24 2.1.5 发布</summary>
			
 
				+    <ul>
			
 
				+      <li><code>sglang</code> 0.4.9 版本适配，同步升级dockerfile基础镜像为sglang 0.4.9.post3</li>
			
 
				+    </ul>
			
 
				+  </details>
			
 
				+
			
 
				+  <details>
			
 
				+    <summary>2025/07/23 2.1.4 发布</summary>
			
 
				+    <ul>
			
 
				+      <li><strong>bug修复</strong>
			
 
				+        <ul>
			
 
				+          <li>修复<code>pipeline</code>后端中<code>MFR</code>步骤在某些情况下显存消耗过大的问题 #2771</li>
			
 
				+          <li>修复某些情况下<code>image</code>/<code>table</code>与<code>caption</code>/<code>footnote</code>匹配不准确的问题 #3129</li>
			
 
				+        </ul>
			
 
				+      </li>
			
 
				+    </ul>
			
 
				+  </details>
			
 
				+
			
 
				+  <details>
			
 
				+    <summary>2025/07/16 2.1.1 发布</summary>
			
 
				+    <ul>
			
 
				+      <li><strong>bug修复</strong>
			
 
				+        <ul>
			
 
				+          <li>修复<code>pipeline</code>在某些情况可能发生的文本块内容丢失问题 #3005</li>
			
 
				+          <li>修复<code>sglang-client</code>需要安装<code>torch</code>等不必要的包的问题 #2968</li>
			
 
				+          <li>更新<code>dockerfile</code>以修复linux字体缺失导致的解析文本内容不完整问题 #2915</li>
			
 
				+        </ul>
			
 
				+      </li>
			
 
				+      <li><strong>易用性更新</strong>
			
 
				+        <ul>
			
 
				+          <li>更新<code>compose.yaml</code>，便于用户直接启动<code>sglang-server</code>、<code>mineru-api</code>、<code>mineru-gradio</code>服务</li>
			
 
				+          <li>启用全新的<a href="https://opendatalab.github.io/MinerU/zh/">在线文档站点</a>，简化readme，提供更好的文档体验</li>
			
 
				+        </ul>
			
 
				+      </li>
			
 
				+    </ul>
			
 
				+  </details>
			
 
				+
			
 
				+  <details>
			
 
				+    <summary>2025/07/05 2.1.0 发布</summary>
			
 
				+    <p>这是 MinerU 2 的第一个大版本更新，包含了大量新功能和改进，包含众多性能优化、体验优化和bug修复，具体更新内容如下：</p>
			
 
				+    <ul>
			
 
				+      <li><strong>性能优化：</strong>
			
 
				+        <ul>
			
 
				+          <li>大幅提升某些特定分辨率（长边2000像素左右）文档的预处理速度</li>
			
 
				+          <li>大幅提升<code>pipeline</code>后端批量处理大量页数较少（&lt;10）文档时的后处理速度</li>
			
 
				+          <li><code>pipeline</code>后端的layout分析速度提升约20%</li>
			
 
				+        </ul>
			
 
				+      </li>
			
 
				+      <li><strong>体验优化：</strong>
			
 
				+        <ul>
			
 
				+          <li>内置开箱即用的<code>fastapi服务</code>和<code>gradio webui</code>，详细使用方法请参考<a href="https://opendatalab.github.io/MinerU/zh/usage/quick_usage/#apiwebuisglang-clientserver">文档</a></li>
			
 
				+          <li><code>sglang</code>适配<code>0.4.8</code>版本，大幅降低<code>vlm-sglang</code>后端的显存要求，最低可在<code>8G显存</code>(Turing及以后架构)的显卡上运行</li>
			
 
				+          <li>对所有命令增加<code>sglang</code>的参数透传，使得<code>sglang-engine</code>后端可以与<code>sglang-server</code>一致，接收<code>sglang</code>的所有参数</li>
			
 
				+          <li>支持基于配置文件的功能扩展，包含<code>自定义公式标识符</code>、<code>开启标题分级功能</code>、<code>自定义本地模型目录</code>，详细使用方法请参考<a href="https://opendatalab.github.io/MinerU/zh/usage/quick_usage/#mineru_1">文档</a></li>
			
 
				+        </ul>
			
 
				+      </li>
			
 
				+      <li><strong>新特性：</strong>
			
 
				+        <ul>
			
 
				+          <li><code>pipeline</code>后端更新 PP-OCRv5 多语种文本识别模型，支持法语、西班牙语、葡萄牙语、俄语、韩语等 37 种语言的文字识别，平均精度涨幅超30%。<a href="https://paddlepaddle.github.io/PaddleOCR/latest/version3.x/algorithm/PP-OCRv5/PP-OCRv5_multi_languages.html">详情</a></li>
			
 
				+          <li><code>pipeline</code>后端增加对竖排文本的有限支持</li>
			
 
				+        </ul>
			
 
				+      </li>
			
 
				+    </ul>
			
 
				+  </details>
			
 
				+
			
 
				   <details>
			
 
				     <summary>2025/06/20 2.0.6发布</summary>
			
 
				     <ul>
			
@@ -584,6 +658,7 @@ mineru -p <input_path> -o <output_path>
 
				 - [DocLayout-YOLO](https://github.com/opendatalab/DocLayout-YOLO)
			
 
				 - [UniMERNet](https://github.com/opendatalab/UniMERNet)
			
 
				 - [RapidTable](https://github.com/RapidAI/RapidTable)
			
 
				+- [TableStructureRec](https://github.com/RapidAI/TableStructureRec)
			
 
				 - [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)
			
 
				 - [PaddleOCR2Pytorch](https://github.com/frotms/PaddleOCR2Pytorch)
			
 
				 - [layoutreader](https://github.com/ppaanngggg/layoutreader)
			
--- a/docs/en/reference/output_files.md
+++ b/docs/en/reference/output_files.md
--- a/docs/en/usage/cli_tools.md
+++ b/docs/en/usage/cli_tools.md
@@ -13,7 +13,7 @@ Options:
 
				   -m, --method [auto|txt|ocr]     Parsing method: auto (default), txt, ocr (pipeline backend only)
			
 
				   -b, --backend [pipeline|vlm-transformers|vlm-sglang-engine|vlm-sglang-client]
			
 
				                                   Parsing backend (default: pipeline)
			
 
				-  -l, --lang [ch|ch_server|ch_lite|en|korean|japan|chinese_cht|ta|te|ka|latin|arabic|east_slavic|cyrillic|devanagari]
			
 
				+  -l, --lang [ch|ch_server|ch_lite|en|korean|japan|chinese_cht|ta|te|ka|th|el|latin|arabic|east_slavic|cyrillic|devanagari]
			
 
				                                   Specify document language (improves OCR accuracy, pipeline backend only)
			
 
				   -u, --url TEXT                  Service address when using sglang-client
			
 
				   -s, --start INTEGER             Starting page number for parsing (0-based)
			
--- a/docs/zh/reference/output_files.md
+++ b/docs/zh/reference/output_files.md
--- a/docs/zh/usage/cli_tools.md
+++ b/docs/zh/usage/cli_tools.md
@@ -13,7 +13,7 @@ Options:
 
				   -m, --method [auto|txt|ocr]     解析方法：auto（默认）、txt、ocr（仅用于 pipeline 后端）
			
 
				   -b, --backend [pipeline|vlm-transformers|vlm-sglang-engine|vlm-sglang-client]
			
 
				                                   解析后端（默认为 pipeline）
			
 
				-  -l, --lang [ch|ch_server|ch_lite|en|korean|japan|chinese_cht|ta|te|ka|latin|arabic|east_slavic|cyrillic|devanagari]
			
 
				+  -l, --lang [ch|ch_server|ch_lite|en|korean|japan|chinese_cht|ta|te|ka|th|el|latin|arabic|east_slavic|cyrillic|devanagari]
			
 
				                                   指定文档语言（可提升 OCR 准确率，仅用于 pipeline 后端）
			
 
				   -u, --url TEXT                  当使用 sglang-client 时，需指定服务地址
			
 
				   -s, --start INTEGER             开始解析的页码（从 0 开始）
			
--- a/mineru/backend/pipeline/pipeline_middle_json_mkcontent.py
+++ b/mineru/backend/pipeline/pipeline_middle_json_mkcontent.py
@@ -188,7 +188,7 @@ def merge_para_with_text(para_block):
 
				     return para_text
			
 
				 
			
 
				 
			
 
				-def make_blocks_to_content_list(para_block, img_buket_path, page_idx):
			
 
				+def make_blocks_to_content_list(para_block, img_buket_path, page_idx, page_size):
			
 
				     para_type = para_block['type']
			
 
				     para_content = {}
			
 
				     if para_type in [BlockType.TEXT, BlockType.LIST, BlockType.INDEX]:
			
@@ -245,6 +245,17 @@ def make_blocks_to_content_list(para_block, img_buket_path, page_idx):
 
				             if block['type'] == BlockType.TABLE_FOOTNOTE:
			
 
				                 para_content[BlockType.TABLE_FOOTNOTE].append(merge_para_with_text(block))
			
 
				 
			
 
				+    page_weight, page_height = page_size
			
 
				+    para_bbox = para_block.get('bbox')
			
 
				+    if para_bbox:
			
 
				+        x0, y0, x1, y1 = para_bbox
			
 
				+        para_content['bbox'] = [
			
 
				+            int(x0 * 1000 / page_weight),
			
 
				+            int(y0 * 1000 / page_height),
			
 
				+            int(x1 * 1000 / page_weight),
			
 
				+            int(y1 * 1000 / page_height),
			
 
				+        ]
			
 
				+
			
 
				     para_content['page_idx'] = page_idx
			
 
				 
			
 
				     return para_content
			
@@ -258,6 +269,7 @@ def union_make(pdf_info_dict: list,
 
				     for page_info in pdf_info_dict:
			
 
				         paras_of_layout = page_info.get('para_blocks')
			
 
				         page_idx = page_info.get('page_idx')
			
 
				+        page_size = page_info.get('page_size')
			
 
				         if not paras_of_layout:
			
 
				             continue
			
 
				         if make_mode in [MakeMode.MM_MD, MakeMode.NLP_MD]:
			
@@ -265,7 +277,7 @@ def union_make(pdf_info_dict: list,
 
				             output_content.extend(page_markdown)
			
 
				         elif make_mode == MakeMode.CONTENT_LIST:
			
 
				             for para_block in paras_of_layout:
			
 
				-                para_content = make_blocks_to_content_list(para_block, img_buket_path, page_idx)
			
 
				+                para_content = make_blocks_to_content_list(para_block, img_buket_path, page_idx, page_size)
			
 
				                 if para_content:
			
 
				                     output_content.append(para_content)
			
 
				 
			
--- a/mineru/backend/vlm/hf_predictor.py
+++ b/mineru/backend/vlm/hf_predictor.py
@@ -4,7 +4,7 @@ from typing import Iterable, List, Optional, Union
 
				 import torch
			
 
				 from PIL import Image
			
 
				 from tqdm import tqdm
			
 
				-from transformers import AutoTokenizer, BitsAndBytesConfig
			
 
				+from transformers import AutoTokenizer, BitsAndBytesConfig, __version__
			
 
				 
			
 
				 from ...model.vlm_hf_model import Mineru2QwenForCausalLM
			
 
				 from ...model.vlm_hf_model.image_processing_mineru2 import process_images
			
@@ -66,7 +66,11 @@ class HuggingfacePredictor(BasePredictor):
 
				                 bnb_4bit_quant_type="nf4",
			
 
				             )
			
 
				         else:
			
 
				-            kwargs["torch_dtype"] = torch_dtype
			
 
				+            from packaging import version
			
 
				+            if version.parse(__version__) >= version.parse("4.56.0"):
			
 
				+                kwargs["dtype"] = torch_dtype
			
 
				+            else:
			
 
				+                kwargs["torch_dtype"] = torch_dtype
			
 
				 
			
 
				         if use_flash_attn:
			
 
				             kwargs["attn_implementation"] = "flash_attention_2"
			
--- a/mineru/backend/vlm/token_to_middle_json.py
+++ b/mineru/backend/vlm/token_to_middle_json.py
@@ -1,8 +1,9 @@
 
				+import os
			
 
				 import time
			
 
				 from loguru import logger
			
 
				 import numpy as np
			
 
				 import cv2
			
 
				-from mineru.utils.config_reader import get_llm_aided_config
			
 
				+from mineru.utils.config_reader import get_llm_aided_config, get_table_enable
			
 
				 from mineru.utils.cut_image import cut_image_and_table
			
 
				 from mineru.utils.enum_class import ContentType
			
 
				 from mineru.utils.hash_utils import str_md5
			
@@ -94,7 +95,9 @@ def result_to_middle_json(token_list, images_list, pdf_doc, image_writer):
 
				         middle_json["pdf_info"].append(page_info)
			
 
				 
			
 
				     """表格跨页合并"""
			
 
				-    merge_table(middle_json["pdf_info"])
			
 
				+    table_enable = get_table_enable(os.getenv('MINERU_VLM_TABLE_ENABLE', 'True').lower() == 'true')
			
 
				+    if table_enable:
			
 
				+        merge_table(middle_json["pdf_info"])
			
 
				 
			
 
				     """llm优化标题分级"""
			
 
				     if heading_level_import_success:
			
--- a/mineru/backend/vlm/vlm_middle_json_mkcontent.py
+++ b/mineru/backend/vlm/vlm_middle_json_mkcontent.py
@@ -125,7 +125,7 @@ def mk_blocks_to_markdown(para_blocks, make_mode, formula_enable, table_enable,
 
				 
			
 
				 
			
 
				 
			
 
				-def make_blocks_to_content_list(para_block, img_buket_path, page_idx):
			
 
				+def make_blocks_to_content_list(para_block, img_buket_path, page_idx, page_size):
			
 
				     para_type = para_block['type']
			
 
				     para_content = {}
			
 
				     if para_type in [BlockType.TEXT, BlockType.LIST, BlockType.INDEX]:
			
@@ -179,6 +179,17 @@ def make_blocks_to_content_list(para_block, img_buket_path, page_idx):
 
				             if block['type'] == BlockType.TABLE_FOOTNOTE:
			
 
				                 para_content[BlockType.TABLE_FOOTNOTE].append(merge_para_with_text(block))
			
 
				 
			
 
				+    page_weight, page_height = page_size
			
 
				+    para_bbox = para_block.get('bbox')
			
 
				+    if para_bbox:
			
 
				+        x0, y0, x1, y1 = para_bbox
			
 
				+        para_content['bbox'] = [
			
 
				+            int(x0 * 1000 / page_weight),
			
 
				+            int(y0 * 1000 / page_height),
			
 
				+            int(x1 * 1000 / page_weight),
			
 
				+            int(y1 * 1000 / page_height),
			
 
				+        ]
			
 
				+
			
 
				     para_content['page_idx'] = page_idx
			
 
				 
			
 
				     return para_content
			
@@ -195,6 +206,7 @@ def union_make(pdf_info_dict: list,
 
				     for page_info in pdf_info_dict:
			
 
				         paras_of_layout = page_info.get('para_blocks')
			
 
				         page_idx = page_info.get('page_idx')
			
 
				+        page_size = page_info.get('page_size')
			
 
				         if not paras_of_layout:
			
 
				             continue
			
 
				         if make_mode in [MakeMode.MM_MD, MakeMode.NLP_MD]:
			
@@ -202,7 +214,7 @@ def union_make(pdf_info_dict: list,
 
				             output_content.extend(page_markdown)
			
 
				         elif make_mode == MakeMode.CONTENT_LIST:
			
 
				             for para_block in paras_of_layout:
			
 
				-                para_content = make_blocks_to_content_list(para_block, img_buket_path, page_idx)
			
 
				+                para_content = make_blocks_to_content_list(para_block, img_buket_path, page_idx, page_size)
			
 
				                 output_content.append(para_content)
			
 
				 
			
 
				     if make_mode in [MakeMode.MM_MD, MakeMode.NLP_MD]:
			
--- a/mineru/cli/client.py
+++ b/mineru/cli/client.py
@@ -62,7 +62,7 @@ from .common import do_parse, read_fn, pdf_suffixes, image_suffixes
 
				     '-l',
			
 
				     '--lang',
			
 
				     'lang',
			
 
				-    type=click.Choice(['ch', 'ch_server', 'ch_lite', 'en', 'korean', 'japan', 'chinese_cht', 'ta', 'te', 'ka',
			
 
				+    type=click.Choice(['ch', 'ch_server', 'ch_lite', 'en', 'korean', 'japan', 'chinese_cht', 'ta', 'te', 'ka', 'th', 'el',
			
 
				                        'latin', 'arabic', 'east_slavic', 'cyrillic', 'devanagari']),
			
 
				     help="""
			
 
				     Input the languages in the pdf (if known) to improve OCR accuracy.  Optional.