Переглянути джерело

feat: update changelog for version 2.2.0 with new table recognition model and OCR enhancements

myhloli 2 місяців тому
батько
коміт
29e37933aa
2 змінених файлів з 226 додано та 78 видалено
  1. 112 39
      README.md
  2. 114 39
      README_zh-CN.md

+ 112 - 39
README.md

@@ -43,48 +43,121 @@
 </div>
 
 # Changelog
-- 2025/08/01 2.1.10 Released
-  - Fixed an issue in the `pipeline` backend where block overlap caused the parsing results to deviate from expectations #3232
-- 2025/07/30 2.1.9 Released
-  - `transformers` 4.54.1 version adaptation
-- 2025/07/28 2.1.8 Released
-  - `sglang` 0.4.9.post5 version adaptation
-- 2025/07/27 2.1.7 Released
-  - `transformers` 4.54.0 version adaptation
-- 2025/07/26 2.1.6 Released
-  - Fixed table parsing issues in handwritten documents when using `vlm` backend
-  - Fixed visualization box position drift issue when document is rotated #3175
-- 2025/07/24 2.1.5 Released
-  - `sglang` 0.4.9 version adaptation, synchronously upgrading the dockerfile base image to sglang 0.4.9.post3
-- 2025/07/23 2.1.4 Released
-  - Bug Fixes
-    - Fixed the issue of excessive memory consumption during the `MFR` step in the `pipeline` backend under certain scenarios #2771
-    - Fixed the inaccurate matching between `image`/`table` and `caption`/`footnote` under certain conditions #3129
-- 2025/07/16 2.1.1 Released
-  - Bug fixes
-    - Fixed text block content loss issue that could occur in certain `pipeline` scenarios #3005
-    - Fixed issue where `sglang-client` required unnecessary packages like `torch` #2968
-    - Updated `dockerfile` to fix incomplete text content parsing due to missing fonts in Linux #2915
-  - Usability improvements
-    - Updated `compose.yaml` to facilitate direct startup of `sglang-server`, `mineru-api`, and `mineru-gradio` services
-    - Launched brand new [online documentation site](https://opendatalab.github.io/MinerU/), simplified readme, providing better documentation experience
-- 2025/07/05 Version 2.1.0 Released
-  - This is the first major update of MinerU 2, which includes a large number of new features and improvements, covering significant performance optimizations, user experience enhancements, and bug fixes. The detailed update contents are as follows:
-  - **Performance Optimizations:**
-    - Significantly improved preprocessing speed for documents with specific resolutions (around 2000 pixels on the long side).
-    - Greatly enhanced post-processing speed when the `pipeline` backend handles batch processing of documents with fewer pages (<10 pages).
-    - Layout analysis speed of the `pipeline` backend has been increased by approximately 20%.
-  - **Experience Enhancements:**
-    - Built-in ready-to-use `fastapi service` and `gradio webui`. For detailed usage instructions, please refer to [Documentation](https://opendatalab.github.io/MinerU/usage/quick_usage/#advanced-usage-via-api-webui-sglang-clientserver).
-    - Adapted to `sglang` version `0.4.8`, significantly reducing the GPU memory requirements for the `vlm-sglang` backend. It can now run on graphics cards with as little as `8GB GPU memory` (Turing architecture or newer).
-    - Added transparent parameter passing for all commands related to `sglang`, allowing the `sglang-engine` backend to receive all `sglang` parameters consistently with the `sglang-server`.
-    - Supports feature extensions based on configuration files, including `custom formula delimiters`, `enabling heading classification`, and `customizing local model directories`. For detailed usage instructions, please refer to [Documentation](https://opendatalab.github.io/MinerU/usage/quick_usage/#extending-mineru-functionality-with-configuration-files).
-  - **New Features:**
-    - Updated the `pipeline` backend with the PP-OCRv5 multilingual text recognition model, supporting text recognition in 37 languages such as French, Spanish, Portuguese, Russian, and Korean, with an average accuracy improvement of over 30%. [Details](https://paddlepaddle.github.io/PaddleOCR/latest/en/version3.x/algorithm/PP-OCRv5/PP-OCRv5_multi_languages.html)
-    - Introduced limited support for vertical text layout in the `pipeline` backend.
+- 2025/09/05 2.2.0 Released
+  - Major Updates
+    - In this version, we focused on improving table parsing accuracy by introducing a new [wired table recognition model](https://github.com/RapidAI/TableStructureRec) and a brand-new hybrid table structure parsing algorithm, significantly enhancing the table recognition capabilities of the `pipeline` backend.
+    - We also added support for cross-page table merging, which is supported by both `pipeline` and `vlm` backends, further improving the completeness and accuracy of table parsing.
+  - Other Updates
+    - The `pipeline` backend now supports 270-degree rotated table parsing, bringing support for table parsing in 0/90/270-degree orientations
+    - `pipeline` added OCR capability support for Thai and Greek, and updated the English OCR model to the latest version. English recognition accuracy improved by 11%, Thai recognition model accuracy is 82.68%, and Greek recognition model accuracy is 89.28% (by PPOCRv5)
+    - Added `bbox` field (mapped to 0-1000 range) in the output `content_list.json`, making it convenient for users to directly obtain position information for each content block
+
 
 <details>
   <summary>History Log</summary>
+
+  <details>
+    <summary>2025/08/01 2.1.10 Released</summary>
+    <ul>
+      <li>Fixed an issue in the <code>pipeline</code> backend where block overlap caused the parsing results to deviate from expectations #3232</li>
+    </ul>
+  </details>  
+
+  <details>
+    <summary>2025/07/30 2.1.9 Released</summary>
+    <ul>
+      <li><code>transformers</code> 4.54.1 version adaptation</li>
+    </ul>
+  </details>  
+
+  <details>
+    <summary>2025/07/28 2.1.8 Released</summary>
+    <ul>
+      <li><code>sglang</code> 0.4.9.post5 version adaptation</li>
+    </ul>
+  </details>  
+
+  <details>
+    <summary>2025/07/27 2.1.7 Released</summary>
+    <ul>
+      <li><code>transformers</code> 4.54.0 version adaptation</li>
+    </ul>
+  </details>  
+
+  <details>
+    <summary>2025/07/26 2.1.6 Released</summary>
+    <ul>
+      <li>Fixed table parsing issues in handwritten documents when using <code>vlm</code> backend</li>
+      <li>Fixed visualization box position drift issue when document is rotated #3175</li>
+    </ul>
+  </details>  
+
+  <details>
+    <summary>2025/07/24 2.1.5 Released</summary>
+    <ul>
+      <li><code>sglang</code> 0.4.9 version adaptation, synchronously upgrading the dockerfile base image to sglang 0.4.9.post3</li>
+    </ul>
+  </details>  
+
+  <details>
+    <summary>2025/07/23 2.1.4 Released</summary>
+    <ul>
+      <li><strong>Bug Fixes</strong>
+        <ul>
+          <li>Fixed the issue of excessive memory consumption during the <code>MFR</code> step in the <code>pipeline</code> backend under certain scenarios #2771</li>
+          <li>Fixed the inaccurate matching between <code>image</code>/<code>table</code> and <code>caption</code>/<code>footnote</code> under certain conditions #3129</li>
+        </ul>
+      </li>
+    </ul>
+  </details>  
+
+  <details>
+    <summary>2025/07/16 2.1.1 Released</summary>
+    <ul>
+      <li><strong>Bug fixes</strong>
+        <ul>
+          <li>Fixed text block content loss issue that could occur in certain <code>pipeline</code> scenarios #3005</li>
+          <li>Fixed issue where <code>sglang-client</code> required unnecessary packages like <code>torch</code> #2968</li>
+          <li>Updated <code>dockerfile</code> to fix incomplete text content parsing due to missing fonts in Linux #2915</li>
+        </ul>
+      </li>
+      <li><strong>Usability improvements</strong>
+        <ul>
+          <li>Updated <code>compose.yaml</code> to facilitate direct startup of <code>sglang-server</code>, <code>mineru-api</code>, and <code>mineru-gradio</code> services</li>
+          <li>Launched brand new <a href="https://opendatalab.github.io/MinerU/">online documentation site</a>, simplified readme, providing better documentation experience</li>
+        </ul>
+      </li>
+    </ul>
+  </details>  
+
+  <details>
+    <summary>2025/07/05 2.1.0 Released</summary>
+    <ul>
+      <li>This is the first major update of MinerU 2, which includes a large number of new features and improvements, covering significant performance optimizations, user experience enhancements, and bug fixes. The detailed update contents are as follows:</li>
+      <li><strong>Performance Optimizations:</strong>
+        <ul>
+          <li>Significantly improved preprocessing speed for documents with specific resolutions (around 2000 pixels on the long side).</li>
+          <li>Greatly enhanced post-processing speed when the <code>pipeline</code> backend handles batch processing of documents with fewer pages (&lt;10 pages).</li>
+          <li>Layout analysis speed of the <code>pipeline</code> backend has been increased by approximately 20%.</li>
+        </ul>
+      </li>
+      <li><strong>Experience Enhancements:</strong>
+        <ul>
+          <li>Built-in ready-to-use <code>fastapi service</code> and <code>gradio webui</code>. For detailed usage instructions, please refer to <a href="https://opendatalab.github.io/MinerU/usage/quick_usage/#advanced-usage-via-api-webui-sglang-clientserver">Documentation</a>.</li>
+          <li>Adapted to <code>sglang</code> version <code>0.4.8</code>, significantly reducing the GPU memory requirements for the <code>vlm-sglang</code> backend. It can now run on graphics cards with as little as <code>8GB GPU memory</code> (Turing architecture or newer).</li>
+          <li>Added transparent parameter passing for all commands related to <code>sglang</code>, allowing the <code>sglang-engine</code> backend to receive all <code>sglang</code> parameters consistently with the <code>sglang-server</code>.</li>
+          <li>Supports feature extensions based on configuration files, including <code>custom formula delimiters</code>, <code>enabling heading classification</code>, and <code>customizing local model directories</code>. For detailed usage instructions, please refer to <a href="https://opendatalab.github.io/MinerU/usage/quick_usage/#extending-mineru-functionality-with-configuration-files">Documentation</a>.</li>
+        </ul>
+      </li>
+      <li><strong>New Features:</strong>
+        <ul>
+          <li>Updated the <code>pipeline</code> backend with the PP-OCRv5 multilingual text recognition model, supporting text recognition in 37 languages such as French, Spanish, Portuguese, Russian, and Korean, with an average accuracy improvement of over 30%. <a href="https://paddlepaddle.github.io/PaddleOCR/latest/en/version3.x/algorithm/PP-OCRv5/PP-OCRv5_multi_languages.html">Details</a></li>
+          <li>Introduced limited support for vertical text layout in the <code>pipeline</code> backend.</li>
+        </ul>
+      </li>
+    </ul>
+  </details>
+
   <details>
     <summary>2025/06/20 2.0.6 Released</summary>
     <ul>

+ 114 - 39
README_zh-CN.md

@@ -43,48 +43,122 @@
 </div>
 
 # 更新记录
-- 2025/08/01 2.1.10 发布
-  - 修复`pipeline`后端因block覆盖导致的解析结果与预期不符  #3232
-- 2025/07/30 2.1.9 发布
-  - `transformers` 4.54.1 版本适配
-- 2025/07/28 2.1.8 发布
-  - `sglang` 0.4.9.post5 版本适配
-- 2025/07/27 2.1.7 发布
-  - `transformers` 4.54.0 版本适配
-- 2025/07/26 2.1.6 发布
-  - 修复`vlm`后端解析部分手写文档时的表格异常问题
-  - 修复文档旋转时可视化框位置漂移问题 #3175
-- 2025/07/24 2.1.5 发布
-  - `sglang` 0.4.9 版本适配,同步升级dockerfile基础镜像为sglang 0.4.9.post3
-- 2025/07/23 2.1.4 发布
-  - bug修复
-    - 修复`pipeline`后端中`MFR`步骤在某些情况下显存消耗过大的问题 #2771
-    - 修复某些情况下`image`/`table`与`caption`/`footnote`匹配不准确的问题 #3129
-- 2025/07/16 2.1.1 发布
-  - bug修复 
-    - 修复`pipeline`在某些情况可能发生的文本块内容丢失问题 #3005
-    - 修复`sglang-client`需要安装`torch`等不必要的包的问题 #2968
-    - 更新`dockerfile`以修复linux字体缺失导致的解析文本内容不完整问题 #2915
-  - 易用性更新
-    - 更新`compose.yaml`,便于用户直接启动`sglang-server`、`mineru-api`、`mineru-gradio`服务
-    - 启用全新的[在线文档站点](https://opendatalab.github.io/MinerU/zh/),简化readme,提供更好的文档体验
-- 2025/07/05 2.1.0 发布
-  - 这是 MinerU 2 的第一个大版本更新,包含了大量新功能和改进,包含众多性能优化、体验优化和bug修复,具体更新内容如下: 
-  - 性能优化: 
-    - 大幅提升某些特定分辨率(长边2000像素左右)文档的预处理速度
-    - 大幅提升`pipeline`后端批量处理大量页数较少(<10)文档时的后处理速度
-    - `pipeline`后端的layout分析速度提升约20%
-  - 体验优化:
-    - 内置开箱即用的`fastapi服务`和`gradio webui`,详细使用方法请参考[文档](https://opendatalab.github.io/MinerU/zh/usage/quick_usage/#apiwebuisglang-clientserver)
-    - `sglang`适配`0.4.8`版本,大幅降低`vlm-sglang`后端的显存要求,最低可在`8G显存`(Turing及以后架构)的显卡上运行
-    - 对所有命令增加`sglang`的参数透传,使得`sglang-engine`后端可以与`sglang-server`一致,接收`sglang`的所有参数
-    - 支持基于配置文件的功能扩展,包含`自定义公式标识符`、`开启标题分级功能`、`自定义本地模型目录`,详细使用方法请参考[文档](https://opendatalab.github.io/MinerU/zh/usage/quick_usage/#mineru_1)
-  - 新特性:  
-    - `pipeline`后端更新 PP-OCRv5 多语种文本识别模型,支持法语、西班牙语、葡萄牙语、俄语、韩语等 37 种语言的文字识别,平均精度涨幅超30%。[详情](https://paddlepaddle.github.io/PaddleOCR/latest/version3.x/algorithm/PP-OCRv5/PP-OCRv5_multi_languages.html)
-    - `pipeline`后端增加对竖排文本的有限支持
+
+- 2025/09/05 2.2.0 发布
+  - 主要更新
+    - 在这个版本我们重点提升了表格的解析精度,通过引入新的[有线表识别模型](https://github.com/RapidAI/TableStructureRec)和全新的混合表格结构解析算法,显著提升了`pipeline`后端的表格识别能力。
+    - 另外我们增加了对跨页表格合并的支持,这一功能同时支持`pipeline`和`vlm`后端,进一步提升了表格解析的完整性和准确性。
+  - 其他更新
+    - `pipeline`后端增加270度旋转的表格解析能力,现已支持0/90/270度三个方向的表格解析
+    - `pipeline`增加对泰文、希腊文的ocr能力支持,并更新了英文ocr模型至最新,英文识别精度提升11%,泰文识别模型精度 82.68%,希腊文识别模型精度 89.28%(by PPOCRv5)
+    - 在输出的`content_list.json`中增加了`bbox`字段(映射至0-1000范围内),方便用户直接获取每个内容块的位置信息
+
 
 <details>
   <summary>历史日志</summary>
+
+  <details>
+    <summary>2025/08/01 2.1.10 发布</summary>
+    <ul>
+      <li>修复<code>pipeline</code>后端因block覆盖导致的解析结果与预期不符 #3232</li>
+    </ul>
+  </details>
+
+  <details>
+    <summary>2025/07/30 2.1.9 发布</summary>
+    <ul>
+      <li><code>transformers</code> 4.54.1 版本适配</li>
+    </ul>
+  </details>
+
+  <details>
+    <summary>2025/07/28 2.1.8 发布</summary>
+    <ul>
+      <li><code>sglang</code> 0.4.9.post5 版本适配</li>
+    </ul>
+  </details>
+
+  <details>
+    <summary>2025/07/27 2.1.7 发布</summary>
+    <ul>
+      <li><code>transformers</code> 4.54.0 版本适配</li>
+    </ul>
+  </details>
+
+  <details>
+    <summary>2025/07/26 2.1.6 发布</summary>
+    <ul>
+      <li>修复<code>vlm</code>后端解析部分手写文档时的表格异常问题</li>
+      <li>修复文档旋转时可视化框位置漂移问题 #3175</li>
+    </ul>
+  </details>
+
+  <details>
+    <summary>2025/07/24 2.1.5 发布</summary>
+    <ul>
+      <li><code>sglang</code> 0.4.9 版本适配,同步升级dockerfile基础镜像为sglang 0.4.9.post3</li>
+    </ul>
+  </details>
+
+  <details>
+    <summary>2025/07/23 2.1.4 发布</summary>
+    <ul>
+      <li><strong>bug修复</strong>
+        <ul>
+          <li>修复<code>pipeline</code>后端中<code>MFR</code>步骤在某些情况下显存消耗过大的问题 #2771</li>
+          <li>修复某些情况下<code>image</code>/<code>table</code>与<code>caption</code>/<code>footnote</code>匹配不准确的问题 #3129</li>
+        </ul>
+      </li>
+    </ul>
+  </details>
+
+  <details>
+    <summary>2025/07/16 2.1.1 发布</summary>
+    <ul>
+      <li><strong>bug修复</strong>
+        <ul>
+          <li>修复<code>pipeline</code>在某些情况可能发生的文本块内容丢失问题 #3005</li>
+          <li>修复<code>sglang-client</code>需要安装<code>torch</code>等不必要的包的问题 #2968</li>
+          <li>更新<code>dockerfile</code>以修复linux字体缺失导致的解析文本内容不完整问题 #2915</li>
+        </ul>
+      </li>
+      <li><strong>易用性更新</strong>
+        <ul>
+          <li>更新<code>compose.yaml</code>,便于用户直接启动<code>sglang-server</code>、<code>mineru-api</code>、<code>mineru-gradio</code>服务</li>
+          <li>启用全新的<a href="https://opendatalab.github.io/MinerU/zh/">在线文档站点</a>,简化readme,提供更好的文档体验</li>
+        </ul>
+      </li>
+    </ul>
+  </details>
+
+  <details>
+    <summary>2025/07/05 2.1.0 发布</summary>
+    <p>这是 MinerU 2 的第一个大版本更新,包含了大量新功能和改进,包含众多性能优化、体验优化和bug修复,具体更新内容如下:</p>
+    <ul>
+      <li><strong>性能优化:</strong>
+        <ul>
+          <li>大幅提升某些特定分辨率(长边2000像素左右)文档的预处理速度</li>
+          <li>大幅提升<code>pipeline</code>后端批量处理大量页数较少(&lt;10)文档时的后处理速度</li>
+          <li><code>pipeline</code>后端的layout分析速度提升约20%</li>
+        </ul>
+      </li>
+      <li><strong>体验优化:</strong>
+        <ul>
+          <li>内置开箱即用的<code>fastapi服务</code>和<code>gradio webui</code>,详细使用方法请参考<a href="https://opendatalab.github.io/MinerU/zh/usage/quick_usage/#apiwebuisglang-clientserver">文档</a></li>
+          <li><code>sglang</code>适配<code>0.4.8</code>版本,大幅降低<code>vlm-sglang</code>后端的显存要求,最低可在<code>8G显存</code>(Turing及以后架构)的显卡上运行</li>
+          <li>对所有命令增加<code>sglang</code>的参数透传,使得<code>sglang-engine</code>后端可以与<code>sglang-server</code>一致,接收<code>sglang</code>的所有参数</li>
+          <li>支持基于配置文件的功能扩展,包含<code>自定义公式标识符</code>、<code>开启标题分级功能</code>、<code>自定义本地模型目录</code>,详细使用方法请参考<a href="https://opendatalab.github.io/MinerU/zh/usage/quick_usage/#mineru_1">文档</a></li>
+        </ul>
+      </li>
+      <li><strong>新特性:</strong>
+        <ul>
+          <li><code>pipeline</code>后端更新 PP-OCRv5 多语种文本识别模型,支持法语、西班牙语、葡萄牙语、俄语、韩语等 37 种语言的文字识别,平均精度涨幅超30%。<a href="https://paddlepaddle.github.io/PaddleOCR/latest/version3.x/algorithm/PP-OCRv5/PP-OCRv5_multi_languages.html">详情</a></li>
+          <li><code>pipeline</code>后端增加对竖排文本的有限支持</li>
+        </ul>
+      </li>
+    </ul>
+  </details>
+
   <details>
     <summary>2025/06/20 2.0.6发布</summary>
     <ul>
@@ -584,6 +658,7 @@ mineru -p <input_path> -o <output_path>
 - [DocLayout-YOLO](https://github.com/opendatalab/DocLayout-YOLO)
 - [UniMERNet](https://github.com/opendatalab/UniMERNet)
 - [RapidTable](https://github.com/RapidAI/RapidTable)
+- [TableStructureRec](https://github.com/RapidAI/TableStructureRec)
 - [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)
 - [PaddleOCR2Pytorch](https://github.com/frotms/PaddleOCR2Pytorch)
 - [layoutreader](https://github.com/ppaanngggg/layoutreader)