Quellcode durchsuchen

docs(install): update Python version requirements and simplify torch installation

- Update Python version requirements to >=3.10
- Simplify torch installation command- Remove numpy version restriction
- Update CUDA compatibility information
- Adjust environment creation commands across multiple documentation files
myhloli vor 7 Monaten
Ursprung
Commit
4fd8d626c4

+ 11 - 6
README.md

@@ -11,6 +11,7 @@
 [![open issues](https://img.shields.io/github/issues-raw/opendatalab/MinerU)](https://github.com/opendatalab/MinerU/issues)
 [![issue resolution](https://img.shields.io/github/issues-closed-raw/opendatalab/MinerU)](https://github.com/opendatalab/MinerU/issues)
 [![PyPI version](https://badge.fury.io/py/magic-pdf.svg)](https://badge.fury.io/py/magic-pdf)
+[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/magic-pdf)](https://pypi.org/project/magic-pdf/)
 [![Downloads](https://static.pepy.tech/badge/magic-pdf)](https://pepy.tech/project/magic-pdf)
 [![Downloads](https://static.pepy.tech/badge/magic-pdf/month)](https://pepy.tech/project/magic-pdf)
 
@@ -47,11 +48,15 @@ Easier to use: Just grab MinerU Desktop. No coding, no login, just a simple inte
 </div>
 
 # Changelog
-- 2025/04/03 Release of 1.3.0, in this version we made many optimizations and improvements:
+- 2025/04/08 1.3.1 released, fixed some compatibility issues
+  - Supported Python 3.13
+  - Resolved errors caused by `transformers 4.51.0`
+  - Made the final adaptation for some outdated Linux systems (e.g., CentOS 7), and no further support will be guaranteed for subsequent versions. [Installation Instructions](https://github.com/opendatalab/MinerU/issues/1004)
+- 2025/04/03 1.3.0 released, in this version we made many optimizations and improvements:
   - Installation and compatibility optimization
     - By removing the use of `layoutlmv3` in layout, resolved compatibility issues caused by `detectron2`.
     - Torch version compatibility extended to 2.2~2.6 (excluding 2.5).
-    - CUDA compatibility supports 11.8/12.4/12.6 (CUDA version determined by torch), resolving compatibility issues for some users with 50-series and H-series GPUs.
+    - CUDA compatibility supports 11.8/12.4/12.6/12.8 (CUDA version determined by torch), resolving compatibility issues for some users with 50-series and H-series GPUs.
     - Python compatible versions expanded to 3.10~3.12, solving the problem of automatic downgrade to 0.6.1 during installation in non-3.10 environments.
     - Offline deployment process optimized; no internet connection required after successful deployment to download any model files.
   - Performance optimization
@@ -232,7 +237,7 @@ There are three different ways to experience MinerU:
     </tr>
     <tr>
         <td colspan="3">Python Version</td>
-        <td colspan="3">3.10~3.12</td>
+        <td colspan="3">>=3.10</td>
     </tr>
     <tr>
         <td colspan="3">Nvidia Driver Version</td>
@@ -242,8 +247,8 @@ There are three different ways to experience MinerU:
     </tr>
     <tr>
         <td colspan="3">CUDA Environment</td>
-        <td>11.8/12.4/12.6</td>
-        <td>11.8/12.4/12.6</td>
+        <td>11.8/12.4/12.6/12.8</td>
+        <td>11.8/12.4/12.6/12.8</td>
         <td>None</td>
     </tr>
     <tr>
@@ -274,7 +279,7 @@ Synced with dev branch updates:
 #### 1. Install magic-pdf
 
 ```bash
-conda create -n mineru 'python<3.13' -y
+conda create -n mineru 'python>=3.10' -y
 conda activate mineru
 pip install -U "magic-pdf[full]"
 ```

+ 10 - 6
README_zh-CN.md

@@ -11,6 +11,7 @@
 [![open issues](https://img.shields.io/github/issues-raw/opendatalab/MinerU)](https://github.com/opendatalab/MinerU/issues)
 [![issue resolution](https://img.shields.io/github/issues-closed-raw/opendatalab/MinerU)](https://github.com/opendatalab/MinerU/issues)
 [![PyPI version](https://badge.fury.io/py/magic-pdf.svg)](https://badge.fury.io/py/magic-pdf)
+[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/magic-pdf)](https://pypi.org/project/magic-pdf/)
 [![Downloads](https://static.pepy.tech/badge/magic-pdf)](https://pepy.tech/project/magic-pdf)
 [![Downloads](https://static.pepy.tech/badge/magic-pdf/month)](https://pepy.tech/project/magic-pdf)
 
@@ -46,11 +47,15 @@
 </div>
 
 # 更新记录
+- 2025/04/08 1.3.1发布,修复了一些兼容问题
+  - 支持python 3.13
+  - 解决因`transformers 4.51.0` 导致的报错
+  - 为部分过时的linux系统(如centos7)做出最后适配,并不再保证后续版本的继续支持,[安装说明](https://github.com/opendatalab/MinerU/issues/1004)
 - 2025/04/03 1.3.0 发布,在这个版本我们做出了许多优化和改进:
   - 安装与兼容性优化
     - 通过移除layout中`layoutlmv3`的使用,解决了由`detectron2`导致的兼容问题
     - torch版本兼容扩展到2.2~2.6(2.5除外)
-    - cuda兼容支持11.8/12.4/12.6(cuda版本由torch决定),解决部分用户50系显卡与H系显卡的兼容问题
+    - cuda兼容支持11.8/12.4/12.6/12.8(cuda版本由torch决定),解决部分用户50系显卡与H系显卡的兼容问题
     - python兼容版本扩展到3.10~3.12,解决了在非3.10环境下安装时自动降级到0.6.1的问题
     - 优化离线部署流程,部署成功后不需要联网下载任何模型文件
   - 性能优化
@@ -70,7 +75,6 @@
 - 2025/02/24 1.2.0 发布,这个版本我们修复了一些问题,提升了解析的效率与精度:
   - 性能优化 
     - auto模式下pdf文档的分类速度提升
-    - 在华为昇腾 NPU 加速模式下,添加高性能插件支持,常见场景下端到端加速可达 300% [申请链接](https://aicarrier.feishu.cn/share/base/form/shrcnb10VaoNQB8kQPA8DEfZC6d)
   - 解析优化
     - 优化对包含水印文档的解析逻辑,显著提升包含水印文档的解析效果
     - 改进了单页内多个图像/表格与caption的匹配逻辑,提升了复杂布局下图文匹配的准确性
@@ -233,7 +237,7 @@ https://github.com/user-attachments/assets/4bea02c9-6d54-4cd6-97ed-dff14340982c
     </tr>
     <tr>
         <td colspan="3">python版本</td>
-        <td colspan="3">>=3.9,<=3.12</td>
+        <td colspan="3">>=3.10</td>
     </tr>
     <tr>
         <td colspan="3">Nvidia Driver 版本</td>
@@ -243,8 +247,8 @@ https://github.com/user-attachments/assets/4bea02c9-6d54-4cd6-97ed-dff14340982c
     </tr>
     <tr>
         <td colspan="3">CUDA环境</td>
-        <td>11.8/12.4/12.6</td>
-        <td>11.8/12.4/12.6</td>
+        <td>11.8/12.4/12.6/12.8</td>
+        <td>11.8/12.4/12.6/12.8</td>
         <td>None</td>
     </tr>
     <tr>
@@ -279,7 +283,7 @@ https://github.com/user-attachments/assets/4bea02c9-6d54-4cd6-97ed-dff14340982c
 > 最新版本国内镜像源同步可能会有延迟,请耐心等待
 
 ```bash
-conda create -n mineru 'python<3.13' -y
+conda create -n mineru 'python>=3.10' -y
 conda activate mineru
 pip install -U "magic-pdf[full]" -i https://mirrors.aliyun.com/pypi/simple
 ```

+ 0 - 22
docs/README_Ascend_NPU_Acceleration_zh_CN.md

@@ -49,25 +49,3 @@ docker run -it -u root --name mineru-npu --privileged=true \
 
 magic-pdf --help
 ```
-
-
-## 已知问题
-
-- paddleocr使用内嵌onnx模型,仅在默认语言配置下能以较快速度对中英文进行识别
-- 自定义lang参数时,paddleocr速度会存在明显下降情况
-- layout模型使用layoutlmv3时会发生间歇性崩溃,建议使用默认配置的doclayout_yolo模型
-- 表格解析仅适配了rapid_table模型,其他模型可能会无法使用
-
-
-## 高性能模式
-
-- 在特定硬件环境可以通过插件开启高性能模式,整体速度相比默认模式提升300%以上
-
-| 系统要求           | 版本/型号        |
-|----------------|--------------|
-| 芯片类型           | 昇腾910B       |
-| CANN版本         | CANN 8.0.RC2 |
-| 驱动版本           | 24.1.rc2.1   |
-| magic-pdf 软件版本 | \> = 1.2.0   |
-
-- 高性能插件需满足一定的硬件条件和资质要求,如需申请使用请填写以下表单[MinerU高性能版本合作申请表](https://aicarrier.feishu.cn/share/base/form/shrcnb10VaoNQB8kQPA8DEfZC6d)

+ 4 - 5
docs/README_Ubuntu_CUDA_Acceleration_en_US.md

@@ -54,7 +54,7 @@ In the final step, enter `yes`, close the terminal, and reopen it.
 ### 4. Create an Environment Using Conda
 
 ```bash
-conda create -n mineru 'python<3.13' -y
+conda create -n mineru 'python>=3.10' -y
 conda activate mineru
 ```
 
@@ -63,14 +63,13 @@ conda activate mineru
 ```sh
 pip install -U magic-pdf[full]
 ```
-> [!IMPORTANT]
-> After installation, make sure to check the version of `magic-pdf` using the following command:
+> [!TIP]
+> After installation, you can check the version of `magic-pdf` using the following command:
 >
 > ```sh
 > magic-pdf --version
 > ```
->
-> If the version number is less than 1.3.0, please report the issue.
+
 
 ### 6. Download Models
 

+ 4 - 5
docs/README_Ubuntu_CUDA_Acceleration_zh_CN.md

@@ -54,7 +54,7 @@ bash Anaconda3-2024.06-1-Linux-x86_64.sh
 ## 4. 使用conda 创建环境
 
 ```bash
-conda create -n mineru 'python<3.13' -y
+conda create -n mineru 'python>=3.10' -y
 conda activate mineru
 ```
 
@@ -64,14 +64,13 @@ conda activate mineru
 pip install -U magic-pdf[full] -i https://mirrors.aliyun.com/pypi/simple
 ```
 
-> [!IMPORTANT]
-> 下载完成后,务必通过以下命令确认magic-pdf的版本是否正确
+> [!TIP]
+> 下载完成后,您可以通过以下命令检查`magic-pdf`的版本:
 >
 > ```bash
 > magic-pdf --version
 > ```
->
-> 如果版本号小于1.3.0,请到issue中向我们反馈
+
 
 ## 6. 下载模型
 

+ 4 - 5
docs/README_Windows_CUDA_Acceleration_en_US.md

@@ -17,7 +17,7 @@ Download link: https://repo.anaconda.com/archive/Anaconda3-2024.06-1-Windows-x86
 ### 3. Create an Environment Using Conda
 
 ```bash
-conda create -n mineru 'python<3.13' -y
+conda create -n mineru 'python>=3.10' -y
 conda activate mineru
 ```
 
@@ -28,13 +28,12 @@ pip install -U magic-pdf[full]
 ```
 
 > [!IMPORTANT]
-> After installation, verify the version of `magic-pdf`:
+> After installation, you can check the version of `magic-pdf` using the following command:
 >
 > ```bash
 > magic-pdf --version
 > ```
->
-> If the version number is less than 1.3.0, please report it in the issues section.
+
 
 ### 5. Download Models
 
@@ -64,7 +63,7 @@ If your graphics card has at least 6GB of VRAM, follow these steps to test CUDA-
 1. **Overwrite the installation of torch and torchvision** supporting CUDA.(Please select the appropriate index-url based on your CUDA version. For more details, refer to the [PyTorch official website](https://pytorch.org/get-started/locally/).)
 
    ```
-   pip install --force-reinstall torch==2.6.0 torchvision==0.21.0 "numpy<2.0.0" --index-url https://download.pytorch.org/whl/cu124
+   pip install --force-reinstall torch torchvision --index-url https://download.pytorch.org/whl/cu124
    ```
 
 2. **Modify the value of `"device-mode"`** in the `magic-pdf.json` configuration file located in your user directory.

+ 4 - 5
docs/README_Windows_CUDA_Acceleration_zh_CN.md

@@ -18,7 +18,7 @@ https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/Anaconda3-2024.06-1-Window
 ## 3. 使用conda 创建环境
 
 ```bash
-conda create -n mineru 'python<3.13' -y
+conda create -n mineru 'python>=3.10' -y
 conda activate mineru
 ```
 
@@ -29,13 +29,12 @@ pip install -U magic-pdf[full] -i https://mirrors.aliyun.com/pypi/simple
 ```
 
 > [!IMPORTANT]
-> 下载完成后,务必通过以下命令确认magic-pdf的版本是否正确
+> 下载完成后,您可以通过以下命令检查magic-pdf的版本
 >
 > ```bash
 > magic-pdf --version
 > ```
->
-> 如果版本号小于 1.3.0 ,请到issue中向我们反馈
+
 
 ## 5. 下载模型
 
@@ -65,7 +64,7 @@ pip install -U magic-pdf[full] -i https://mirrors.aliyun.com/pypi/simple
 **1.覆盖安装支持cuda的torch和torchvision**(请根据cuda版本选择合适的index-url,具体可参考[torch官网](https://pytorch.org/get-started/locally/))
 
 ```bash
-pip install --force-reinstall torch==2.6.0 torchvision==0.21.0 "numpy<2.0.0" --index-url https://download.pytorch.org/whl/cu124
+pip install --force-reinstall torch torchvision --index-url https://download.pytorch.org/whl/cu124
 ```
 
 **2.修改【用户目录】中配置文件magic-pdf.json中"device-mode"的值**

+ 1 - 10
docs/how_to_download_models_en.md

@@ -18,15 +18,6 @@ The configuration file can be found in the user directory, with the filename `ma
 
 # How to update models previously downloaded
 
-## 1. Models downloaded via Git LFS
-
-> [!IMPORTANT]
-> Due to feedback from some users that downloading model files using git lfs was incomplete or resulted in corrupted model files, this method is no longer recommended.
->
-> For versions 0.9.x and later, due to the repository change and the addition of the layout sorting model in PDF-Extract-Kit 1.0, the models cannot be updated using the `git pull` command. Instead, a Python script must be used for one-click updates.
-
-When magic-pdf <= 0.8.1, if you have previously downloaded the model files via git lfs, you can navigate to the previous download directory and update the models using the `git pull` command.
-
-## 2. Models downloaded via Hugging Face or Model Scope
+## 1. Models downloaded via Hugging Face or Model Scope
 
 If you previously downloaded models via Hugging Face or Model Scope, you can rerun the Python script used for the initial download. This will automatically update the model directory to the latest version.

+ 1 - 11
docs/how_to_download_models_zh_cn.md

@@ -32,16 +32,6 @@ python脚本会自动下载模型文件并配置好配置文件中的模型目
 
 # 此前下载过模型,如何更新
 
-## 1. 通过git lfs下载过模型
-
-> [!IMPORTANT]
-> 由于部分用户反馈通过git lfs下载模型文件遇到下载不全和模型文件损坏情况,现已不推荐使用该方式下载。
-> 
-> 0.9.x及以后版本由于PDF-Extract-Kit 1.0更换仓库和新增layout排序模型,不能通过`git pull`命令更新,需要使用python脚本一键更新。
-
-当magic-pdf <= 0.8.1时,如此前通过 git lfs 下载过模型文件,可以进入到之前的下载目录中,通过`git pull`命令更新模型。
-
-
-## 2. 通过 Hugging Face 或 Model Scope 下载过模型
+## 1. 通过 Hugging Face 或 Model Scope 下载过模型
 
 如此前通过 HuggingFace 或 Model Scope 下载过模型,可以重复执行此前的模型下载python脚本,将会自动将模型目录更新到最新版本。

+ 7 - 9
next_docs/en/user_guide/install/boost_with_cuda.rst

@@ -80,7 +80,7 @@ Specify Python version 3.10.
 
 .. code:: sh
 
-    conda create -n mineru 'python<3.13' -y
+    conda create -n mineru 'python>=3.10' -y
     conda activate mineru
 
 5. Install Applications
@@ -90,16 +90,15 @@ Specify Python version 3.10.
 
    pip install -U magic-pdf[full]
 
-.. admonition:: Important
+.. admonition:: TIP
     :class: tip
 
-    ❗ After installation, make sure to check the version of ``magic-pdf`` using the following command:
+    After installation, you can check the version of ``magic-pdf`` using the following command:
 
 .. code:: sh
 
    magic-pdf --version
 
-If the version number is less than 1.3.0, please report the issue.
 
 6. Download Models
 ~~~~~~~~~~~~~~~~~~
@@ -178,7 +177,7 @@ Download link: https://repo.anaconda.com/archive/Anaconda3-2024.06-1-Windows-x86
 
 ::
 
-    conda create -n mineru 'python<3.13' -y
+    conda create -n mineru 'python>=3.10' -y
     conda activate mineru
 
 4. Install Applications
@@ -188,16 +187,15 @@ Download link: https://repo.anaconda.com/archive/Anaconda3-2024.06-1-Windows-x86
 
    pip install -U magic-pdf[full]
 
-.. admonition:: Important
+.. admonition:: Tip
     :class: tip
 
-    ❗️After installation, verify the version of ``magic-pdf``:
+    After installation, you can check the version of ``magic-pdf``:
 
     .. code:: bash
 
       magic-pdf --version
 
-    If the version number is less than 1.3.0, please report it in the issues section.
 
 5. Download Models
 ~~~~~~~~~~~~~~~~~~
@@ -237,7 +235,7 @@ test CUDA-accelerated parsing performance.
 
 .. code:: sh
 
-   pip install --force-reinstall torch==2.6.0 torchvision==0.21.1 "numpy<2.0.0" --index-url https://download.pytorch.org/whl/cu124
+   pip install --force-reinstall torch torchvision --index-url https://download.pytorch.org/whl/cu124
 
 
 2. **Modify the value of ``"device-mode"``** in the ``magic-pdf.json``

+ 6 - 6
next_docs/en/user_guide/install/config.rst

@@ -28,7 +28,7 @@ magic-pdf.json
         "layoutreader-model-dir":"/tmp/layoutreader",
         "device-mode":"cpu",
         "layout-config": {
-            "model": "layoutlmv3"
+            "model": "doclayout_yolo"
         },
         "formula-config": {
             "mfd_model": "yolo_v8_mfd",
@@ -37,7 +37,7 @@ magic-pdf.json
         },
         "table-config": {
             "model": "rapid_table",
-            "enable": false,
+            "enable": true,
             "max_time": 400    
         },
         "config_version": "1.0.0"
@@ -88,10 +88,10 @@ layout-config
 .. code:: json
 
     {
-        "model": "layoutlmv3"  
+        "model": "doclayout_yolo"
     }
 
-layout model can not be disabled now, And we have only kind of layout model currently.
+layout model can not be disabled now.
 
 
 formula-config
@@ -132,14 +132,14 @@ table-config
 
    {
         "model": "rapid_table",
-        "enable": false,
+        "enable": true,
         "max_time": 400    
     }
 
 model
 """"""""
 
-Specify the table inference model, options are ['rapid_table', 'tablemaster', 'struct_eqtable']
+Specify the table inference model, options are ['rapid_table']
 
 
 max_time

+ 1 - 12
next_docs/en/user_guide/install/download_model_weight_files.rst

@@ -29,18 +29,7 @@ filename ``magic-pdf.json``.
 How to update models previously downloaded
 -----------------------------------------
 
-1. Models downloaded via Git LFS
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-   Due to feedback from some users that downloading model files using
-   git lfs was incomplete or resulted in corrupted model files, this
-   method is no longer recommended.
-
-If you previously downloaded model files via git lfs, you can navigate
-to the previous download directory and use the ``git pull`` command to
-update the model.
-
-2. Models downloaded via Hugging Face or Model Scope
+1. Models downloaded via Hugging Face or Model Scope
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 If you previously downloaded models via Hugging Face or Model Scope, you

+ 3 - 3
next_docs/en/user_guide/install/install.rst

@@ -71,8 +71,8 @@ Also you can try `online demo <https://www.modelscope.cn/studios/OpenDataLab/Min
     </tr>
     <tr>
         <td colspan="3">CUDA Environment</td>
-        <td>11.8/12.4/12.6</td>
-        <td>11.8/12.4/12.6</td>
+        <td>11.8/12.4/12.6/12.8</td>
+        <td>11.8/12.4/12.6/12.8</td>
         <td>None</td>
     </tr>
     <tr>
@@ -97,7 +97,7 @@ Create an environment
 
 .. code-block:: shell
 
-    conda create -n mineru 'python<3.13' -y
+    conda create -n mineru 'python>=3.10' -y
     conda activate mineru
     pip install -U "magic-pdf[full]"
 

+ 19 - 11
setup.py

@@ -26,6 +26,7 @@ if __name__ == '__main__':
     setup(
         name="magic_pdf",  # 项目名
         version=__version__,  # 自动从tag中获取版本号
+        license="AGPL-3.0",
         packages=find_packages() + ["magic_pdf.resources"] + ["magic_pdf.model.sub_modules.ocr.paddleocr2pytorch.pytorchocr.utils.resources"],  # 包含所有的包
         package_data={
             "magic_pdf.resources": ["**"],  # 包含magic_pdf.resources目录下的所有文件
@@ -53,17 +54,17 @@ if __name__ == '__main__':
                      "omegaconf>=2.3.0,<3",  # paddleocr2pytorch
             ],
             "full_old_linux":[
-                    "matplotlib>=3.10",
-                    "ultralytics>=8.3.48",  # yolov8,公式检测
+                    "matplotlib>=3.10,<=3.10.1",
+                    "ultralytics>=8.3.48,<=8.3.104",  # yolov8,公式检测
                     "doclayout_yolo==0.0.2b1",  # doclayout_yolo
-                    "dill>=0.3.9,<1",  # doclayout_yolo
-                    "PyYAML>=6.0.2,<7",  # yaml
-                    "ftfy>=6.3.1,<7",  # unimernet_hf
-                    "openai>=1.70.0,<2",  # openai SDK
-                    "shapely>=2.0.7,<3",  # imgaug-paddleocr2pytorch
-                    "pyclipper>=1.3.0,<2",  # paddleocr2pytorch
-                    "omegaconf>=2.3.0,<3",  # paddleocr2pytorch
-                    "albumentations<=1.4.20", # 1.4.21引入的simsimd不支持2019年及更早的linux系统
+                    "dill==0.3.9",  # doclayout_yolo
+                    "PyYAML==6.0.2",  # yaml
+                    "ftfy==6.3.1",  # unimernet_hf
+                    "openai==1.71.0",  # openai SDK
+                    "shapely==2.1.0",  # imgaug-paddleocr2pytorch
+                    "pyclipper==1.3.0.post6",  # paddleocr2pytorch
+                    "omegaconf==2.3.0",  # paddleocr2pytorch
+                    "albumentations==1.4.20", # 1.4.21引入的simsimd不支持2019年及更早的linux系统
                     "rapid_table==1.0.3",  # rapid_table新版本依赖的onnxruntime不支持2019年及更早的linux系统
             ],
         },
@@ -71,7 +72,14 @@ if __name__ == '__main__':
         long_description=long_description,  # 详细描述
         long_description_content_type="text/markdown",  # 如果README是Markdown格式
         url="https://github.com/opendatalab/MinerU",
-        python_requires=">=3.9",  # 项目依赖的 Python 版本
+        keywords=["magic-pdf, mineru, MinerU, convert, pdf, markdown"],
+        classifiers=[
+            "Programming Language :: Python :: 3.10",
+            "Programming Language :: Python :: 3.11",
+            "Programming Language :: Python :: 3.12",
+            "Programming Language :: Python :: 3.13",
+        ],
+        python_requires=">=3.10,<4",  # 项目依赖的 Python 版本
         entry_points={
             "console_scripts": [
                 "magic-pdf = magic_pdf.tools.cli:cli",