Forráskód Böngészése

Merge pull request #1066 from opendatalab/master

master -> dev
Xiaomeng Zhao 1 éve
szülő
commit
4e0b3a8f15

+ 0 - 3
.github/workflows/daily.yml

@@ -2,9 +2,6 @@
 # For more information see: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-python
 
 name: mineru
-on:
-  schedule:
-    - cron: '0 22 * * *'  # 每天晚上 10 点执行
 jobs:
   cli-test:
     runs-on: pdf

+ 0 - 1
Dockerfile

@@ -44,7 +44,6 @@ RUN /bin/bash -c "wget https://gitee.com/myhloli/MinerU/raw/master/magic-pdf.tem
 RUN /bin/bash -c "pip3 install modelscope && \
     wget https://gitee.com/myhloli/MinerU/raw/master/scripts/download_models.py && \
     python3 download_models.py && \
-    sed -i 's|/tmp/models|/root/.cache/modelscope/hub/opendatalab/PDF-Extract-Kit/models|g' /root/magic-pdf.json && \
     sed -i 's|cpu|cuda|g' /root/magic-pdf.json"
 
 # Set the entry point to activate the virtual environment and run the command line tool

+ 9 - 0
docs/FAQ_en_us.md

@@ -64,3 +64,12 @@ This might be because the server's CPU does not support the AVX/AVX2 instruction
 
 References: https://github.com/opendatalab/MinerU/issues/591 , https://github.com/opendatalab/MinerU/issues/736
 
+
+### 8. Error when installing MinerU on CentOS 7 or Ubuntu 18: `ERROR: Failed building wheel for simsimd`
+
+The new version of albumentations (1.4.21) introduces a dependency on simsimd. Since the pre-built package of simsimd for Linux requires a glibc version greater than or equal to 2.28, this causes installation issues on some Linux distributions released before 2019. You can resolve this issue by using the following command:
+```
+pip install -U magic-pdf[full,old_linux] --extra-index-url https://wheels.myhloli.com
+```
+
+Reference: https://github.com/opendatalab/MinerU/issues/1004

+ 10 - 1
docs/FAQ_zh_cn.md

@@ -64,4 +64,13 @@ pip install paddlepaddle-gpu==3.0.0b1 -i https://www.paddlepaddle.org.cn/package
 
 可能是因为服务器CPU不支持AVX/AVX2指令集,或cpu本身支持但被运维禁用了,可以尝试联系运维解除限制或更换服务器。
 
-参考:https://github.com/opendatalab/MinerU/issues/591 , https://github.com/opendatalab/MinerU/issues/736
+参考:https://github.com/opendatalab/MinerU/issues/591 , https://github.com/opendatalab/MinerU/issues/736
+
+### 8.在 CentOS 7 或 Ubuntu 18 系统安装MinerU时报错`ERROR: Failed building wheel for simsimd`
+
+新版本albumentations(1.4.21)引入了依赖simsimd,由于simsimd在linux的预编译包要求glibc的版本大于等于2.28,导致部分2019年之前发布的Linux发行版无法正常安装,可通过如下命令安装:
+```
+pip install -U magic-pdf[full,old_linux] --extra-index-url https://wheels.myhloli.com
+```
+
+参考:https://github.com/opendatalab/MinerU/issues/1004

+ 3 - 0
docs/README_Ubuntu_CUDA_Acceleration_en_US.md

@@ -76,8 +76,10 @@ pip install -U magic-pdf[full] --extra-index-url https://wheels.myhloli.com
 
 ### 6. Download Models
 
+
 Refer to detailed instructions on [how to download model files](how_to_download_models_en.md).
 
+
 ## 7. Understand the Location of the Configuration File
 
 After completing the [6. Download Models](#6-download-models) step, the script will automatically generate a `magic-pdf.json` file in the user directory and configure the default model path.
@@ -86,6 +88,7 @@ You can find the `magic-pdf.json` file in your user directory.
 > [!TIP]
 > The user directory for Linux is "/home/username".
 
+
 ### 8. First Run
 
 Download a sample file from the repository and test it.

+ 1 - 0
docs/README_Ubuntu_CUDA_Acceleration_zh_CN.md

@@ -77,6 +77,7 @@ pip install -U magic-pdf[full] --extra-index-url https://wheels.myhloli.com -i h
 
 ## 6. 下载模型
 
+
 详细参考 [如何下载模型文件](how_to_download_models_zh_cn.md)
 
 ## 7. 了解配置文件存放的位置

+ 1 - 0
docs/README_Windows_CUDA_Acceleration_en_US.md

@@ -84,6 +84,7 @@ If your graphics card has at least 8GB of VRAM, follow these steps to test CUDA-
    }
    ```
 
+
 3. **Run the following command to test CUDA acceleration**:
 
    ```

+ 3 - 3
docs/how_to_download_models_en.md

@@ -1,21 +1,21 @@
 Model downloads are divided into initial downloads and updates to the model directory. Please refer to the corresponding documentation for instructions on how to proceed.
 
+
 # Initial download of model files
 
-### 1. Download the Model from Hugging Face
+### Download the Model from Hugging Face
 
 Use a Python Script to Download Model Files from Hugging Face
-
 ```bash
 pip install huggingface_hub
 wget https://github.com/opendatalab/MinerU/raw/master/scripts/download_models_hf.py -O download_models_hf.py
 python download_models_hf.py
 ```
-
 The Python script will automatically download the model files and configure the model directory in the configuration file.
 
 The configuration file can be found in the user directory, with the filename `magic-pdf.json`.
 
+
 # How to update models previously downloaded
 
 ## 1. Models downloaded via Git LFS

+ 2 - 1
docs/how_to_download_models_zh_cn.md

@@ -10,6 +10,7 @@
   <pre><code>pip install huggingface_hub
 wget https://gitee.com/myhloli/MinerU/raw/master/scripts/download_models_hf.py -O download_models_hf.py
 python download_models_hf.py</code></pre>
+  <p>python脚本会自动下载模型文件并配置好配置文件中的模型目录</p>
 </details>
 
 ## 方法二:从 ModelScope 下载模型
@@ -21,7 +22,6 @@ pip install modelscope
 wget https://gitee.com/myhloli/MinerU/raw/master/scripts/download_models.py -O download_models.py
 python download_models.py
 ```
-
 python脚本会自动下载模型文件并配置好配置文件中的模型目录
 
 配置文件可以在用户目录中找到,文件名为`magic-pdf.json`
@@ -29,6 +29,7 @@ python脚本会自动下载模型文件并配置好配置文件中的模型目
 > [!TIP]
 > windows的用户目录为 "C:\\Users\\用户名", linux用户目录为 "/home/用户名", macOS用户目录为 "/Users/用户名"
 
+
 # 此前下载过模型,如何更新
 
 ## 1. 通过git lfs下载过模型

+ 1 - 1
magic_pdf/libs/version.py

@@ -1 +1 @@
-__version__ = "0.9.0"
+__version__ = "0.10.0"

+ 56 - 0
signatures/version1/cla.json

@@ -47,6 +47,62 @@
       "created_at": "2024-08-26T07:01:49Z",
       "repoId": 765083837,
       "pullRequestNo": 487
+    },
+    {
+      "name": "hamirmahal",
+      "id": 43425812,
+      "comment_id": 2395141155,
+      "created_at": "2024-10-05T18:22:47Z",
+      "repoId": 765083837,
+      "pullRequestNo": 687
+    },
+    {
+      "name": "wmpscc",
+      "id": 29891793,
+      "comment_id": 2416780426,
+      "created_at": "2024-10-16T13:02:13Z",
+      "repoId": 765083837,
+      "pullRequestNo": 682
+    },
+    {
+      "name": "randydl",
+      "id": 36127931,
+      "comment_id": 2439668779,
+      "created_at": "2024-10-26T17:39:26Z",
+      "repoId": 765083837,
+      "pullRequestNo": 793
+    },
+    {
+      "name": "hyastar",
+      "id": 117415976,
+      "comment_id": 2466539016,
+      "created_at": "2024-11-10T01:32:42Z",
+      "repoId": 765083837,
+      "pullRequestNo": 916
+    },
+    {
+      "name": "kimi360",
+      "id": 3158007,
+      "comment_id": 2472266659,
+      "created_at": "2024-11-13T02:57:34Z",
+      "repoId": 765083837,
+      "pullRequestNo": 938
+    },
+    {
+      "name": "ProseGuys",
+      "id": 45124798,
+      "comment_id": 2472990455,
+      "created_at": "2024-11-13T09:37:42Z",
+      "repoId": 765083837,
+      "pullRequestNo": 945
+    },
+    {
+      "name": "liugongjian",
+      "id": 9069358,
+      "comment_id": 2484888409,
+      "created_at": "2024-11-19T07:28:12Z",
+      "repoId": 765083837,
+      "pullRequestNo": 1024
     }
   ]
 }