Преглед изворни кода

docs: update model download instructions to use python script (#560)

Xiaomeng Zhao пре 1 година
родитељ
комит
77596ad079
2 измењених фајлова са 25 додато и 57 уклоњено
  1. 10 17
      docs/how_to_download_models_en.md
  2. 15 40
      docs/how_to_download_models_zh_cn.md

+ 10 - 17
docs/how_to_download_models_en.md

@@ -1,20 +1,12 @@
-### 1. Install Git LFS
-Before you begin, make sure Git Large File Storage (Git LFS) is installed on your system. Install it using the following command:
-
-```bash
-git lfs install
-```
-
-### 2. Download the Model from Hugging Face
-To download the `PDF-Extract-Kit` model from Hugging Face, use the following command:
-
+### 1. Download the Model from Hugging Face
+Use a Python Script to Download Model Files from Hugging Face
 ```bash
-git lfs clone https://huggingface.co/opendatalab/PDF-Extract-Kit
+pip install huggingface_hub
+wget https://github.com/opendatalab/MinerU/raw/master/docs/download_models_hf.py
+python download_models_hf.py
 ```
-
-Ensure that Git LFS is enabled during the clone to properly download all large files.
-
-### 3. Additional steps
+After the Python script finishes executing, it will output the directory where the models are downloaded.
+### 2. Additional steps
 
 #### 1. Check whether the model directory is downloaded completely.
 
@@ -65,6 +57,7 @@ The structure of the model folder is as follows, including configuration files a
 
 Please check whether the size of the model file in the directory is consistent with the description on the web page. If possible, it is best to check whether the model is downloaded completely through sha256.
 
-#### 3. Move the model to the solid-state drive
+#### 3. 
+
+Additionally, in `~/magic-pdf.json`, update the model directory path to the absolute path of the `models` directory output by the previous Python script. Otherwise, you will encounter an error indicating that the model cannot be loaded.
 
-Move the 'models' directory to a directory with large disk space, preferably on a solid-state drive (SSD). In addition, modify the model directory in `~/magic-pdf.json` to point to the final model storage location, otherwise the model cannot be loaded.

+ 15 - 40
docs/how_to_download_models_zh_cn.md

@@ -1,50 +1,26 @@
 # 如何下载模型文件
 
-模型文件可以从Hugging Face 或 Model Scope 下载,由于网络原因,国内用户访问HF 可能会失败,请使用 ModelScope。
-
-
-方法一:[从 Hugging Face 下载模型](#方法一从-hugging-face-下载模型)
-
-方法二:[从 ModelScope 下载模型](#方法二从-modelscope-下载模型)
-
-## 方法一:从 Hugging Face 下载模型
-
-使用Git LFS 从Hugging Face下载模型文件
-
-```bash
-git lfs install # 安装 Git 大文件存储插件 (Git LFS) 
-git lfs clone https://huggingface.co/opendatalab/PDF-Extract-Kit # 从 Hugging Face 下载 PDF-Extract-Kit 模型
-```
+模型文件可以从 Hugging Face 或 Model Scope 下载,由于网络原因,国内用户访问HF可能会失败,请使用 ModelScope。
 
+<details>
+  <summary>方法一:从 Hugging Face 下载模型</summary>
+  <p>使用python脚本 从Hugging Face下载模型文件</p>
+  <pre><code>pip install huggingface_hub
+wget https://gitee.com/myhloli/MinerU/raw/master/docs/download_models_hf.py
+python download_models_hf.py</code></pre>
+  <p>python脚本执行完毕后,会输出模型下载目录</p>
+</details>
 
 ## 方法二:从 ModelScope 下载模型
-ModelScope 支持SDK或模型下载,任选一个即可。
-
-[Git lsf下载](#1利用git-lsf下载)
 
-[SDK下载](#2利用sdk下载)
-
-### 1)利用Git lsf下载
-
-```bash
-git lfs install
-git lfs clone https://www.modelscope.cn/opendatalab/PDF-Extract-Kit.git
-```
-
-### 2)利用SDK下载
+### 使用python脚本 从ModelScope下载模型文件
 
 ```bash
-# 首先安装modelscope
 pip install modelscope
+wget https://gitee.com/myhloli/MinerU/raw/master/docs/download_models.py
+python download_models.py
 ```
-
-```python
-# 使用modelscope sdk下载模型
-from modelscope import snapshot_download
-model_dir = snapshot_download('opendatalab/PDF-Extract-Kit')
-print(f"模型文件下载路径为:{model_dir}/models")
-```
-
+python脚本执行完毕后,会输出模型下载目录
 ## 【❗️必须要做❗️】的额外步骤(模型下载完成后请务必完成以下操作)
 
 ### 1.检查模型目录是否下载完整
@@ -95,6 +71,5 @@ print(f"模型文件下载路径为:{model_dir}/models")
 ### 2.检查模型文件是否下载完整
 请检查目录下的模型文件大小与网页上描述是否一致,如果可以的话,最好通过sha256校验模型是否下载完整
 
-### 3.移动模型到固态硬盘
-将 'models' 目录移动到具有较大磁盘空间的目录中,最好是在固态硬盘(SSD)上。
-此外在 `~/magic-pdf.json`里修改模型的目录指向最终的模型存放位置,否则会报模型无法加载的错误。
+### 3.修改magic-pdf.json中的模型路径
+此外在 `~/magic-pdf.json`里修改模型的目录指向之前python脚本输出的models目录的绝对路径,否则会报模型无法加载的错误。