开始之前,请确保您的系统上已安装 Git 大文件存储 (Git LFS)。使用以下命令进行安装
git lfs install
请使用以下命令从 Hugging Face 下载 PDF-Extract-Kit 模型:
git lfs clone https://huggingface.co/wanderkid/PDF-Extract-Kit
确保在克隆过程中启用了 Git LFS,以便正确下载所有大文件。
# 首先安装modelscope
pip install modelscope
# 使用modelscope sdk下载模型
from modelscope import snapshot_download
model_dir = snapshot_download('wanderkid/PDF-Extract-Kit')
也可以使用git clone从 ModelScope 下载模型:
需要先安装git lfs
On Linux
Debian and RPM packages are available from packagecloud, see the Linux installation instructions.
On macOS
Homebrew bottles are distributed and can be installed via
brew install git-lfs.On Windows
Git LFS is included in the distribution of Git for Windows. Alternatively, you can install a recent version of Git LFS from the Chocolatey package manager.
然后通过git clone下载模型:
git lfs clone https://www.modelscope.cn/wanderkid/PDF-Extract-Kit.git
将 'models' 目录移动到具有较大磁盘空间的目录中,最好是在固态硬盘(SSD)上。
模型文件夹的结构如下,包含了不同组件的配置文件和权重文件:
./
├── Layout
│ ├── config.json
│ └── model_final.pth
├── MFD
│ └── weights.pt
├── MFR
│ └── UniMERNet
│ ├── config.json
│ ├── preprocessor_config.json
│ ├── pytorch_model.bin
│ ├── README.md
│ ├── tokenizer_config.json
│ └── tokenizer.json
│── TabRec
│ └─StructEqTable
│ ├── config.json
│ ├── generation_config.json
│ ├── model.safetensors
│ ├── preprocessor_config.json
│ ├── special_tokens_map.json
│ ├── spiece.model
│ ├── tokenizer.json
│ └── tokenizer_config.json
└── README.md