Required versions: CUDA 11.8 + cuDNN 8.7.0
If Anaconda is already installed, you can skip this step.
Download link: https://repo.anaconda.com/archive/Anaconda3-2024.06-1-Windows-x86_64.exe
Python version must be 3.10.
conda create -n MinerU python=3.10
conda activate MinerU
pip install -U magic-pdf[full] --extra-index-url https://wheels.myhloli.com
[!IMPORTANT] After installation, verify the version of
magic-pdf:> magic-pdf --version > ``` > > If the version number is less than 0.7.0, please report it in the issues section. ### 5. Download Models Refer to detailed instructions on [how to download model files](how_to_download_models_en.md). ### 6. Understand the Location of the Configuration File After completing the [5. Download Models](#5-download-models) step, the script will automatically generate a `magic-pdf.json` file in the user directory and configure the default model path. You can find the `magic-pdf.json` file in your 【user directory】 . > [!TIP] > The user directory for Windows is "C:/Users/username". ### 7. First Run Download a sample file from the repository and test it.powershell wget https://github.com/opendatalab/MinerU/raw/master/demo/small_ocr.pdf -O small_ocr.pdf magic-pdf -p small_ocr.pdf -o ./output
### 8. Test CUDA Acceleration If your graphics card has at least 8GB of VRAM, follow these steps to test CUDA-accelerated parsing performance. 1. **Overwrite the installation of torch and torchvision** supporting CUDA.pip install --force-reinstall torch==2.3.1 torchvision==0.18.1 --index-url https://download.pytorch.org/whl/cu118
2. **Modify the value of `"device-mode"`** in the `magic-pdf.json` configuration file located in your user directory.json {
"device-mode": "cuda"}
3. **Run the following command to test CUDA acceleration**:magic-pdf -p small_ocr.pdf -o ./output
### 9. Enable CUDA Acceleration for OCR 1. **Download paddlepaddle-gpu**, which will automatically enable OCR acceleration upon installation.pip install paddlepaddle-gpu==2.6.1
2. **Run the following command to test OCR acceleration**:magic-pdf -p small_ocr.pdf -o ./output ```