Переглянути джерело

build(docker): update docker build step (#471)

* build(docker): update base image to Ubuntu 22.04 and install PaddlePaddleUpgrade the Docker base image from ubuntu:latest to ubuntu:22.04 for improved
performance and stability.

Additionally, integrate PaddlePaddle GPU version 3.0.0b1
into the Docker build for enhanced AI capabilities. The MinIO configuration file has
also been updated to the latest version.

* build(dockerfile): Updated the Dockerfile

* build(Dockerfile): update Dockerfile

* docs(docker): add instructions for quick deployment with Docker

Include Docker-based deployment instructions in the README for both English and
Chinese locales. This update provides users a quick-start guide to using Docker for
deployment, with notes on GPU VRAM requirements and default acceleration features.

* build(docker): Layer the installation of dependencies, downloading the model, and the setup of the program itself.

* build(docker): Layer the installation of dependencies, downloading the model, and the setup of the program itself.
Xiaomeng Zhao 1 рік тому
батько
коміт
1fc0b76de8
4 змінених файлів з 53 додано та 12 видалено
  1. 18 12
      Dockerfile
  2. 8 0
      README.md
  3. 9 0
      README_zh-CN.md
  4. 18 0
      requirements-docker.txt

+ 18 - 12
Dockerfile

@@ -1,5 +1,5 @@
 # Use the official Ubuntu base image
-FROM ubuntu:latest
+FROM ubuntu:22.04
 
 # Set environment variables to non-interactive to avoid prompts during installation
 ENV DEBIAN_FRONTEND=noninteractive
@@ -29,17 +29,23 @@ RUN python3 -m venv /opt/mineru_venv
 
 # Activate the virtual environment and install necessary Python packages
 RUN /bin/bash -c "source /opt/mineru_venv/bin/activate && \
-    pip install --upgrade pip && \
-    pip install magic-pdf[full-cpu] detectron2 --extra-index-url https://myhloli.github.io/wheels/"
-
-# Copy the configuration file template and set up the model directory
-COPY magic-pdf.template.json /root/magic-pdf.json
-
-# Set the models directory in the configuration file (adjust the path as needed)
-RUN sed -i 's|/tmp/models|/opt/models|g' /root/magic-pdf.json
-
-# Create the models directory
-RUN mkdir -p /opt/models
+    pip3 install --upgrade pip && \
+    wget https://gitee.com/myhloli/MinerU/raw/master/requirements-docker.txt && \
+    pip3 install -r requirements-docker.txt --extra-index-url https://wheels.myhloli.com -i https://pypi.tuna.tsinghua.edu.cn/simple && \
+    pip3 install paddlepaddle-gpu==3.0.0b1 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/"
+
+# Copy the configuration file template and install magic-pdf latest
+RUN /bin/bash -c "wget https://gitee.com/myhloli/MinerU/raw/master/magic-pdf.template.json && \
+    cp magic-pdf.template.json /root/magic-pdf.json && \
+    source /opt/mineru_venv/bin/activate && \
+    pip3 install magic-pdf==0.7.0b1"
+
+# Download models and update the configuration file
+RUN /bin/bash -c "pip3 install modelscope && \
+    wget https://gitee.com/myhloli/MinerU/raw/master/docs/download_models.py && \
+    python3 download_models.py && \
+    sed -i 's|/tmp/models|/root/.cache/modelscope/hub/wanderkid/PDF-Extract-Kit/models|g' /root/magic-pdf.json && \
+    sed -i 's|cpu|cuda|g' /root/magic-pdf.json"
 
 # Set the entry point to activate the virtual environment and run the command line tool
 ENTRYPOINT ["/bin/bash", "-c", "source /opt/mineru_venv/bin/activate && exec \"$@\"", "--"]

+ 8 - 0
README.md

@@ -227,6 +227,14 @@ If your device supports CUDA and meets the GPU requirements of the mainline envi
 
 - [Ubuntu 22.04 LTS + GPU](docs/README_Ubuntu_CUDA_Acceleration_en_US.md)
 - [Windows 10/11 + GPU](docs/README_Windows_CUDA_Acceleration_en_US.md)
+- Quick Deployment with Docker
+    > Docker requires a GPU with at least 16GB of VRAM, and all acceleration features are enabled by default.
+  ```bash
+  wget https://github.com/opendatalab/MinerU/raw/master/Dockerfile
+  docker build -t mineru:0.7.0b1 .
+  docker run --rm -it --gpus=all mineru:0.7.0b1 /bin/bash
+  magic-pdf --help
+  ```
 
 ## Usage
 

+ 9 - 0
README_zh-CN.md

@@ -230,6 +230,15 @@ cp magic-pdf.template.json ~/magic-pdf.json
 
 - [Ubuntu22.04LTS + GPU](docs/README_Ubuntu_CUDA_Acceleration_zh_CN.md)
 - [Windows10/11 + GPU](docs/README_Windows_CUDA_Acceleration_zh_CN.md)
+- 使用Docker快速部署
+    > Docker 需设备gpu显存大于等于16GB,默认开启所有加速功能
+  ```bash
+  wget https://github.com/opendatalab/MinerU/raw/master/Dockerfile
+  docker build -t mineru:0.7.0b1 .
+  docker run --rm -it --gpus=all mineru:0.7.0b1 /bin/bash
+  magic-pdf --help
+  ```
+    
 
 ## 使用
 

+ 18 - 0
requirements-docker.txt

@@ -0,0 +1,18 @@
+boto3>=1.28.43
+Brotli>=1.1.0
+click>=8.1.7
+PyMuPDF>=1.24.9
+loguru>=0.6.0
+numpy>=1.21.6,<2.0.0
+fast-langdetect==0.2.0
+wordninja>=2.0.0
+scikit-learn>=1.0.2
+pdfminer.six==20231228
+unimernet==0.1.6
+matplotlib
+ultralytics
+paddleocr==2.7.3
+paddlepaddle==3.0.0b1
+pypandoc
+struct-eqtable==0.1.0
+detectron2