myhloli 44fdeb663f Refactor async function and improve output directory handling in prediction 1 ヶ月 前
..
README.md 3f32f2a587 Update README.md 5 ヶ月 前
README_zh.md 3f32f2a587 Update README.md 5 ヶ月 前
_config_endpoint.py 1671e68367 fix error logs for multi_gpu endpoint. 2 ヶ月 前
client.py 44fdeb663f Refactor async function and improve output directory handling in prediction 1 ヶ月 前
server.py 44fdeb663f Refactor async function and improve output directory handling in prediction 1 ヶ月 前

README.md

MinerU v2.0 Multi-GPU Server

简体中文

A streamlined multi-GPU server implementation.

Quick Start

1. install MinerU

pip install --upgrade pip
pip install uv
uv pip install -U "mineru[core]"
uv pip install litserve aiohttp loguru

2. Start the Server

python server.py

3. Start the Client

python client.py

Now, pdf files under folder demo will be processed in parallel. Assuming you have 2 gpus, if you change the workers_per_device to 2, 4 pdf files will be processed at the same time!

Customize

Server

Example showing how to start the server with custom settings:

server = ls.LitServer(
    MinerUAPI(output_dir='/tmp/mineru_output'),
    accelerator='auto',  # You can specify 'cuda'
    devices='auto',  # "auto" uses all available GPUs
    workers_per_device=1,  # One worker instance per GPU
    timeout=False  # Disable timeout for long processing
)
server.run(port=8000, generate_client_file=False)

Client

The client supports both synchronous and asynchronous processing:

import asyncio
import aiohttp
from client import mineru_parse_async

async def process_documents():
    async with aiohttp.ClientSession() as session:
        # Basic usage
        result = await mineru_parse_async(session, 'document.pdf')
        
        # With custom options
        result = await mineru_parse_async(
            session, 
            'document.pdf',
            backend='pipeline',
            lang='ch',
            formula_enable=True,
            table_enable=True
        )

# Run async processing
asyncio.run(process_documents())

Concurrent Processing

Process multiple files simultaneously:

async def process_multiple_files():
    files = ['doc1.pdf', 'doc2.pdf', 'doc3.pdf']
    
    async with aiohttp.ClientSession() as session:
        tasks = [mineru_parse_async(session, file) for file in files]
        results = await asyncio.gather(*tasks)
    
    return results