Xiaomeng Zhao d3f6736e0a Update _config_endpoint.py 4 kuukautta sitten
..
README.md 3f32f2a587 Update README.md 5 kuukautta sitten
README_zh.md 3f32f2a587 Update README.md 5 kuukautta sitten
_config_endpoint.py d3f6736e0a Update _config_endpoint.py 4 kuukautta sitten
client.py dbfd392f05 add updated example project based on 2.0 5 kuukautta sitten
server.py c08a86d6c7 Update projects/multi_gpu_v2/server.py 4 kuukautta sitten

README.md

MinerU v2.0 Multi-GPU Server

简体中文

A streamlined multi-GPU server implementation.

Quick Start

1. install MinerU

pip install --upgrade pip
pip install uv
uv pip install -U "mineru[core]"
uv pip install litserve aiohttp loguru

2. Start the Server

python server.py

3. Start the Client

python client.py

Now, pdf files under folder demo will be processed in parallel. Assuming you have 2 gpus, if you change the workers_per_device to 2, 4 pdf files will be processed at the same time!

Customize

Server

Example showing how to start the server with custom settings:

server = ls.LitServer(
    MinerUAPI(output_dir='/tmp/mineru_output'),
    accelerator='auto',  # You can specify 'cuda'
    devices='auto',  # "auto" uses all available GPUs
    workers_per_device=1,  # One worker instance per GPU
    timeout=False  # Disable timeout for long processing
)
server.run(port=8000, generate_client_file=False)

Client

The client supports both synchronous and asynchronous processing:

import asyncio
import aiohttp
from client import mineru_parse_async

async def process_documents():
    async with aiohttp.ClientSession() as session:
        # Basic usage
        result = await mineru_parse_async(session, 'document.pdf')
        
        # With custom options
        result = await mineru_parse_async(
            session, 
            'document.pdf',
            backend='pipeline',
            lang='ch',
            formula_enable=True,
            table_enable=True
        )

# Run async processing
asyncio.run(process_documents())

Concurrent Processing

Process multiple files simultaneously:

async def process_multiple_files():
    files = ['doc1.pdf', 'doc2.pdf', 'doc3.pdf']
    
    async with aiohttp.ClientSession() as session:
        tasks = [mineru_parse_async(session, file) for file in files]
        results = await asyncio.gather(*tasks)
    
    return results