|
|
@@ -737,9 +737,9 @@ mineru -p <input_path> -o <output_path> -b vlm-sglang-client -u http://127.0.0.1
|
|
|
> Below are some suggestions and notes for using the sglang acceleration mode:
|
|
|
> - The sglang acceleration mode currently supports operation on Turing architecture GPUs with a minimum of 8GB VRAM, but you may encounter VRAM shortages on GPUs with less than 24GB VRAM. You can optimize VRAM usage with the following parameters:
|
|
|
> - If running on a single GPU and encountering VRAM shortage, reduce the KV cache size by setting `--mem-fraction-static 0.5`. If VRAM issues persist, try lowering it further to `0.4` or below.
|
|
|
-> - If you have more than one GPU, you can expand available VRAM using tensor parallelism (TP) mode: `--tp 2`
|
|
|
+> - If you have more than one GPU, you can expand available VRAM using tensor parallelism (TP) mode: `--tp-size 2`
|
|
|
> - If you are already successfully using sglang to accelerate VLM inference but wish to further improve inference speed, consider the following parameters:
|
|
|
-> - If using multiple GPUs, increase throughput using sglang's multi-GPU parallel mode: `--dp 2`
|
|
|
+> - If using multiple GPUs, increase throughput using sglang's multi-GPU parallel mode: `--dp-size 2`
|
|
|
> - You can also enable `torch.compile` to accelerate inference speed by about 15%: `--enable-torch-compile`
|
|
|
> - For more information on using sglang parameters, please refer to the [sglang official documentation](https://docs.sglang.ai/backend/server_arguments.html#common-launch-commands)
|
|
|
> - All sglang-supported parameters can be passed to MinerU via command-line arguments, including those used with the following commands: `mineru`, `mineru-sglang-server`, `mineru-gradio`, `mineru-api`
|
|
|
@@ -761,7 +761,7 @@ mineru -p <input_path> -o <output_path> -b vlm-sglang-client -u http://127.0.0.1
|
|
|
> - Below are some possible use cases:
|
|
|
> - If you have multiple GPUs and need to specify GPU 0 and GPU 1 to launch 'sglang-server' in multi-GPU mode, you can use the following command:
|
|
|
> ```bash
|
|
|
-> CUDA_VISIBLE_DEVICES=0,1 mineru-sglang-server --port 30000 --dp 2
|
|
|
+> CUDA_VISIBLE_DEVICES=0,1 mineru-sglang-server --port 30000 --dp-size 2
|
|
|
> ```
|
|
|
> - If you have multiple GPUs and need to launch two `fastapi` services on GPU 0 and GPU 1 respectively, listening on different ports, you can use the following commands:
|
|
|
> ```bash
|