Can A100 run this model?

#4
by traphix - opened

Can A100 run this model?

Yes, I have done

Yes, I have done

Can you share the deployment command?

docker run -d --gpus all
--name vllm-minimax-m3
--restart unless-stopped
--privileged --ipc=host -p 8088:8088
-v /home/models/minimax3:/models/minimax3
docker.m.daocloud.io/vllm/vllm-openai:minimax-m3
/models/minimax3
--host 0.0.0.0
--port 8088
--served-model-name minimax-m3
--block-size 128
--max-model-len 131072
--gpu-memory-utilization 0.975
--tensor-parallel-size 8
--tool-call-parser minimax_m3
--enable-auto-tool-choice
--reasoning-parser minimax_m3

I am using 8 A100 80GB GPUs, but the context length can only reach 128K; anything beyond that causes out-of-memory (OOM) errors.

Sign up or log in to comment