Mistral Nemo -> CUDA out of memory on 2 x 80GB H100 #1437
-
Hello everybody, i am unable to start SGLang with Mistral Nemo. Details bellow. python3 -m sglang.check_env
gpustat
python3 -m sglang.launch_server --model-path mistralai/Mistral-Nemo-Instruct-2407 --tp-size 2 --enable-p2p-check
|
Beta Was this translation helpful? Give feedback.
Replies: 4 comments
-
You can set a smaller value for
|
Beta Was this translation helpful? Give feedback.
-
Thank you, any Ideea on how to get SGLang to use the 2 available GPUs instead of just one? This way, i wouldn't need to use --mem-fraction-static |
Beta Was this translation helpful? Give feedback.
-
It turns out that there is something wrong with the model config of SGlang reads the model's context length based on
You can also try the model's full 128k context length. This also works.
In all cases (32k, 128, or --mem-fraction-static 0.8), sglang will use 2 gpus because you set |
Beta Was this translation helpful? Give feedback.
-
Wow, thanks a lot! Just for "history": |
Beta Was this translation helpful? Give feedback.
It turns out that there is something wrong with the model config of
mistralai/Mistral-Nemo-Instruct-2407
.SGlang reads the model's context length based on
max_position_embeddings
in the model config. However, in this model, they set it to 1024k while this model is only trained with 128k.https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407/blob/main/config.json#L13
This makes sglang incorrectly estimate the memory usage of memory pool, so we need to correct it with either
--mem-fraction-static
or--context-length
.For example, if your use case only needs 32k context length, you can do