Unable to Load Gemma2 Model with SGLANG #1869

hahmad2008 · 2024-11-01T10:45:04Z

Checklist

1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
5. Please use English, otherwise it will be closed.

Describe the bug

I am experiencing issues loading the Gemma2-9B model on a 24GB GPU using SGLANG. Despite trying various values for mem_fraction_static, I consistently encounter out-of-memory errors when increasing the value. If I decrease it, I receive a negative value for self.max_total_num_tokens.

In contrast, I can successfully load the same model on the 24GB GPU using VLLM with a GPU memory utilization of 0.95. Any guidance on resolving this would be appreciated.

Reproduction

Environment

sglang: 0.3.3.post1

The text was updated successfully, but these errors were encountered:

hahmad2008 · 2024-11-04T19:09:14Z

@yileld sorry for tagging. Any idea?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to Load Gemma2 Model with SGLANG #1869

Unable to Load Gemma2 Model with SGLANG #1869

hahmad2008 commented Nov 1, 2024

hahmad2008 commented Nov 4, 2024

Unable to Load Gemma2 Model with SGLANG #1869

Unable to Load Gemma2 Model with SGLANG #1869

Comments

hahmad2008 commented Nov 1, 2024

Checklist

Describe the bug

Reproduction

Environment

hahmad2008 commented Nov 4, 2024