-
Does SGLang automatically manage memory when a large batch (e.g. 100 batch size with avg. token per sequence 6,000) is submitted for processing? I have been getting OOM errors when increasing my batch sizes, which is surprising because I thought the memory management was automatic. |
Beta Was this translation helpful? Give feedback.
Answered by
hnyls2002
Feb 12, 2024
Replies: 1 comment
-
@pj-ml, could you try reducing the argument https://github.com/sgl-project/sglang/?tab=readme-ov-file#additional-arguments |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
pj-ml
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
@pj-ml, could you try reducing the argument
--mem-fraction-static
? When the batch size increases, the backend may require more GPU memory for temporary usage.https://github.com/sgl-project/sglang/?tab=readme-ov-file#additional-arguments