Some questions about TTFT and TPOT benchmarks #1229
-
It's great to see an easy-to-use and high-performance LLM inference tool available. I tested a model of Llama 8B on an L40 GPU, but the test results were somewhat confusing, which I hope can be answered. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
When the streaming is enabled, sglang won't use These configs are set with some simple heuristic so you can play with these arguments for your own workloads. |
Beta Was this translation helpful? Give feedback.
When the streaming is enabled, sglang won't use
num_continue_decode_steps
, according to thissglang/python/sglang/srt/managers/tp_worker.py
Line 282 in 13f1357
These configs are set with some simple heuristic so you can play with these arguments for your own workloads.