Some questions about TTFT and TPOT benchmarks #1229

sitabulaixizawaluduo · 2024-08-27T09:01:43Z

sitabulaixizawaluduo
Aug 27, 2024

It's great to see an easy-to-use and high-performance LLM inference tool available. I tested a model of Llama 8B on an L40 GPU, but the test results were somewhat confusing, which I hope can be answered.
I use 200 pieces of data with a length of 800 for testing, and the output length is forced to 32 using ignore_eos = True. When using the same request-rate, sglang's TTFT P99 latency is 50% higher than that of lmdeploy, but when I set the output length to 1, the two are basically the same, and even sglang is better. From the code point of view, sglang uses the prefill priority policy for inference. At the same time, after decoding, it will be cycled according to the settings num_continue_decode_steps in global_config, and then get new requests from the request pool and perform prefill calculations. It seems that this strategy has some impact on TTFT. I would like to ask, what is the main basis for setting this parameter, and what is the impact on the result? Thanks.
The lmdeploy version I use 0.5.3, the sglang version is 0.2.9, sglang does not have chunked prefill and prefix caching enabled

Answered by merrymercy

Sep 22, 2024

When the streaming is enabled, sglang won't use num_continue_decode_steps, according to this

sglang/python/sglang/srt/managers/tp_worker.py

Line 282 in 13f1357

if self.out_pyobjs and self.running_batch.has_stream:

These configs are set with some simple heuristic so you can play with these arguments for your own workloads.

View full answer

merrymercy · 2024-09-22T09:53:08Z

merrymercy
Sep 22, 2024
Maintainer

When the streaming is enabled, sglang won't use num_continue_decode_steps, according to this

sglang/python/sglang/srt/managers/tp_worker.py

Line 282 in 13f1357

if self.out_pyobjs and self.running_batch.has_stream:

These configs are set with some simple heuristic so you can play with these arguments for your own workloads.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some questions about TTFT and TPOT benchmarks #1229

{{title}}

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Some questions about TTFT and TPOT benchmarks #1229

sitabulaixizawaluduo Aug 27, 2024

Replies: 1 comment

merrymercy Sep 22, 2024 Maintainer

sitabulaixizawaluduo
Aug 27, 2024

merrymercy
Sep 22, 2024
Maintainer