[Question]Why is the default value of max_prefill_tokens 16384? #1468
-
Why is the default value of max_prefill_tokens 16384? How is it set? Is this parameter similar to Thanks~ |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
@zhyncs Do you have any suggestions? Thanks~ |
Beta Was this translation helpful? Give feedback.
-
You do not need to worry about this. sglang/python/sglang/srt/server_args.py Lines 323 to 328 in 13f1357 If you have a 32k model, it will run it chunk by chunk, so no any addition settings are required to support it.
|
Beta Was this translation helpful? Give feedback.
You do not need to worry about this.
By default, sglang uses chunked prefill with a chunk size of 8k
sglang/python/sglang/srt/server_args.py
Lines 323 to 328 in 13f1357
If you have a 32k model, it will run it chunk by chunk, so no any addition settings are required to support it.
max_prefill_tokens
is a legacy config for the cases where chunked prefill is not enabled. In that case, you need to update it to 32k for your 32k model