Skip to content

[Question]Why is the default value of max_prefill_tokens 16384? #1468

Closed Answered by merrymercy
wjj19950828 asked this question in Q&A
Discussion options

You must be logged in to vote

You do not need to worry about this.
By default, sglang uses chunked prefill with a chunk size of 8k

parser.add_argument(
"--chunked-prefill-size",
type=int,
default=ServerArgs.chunked_prefill_size,
help="The maximum number of tokens in a chunk for the chunked prefill. Setting this to -1 means disabling chunked prefill",
)
.

If you have a 32k model, it will run it chunk by chunk, so no any addition settings are required to support it.

max_prefill_tokens is a legacy config for the cases where chunked prefill is not enabled. In that case, you need to update it to 32k for your 32k model

Replies: 2 comments

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Answer selected by merrymercy
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants
Converted from issue

This discussion was converted from issue #1460 on September 19, 2024 08:28.