[Question]Why is the default value of max_prefill_tokens 16384? #1468

wjj19950828 · 2024-09-19T02:56:50Z

wjj19950828
Sep 19, 2024

Why is the default value of max_prefill_tokens 16384? How is it set?

Is this parameter similar to max_num_batched_tokens in vLLM? For example, if it is a 32k model, does it need to be adjusted to 32k? If I adjust it to 32k, will there be an OOM problem?

Thanks~

Answered by merrymercy

Sep 22, 2024

You do not need to worry about this.
By default, sglang uses chunked prefill with a chunk size of 8k

sglang/python/sglang/srt/server_args.py

Lines 323 to 328 in 13f1357

     parser.add_argument(  
   "--chunked-prefill-size",  
   type=int,  
   default=ServerArgs.chunked_prefill_size,  
   help="The maximum number of tokens in a chunk for the chunked prefill. Setting this to -1 means disabling chunked prefill",  
   )  

 

.

If you have a 32k model, it will run it chunk by chunk, so no any addition settings are required to support it.

max_prefill_tokens is a legacy config for the cases where chunked prefill is not enabled. In that case, you need to update it to 32k for your 32k model

View full answer

wjj19950828 · 2024-09-19T11:36:40Z

wjj19950828
Sep 19, 2024
Author

@zhyncs Do you have any suggestions? Thanks~

0 replies

merrymercy · 2024-09-22T09:28:15Z

merrymercy
Sep 22, 2024
Maintainer

You do not need to worry about this.
By default, sglang uses chunked prefill with a chunk size of 8k

sglang/python/sglang/srt/server_args.py

Lines 323 to 328 in 13f1357

    
           parser.add_argument( 
        
               "--chunked-prefill-size", 
        
               type=int, 
        
               default=ServerArgs.chunked_prefill_size, 
        
               help="The maximum number of tokens in a chunk for the chunked prefill. Setting this to -1 means disabling chunked prefill", 
        
           )

.

If you have a 32k model, it will run it chunk by chunk, so no any addition settings are required to support it.

max_prefill_tokens is a legacy config for the cases where chunked prefill is not enabled. In that case, you need to update it to 32k for your 32k model

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]Why is the default value of max_prefill_tokens 16384? #1468

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

	parser.add_argument(
	"--chunked-prefill-size",
	type=int,
	default=ServerArgs.chunked_prefill_size,
	help="The maximum number of tokens in a chunk for the chunked prefill. Setting this to -1 means disabling chunked prefill",
	)

[Question]Why is the default value of max_prefill_tokens 16384? #1468

wjj19950828 Sep 19, 2024

Replies: 2 comments

wjj19950828 Sep 19, 2024 Author

merrymercy Sep 22, 2024 Maintainer

wjj19950828
Sep 19, 2024

wjj19950828
Sep 19, 2024
Author

merrymercy
Sep 22, 2024
Maintainer