-
Couldn't load subscription status.
- Fork 59
Update of VLLM_PROMPT_BS_BUCKET_MAX logic, real bs change, not only linear warmup #348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 1 commit
7a56062
6e44955
1b7a399
63fb078
d46a7a7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -20,6 +20,7 @@ | |
| - `VLLM_GRAPH_PROMPT_STRATEGY`: strategy determining order of prompt graph capture, `min_tokens` or `max_bs`. The default is `min_tokens`. | ||
| - `VLLM_GRAPH_DECODE_STRATEGY`: strategy determining order of decode graph capture, `min_tokens` or `max_bs`. The default is `max_bs`. | ||
| - `VLLM_EXPONENTIAL_BUCKETING`: if `true`, enables exponential bucket spacing instead of linear. The default is `true`. | ||
| - `VLLM_PROMPT_BS_BUCKET_MAX`: prefill batch size max. The default is `1`. | ||
| - `VLLM_{phase}_{dim}_BUCKET_{param}`: collection of 12 environment variables configuring ranges of bucketing mechanism (linear bucketing only). | ||
|
||
| - `{phase}` is either `PROMPT` or `DECODE` | ||
| - `{dim}` is either `BS`, `SEQ` or `BLOCK` | ||
|
||
|
|
@@ -28,7 +29,6 @@ | |
| - Prompt: | ||
| - batch size min (`VLLM_PROMPT_BS_BUCKET_MIN`): `1` | ||
| - batch size step (`VLLM_PROMPT_BS_BUCKET_STEP`): `min(max_num_seqs, 32)` | ||
| - batch size max (`VLLM_PROMPT_BS_BUCKET_MAX`): `min(max_num_seqs, 64)` | ||
| - sequence length min (`VLLM_PROMPT_SEQ_BUCKET_MIN`): `block_size` | ||
| - sequence length step (`VLLM_PROMPT_SEQ_BUCKET_STEP`): `block_size` | ||
| - sequence length max (`VLLM_PROMPT_SEQ_BUCKET_MAX`): `1024` | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -23,6 +23,8 @@ def check_for_user_flags(self, phase): | |
| if len(user_flags) > 0: | ||
| logger().warning("*******************************************************") | ||
| for flag in user_flags: | ||
| if flag in ("VLLM_PROMPT_BS_BUCKET_MAX"): | ||
| continue | ||
|
||
| logger().warning( | ||
| f"Using Exponential Strategy - Your configuration {flag}={getattr(get_config(), flag)} will be overwritten!" | ||
| ) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a recomendation that VLLM_PROMPT_BS_BUCKET_MAX should be no bigger than max_num_batched_tokens as per issue seen in #331?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes > @iboiko-habana
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Updated