-
Couldn't load subscription status.
- Fork 59
Update of VLLM_PROMPT_BS_BUCKET_MAX logic, real bs change, not only linear warmup #348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Update of VLLM_PROMPT_BS_BUCKET_MAX logic, real bs change, not only linear warmup #348
Conversation
…inear warmup Signed-off-by: Iryna Boiko <[email protected]>
| logger().warning("*******************************************************") | ||
| for flag in user_flags: | ||
| if flag in ("VLLM_PROMPT_BS_BUCKET_MAX"): | ||
| continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I add this flag only wouldn't I have those stars **** with no warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Updated
docs/configuration/env_vars.md
Outdated
| - `VLLM_GRAPH_DECODE_STRATEGY`: strategy determining order of decode graph capture, `min_tokens` or `max_bs`. The default is `max_bs`. | ||
| - `VLLM_EXPONENTIAL_BUCKETING`: if `true`, enables exponential bucket spacing instead of linear. The default is `true`. | ||
| - `VLLM_PROMPT_BS_BUCKET_MAX`: prefill batch size max. The default is `1`. | ||
| - `VLLM_{phase}_{dim}_BUCKET_{param}`: collection of 12 environment variables configuring ranges of bucketing mechanism (linear bucketing only). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We no longer have here 12 environment variables (even before your change there were more). Let's either don't mention a number or fix it to a proper value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Updated
docs/configuration/env_vars.md
Outdated
| - `VLLM_PROMPT_BS_BUCKET_MAX`: prefill batch size max. The default is `1`. | ||
| - `VLLM_{phase}_{dim}_BUCKET_{param}`: collection of 12 environment variables configuring ranges of bucketing mechanism (linear bucketing only). | ||
| - `{phase}` is either `PROMPT` or `DECODE` | ||
| - `{dim}` is either `BS`, `SEQ` or `BLOCK` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
according to the env variables we list here dim can also be CTX
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Updated
docs/configuration/env_vars.md
Outdated
| - `VLLM_GRAPH_PROMPT_STRATEGY`: strategy determining order of prompt graph capture, `min_tokens` or `max_bs`. The default is `min_tokens`. | ||
| - `VLLM_GRAPH_DECODE_STRATEGY`: strategy determining order of decode graph capture, `min_tokens` or `max_bs`. The default is `max_bs`. | ||
| - `VLLM_EXPONENTIAL_BUCKETING`: if `true`, enables exponential bucket spacing instead of linear. The default is `true`. | ||
| - `VLLM_PROMPT_BS_BUCKET_MAX`: prefill batch size max. The default is `1`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a recomendation that VLLM_PROMPT_BS_BUCKET_MAX should be no bigger than max_num_batched_tokens as per issue seen in #331?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes > @iboiko-habana
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Updated
Signed-off-by: Iryna Boiko <[email protected]>
Signed-off-by: Iryna Boiko <[email protected]>
Signed-off-by: Iryna Boiko <[email protected]>
Signed-off-by: Iryna Boiko <[email protected]>
| - `VLLM_GRAPH_DECODE_STRATEGY`: strategy determining order of decode graph capture, `min_tokens` or `max_bs`. The default is `max_bs`. | ||
| - `VLLM_EXPONENTIAL_BUCKETING`: if `true`, enables exponential bucket spacing instead of linear. The default is `true`. | ||
| - `VLLM_{phase}_{dim}_BUCKET_{param}`: collection of 12 environment variables configuring ranges of bucketing mechanism (linear bucketing only). | ||
| - `VLLM_PROMPT_BS_BUCKET_MAX`: `(VLLM_PROMPT_BS_BUCKET_MAX * query) <=max_num_batched_tokens`- prefill batch size max. The default is`1`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please resolve conflicts and we can merge
| - `VLLM_GRAPH_DECODE_STRATEGY`: strategy determining order of decode graph capture, `min_tokens` or `max_bs`. The default is `max_bs`. | ||
| - `VLLM_EXPONENTIAL_BUCKETING`: if `true`, enables exponential bucket spacing instead of linear. The default is `true`. | ||
| - `VLLM_{phase}_{dim}_BUCKET_{param}`: collection of 12 environment variables configuring ranges of bucketing mechanism (linear bucketing only). | ||
| - `VLLM_PROMPT_BS_BUCKET_MAX`: `(VLLM_PROMPT_BS_BUCKET_MAX * query) <=max_num_batched_tokens`- prefill batch size max. The default is`1`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is query here? Is query the input seq length?
When default is 1, max_num_batched_tokens < query (chunked prefill) is allowed and often used mode.
Does this breaks chunked prefill (chunk being smaller than query) and all advantages of it?
…inear warmup