Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions docs/configuration/env_vars.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,15 +20,15 @@
- `VLLM_GRAPH_PROMPT_STRATEGY`: strategy determining order of prompt graph capture, `min_tokens` or `max_bs`. The default is `min_tokens`.
- `VLLM_GRAPH_DECODE_STRATEGY`: strategy determining order of decode graph capture, `min_tokens` or `max_bs`. The default is `max_bs`.
- `VLLM_EXPONENTIAL_BUCKETING`: if `true`, enables exponential bucket spacing instead of linear. The default is `true`.
- `VLLM_{phase}_{dim}_BUCKET_{param}`: collection of 12 environment variables configuring ranges of bucketing mechanism (linear bucketing only).
- `VLLM_PROMPT_BS_BUCKET_MAX`: `(VLLM_PROMPT_BS_BUCKET_MAX * query) <=max_num_batched_tokens`- prefill batch size max. The default is`1`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please resolve conflicts and we can merge

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is query here? Is query the input seq length?
When default is 1, max_num_batched_tokens < query (chunked prefill) is allowed and often used mode.
Does this breaks chunked prefill (chunk being smaller than query) and all advantages of it?

- `VLLM_{phase}_{dim}_BUCKET_{param}`: collection of environment variables configuring ranges of bucketing mechanism (linear bucketing only).
- `{phase}` is either `PROMPT` or `DECODE`
- `{dim}` is either `BS`, `SEQ` or `BLOCK`
- `{dim}` is either `BS`, `SEQ`, `CTX` or `BLOCK`
- `{param}` is either `MIN`, `STEP` or `MAX`
- Default values:
- Prompt:
- batch size min (`VLLM_PROMPT_BS_BUCKET_MIN`): `1`
- batch size step (`VLLM_PROMPT_BS_BUCKET_STEP`): `min(max_num_seqs, 32)`
- batch size max (`VLLM_PROMPT_BS_BUCKET_MAX`): `min(max_num_seqs, 64)`
- sequence length min (`VLLM_PROMPT_SEQ_BUCKET_MIN`): `block_size`
- sequence length step (`VLLM_PROMPT_SEQ_BUCKET_STEP`): `block_size`
- sequence length max (`VLLM_PROMPT_SEQ_BUCKET_MAX`): `1024`
Expand Down
3 changes: 2 additions & 1 deletion vllm_gaudi/extension/bucketing/exponential.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,9 @@ def check_for_user_flags(self, phase):
params = ['min', 'step', 'max', 'limit']
env_vars = [f'VLLM_{phase}_{dim}_BUCKET_{p}'.upper() for dim in dim for p in params]
user_flags = []
overwritten_user_flags = ["VLLM_PROMPT_BS_BUCKET_MAX"]
for e in env_vars:
if getattr(get_config(), e) is not None:
if getattr(get_config(), e) is not None and e not in overwritten_user_flags:
user_flags.append(e)
if len(user_flags) > 0:
logger().warning("*******************************************************")
Expand Down