Skip to content

Commit c2ed069

Browse files
authored
[BugFix] Fix mixed penalties batch with async scheduling (vllm-project#27910)
Signed-off-by: Nick Hill <[email protected]>
1 parent af6e19f commit c2ed069

File tree

1 file changed

+8
-0
lines changed

1 file changed

+8
-0
lines changed

vllm/v1/sample/ops/penalties.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,14 @@ def apply_all_penalties(
2121
"""
2222
_, vocab_size = logits.shape
2323
output_tokens_t = _convert_to_tensors(output_token_ids, vocab_size, logits.device)
24+
25+
# In the async scheduling case, rows that won't have penalties applied may contain
26+
# -1 placeholder token ids. We must replace these with valid token ids so that the
27+
# scatter done in apply_penalties is valid.
28+
# NOTE(nick): The penalties implementation is currently quite inefficient and
29+
# will be reworked anyhow.
30+
output_tokens_t.masked_fill_(output_tokens_t == -1, vocab_size)
31+
2432
return apply_penalties(
2533
logits,
2634
prompt_token_ids,

0 commit comments

Comments
 (0)