Skip to content

vLLM v1#62

Open
yahya010 wants to merge 12 commits intomainfrom
yahya/v1
Open

vLLM v1#62
yahya010 wants to merge 12 commits intomainfrom
yahya/v1

Conversation

@yahya010
Copy link
Copy Markdown

@yahya010 yahya010 commented Feb 3, 2026

Update to vLLM v1

@yahya010 yahya010 marked this pull request as draft February 3, 2026 16:53
@codecov
Copy link
Copy Markdown

codecov bot commented Feb 3, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@yahya010 yahya010 requested a review from shepardxia February 3, 2026 21:00
Copy link
Copy Markdown
Contributor

@shepardxia shepardxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Format is a bit weird, are you using pre-commit hooks? Might be worth checking the codecov too. But otherwise backend itself looks good to me!

Comment on lines +11 to +12
os.environ.setdefault("VLLM_USE_V1", "1")
os.environ.setdefault("VLLM_ENABLE_V1_MULTIPROCESSING", "0")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleaner to move this inside the try block

@yahya010 yahya010 requested a review from benlebrun February 6, 2026 20:25
@postylem postylem removed the request for review from benlebrun April 2, 2026 16:20
@yahya010 yahya010 marked this pull request as ready for review April 3, 2026 15:36
"disable_log_requests": True,
"disable_async_output_proc": True, # This parameter forces vLLM to use v0, which is currently what we want to do.
"disable_log_stats": True,
"gpu_memory_utilization": 0.5,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this set to 0.5?

with self._lock:
self._captured_batch = None

class AsyncVirtualLM(AsyncLM): # pragma: no cover
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no coverage here?

logprobs = torch.log_softmax(logits, dim=-1, dtype=logits.dtype)
with self._lock:
# Single clone of entire batch - O(1) instead of O(batch_size)
self._captured_batch = logprobs.clone()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it ever happen that this step overwrites logprobs if apply() is called multiple times per step? Or is it guaranteed that we call it only once?

@@ -0,0 +1,370 @@
#!/usr/bin/env python3
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yahya010 do you have some numbers from the benchmark?

# Clean up distributed state
destroy_model_parallel()
destroy_distributed_environment()
except Exception:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exceptions could we get here? It would be better to catch the specific exception.

return results


def print_comparison(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is never called

self.log_probs = torch.log_softmax(logits, dim=-1, dtype=logits.dtype)
logging.getLogger("vllm").setLevel(logging.WARNING)

class GlobalLogprobsCapture(LogitsProcessor): # pragma: no cover
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a test instead of skipping it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants