Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
shepardxia
left a comment
There was a problem hiding this comment.
Format is a bit weird, are you using pre-commit hooks? Might be worth checking the codecov too. But otherwise backend itself looks good to me!
genlm/backend/llm/vllm.py
Outdated
| os.environ.setdefault("VLLM_USE_V1", "1") | ||
| os.environ.setdefault("VLLM_ENABLE_V1_MULTIPROCESSING", "0") |
There was a problem hiding this comment.
cleaner to move this inside the try block
| "disable_log_requests": True, | ||
| "disable_async_output_proc": True, # This parameter forces vLLM to use v0, which is currently what we want to do. | ||
| "disable_log_stats": True, | ||
| "gpu_memory_utilization": 0.5, |
| with self._lock: | ||
| self._captured_batch = None | ||
|
|
||
| class AsyncVirtualLM(AsyncLM): # pragma: no cover |
| logprobs = torch.log_softmax(logits, dim=-1, dtype=logits.dtype) | ||
| with self._lock: | ||
| # Single clone of entire batch - O(1) instead of O(batch_size) | ||
| self._captured_batch = logprobs.clone() |
There was a problem hiding this comment.
Could it ever happen that this step overwrites logprobs if apply() is called multiple times per step? Or is it guaranteed that we call it only once?
| @@ -0,0 +1,370 @@ | |||
| #!/usr/bin/env python3 | |||
There was a problem hiding this comment.
@yahya010 do you have some numbers from the benchmark?
| # Clean up distributed state | ||
| destroy_model_parallel() | ||
| destroy_distributed_environment() | ||
| except Exception: |
There was a problem hiding this comment.
What exceptions could we get here? It would be better to catch the specific exception.
| return results | ||
|
|
||
|
|
||
| def print_comparison( |
| self.log_probs = torch.log_softmax(logits, dim=-1, dtype=logits.dtype) | ||
| logging.getLogger("vllm").setLevel(logging.WARNING) | ||
|
|
||
| class GlobalLogprobsCapture(LogitsProcessor): # pragma: no cover |
There was a problem hiding this comment.
Could we add a test instead of skipping it?
Update to vLLM v1