vLLM v1 by yahya010 · Pull Request #62 · genlm/genlm-backend

yahya010 · 2026-02-03T15:54:17Z

Update to vLLM v1

codecov · 2026-02-03T19:04:25Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

shepardxia

Format is a bit weird, are you using pre-commit hooks? Might be worth checking the codecov too. But otherwise backend itself looks good to me!

shepardxia · 2026-02-06T02:46:55Z

genlm/backend/llm/vllm.py

+os.environ.setdefault("VLLM_USE_V1", "1")
+os.environ.setdefault("VLLM_ENABLE_V1_MULTIPROCESSING", "0")


cleaner to move this inside the try block

vicky-xef · 2026-04-07T17:00:03Z

genlm/backend/llm/vllm.py

-                "disable_log_requests": True,
-                "disable_async_output_proc": True,  # This parameter forces vLLM to use v0, which is currently what we want to do.
+                "disable_log_stats": True,
+                "gpu_memory_utilization": 0.5,


why is this set to 0.5?

vicky-xef · 2026-04-07T17:02:54Z

genlm/backend/llm/vllm.py

+            with self._lock:
+                self._captured_batch = None
+
+    class AsyncVirtualLM(AsyncLM):  # pragma: no cover


no coverage here?

samuki · 2026-04-08T09:04:12Z

genlm/backend/llm/vllm.py

+            logprobs = torch.log_softmax(logits, dim=-1, dtype=logits.dtype)
+            with self._lock:
+                # Single clone of entire batch - O(1) instead of O(batch_size)
+                self._captured_batch = logprobs.clone()


Could it ever happen that this step overwrites logprobs if apply() is called multiple times per step? Or is it guaranteed that we call it only once?

samuki · 2026-04-08T09:06:00Z

benchmark/benchmark_v0_v1.py

@@ -0,0 +1,370 @@
+#!/usr/bin/env python3


@yahya010 do you have some numbers from the benchmark?

samuki · 2026-04-08T09:37:55Z

genlm/backend/llm/vllm.py

+                # Clean up distributed state
                destroy_model_parallel()
                destroy_distributed_environment()
+            except Exception:


What exceptions could we get here? It would be better to catch the specific exception.

samuki · 2026-04-08T09:42:19Z

benchmark/benchmark_v0_v1.py

+    return results
+
+
+def print_comparison(


This function is never called

samuki · 2026-04-08T09:44:00Z

genlm/backend/llm/vllm.py

-            self.log_probs = torch.log_softmax(logits, dim=-1, dtype=logits.dtype)
+    logging.getLogger("vllm").setLevel(logging.WARNING)
+
+    class GlobalLogprobsCapture(LogitsProcessor):  # pragma: no cover


Could we add a test instead of skipping it?

yahya010 added 2 commits December 21, 2025 14:28

v1

dd14c19

benchmark + batch

bf7cf81

yahya010 marked this pull request as draft February 3, 2026 16:53

yahya010 added 2 commits February 3, 2026 18:58

async

af2d57b

ruff

c52bb37

yahya010 added 6 commits February 3, 2026 19:18

ruff

0f089ce

ruff

8d7d48e

more

15eaf6c

again

ccc8b06

t4 skip

033f7a0

remove cudaonly

d790aa6

yahya010 requested a review from shepardxia February 3, 2026 21:00

shepardxia reviewed Feb 6, 2026

View reviewed changes

update env var

3ac26ed

yahya010 requested a review from benlebrun February 6, 2026 20:25

vicky-xef requested review from ClementeP, samuki, shepardxia and vicky-xef April 1, 2026 09:15

postylem removed the request for review from benlebrun April 2, 2026 16:20

update & pin

c2d9726

yahya010 marked this pull request as ready for review April 3, 2026 15:36

samuki mentioned this pull request Apr 7, 2026

Update vllm.py to support the V1 engine #53

Open

vicky-xef reviewed Apr 7, 2026

View reviewed changes

samuki reviewed Apr 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vLLM v1#62

vLLM v1#62
yahya010 wants to merge 12 commits intomainfrom
yahya/v1

yahya010 commented Feb 3, 2026

Uh oh!

codecov bot commented Feb 3, 2026 •

edited

Loading

Uh oh!

shepardxia left a comment

Uh oh!

shepardxia Feb 6, 2026

Uh oh!

vicky-xef Apr 7, 2026

Uh oh!

vicky-xef Apr 7, 2026

Uh oh!

samuki Apr 8, 2026

Uh oh!

samuki Apr 8, 2026

Uh oh!

samuki Apr 8, 2026

Uh oh!

samuki Apr 8, 2026

Uh oh!

samuki Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		os.environ.setdefault("VLLM_USE_V1", "1")
		os.environ.setdefault("VLLM_ENABLE_V1_MULTIPROCESSING", "0")

Conversation

yahya010 commented Feb 3, 2026

Uh oh!

codecov bot commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

shepardxia left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov bot commented Feb 3, 2026 •

edited

Loading