-
-
Notifications
You must be signed in to change notification settings - Fork 7.1k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[ROCm] Effort to reduce the number of environment variables in command line
ci/build
#17229
opened Apr 26, 2025 by
hongxiayang
Loading…
Use CUDA 12.6 as default for release and nightly wheels
ci/build
documentation
Improvements or additions to documentation
[Bugfix] Get a specific type of layer from forward context
tpu
Related to Google TPUs
v1
#17222
opened Apr 26, 2025 by
heheda12345
Loading…
[Doc] Clarify note for H2O-VL
documentation
Improvements or additions to documentation
#17219
opened Apr 26, 2025 by
DarkLight1337
Loading…
[V1][Spec Decode] Apply torch.compile & cudagraph to EAGLE
documentation
Improvements or additions to documentation
v1
#17211
opened Apr 26, 2025 by
luyuzhe111
Loading…
[Misc][Tools][Benchmark] Publish script to auto tune server parameters
#17207
opened Apr 25, 2025 by
Chenyaaang
Loading…
[Benchmark] Add single turn MTBench to Serving Bench
#17202
opened Apr 25, 2025 by
ekagra-ranjan
Loading…
[Hardware][Apple] Allows VLLM_TARGET_DEVICE=empty on MacOs
ci/build
ready
ONLY add when PR is ready to merge/full CI is needed
#17200
opened Apr 25, 2025 by
wallashss
Loading…
[WIP] Support vLLM in transformers hybrid attention implementation
#17198
opened Apr 25, 2025 by
wuisawesome
Loading…
[Security] Don't bind tcp zmq socket to all interfaces
documentation
Improvements or additions to documentation
security
Security related issues and PRs
[WIP][Bugfix] Fix 'MistralTokenizer' object has no attribute 'init_kwargs'
bug
Something isn't working
ready
ONLY add when PR is ready to merge/full CI is needed
[V1] Remove num_input_tokens from attn_metadata
tpu
Related to Google TPUs
v1
#17193
opened Apr 25, 2025 by
heheda12345
Loading…
[Bugfix] support local dataset path in benchmark_serving
#17179
opened Apr 25, 2025 by
wubai
Loading…
Add option "--expand-tools-even-if-tool-choice-none"
frontend
tool-calling
#17177
opened Apr 25, 2025 by
okdshin
Loading…
[CI] Add mteb testing to test the accuracy of the embedding model
ci/build
#17175
opened Apr 25, 2025 by
noooop
Loading…
[Bugfix] Modifications to error handling of multiple vllm api endpoints
frontend
#17165
opened Apr 25, 2025 by
tunglinwood
Loading…
[Hardware][Power] Enable compressed tensor W8A8 INT8 quantization for POWER
ci/build
#17153
opened Apr 25, 2025 by
Akashcodes732
Loading…
[Misc] Add gemma3 chat template with pythonic-style function calling
documentation
Improvements or additions to documentation
tool-calling
#17149
opened Apr 25, 2025 by
philipchung
Loading…
Add xLAM tool parser support
documentation
Improvements or additions to documentation
frontend
tool-calling
#17148
opened Apr 25, 2025 by
zuxin666
Loading…
[Frontend][TPU] Enforce user input key args to reduce chance of large performance degradation
documentation
Improvements or additions to documentation
frontend
#17145
opened Apr 24, 2025 by
Chenyaaang
Loading…
[Kernel] FP8 quantization fused into V1 Triton Attention
#17143
opened Apr 24, 2025 by
gshtras
Loading…
[ROCm][FP8][Kernel] FP8 quantization fused into Custom Paged Attention
#17139
opened Apr 24, 2025 by
gshtras
Loading…
Previous Next
ProTip!
Follow long discussions with comments:>50.