vllm-project / vllm Public

Sponsor vllm-project/vllm
Notifications
Fork 7.1k
Star 45.8k

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Pull requests: vllm-project/vllm

Labels 47 Milestones 1

New pull request New

599 Open 7,827 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

[ROCm] Effort to reduce the number of environment variables in command line ci/build

#17229 opened Apr 26, 2025 by hongxiayang

Loading…

Use CUDA 12.6 as default for release and nightly wheels ci/build documentation

Improvements or additions to documentation

#17224 opened Apr 26, 2025 by huydhn • Draft

[Bugfix] Get a specific type of layer from forward context tpu

Related to Google TPUs

#17222 opened Apr 26, 2025 by heheda12345

Loading…

[Doc] Clarify note for H2O-VL documentation

Improvements or additions to documentation

#17219 opened Apr 26, 2025 by DarkLight1337

Loading…

[V1][Spec Decode] Apply torch.compile & cudagraph to EAGLE documentation

Improvements or additions to documentation

#17211 opened Apr 26, 2025 by luyuzhe111

Loading…

[Misc][Tools][Benchmark] Publish script to auto tune server parameters

#17207 opened Apr 25, 2025 by Chenyaaang

Loading…

[Benchmark] Add single turn MTBench to Serving Bench

#17202 opened Apr 25, 2025 by ekagra-ranjan

Loading…

[Misc]add configurable cuda graph size

#17201 opened Apr 25, 2025 by CXIAAAAA

Loading…

[Hardware][Apple] Allows VLLM_TARGET_DEVICE=empty on MacOs ci/build ready

ONLY add when PR is ready to merge/full CI is needed

#17200 opened Apr 25, 2025 by wallashss

Loading…

[WIP] Support vLLM in transformers hybrid attention implementation

#17198 opened Apr 25, 2025 by wuisawesome

Loading…

[Security] Don't bind tcp zmq socket to all interfaces documentation

Improvements or additions to documentation

security

Security related issues and PRs

#17197 opened Apr 25, 2025 by russellb

Loading…

v0.8.5

[Bugfix] Fix Lora Name Parsing

#17196 opened Apr 25, 2025 by alex-jw-brooks

Loading…

[WIP][Bugfix] Fix 'MistralTokenizer' object has no attribute 'init_kwargs' bug

Something isn't working

ready

ONLY add when PR is ready to merge/full CI is needed

#17195 opened Apr 25, 2025 by chaunceyjiang

Loading…

v0.8.5

[V1] Remove num_input_tokens from attn_metadata tpu

Related to Google TPUs

#17193 opened Apr 25, 2025 by heheda12345

Loading…

[Bugfix] support local dataset path in benchmark_serving

#17179 opened Apr 25, 2025 by wubai

Loading…

Add option "--expand-tools-even-if-tool-choice-none" frontend tool-calling

#17177 opened Apr 25, 2025 by okdshin

Loading…

[CI] Add mteb testing to test the accuracy of the embedding model ci/build

#17175 opened Apr 25, 2025 by noooop

Loading…

[Bugfix] Modifications to error handling of multiple vllm api endpoints frontend

#17165 opened Apr 25, 2025 by tunglinwood

Loading…

[Hardware][Power] Enable compressed tensor W8A8 INT8 quantization for POWER ci/build

#17153 opened Apr 25, 2025 by Akashcodes732

Loading…

[Misc] Add gemma3 chat template with pythonic-style function calling documentation

Improvements or additions to documentation

tool-calling

#17149 opened Apr 25, 2025 by philipchung

Loading…

Add xLAM tool parser support documentation

Improvements or additions to documentation

frontend tool-calling

#17148 opened Apr 25, 2025 by zuxin666

Loading…

[Model] Mamba2 causal conv1d Refactor to Split Prefill and Decode Requests for Corresponding Kernels

#17146 opened Apr 25, 2025 by cyang49 • Draft

[Frontend][TPU] Enforce user input key args to reduce chance of large performance degradation documentation

Improvements or additions to documentation

frontend

#17145 opened Apr 24, 2025 by Chenyaaang

Loading…

[Kernel] FP8 quantization fused into V1 Triton Attention

#17143 opened Apr 24, 2025 by gshtras

Loading…

[ROCm][FP8][Kernel] FP8 quantization fused into Custom Paged Attention

#17139 opened Apr 24, 2025 by gshtras

Loading…

Previous 1 2 3 4 5 … 23 24 Next

Previous Next

ProTip! Follow long discussions with comments:>50.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly