vllm-project / vllm-gaudi Public

Notifications You must be signed in to change notification settings
Fork 70
Star 16

Code
Issues 1
Pull requests 56
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Pull requests: vllm-project/vllm-gaudi

Labels 12 Milestones 0

New pull request New

56 Open 523 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

Port: Fix prefix caching automatic off with conti pa (#583)

#586 opened Nov 19, 2025 by adobrzyn

Loading…

Sleep mode support

#584 opened Nov 18, 2025 by Kacper-Pietkun • Draft

[DOCKER update] update docker to 1.23, transformers to 4.56.0

#580 opened Nov 17, 2025 by xuechendi • Draft

Add support of FP32 softmax to unified attention

#577 opened Nov 17, 2025 by afierka-intel • Draft

Cherry-pick release docker cmdline fixes, WA and long context support

#576 opened Nov 17, 2025 by nngokhale

Loading…

Implementing softmax_fa2 in partial_attn shared and causal

#566 opened Nov 13, 2025 by ksmusz • Draft

Docs: Missing content from Habana docs documentation

Improvements or additions to documentation

skip-gaudi-tests

#562 opened Nov 13, 2025 by mhelf-intel

Loading…

Add a plugin for variable support in Markdown documentation

Improvements or additions to documentation

skip-gaudi-tests

#554 opened Nov 12, 2025 by mhelf-intel

Loading…

fix loading fp8 static quantized model for compressored_tensors format.

#552 opened Nov 11, 2025 by lkk12014402

Loading…

Prepare Unified Attention biases on HPU + add NumPy memory pooling

#550 opened Nov 7, 2025 by kzawora-intel

Loading…

Refactor part of spec decode structure identical to vLLM

#544 opened Nov 7, 2025 by jerrychenhf

Loading…

Michalkuligowski patch 7

#542 opened Nov 6, 2025 by michalkuligowski • Draft

[SW-228042] Add support for dynamic vLLM kv-cache quantization

#538 opened Nov 6, 2025 by dudilester

Loading…

[Attention Metadata Overhaul 2/N] Move metadata processing outside HPUModelAdapter, prepare biases on CPU

#530 opened Nov 5, 2025 by kzawora-intel

Loading…

[Attention Metadata Overhaul 1/N] Extract metadata update to HPUAttentionMetadataProcessor

#526 opened Nov 5, 2025 by kzawora-intel

Loading…

enable lmcache

#521 opened Nov 5, 2025 by hsubramony

Loading…

reduce graph recompilations in input embeddings for Gemma3

#519 opened Nov 4, 2025 by skaulintel • Draft

Call shutdown_inc to mitiagate driver worker teardown order

#511 opened Nov 3, 2025 by michalkuligowski • Draft

Udpate TESTOWNERS

#495 opened Oct 28, 2025 by jbyczkow

Loading…

Initial Commit GPT-OSS

#485 opened Oct 28, 2025 by hlahkar

Loading…

[SW-242794] Fix not warmed up decode buckets

#484 opened Oct 28, 2025 by jbyczkow • Draft

[Attention Metadata Overhaul 3/N] Add per-layer attention metadata

#475 opened Oct 24, 2025 by kzawora-intel • Draft

Enable triangular mask with valid_seq_lengths

#454 opened Oct 23, 2025 by kamil-kaczor

Loading…

enable gdr on 10.2 baseline

#431 opened Oct 20, 2025 by hsubramony • Draft

Fix for Llama4 static quantization

#430 opened Oct 20, 2025 by vidyasiv

Loading…

Previous 1 2 3 Next

Previous Next

ProTip! Find all pull requests that aren't related to any open issues with -linked:issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!