-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[Hardware][Intel CPU]use ipex varlen attention to compute prompts for better performance.
#5943
opened Jun 28, 2024 by
jikunshang
Loading…
[core][optimization] use a pool of numpy ndarray to hold seq data
#5942
opened Jun 27, 2024 by
youkaichao
Loading…
[ Misc ] Refactor w8a8 to use
process_weights_after_load
(Simplify Weight Loading)
#5940
opened Jun 27, 2024 by
robertgshaw2-neuralmagic
Loading…
[Kernel] Raise an exception in MoE kernel if the batch size is larger then 65k
#5939
opened Jun 27, 2024 by
comaniac
Loading…
[Bugfix] Only add
Attention.kv_scale
if kv cache quantization is enabled
#5936
opened Jun 27, 2024 by
mgoin
Loading…
[Frontend]openai base64 embedding: remove the message blocker for base64 embedding
#5935
opened Jun 27, 2024 by
llmpros
Loading…
[Bugfix] Fix compute datatype for cutlass 3.x epilogues
#5931
opened Jun 27, 2024 by
tlrmchlsmth
Loading…
[ Misc ] Remove
fp8_shard_indexer
from Col/Row Parallel Linear (Simplify Weight Loading)
#5928
opened Jun 27, 2024 by
robertgshaw2-neuralmagic
Loading…
[Distributed] Make it clear that % should not be in tensor dict keys.
#5927
opened Jun 27, 2024 by
xwjiang2010
Loading…
[Bugfix] fix missing last itl in openai completions benchmark
#5926
opened Jun 27, 2024 by
mcalman
Loading…
[Frontend] Support for chat completions input in the tokenize endpoint
#5923
opened Jun 27, 2024 by
sasha0552
Loading…
[ Bugfix ] Enabling Loading Models With Fused QKV/MLP on Disk with FP8
#5921
opened Jun 27, 2024 by
robertgshaw2-neuralmagic
Loading…
[Kernel] Prototype integration of bytedance/flux kernels
#5917
opened Jun 27, 2024 by
tlrmchlsmth
•
Draft
Warn if user max_model_len is greater than derived max_model_len
#5911
opened Jun 27, 2024 by
fialhocoelho
•
Draft
[Lora] Use safetensor keys instead of adapter_config.json to find unexpected modules.
#5909
opened Jun 27, 2024 by
rkooo567
Loading…
[VLM][BugFix] Make sure that
multi_modal_kwargs
can broadcast properly with ring buffer.
#5905
opened Jun 27, 2024 by
xwjiang2010
Loading…
Previous Next
ProTip!
Type g i on any issue or pull request to go back to the issue listing page.