-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Issues: vllm-project/vllm
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[Bug]: Illegal memory access for MoE kernel with large workloads
bug
Something isn't working
#5938
opened Jun 27, 2024 by
comaniac
[Bug]: FP8 checkpoints with fused linear modules fail to load scales correctly
bug
Something isn't working
#5915
opened Jun 27, 2024 by
mgoin
[Bug]: TRACKING ISSUE: CUDA OOM with Logprobs
bug
Something isn't working
#5907
opened Jun 27, 2024 by
robertgshaw2-neuralmagic
[Bug]: Internal Server Error when hosting Salesforce/SFR-Embedding-Mistral
bug
Something isn't working
#5906
opened Jun 27, 2024 by
markkofler
[Bug]: TRACKING ISSUE: Something isn't working
AsyncEngineDeadError
bug
#5901
opened Jun 27, 2024 by
robertgshaw2-neuralmagic
1 of 10 tasks
[Misc]: How can I serve multiple models on a single port using the OpenAI API?
misc
#5899
opened Jun 27, 2024 by
SuiJiGuoChengSuiJiGuo
[Bug]: Inconsistent Responses with VLLM When Batch Size > 1 even temperature = 0
bug
Something isn't working
#5898
opened Jun 27, 2024 by
gjgjos
[Bug]: Incoherent error message when using MLPSpeculator and Something isn't working
num_speculative_tokens
is set too high
bug
#5893
opened Jun 27, 2024 by
tdoublep
[Feature]: Add distributed inference support for lora adapters.
feature request
#5891
opened Jun 27, 2024 by
xuanyaoming
[Bug]: Query with logprobs and echo crashes vllm (llama-3-8b-instruct)
bug
Something isn't working
#5890
opened Jun 27, 2024 by
yaronr
[Installation]: Wrong torch header file referenced when compiling from source
installation
Installation problems
#5889
opened Jun 27, 2024 by
crazy-JiangDongHua
[Bug]: Exception in ASGI application
bug
Something isn't working
#5881
opened Jun 27, 2024 by
houshuai-cs
[Bug]: vllm stuck when using prompt_token_ids and setting prompt_logprobs
bug
Something isn't working
#5872
opened Jun 26, 2024 by
xinyangz
[Bug]: LLaVa Next Value Error - "Incorrect type of image sizes" when running in Docker
bug
Something isn't working
#5868
opened Jun 26, 2024 by
FennFlyer
[Bug]: The end_sync operation inside the cross_device_reduce_2stage kernel sometimes deadlocks because it can't wait for the end signal.
bug
Something isn't working
#5866
opened Jun 26, 2024 by
JiantaoXu
[Usage]: Can I get the streaming output when using offline inference?
usage
How to use vllm
#5862
opened Jun 26, 2024 by
jiangjiadi
[Misc]: CUDAGraph captured generation stuck with custom_all_reduce and tensor_parallel=2
misc
#5854
opened Jun 26, 2024 by
nuzant
[Bug]: server error when hosting TheBloke/Llama-2-7B-Chat-GPTQ with chunked-prefill
bug
Something isn't working
#5853
opened Jun 26, 2024 by
George-ao
[Bug]: OutOfMemoryError when loading a small model with a huge context length
bug
Something isn't working
#5847
opened Jun 25, 2024 by
alugowski
[Feature]: Support in distributed speculative inference
feature request
#5835
opened Jun 25, 2024 by
keyboardAnt
Previous Next
ProTip!
Adding no:label will show everything without a label.