-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Issues: vllm-project/vllm
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[Bug]: llama-3.2-11B-vision run in vllm==0.6.3 OOM error(L20)
bug
Something isn't working
#10569
opened Nov 22, 2024 by
Jamrainbow
1 task done
[Doc]: Docker+vllm+fastchat deploys multimodal large model Qwen2-vl-7b-instruct(docker+vllm+fastchat部署多模态大模型Qwen2-vl-7b-instruct)
documentation
Improvements or additions to documentation
#10566
opened Nov 22, 2024 by
Aanlifire
1 task done
[Feature]: How to run speculative models with tensor parallelism?
feature request
#10562
opened Nov 22, 2024 by
cxxuser
1 task done
[Bug]: Speculative Decoding without enabling eager mode returns gibberish output after some tokens.
bug
Something isn't working
#10559
opened Nov 22, 2024 by
andoorve
1 task done
[Usage]: Docker w/ CPU fails when defining VLLM_CPU_OMP_THREADS_BIND
usage
How to use vllm
#10556
opened Nov 22, 2024 by
ccruttjr
[Usage]: Can we extend the context length of gemma2 model or other models?
usage
How to use vllm
#10548
opened Nov 21, 2024 by
hahmad2008
1 task done
[Installation]: can't get the cu118 version of vllm 0.6.3 by https://github.com/vllm-project/vllm/releases/download/v0.6.3/vllm-0.6.3+cu118-cp310-cp310-manylinux1_x86_64.whl
installation
Installation problems
#10540
opened Nov 21, 2024 by
mayfool
1 task done
[Feature]: Support for Registering Model-Specific Default Sampling Parameters
feature request
#10539
opened Nov 21, 2024 by
yansh97
1 task done
[Usage]: How to use ROPE scaling for llama3.1 and gemma2?
usage
How to use vllm
#10537
opened Nov 21, 2024 by
hahmad2008
1 task done
[Usage]: Fail to load params.json
usage
How to use vllm
#10534
opened Nov 21, 2024 by
dequeueing
1 task done
[Bug]: Authorization ignored when root_path is set
bug
Something isn't working
#10531
opened Nov 21, 2024 by
OskarLiew
1 task done
[Usage]: Optimizing TTFT for Qwen2.5-72B Model Deployment on A800 GPUs for RAG Application
usage
How to use vllm
#10527
opened Nov 21, 2024 by
zhanghx0905
1 task done
[Feature]: Additional possible value for
tool_choice
: required
feature request
#10526
opened Nov 21, 2024 by
fahadh4ilyas
1 task done
[Bug]: torch.distributed.DistBackendError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1333, unhandled system error (run with NCCL_DEBUG=INFO for details), NCCL version 2.18.1
bug
Something isn't working
#10523
opened Nov 21, 2024 by
QualityGN
1 task done
[Usage]: when i set --tensor-parallel-size 4 ,openai server dose not work . Report a new Exception
usage
How to use vllm
#10521
opened Nov 21, 2024 by
Geek-Peng
1 task done
[Usage]: What's the relationship between KV cache and MAX_SEQUENCE_LENGTH.
usage
How to use vllm
#10517
opened Nov 21, 2024 by
GRuuuuu
1 task done
[Bug]: Model does not split in multiple Gpus instead it occupy same memory on each GPU
bug
Something isn't working
#10516
opened Nov 21, 2024 by
anilkumar0502
1 task done
[Feature]: Manually inject Prefix KV Cache
feature request
#10515
opened Nov 21, 2024 by
toilaluan
1 task done
[Bug]: I'm trying to run Pixtral-Large-Instruct-2411 using vllm, following the documentation at https://huggingface.co/mistralai/Pixtral-Large-Instruct-2411, but I encountered an error.
bug
Something isn't working
#10512
opened Nov 21, 2024 by
eii-lyl
1 task done
[Usage]: KVcache usage for different tasks in batch
usage
How to use vllm
#10509
opened Nov 21, 2024 by
Lukas-123
1 task done
Metrics model name when using multiple loras
bug
Something isn't working
#10504
opened Nov 20, 2024 by
mces89
1 task done
[Feature]: Support outlines versions > v0.1.0
feature request
#10489
opened Nov 20, 2024 by
Treparme
1 task done
[Bug]: LLaMA3.2-1B finetuned w/ Sentence Transformer. --> ValueError: Model architectures ['LlamaModel'] are not supported for now.
bug
Something isn't working
#10481
opened Nov 20, 2024 by
thusinh1969
1 task done
Previous Next
ProTip!
no:milestone will show everything without a milestone.