Why vLLM
was not used for srt
#1770
Closed
Venkat2811
started this conversation in
General
Replies: 1 comment 3 replies
-
The vLLM dependency will be removed |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello @merrymercy @zhyncs,
Thanks for this amazing project, it was very easy to get it running locally. I've been scouting
vLLM
andsglang
InfEngine impl details and few recent PRs. I was also catching up on your meetings on YouTube channel & learning materials repo. I also see that you are using vLLM as a dependency and using some implementation parts. I am no expert by any means, just trying to understand:RadixAttention
as one of the attention in vLLM.Would like to understand your vision and roadmap for this project. I understand that current vLLM architecture makes it difficult to implement large changes like the ones in sglang. vLLM team is working on architecture 2.0 to address several pain points. vLLM team is also doing various improvements to decrease CPU overhead, supporting different types of KV caches, multi-process API server and engine, TP, PP, EP, LMCache, structured output generation, etc.,
Since you've also worked on vLLM project, sharing the details for your motivation would be very helpful. Thanks in advance !
Thanks,
Venkat
Beta Was this translation helpful? Give feedback.
All reactions