How does RadixAttention implements multi-head/multi-query/grouped-query attention. #652
-
how does radix-attention function call need to be modified in sglang for a model implemented in vllm where paged attention takes care of multi-query and grouped-query architecture |
Beta Was this translation helpful? Give feedback.
Answered by
merrymercy
Aug 13, 2024
Replies: 1 comment
-
All of them are supported without any specific modification. |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
merrymercy
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
All of them are supported without any specific modification.