How does RadixAttention implements multi-head/multi-query/grouped-query attention. #652

Griffintaur · 2024-04-29T14:11:45Z

Griffintaur
Apr 29, 2024

how does radix-attention function call need to be modified in sglang for a model implemented in vllm where paged attention takes care of multi-query and grouped-query architecture

Answered by merrymercy

Aug 13, 2024

All of them are supported without any specific modification.

View full answer

merrymercy · 2024-08-13T07:20:14Z

merrymercy
Aug 13, 2024
Maintainer

All of them are supported without any specific modification.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How does RadixAttention implements multi-head/multi-query/grouped-query attention. #652

{{title}}

Replies: 1 comment

{{title}}

Select a reply

How does RadixAttention implements multi-head/multi-query/grouped-query attention. #652

Griffintaur Apr 29, 2024

Replies: 1 comment

merrymercy Aug 13, 2024 Maintainer

Griffintaur
Apr 29, 2024

merrymercy
Aug 13, 2024
Maintainer