You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Paged Attention or paged KV cache is a technique widely used in LLM models to improve the performance (by reducing memory usage).
FlashAttention and FlexAttention benchmarks should be improved to implement and evaluate the performance of paged Attention.
The text was updated successfully, but these errors were encountered:
mfrancepillois
changed the title
Enhance FlashAttention and FlexAttention benchmark with paged Attention
Enhance FlashAttention and FlexAttention benchmarks with paged Attention
Mar 5, 2025
Paged Attention or paged KV cache is a technique widely used in LLM models to improve the performance (by reducing memory usage).
FlashAttention and FlexAttention benchmarks should be improved to implement and evaluate the performance of paged Attention.
The text was updated successfully, but these errors were encountered: