Enhance FlashAttention and FlexAttention benchmarks with paged Attention #3616

mfrancepillois · 2025-03-05T16:56:17Z

Paged Attention or paged KV cache is a technique widely used in LLM models to improve the performance (by reducing memory usage).
FlashAttention and FlexAttention benchmarks should be improved to implement and evaluate the performance of paged Attention.

mfrancepillois changed the title ~~Enhance FlashAttention and FlexAttention benchmark with paged Attention~~ Enhance FlashAttention and FlexAttention benchmarks with paged Attention Mar 5, 2025

mfrancepillois mentioned this issue Mar 5, 2025

[FlexAttention] Add initial benchmarks #3578

Open

vlad-penkin added this to the 4. [Performance] Core milestone Mar 5, 2025

vlad-penkin added codegen: attention performance labels Mar 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance FlashAttention and FlexAttention benchmarks with paged Attention #3616

Enhance FlashAttention and FlexAttention benchmarks with paged Attention #3616

mfrancepillois commented Mar 5, 2025 •

edited

Loading

Enhance FlashAttention and FlexAttention benchmarks with paged Attention #3616

Enhance FlashAttention and FlexAttention benchmarks with paged Attention #3616

Comments

mfrancepillois commented Mar 5, 2025 • edited Loading

mfrancepillois commented Mar 5, 2025 •

edited

Loading