Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance FlashAttention and FlexAttention benchmarks with paged Attention #3616

Open
mfrancepillois opened this issue Mar 5, 2025 · 0 comments

Comments

@mfrancepillois
Copy link
Contributor

mfrancepillois commented Mar 5, 2025

Paged Attention or paged KV cache is a technique widely used in LLM models to improve the performance (by reducing memory usage).
FlashAttention and FlexAttention benchmarks should be improved to implement and evaluate the performance of paged Attention.

@mfrancepillois mfrancepillois changed the title Enhance FlashAttention and FlexAttention benchmark with paged Attention Enhance FlashAttention and FlexAttention benchmarks with paged Attention Mar 5, 2025
@vlad-penkin vlad-penkin added this to the 4. [Performance] Core milestone Mar 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants