Skip to content

Commit da11d1b

Browse files
committed
Bump v2.6.0
1 parent d0787ac commit da11d1b

File tree

3 files changed

+8
-3
lines changed

3 files changed

+8
-3
lines changed

README.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -314,6 +314,11 @@ Implement deterministic backward pass. Thanks to engineers from [Meituan](www.me
314314
Support paged KV cache (i.e., [PagedAttention](https://arxiv.org/abs/2309.06180)).
315315
Thanks to @beginlner for this contribution.
316316

317+
### 2.6: Softcapping.
318+
319+
Support attention with softcapping, as used in Gemma-2 and Grok models.
320+
Thanks to @Narsil for this contribution.
321+
317322
## Performance
318323

319324
We present expected speedup (combined forward + backward pass) and memory savings from using FlashAttention against PyTorch standard attention, depending on sequence length, on different GPUs (speedup depends on memory bandwidth - we see more speedup on slower GPU memory).

flash_attn/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
__version__ = "2.5.9.post1"
1+
__version__ = "2.6.0"
22

33
from flash_attn.flash_attn_interface import (
44
flash_attn_func,

training/Dockerfile

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,7 @@ RUN pip install transformers==4.25.1 datasets==2.8.0 pytorch-lightning==1.8.6 tr
8585
RUN pip install git+https://github.com/mlcommons/[email protected]
8686

8787
# Install FlashAttention
88-
RUN pip install flash-attn==2.5.9.post1
88+
RUN pip install flash-attn==2.6.0
8989

9090
# Install CUDA extensions for fused dense
91-
RUN pip install git+https://github.com/HazyResearch/flash-attention@v2.5.9.post1#subdirectory=csrc/fused_dense_lib
91+
RUN pip install git+https://github.com/HazyResearch/flash-attention@v2.6.0#subdirectory=csrc/fused_dense_lib

0 commit comments

Comments
 (0)