Skip to content

Conversation

@Anri-Lombard
Copy link

Checklist

  • Confirmed that cargo run-checks command has been executed.
  • Made sure the book is up to date with changes in this PR.

Related Issues/PRs

#4096 (windowed attention portion)

Changes

Adds windowed self-attention module:

  • generate_sliding_window_mask() utility in mask.rs
  • WindowedAttention module with causal and bidirectional modes
  • WindowedAttentionCache - rolling KV cache storing only last window_size pairs

Still needed:

  • GQA/MQA support
  • Chunked attention for true O(n × w) compute

cc @huy209vn - I got started, feel free to add as you wish. I'll add as well when I have some time.

Testing

  • Shape tests for causal and bidirectional modes
  • Masking correctness (modifying out-of-window positions doesn't affect output)
  • Padding mask correctness
  • Cache truncation to window_size
  • Cache equivalence with non-cached forward pass
  • Invalid config panics (d_model not divisible by n_heads)
  • Display formatting

- tried to stick to cross attention conventions
- still need features like cqa and mqa
@codecov
Copy link

codecov bot commented Dec 10, 2025

Codecov Report

❌ Patch coverage is 98.39744% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.48%. Comparing base (196ec1a) to head (fe16d36).

Files with missing lines Patch % Lines
...urn-nn/src/modules/attention/windowed_attention.rs 98.11% 5 Missing ⚠️

❌ Your project check has failed because the head coverage (68.48%) is below the target coverage (80.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4151      +/-   ##
==========================================
+ Coverage   68.42%   68.48%   +0.05%     
==========================================
  Files        1281     1282       +1     
  Lines      157264   157576     +312     
==========================================
+ Hits       107602   107909     +307     
- Misses      49662    49667       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@huy209vn
Copy link
Contributor

Hey, I’m working on sliding-window attention for an audio perception project now. I’m planning to implement optimized CubeCL kernels for this ...wanted to coordinate so we’re not duplicating effort. What's the current approach..

@Anri-Lombard
Copy link
Author

I'm happy to go with your approach rather if you want, then you can reference it and I'll check how it deviates from mine and if mine would add value. I'm not continuing work on it for now.

@laggui laggui changed the title Initial implementation WindowedAttention Initial implementation Dec 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants