Skip to content

Conversation

@MagellaX
Copy link
Owner

@MagellaX MagellaX commented Oct 7, 2025

Summary by cubic

Adds a single-sweep Triton backward for attention (dQ/dK/dV via saved LSE when dropout_p == 0) and makes the Triton forward feature-complete with masks, dropout, deterministic seeding, and ALiBi. This reduces SDPA fallbacks and enables reproducible training/inference behavior.

  • New Features
    • Streaming Triton backward when dropout_p == 0; falls back to SDPA if training with dropout.
    • Deterministic mode: set_deterministic(enabled, seed) seeds Philox for reproducible dropout/mask sampling.
    • Forward supports boolean/additive attn_mask, dropout, and ALiBi bias directly in Triton (no SDPA fallback needed).
    • Expanded forward signature: forward(..., dropout_p, alibi_slopes, deterministic); added mask prep for [B, H, M, N].
    • Added tests for mask parity, dropout determinism, and backward parity (with mask + ALiBi); updated docs/README usage.

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 4 files

@MagellaX MagellaX merged commit ced9a28 into main Oct 7, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants