-
Notifications
You must be signed in to change notification settings - Fork 92
[torch] Flip --enable-pytorch-flash-attention-windows
for releases.
#1437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Eyy "jammm self-requested a review" that's what I'm talkin' 'bout 🔥 Now we just wait for Jeff in pytorch/pytorch#162330 👀 |
pytorch/pytorch@62843c1 is merged. Triggering some test runs on this. |
Test runs built successfully. Sanity check tests did not run for unrelated reasons. I'd say this is ready for review/merge now. |
Waiting for @jammm's approval - it looks like tomorrow's wheels can include aotriton 💯 |
Already approved :) |
Oh, then for Scott to merge :D |
Just wanted to share that I got my hands on a PyTorch wheel built with AOTriton. On gfx1200, we also need to set the But that’s not really an issue, because once we set it, performance basically doubles. It's really awesome. |
Motivation
Fixes #1040, enabling aotriton for flash attention in pytorch (if it works). This is expected to improve performance in workloads like ComfyUI image generation by upwards of 60% (e.g. 12.6 it/s to 20.0 it/s).
Technical Details
Follow-up to #1432 and depends on pytorch/pytorch#162330.
Note that support is experimental for some GPUs like gfx1100, so the
TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1
environment variable may be needed to try aotriton on those systems.Test Plan
Trigger either https://github.com/ROCm/TheRock/actions/workflows/build_windows_pytorch_wheels.yml or https://github.com/ROCm/TheRock/actions/workflows/release_windows_pytorch_wheels.yml across the matrix of GPU families once that PyTorch PR is merged.
We're still going to need automated tests and documentation for this. I'd like numerics tests running somewhere and documentation that shows how to check which pytorch features are enabled in the wheels that a user installs.
Test Result
Test runs:
7.0.0rc20250908
for gfx110X-dgpuhttps://github.com/ROCm/TheRock/actions/runs/17660456285 using the branch and7.0.0rc20250908
for gfx11517.0.0rc20250908
for gfx1151(may need to retrigger to pick up fixes for flaky checkouts)
Submission Checklist