[torch] Flip `--enable-pytorch-flash-attention-windows` for releases. #1437

ScottTodd · 2025-09-09T20:19:08Z

Motivation

Fixes #1040, enabling aotriton for flash attention in pytorch (if it works). This is expected to improve performance in workloads like ComfyUI image generation by upwards of 60% (e.g. 12.6 it/s to 20.0 it/s).

Technical Details

Follow-up to #1432 and depends on pytorch/pytorch#162330.

Note that support is experimental for some GPUs like gfx1100, so the TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1 environment variable may be needed to try aotriton on those systems.

Test Plan

Trigger either https://github.com/ROCm/TheRock/actions/workflows/build_windows_pytorch_wheels.yml or https://github.com/ROCm/TheRock/actions/workflows/release_windows_pytorch_wheels.yml across the matrix of GPU families once that PyTorch PR is merged.

We're still going to need automated tests and documentation for this. I'd like numerics tests running somewhere and documentation that shows how to check which pytorch features are enabled in the wheels that a user installs.

Test Result

Test runs:

https://github.com/ROCm/TheRock/actions/runs/17660396787 using this branch and 7.0.0rc20250908 for gfx110X-dgpu
~~https://github.com/ROCm/TheRock/actions/runs/17660456285 using the branch and 7.0.0rc20250908 for gfx1151~~
https://github.com/ROCm/TheRock/actions/runs/17662170140 using the branch and 7.0.0rc20250908 for gfx1151
- Tests not running should be fixed with Use cloudfront_staging_url for Windows pytorch testing workflow. #1469

(may need to retrigger to pick up fixes for flaky checkouts)

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Nem404 · 2025-09-10T12:08:59Z

Eyy "jammm self-requested a review" that's what I'm talkin' 'bout 🔥

Now we just wait for Jeff in pytorch/pytorch#162330 👀

Nem404 · 2025-09-11T15:40:25Z

pytorch/pytorch#162330 (comment) 🎉

ScottTodd · 2025-09-12T00:04:28Z

pytorch/pytorch@62843c1 is merged. Triggering some test runs on this.

ScottTodd · 2025-09-12T04:02:54Z

Test runs built successfully. Sanity check tests did not run for unrelated reasons. I'd say this is ready for review/merge now.

Nem404 · 2025-09-12T07:29:59Z

Waiting for @jammm's approval - it looks like tomorrow's wheels can include aotriton 💯

jammm · 2025-09-12T07:33:25Z

Already approved :)

Nem404 · 2025-09-12T07:41:20Z

Already approved :)

Oh, then for Scott to merge :D

0xDELUXA · 2025-09-12T07:48:22Z

Just wanted to share that I got my hands on a PyTorch wheel built with AOTriton.

On gfx1200, we also need to set the TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1 env var - otherwise it only shows a warning and doesn’t use SDPA.

But that’s not really an issue, because once we set it, performance basically doubles. It's really awesome.

[torch] Flip --enable-pytorch-flash-attention-windows for releases.

9e1f84e

github-project-automation bot added this to TheRock Triage Sep 9, 2025

github-project-automation bot moved this to TODO in TheRock Triage Sep 9, 2025

Mark flash attention as supported in docs.

1f267ff

jammm self-requested a review September 10, 2025 11:53

jammm approved these changes Sep 10, 2025

View reviewed changes

Merge branch 'main' into users/scotttodd/torch-windows-aotriton-flip

603bdc3

ScottTodd marked this pull request as ready for review September 12, 2025 04:02

ScottTodd merged commit 5996147 into main Sep 12, 2025
11 of 13 checks passed

github-project-automation bot moved this from TODO to Done in TheRock Triage Sep 12, 2025

ScottTodd deleted the users/scotttodd/torch-windows-aotriton-flip branch September 12, 2025 14:36

Nem404 mentioned this pull request Sep 12, 2025

[Windows] Enable [ao]triton in PyTorch wheels #1040

Closed

jammm mentioned this pull request Sep 18, 2025

[Issue]: rocm7rc crashes (OOM) on Windows due to lack of vram management and recycling mechanism #1494

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[torch] Flip `--enable-pytorch-flash-attention-windows` for releases. #1437

[torch] Flip `--enable-pytorch-flash-attention-windows` for releases. #1437

Uh oh!

ScottTodd commented Sep 9, 2025 •

edited

Loading

Uh oh!

Nem404 commented Sep 10, 2025

Uh oh!

Nem404 commented Sep 11, 2025 •

edited

Loading

Uh oh!

ScottTodd commented Sep 12, 2025

Uh oh!

ScottTodd commented Sep 12, 2025

Uh oh!

Nem404 commented Sep 12, 2025

Uh oh!

jammm commented Sep 12, 2025

Uh oh!

Nem404 commented Sep 12, 2025

Uh oh!

0xDELUXA commented Sep 12, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

[torch] Flip --enable-pytorch-flash-attention-windows for releases. #1437

[torch] Flip --enable-pytorch-flash-attention-windows for releases. #1437

Uh oh!

Conversation

ScottTodd commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

Nem404 commented Sep 10, 2025

Uh oh!

Nem404 commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ScottTodd commented Sep 12, 2025

Uh oh!

ScottTodd commented Sep 12, 2025

Uh oh!

Nem404 commented Sep 12, 2025

Uh oh!

jammm commented Sep 12, 2025

Uh oh!

Nem404 commented Sep 12, 2025

Uh oh!

0xDELUXA commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[torch] Flip `--enable-pytorch-flash-attention-windows` for releases. #1437

[torch] Flip `--enable-pytorch-flash-attention-windows` for releases. #1437

ScottTodd commented Sep 9, 2025 •

edited

Loading

Nem404 commented Sep 11, 2025 •

edited

Loading

0xDELUXA commented Sep 12, 2025 •

edited

Loading