Skip to content

Conversation

ScottTodd
Copy link
Member

@ScottTodd ScottTodd commented Sep 9, 2025

Motivation

Fixes #1040, enabling aotriton for flash attention in pytorch (if it works). This is expected to improve performance in workloads like ComfyUI image generation by upwards of 60% (e.g. 12.6 it/s to 20.0 it/s).

Technical Details

Follow-up to #1432 and depends on pytorch/pytorch#162330.

Note that support is experimental for some GPUs like gfx1100, so the TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1 environment variable may be needed to try aotriton on those systems.

Test Plan

Trigger either https://github.com/ROCm/TheRock/actions/workflows/build_windows_pytorch_wheels.yml or https://github.com/ROCm/TheRock/actions/workflows/release_windows_pytorch_wheels.yml across the matrix of GPU families once that PyTorch PR is merged.

We're still going to need automated tests and documentation for this. I'd like numerics tests running somewhere and documentation that shows how to check which pytorch features are enabled in the wheels that a user installs.

Test Result

Test runs:

(may need to retrigger to pick up fixes for flaky checkouts)

Submission Checklist

@jammm jammm self-requested a review September 10, 2025 11:53
@Nem404
Copy link

Nem404 commented Sep 10, 2025

Eyy "jammm self-requested a review" that's what I'm talkin' 'bout 🔥

Now we just wait for Jeff in pytorch/pytorch#162330 👀

@Nem404
Copy link

Nem404 commented Sep 11, 2025

@ScottTodd
Copy link
Member Author

pytorch/pytorch@62843c1 is merged. Triggering some test runs on this.

@ScottTodd
Copy link
Member Author

Test runs built successfully. Sanity check tests did not run for unrelated reasons. I'd say this is ready for review/merge now.

@ScottTodd ScottTodd marked this pull request as ready for review September 12, 2025 04:02
@Nem404
Copy link

Nem404 commented Sep 12, 2025

Waiting for @jammm's approval - it looks like tomorrow's wheels can include aotriton 💯

@jammm
Copy link
Contributor

jammm commented Sep 12, 2025

Already approved :)

@Nem404
Copy link

Nem404 commented Sep 12, 2025

Already approved :)

Oh, then for Scott to merge :D

@0xDELUXA
Copy link

0xDELUXA commented Sep 12, 2025

Just wanted to share that I got my hands on a PyTorch wheel built with AOTriton.

On gfx1200, we also need to set the TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1 env var - otherwise it only shows a warning and doesn’t use SDPA.

But that’s not really an issue, because once we set it, performance basically doubles. It's really awesome.

@ScottTodd ScottTodd merged commit 5996147 into main Sep 12, 2025
11 of 13 checks passed
@github-project-automation github-project-automation bot moved this from TODO to Done in TheRock Triage Sep 12, 2025
@ScottTodd ScottTodd deleted the users/scotttodd/torch-windows-aotriton-flip branch September 12, 2025 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[Windows] Enable [ao]triton in PyTorch wheels
4 participants