ROCm - Performance gain with PYTORCH_TUNABLEOP #4056
NTFSynergy
started this conversation in
Ideas
Replies: 1 comment
-
|
good find - since you're most up-to-date with that info, care to do wiki updates? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
As per ROCm/issues/5040
Some consumer grade AMD GPUs have performance issues with current stable version (6.4.2) of ROCm+nightly PyTorch 2.9 until patch from AMD is available. This might take a while.
Since the toggle "Tunable ops" in Settings/Models & Loading controls PYTORCH_TUNABLEOP feature (correct me if I'm wrong) is it possible to mention this issue and (possible) workaround on the ROCm part of the wiki? Maybe have it default enabled on affected HW in dev branch?
Only some AMD GPUs may be affected, however 9070/XT is affected for sure. With this feature enabled I can get from stock avg 1.4-1.6it/s to 3.0-3.2it/s using SDXL at 1024x1024 (3.7-3.9it/s with HiDiffusion).
There might be some performance gains on 7000 series as well. Sadly, I can't test this on current stable pytorch (2.7), because I must use Pytorch nightly for 9070XT support.
As it is stated in the ROCm issue above, this might cause abnormal behavior using LoRAs (distortions), however I can't replicate that.
Beta Was this translation helpful? Give feedback.
All reactions