Replies: 13 comments
-
|
there are many attention methods available. 8f this is another one, we can look at it, but pls post link to page that takes about it, not cmd arg from some other app as I don't know what that is? |
Beta Was this translation helpful? Give feedback.
-
|
how can i see them ? |
Beta Was this translation helpful? Give feedback.
-
|
so comfy ui have this thing if you put this in the cmd -use-quad-cross-attention |
Beta Was this translation helpful? Give feedback.
-
|
We have dynamic attention instead. Set attention method to scaled dot product and enable dynamic atten in sdp options. I have implemented quad attention for the new backend in the past but never committed as it was using more vram than dynamic atten and was running 2-3 times slower than dynamic atten. |
Beta Was this translation helpful? Give feedback.
-
|
oh i see what is the best fast options that known to run fast ? i have 16 gigs vram and it using 8 only during generation . |
Beta Was this translation helpful? Give feedback.
-
|
Quad or dynamic atten will trade off performance (runs slower) for better vram usage (uses less vram). Scaled dot product is always faster unless you are using an edge case device. Also we enable balanced offload by default, you can disable all offloading by setting offload mode to none if you have enough vram. You can also use Flash Atten but it is more tricky to set up: https://vladmandic.github.io/sdnext-docs/ZLUDA/#how-to-enable-triton |
Beta Was this translation helpful? Give feedback.
-
|
thanks . seems like default setting are best but it using 8 9 gb .when it using 15.9 gb its more slower how so. but when using 8 its faster ,can i limit the sd next to use 15gb of vram and let 1gb for pc ? |
Beta Was this translation helpful? Give feedback.
-
|
Dynamic atten has slice rate sliders. |
Beta Was this translation helpful? Give feedback.
-
|
Dynamic Attention slicing rate in GB |
Beta Was this translation helpful? Give feedback.
-
If the estimated vram usage for the scaled dot product operation higher than the trigger rate, it will start slicing it until the estimated vram usage goes below the slice rate. |
Beta Was this translation helpful? Give feedback.
-
|
so if i have 16 gb of vram how does it work do i neeed to put 16 or 15? |
Beta Was this translation helpful? Give feedback.
-
|
This only counts the VRAM usage for the scaled dot product operation, it doesn't care about the total VRAM usage. Try 4 GB trigger rate and 2 GB slice rate. |
Beta Was this translation helpful? Give feedback.
-
|
converting this into a discussion thread instead of an issue. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Feature description
is it possible to use in sd next ? -use-quad-cross-attention
in stable diffusion it boost the generation time .
Version Platform Description
No response
Beta Was this translation helpful? Give feedback.
All reactions