[Question]: is it possible to use quad cross attention in sd next ? #3909

MIMIIZ2 · 2025-05-03T17:24:17Z

MIMIIZ2
May 3, 2025

Feature description

is it possible to use in sd next ? -use-quad-cross-attention
in stable diffusion it boost the generation time .

Version Platform Description

No response

vladmandic · 2025-05-03T19:07:40Z

vladmandic
May 3, 2025
Maintainer

there are many attention methods available. 8f this is another one, we can look at it, but pls post link to page that takes about it, not cmd arg from some other app as I don't know what that is?

0 replies

MIMIIZ2 · 2025-05-03T19:14:55Z

MIMIIZ2
May 3, 2025
Author

how can i see them ?

0 replies

MIMIIZ2 · 2025-05-03T19:29:01Z

MIMIIZ2
May 3, 2025
Author

so comfy ui have this thing if you put this in the cmd -use-quad-cross-attention
set COMMANDLINE_ARGS=--auto-launch --reserve-vram 0.2 --cuda-device 0 --gpu-only --use-quad-cross-attention
like this it boost performance by alot
https://github.com/patientx/ComfyUI-Zluda

0 replies

Disty0 · 2025-05-03T19:50:25Z

Disty0
May 3, 2025
Collaborator

We have dynamic attention instead. Set attention method to scaled dot product and enable dynamic atten in sdp options.

I have implemented quad attention for the new backend in the past but never committed as it was using more vram than dynamic atten and was running 2-3 times slower than dynamic atten.

0 replies

MIMIIZ2 · 2025-05-03T19:58:45Z

MIMIIZ2
May 3, 2025
Author

oh i see what is the best fast options that known to run fast ? i have 16 gigs vram and it using 8 only during generation .

0 replies

Disty0 · 2025-05-03T20:02:27Z

Disty0
May 3, 2025
Collaborator

Quad or dynamic atten will trade off performance (runs slower) for better vram usage (uses less vram).

Scaled dot product is always faster unless you are using an edge case device.
Default for zluda should be scaled dot product with dynamic atten enabled.
Disable dynamic atten if you want speed instead.

Also we enable balanced offload by default, you can disable all offloading by setting offload mode to none if you have enough vram.

You can also use Flash Atten but it is more tricky to set up: https://vladmandic.github.io/sdnext-docs/ZLUDA/#how-to-enable-triton
Flash atten should be much faster than scaled dot product while using the same amount of vram as dynamic atten, so it is the best of both worlds.

0 replies

MIMIIZ2 · 2025-05-03T20:17:20Z

MIMIIZ2
May 3, 2025
Author

thanks . seems like default setting are best but it using 8 9 gb .when it using 15.9 gb its more slower how so. but when using 8 its faster ,can i limit the sd next to use 15gb of vram and let 1gb for pc ?

0 replies

Disty0 · 2025-05-03T21:23:50Z

Disty0
May 3, 2025
Collaborator

Dynamic atten has slice rate sliders.

0 replies

MIMIIZ2 · 2025-05-03T21:44:39Z

MIMIIZ2
May 3, 2025
Author

Dynamic Attention slicing rate in GB
Dynamic Attention trigger rate in GB
what is the differece between them / waht case you use them ?

0 replies

Disty0 · 2025-05-03T21:58:33Z

Disty0
May 3, 2025
Collaborator

Dynamic Attention slicing rate in GB Dynamic Attention trigger rate in GB what is the differece between them / waht case you use them ?

If the estimated vram usage for the scaled dot product operation higher than the trigger rate, it will start slicing it until the estimated vram usage goes below the slice rate.

0 replies

MIMIIZ2 · 2025-05-03T22:22:05Z

MIMIIZ2
May 3, 2025
Author

so if i have 16 gb of vram how does it work do i neeed to put 16 or 15?

0 replies

Disty0 · 2025-05-03T22:57:44Z

Disty0
May 3, 2025
Collaborator

This only counts the VRAM usage for the scaled dot product operation, it doesn't care about the total VRAM usage.
Don't give it the full VRAM of your GPU as you need to store the model weights in VRAM and also need VRAM for other stuff too.

Try 4 GB trigger rate and 2 GB slice rate.

0 replies

vladmandic · 2025-05-04T00:36:54Z

vladmandic
May 4, 2025
Maintainer

converting this into a discussion thread instead of an issue.

0 replies

Uh oh!

[Question]: is it possible to use quad cross attention in sd next ? #3909

Uh oh!

MIMIIZ2 May 3, 2025

Feature description

Version Platform Description

Replies: 13 comments

Uh oh!

vladmandic May 3, 2025 Maintainer

Uh oh!

MIMIIZ2 May 3, 2025 Author

Uh oh!

MIMIIZ2 May 3, 2025 Author

Uh oh!

Disty0 May 3, 2025 Collaborator

Uh oh!

Uh oh!

MIMIIZ2 May 3, 2025 Author

Uh oh!

Uh oh!

Disty0 May 3, 2025 Collaborator

Uh oh!

Uh oh!

MIMIIZ2 May 3, 2025 Author

Uh oh!

Disty0 May 3, 2025 Collaborator

Uh oh!

Uh oh!

MIMIIZ2 May 3, 2025 Author

Uh oh!

Uh oh!

Disty0 May 3, 2025 Collaborator

Uh oh!

MIMIIZ2 May 3, 2025 Author

Uh oh!

Disty0 May 3, 2025 Collaborator

Uh oh!

vladmandic May 4, 2025 Maintainer

MIMIIZ2
May 3, 2025

vladmandic
May 3, 2025
Maintainer

MIMIIZ2
May 3, 2025
Author

MIMIIZ2
May 3, 2025
Author

Disty0
May 3, 2025
Collaborator

MIMIIZ2
May 3, 2025
Author

Disty0
May 3, 2025
Collaborator

MIMIIZ2
May 3, 2025
Author

Disty0
May 3, 2025
Collaborator

MIMIIZ2
May 3, 2025
Author

Disty0
May 3, 2025
Collaborator

MIMIIZ2
May 3, 2025
Author

Disty0
May 3, 2025
Collaborator

vladmandic
May 4, 2025
Maintainer