Why choose a workgroup size of 64 by default for compute shaders? #195

BassP97 · 2024-12-30T23:02:36Z

BassP97
Dec 30, 2024

Hi all! I'm curious about this passage from the webgpu compute shaders lesson:

Unfortunately, the perfect size is GPU dependent and WebGPU can not provide that info. The general advice for WebGPU is to choose a workgroup size of 64 unless you have some specific reason to choose another size. Apparently most GPUs can efficiently run 64 things in lockstep. If you choose a higher number and the GPU can’t do it as a fast path it will chose a slower path. If on the other hand you chose a number below what the GPU can do then you may not get the maximum performance.

Coming from a cuda background, this is confusing! Judging by the rest of the lesson (and the next one), workgroups are roughly analagous to blocks in cuda - ie you can synchronize within them, and you can share memory between threads within one. But blocks in CUDA should, in most cases, contain way more than 64 threads so that SMs/warp schedulers have something to do when waiting on memory read stalls. So, why should we use 64 by default? A-priori, I feel like I should use 256 by default and lower it if I encounter performance issues (eg register pressure)

Am I missing something? I suspect my mental models are broken somehow - perhaps workgroups aren't analagous to blocks at all, or webgpu does some magic under the hood to merge multiple workgroups into one block, improving occupancy?

Answered by greggman

Dec 31, 2024

This is advice comes from here

https://codelabs.developers.google.com/your-first-webgpu-app#7

Note: For more advanced uses of compute shaders, the workgroup size becomes more important. Shader invocations within a single workgroup are allowed to share faster memory and use certain types of synchronization primitives. You don't need any of that, though, since your shader executions are fully independent.

You could make the workgroup size (1 x 1 x 1), and it would still work correctly, but that also restricts how well the GPU can run the shader in parallel. Picking something bigger helps the GPU divide the work better.

There is a theoretical ideal workgroup size for every GPU, but it's dep…

View full answer

greggman · 2024-12-31T02:00:22Z

greggman
Dec 31, 2024
Maintainer

This is advice comes from here

https://codelabs.developers.google.com/your-first-webgpu-app#7

Note: For more advanced uses of compute shaders, the workgroup size becomes more important. Shader invocations within a single workgroup are allowed to share faster memory and use certain types of synchronization primitives. You don't need any of that, though, since your shader executions are fully independent.

You could make the workgroup size (1 x 1 x 1), and it would still work correctly, but that also restricts how well the GPU can run the shader in parallel. Picking something bigger helps the GPU divide the work better.

There is a theoretical ideal workgroup size for every GPU, but it's dependent on architectural details that WebGPU doesn't expose, so usually you want to pick a number driven by the requirements of the shader. Lacking that, given the wide range of hardware that WebGPU content may run on, 64 is a good number that's unlikely to exceed any hardware limits but still handles large enough batches to be reasonably efficient. (8 x 8 == 64, so your workgroup size follows this advice.)

but yes, different GPUs will have different best sizes. I don't know if there is any one size fits all. It's possible the subgroups feature, still being worked on, will expose more info about the best size ... or not. It's unfortunate there is no easy way to know what's best.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why choose a workgroup size of 64 by default for compute shaders? #195

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Why choose a workgroup size of 64 by default for compute shaders? #195

BassP97 Dec 30, 2024

Replies: 1 comment

greggman Dec 31, 2024 Maintainer

BassP97
Dec 30, 2024

greggman
Dec 31, 2024
Maintainer