-
Hi all! I'm curious about this passage from the webgpu compute shaders lesson:
Coming from a cuda background, this is confusing! Judging by the rest of the lesson (and the next one), workgroups are roughly analagous to blocks in cuda - ie you can synchronize within them, and you can share memory between threads within one. But blocks in CUDA should, in most cases, contain way more than 64 threads so that SMs/warp schedulers have something to do when waiting on memory read stalls. So, why should we use 64 by default? A-priori, I feel like I should use 256 by default and lower it if I encounter performance issues (eg register pressure) Am I missing something? I suspect my mental models are broken somehow - perhaps workgroups aren't analagous to blocks at all, or webgpu does some magic under the hood to merge multiple workgroups into one block, improving occupancy? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
This is advice comes from here https://codelabs.developers.google.com/your-first-webgpu-app#7
but yes, different GPUs will have different best sizes. I don't know if there is any one size fits all. It's possible the subgroups feature, still being worked on, will expose more info about the best size ... or not. It's unfortunate there is no easy way to know what's best. |
Beta Was this translation helpful? Give feedback.
This is advice comes from here
https://codelabs.developers.google.com/your-first-webgpu-app#7