-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attempt add single pass scan. #685
base: main
Are you sure you want to change the base?
Conversation
Although tests are passing, out of an abundance of caution, I want to add seperate tests specifically for the correctness of the scan. These should act as a litmus test for general compatibility with single-pass monoid techniques, which would be useful as more shaders are converted to single pass. |
76443fd
to
76235ca
Compare
Change fallback reduce pattern to match scan pattern so last thread in workgroup has the correct aggregate in registers without need for additional barrier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think I have my head around this now. It feels like this is going to be a workable pattern. I think shared memory usage can be reduced, and there are some stylistic things that can be improved.
My overriding concern is the array structure for the monoid - we somehow have the worst of both worlds where the fields aren't named and there's also repetition. I think it can be improved by leaning in more heavily to arrays.
I didn't try this, but am assuming it works. Also, the main focus of my review is on the scan wgsl shader. I did skim the rest but will rely on other reviewers for any fine details I may have missed.
I'm not approving at this point, but think it can be landed soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some points from browsing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing my comments. I also verified that it works (on Apple M1) with the test scenes. I think it's ready to go in, though I didn't go over it with a fine-toothed comb.
//so try unlocking, else, keep looking back | ||
var all_complete = inc_complete.s[0]; | ||
for (var i = 1u; i < PATH_MEMBERS; i += 1u) { | ||
all_complete = all_complete && inc_complete.s[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can write this &=
, but it's a very minor stylistic point.
To reduce the risk from this, I'd propose that we land this after we release version 0.3.0. I'm hopeful that we can do this after office hours this week, but we'll see. |
Following up on my conversation with Raph, FXC is still blocking on this: In the looped version FXC is giving the following errors:
This is because the loop inside the lookback/fallback block cannot be unrolled. Going back to the pre-looping implementation, FXC fails to compile due to uniformity analysis:
I think it considers the barrier on line 107 illegal because the control flow is dependent on shared memory. |
Ah, very interesting. Looking at the code, I'm surprised this runs at all, the WGSL uniformity analysis should reject it (if it doesn't, that's a bug in naga). I haven't tried it, but I'd expect Chrome/tint to also reject it. Fortunately, it should be fairly easily fixable. Instead of using I don't have my head around the loop unrolling yet. This is just One other thing I'd try is putting the literal 5 in the loop bounds. I also have a theory why FXC might be rejecting this. I believe that naga compiles a |
After changing As for the loop unrolling, I think you're dead on with regard to the naga transpilation. Even after changing to With regard to metal, I had the same exact performance issues. I think I was getting like 60% of the speed from wgsl->naga->msl as opposed to glsl->glslangvalidator->SPIR-V->spirv_cross->msl, and that's after I unrolled the device level loads/stores by hand. |
What's the status of this PR? It looks to me that we're at least waiting for the change described in #685 (comment) to land? |
Naga emitting I reimplemented my prefix sum library in Dawn/Tint to gather data on how significant the slowdown is, and as the data comes in I'll use it to open up an issue requesting the change in naga. That might take a while though. A more immediate option is to revert the |
Is this being tracked on the wgpu side? |
No, I have the dawn/tint results in hand, but I'm waiting for fresh benchmarks on wgpu/naga 23.0. Once I have those, I'll open up an issue. |
See wgpu#6521 |
Adds single pass scan with new GPU and CPU shader. Removes previous tree based scans.