Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReSTIR merged GBuffer pass #176

Draft
wants to merge 12 commits into
base: main
Choose a base branch
from
Draft

ReSTIR merged GBuffer pass #176

wants to merge 12 commits into from

Conversation

kvark
Copy link
Owner

@kvark kvark commented Sep 17, 2024

Experiment branched off #161

The idea is to only fetch GBuffers once at init of the workgroup. We can place that data in workgroup memory and re-use spatially. This would reduce the amount of VRAM traffic (and latency).
In practice, it turned out to be significantly slower:
Local4-gbuffer-merge

I suspect this could be due to:

  • the driver being better at occupancy when shaders are smaller. NSight confirms this to some extent. It's still not entirely straightforward, since NVidia can have variable register occupancy during the shader execution.
  • separate gbuffer pass is more local, it doesn't re-shuffle the groups into clusters
  • gbuffer pass able to mix the latency of VRAM access with RT core utilization, while the merged pass becomes more blocked on RTCore

Update

Can confirm this is due to locality as the biggest factor. Here is a run with group shuffling disabled. It's much faster.
Local4a-gbuffer-tight

@kvark kvark added the type: experiment Experimental code label Sep 17, 2024
@kvark kvark mentioned this pull request Sep 17, 2024
16 tasks
@kvark kvark changed the title Experiment/merge gbuffer pass ReSTIR merged GBuffer pass Sep 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: experiment Experimental code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant