Summary
Benchmark different workgroup sizes (64, 128, 256) for the branched flow trace compute shader using the GPU timestamp profiler.
Current Behavior
- Trace pass uses
@workgroup_size(64) (hardcoded in WGSL)
- Clear pass uses
@workgroup_size(16, 16) (256 total)
- No benchmarking data on optimal workgroup size for this workload
Proposed Change
- Test workgroup sizes 64, 128, and 256 for the trace pass
- Measure per-pass GPU time using the
GpuProfiler (already implemented)
- Set the default to whichever performs best on representative hardware
- Consider making workgroup size configurable via a compile-time constant
Acceptance Criteria
Context
From deep review plan item 4.2. Requires interactive benchmarking with GPU profiler. Optimal size depends on GPU architecture (occupancy, register pressure). Note: WSLg currently uses OpenGL ES software rendering (no Vulkan), so benchmark on native hardware or with GPU passthrough for meaningful results.
Files: src/render/shaders/branched_flow_compute.wgsl
Summary
Benchmark different workgroup sizes (64, 128, 256) for the branched flow trace compute shader using the GPU timestamp profiler.
Current Behavior
@workgroup_size(64)(hardcoded in WGSL)@workgroup_size(16, 16)(256 total)Proposed Change
GpuProfiler(already implemented)Acceptance Criteria
Context
From deep review plan item 4.2. Requires interactive benchmarking with GPU profiler. Optimal size depends on GPU architecture (occupancy, register pressure). Note: WSLg currently uses OpenGL ES software rendering (no Vulkan), so benchmark on native hardware or with GPU passthrough for meaningful results.
Files:
src/render/shaders/branched_flow_compute.wgsl