Skip to content

[LOW] Performance: Workgroup size tuning for branched flow #37

@pjt222

Description

@pjt222

Summary

Benchmark different workgroup sizes (64, 128, 256) for the branched flow trace compute shader using the GPU timestamp profiler.

Current Behavior

  • Trace pass uses @workgroup_size(64) (hardcoded in WGSL)
  • Clear pass uses @workgroup_size(16, 16) (256 total)
  • No benchmarking data on optimal workgroup size for this workload

Proposed Change

  1. Test workgroup sizes 64, 128, and 256 for the trace pass
  2. Measure per-pass GPU time using the GpuProfiler (already implemented)
  3. Set the default to whichever performs best on representative hardware
  4. Consider making workgroup size configurable via a compile-time constant

Acceptance Criteria

  • Benchmark data for 3 workgroup sizes on at least one GPU
  • Default updated to empirically best size
  • All existing tests pass (including WGSL validation)

Context

From deep review plan item 4.2. Requires interactive benchmarking with GPU profiler. Optimal size depends on GPU architecture (occupancy, register pressure). Note: WSLg currently uses OpenGL ES software rendering (no Vulkan), so benchmark on native hardware or with GPU passthrough for meaningful results.

Files: src/render/shaders/branched_flow_compute.wgsl

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestperformancePerformance optimizations

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions