Skip to content

Conversation

@DarioSamo
Copy link
Contributor

@DarioSamo DarioSamo commented Sep 27, 2023

This is an experimental PR with an implementation that won't resemble the final approach we'd take if shader optimizations are to be enabled, but at least it gets a SPIRV-Tools port to SCons out of the way as the biggest undertaking that had to be done to test it out.

I was made aware one of the reasons the optimizer couldn't be used was because it wouldn't preserve the resource bindings properly, and therefore make the validation fail. However, upon digging on the Glslang source, it's pretty evident it just doesn't expose the options the optimizer has to preserve this information. With that change made, it's perfectly possible to use Godot with the shader optimizer, although some issues might be left to find out.

    // -- GODOT start --
    spvOptOptions.set_preserve_bindings(true);
    spvOptOptions.set_preserve_spec_constants(true);
    // -- GODOT end --

Important

Enabling "rendering/shader_compiler/shader_compilation/optimize" and restarting the editor is required for shader optimizations to take effect.

image

It should be fairly noticeable by the fact it'll take longer to build shaders during the startup of the editor or when opening scenes.

What we want to verify

There are ongoing discussions about how much optimizations done by the shader compiler are worth it or not. Drivers can do a lot of heavy-lifting when it comes to optimizations, so it's hard to find consistent data on how much this helps or not across the board as it can vary wildly depending on hardware vendors and the platforms being targeted.

The idea behind the PR is to have an easily accesible option to test out how much of this holds true depending on where Godot is deployed. By using the project setting, it should be fairly easy to verify if this brings any noticeable improvement to a particular platform.

What I've been able to verify so far

For context, I'm doing my testing in Windows 11 with an RTX 3090 Ti. Out of all the platforms where I think this would be beneficial, I think this is the least likely one to show any difference. NVIDIA's fairly competitive when it comes to its Windows drivers and this is high-end hardware.

As far as Godot's caching is concerned, there are significant differences that can be verified when it comes to the size of the SPIR-V shaders (expected) and the PSO cache (less expected) stored in the user data directories. At least on NVIDIA, this seems to hint that the initial PSO that is generated does not achieve as good of an optimization that the SPIR-V Optimizer does. However, there's no telling if this PSO is actually used at later points or replaced by a more optimized one in the background.

image

So this is clearly not nothing, but that doesn't necessarily translate to performance. This is where I've had a dodgy experience so far in getting results that can be replicated consistently. Whenever I've noticed a performance uplift, it's been usually around 1-2%, only for it to go away the more I jumped between both versions. I suspect the driver is doing some heavy lifting to delegate the optimization and swap out the actual pipeline for a better one as soon as it can.

What we should verify

I suspect we might find more significant differences if we target testing on the platforms that might not have their drivers as polished as NVIDIA and AMD on desktop.

  • Android: There's a huge amount of hardware variety here where we could find this is worth it given the nature of how Vulkan drivers work on this platform.

  • Intel: Very popular across low-end laptops and not necessarily the most up-to-date when it comes to drivers. ARC discrete GPUs might fare better than iGPUs.

If you'd still like to test on NVIDIA and AMD, the info can still be useful to find out whether this PR will be worth it or not, or in an odd case if there's any actual regressions from it.

Reasons we might not want this

  • Compilation takes longer, that is undeniable. However, considering the optimizer works on multiple steps, and one of them has the original SPIR-V in its unoptimized form, I think we can realistically mitigate this by using the unoptimized version as soon as possible and delegating the optimization to the background. This would result in no noticeable difference whatsoever to the current shader stutters possible as long as it's done properly.

  • The code size for SPIR-V Tools's optimizer is massive. While I can try to whittle it down as much as I can, I think it might end up to nearly 200K lines at minimum. While we can easily opt out of building it into the engine with an option, it's still a massive addition to the codebase. On the positive side, there was pretty much no patching required to get it to work: only extracting the required files as necessary. That said, I do think this could be slightly more crucial to the engine if we happen to find significant performance differences that make it worth it, unlike the dependency on OIDN which ended up performing much worse than it should and was also around 115K lines.

@DarioSamo DarioSamo changed the title Enable shader compiler optimizations (looking for testers). Enable shader compiler optimizations (looking for testing). Sep 27, 2023
Copy link
Member

@reduz reduz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems fine, just nitpick.

@DarioSamo DarioSamo force-pushed the spirv-opt branch 2 times, most recently from 5dec64e to b0dab7f Compare September 27, 2023 14:51
@DarioSamo
Copy link
Contributor Author

DarioSamo commented Sep 27, 2023

As per @reduz's suggestion, now you can toggle this with a project setting instead. The launch option is gone.

@DarioSamo DarioSamo changed the title Enable shader compiler optimizations (looking for testing). Enable SPIR-V optimizations for shader compiler (looking for testing). Sep 27, 2023
@Chaosus Chaosus added this to the 4.x milestone Sep 28, 2023
@darksylinc
Copy link
Contributor

Validation warnings

First off, there are validation warnings in Mobile renderer and Godot will crash if ran with validation layers (crash happens inside the layer).

The warnings are:

WARNING: PERFORMANCE - Message Id Number: 101294395 | Message Id Name: UNASSIGNED-CoreValidation-Shader-OutputNotConsumed
	Validation Performance Warning: [ UNASSIGNED-CoreValidation-Shader-OutputNotConsumed ] Object 0: handle = 0x10d2720000000180, type = VK_OBJECT_TYPE_SHADER_MODULE; | MessageID = 0x609a13b | vertex shader writes to output location 2.0 which is not consumed by fragment shader
	Objects - 1
		Object[0] - VK_OBJECT_TYPE_SHADER_MODULE, Handle 1212156594041651584
     at: _debug_messenger_callback (drivers/vulkan/vulkan_context.cpp:264)
WARNING: PERFORMANCE - Message Id Number: 101294395 | Message Id Name: UNASSIGNED-CoreValidation-Shader-OutputNotConsumed
	Validation Performance Warning: [ UNASSIGNED-CoreValidation-Shader-OutputNotConsumed ] Object 0: handle = 0x3b89370000000164, type = VK_OBJECT_TYPE_SHADER_MODULE; | MessageID = 0x609a13b | vertex shader writes to output location 2.0 which is not consumed by fragment shader
	Objects - 1
		Object[0] - VK_OBJECT_TYPE_SHADER_MODULE, Handle 4290020593186636132
     at: _debug_messenger_callback (drivers/vulkan/vulkan_context.cpp:264)
WARNING: PERFORMANCE - Message Id Number: 101294395 | Message Id Name: UNASSIGNED-CoreValidation-Shader-OutputNotConsumed
	Validation Performance Warning: [ UNASSIGNED-CoreValidation-Shader-OutputNotConsumed ] Object 0: handle = 0xad93400000000172, type = VK_OBJECT_TYPE_SHADER_MODULE; | MessageID = 0x609a13b | vertex shader writes to output location 2.0 which is not consumed by fragment shader
	Objects - 1
		Object[0] - VK_OBJECT_TYPE_SHADER_MODULE, Handle -5939333114827374222
     at: _debug_messenger_callback (drivers/vulkan/vulkan_context.cpp:264)
ERROR: VALIDATION - Message Id Number: -840888189 | Message Id Name: UNASSIGNED-CoreValidation-DrawState-DescriptorSetNotBound
	Validation Error: [ UNASSIGNED-CoreValidation-DrawState-DescriptorSetNotBound ] Object 0: handle = 0xc3fc2e0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; | MessageID = 0xcde11083 | vkCmdDrawIndexed(): VkPipeline 0x42bb7d000000430b[RID:554188220137489] uses set #0 but that set is not bound.
	Objects - 1
		Object[0] - VK_OBJECT_TYPE_COMMAND_BUFFER, Handle 205505248
   at: _debug_messenger_callback (drivers/vulkan/vulkan_context.cpp:267)
ERROR: VALIDATION - Message Id Number: -840888189 | Message Id Name: UNASSIGNED-CoreValidation-DrawState-DescriptorSetNotBound
	Validation Error: [ UNASSIGNED-CoreValidation-DrawState-DescriptorSetNotBound ] Object 0: handle = 0xc3fc2e0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; | MessageID = 0xcde11083 | vkCmdDrawIndexed(): VkPipeline 0x42bb7d000000430b[RID:554188220137489] uses set #2 but that set is not bound.
	Objects - 1
		Object[0] - VK_OBJECT_TYPE_COMMAND_BUFFER, Handle 205505248
   at: _debug_messenger_callback (drivers/vulkan/vulkan_context.cpp:267)

As we talked with @DarioSamo, the crashes & important validation errors can be fixed if we let Godot reflect the SPIR-V before optimization, instead of afterwards. Not a big deal.

The performance warnings can't be fixed without a severe refactor on how shaders works. Unfortunately Godot just crosses fingers the driver will optimize the unconsumed vertex outputs away.

Performance Comparison on low end Android

OK I admit I was hoping a miracle (e.g. 50% improvement or more), but it does make a difference.

I tested on Android, particularly a Redmi 4X boasting a Adreno 504. I chose this device because it is one of the weakest devices we will find that we should be able to reasonably support (e.g. everything on minimum, resolution downscaled, etc) because the drivers are decent enough. Note: Godot currently can't hope to target this device, it runs too slow.

All tests were made with the Mobile renderer, not the Forward+ one.

I also used two scenes that I knew in advance are pixel shader bound. They're extremely simple:

  1. Sky: It's just an empty scene with the sky shader on. It is worrisome this phone struggles to render the Sky so I have been focusing on that to see if we're missing something basic.
  2. Plane: It's just an empty scene where the whole screen is covered by a plane with a Basic material and a single directional light, shadows disabled.

Note that these are very synthetic and not indicative of real world performance.

Min FPS Max FPS Most Frequent FPS Most Frequent MSPF
Sky (no opt) 24 25 24,00 41,67
Sky (w/ opt) 25 27 26,00 38,46
Plane A (no opt) 24 25 24,00 41,67
Plane A (w/ opt) 25 25 25,00 40,00
Plane B (no opt) 33 34 33,00 30,30
Plane B (w/ opt) 34 35 35,00 28,57
Plane C (no opt) 37 37 37,00 27,03
Plane C (w/ opt) 39 40 40,00 25,00

To sumarize (in MSPF):

No Optimization With Optimization Time % Improvement
Sky 41,67 38,46 92,31 % 7,69 %
Plane A 41,67 40,00 96,00 % 4,00 %
Plane B 30,30 28,57 94,29 % 5,71 %
Plane C 27,03 25,00 92,50 % 7,50 %

Takeaways:

  1. Plane scene started with 24 fps; ended up with 40 fps. Not bad (including my own optimizations, not just this PR).
  2. Sky & Plane C have an improvement of 7.50-7.70%. This seems to be the upper limit of what we can expect from the SPIR-V Optimizer.
    • Sky scene is of particular interest because it has no specialization constants. SC limit how much the SPIR-V Optimizer can do.
  3. Plane A (vanilla Godot) has an improvement of just 4%. My optimizations just get rid of static branches in the pixel shader. They're usually cheap but they're not free. Even less on mobile. And it seems to be deterring SPIR-V optimizations.

Although I was hoping a lot more; it is something. And it is by accumulation of performance improvements that we eventually reach decent performance. Particularly on the lower end.

So if Dario finds the cost of integrating this PR acceptable (i.e. fixing validation errors, runtime cost of compiling with optimizer), I'm in favour of including the SPIR-V Optimizer

Port SPIRV-Tools to SCons. Enable optimizations on glslang when it's built in. Add project setting to enable optimizations by the shader compiler (disabled by default).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants