You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have tried to emulate this with the new VK_EXT_device_generated_commands extension, however, at least on Nvidia, this resulted in ~25.5x slower execution of almost exact same shader code (at most a couple instructions different). This also required a preprocess buffer of more than 27MB. A more detailed post about this can be found here. Even if the VK_EXT_device_generated_commands did not have such a massive slowdown, I think using regular commands is not only a lot easier, it allows for the same optimizations that the regular commands always have been able to do, which the VK_EXT_device_generated_commands would not.
I do not know if non-base versions of these, i.e. vkCmdDispatchIndirect2 and vkCmdDispatchIndirectCount2, would work as I do not see a way to distinguish between the separate dispatches. The base versions can all have a different base to distinguish them. I gues these could work if, something like a gl_DispatchID would be added.
After some time (by searching for the term gl_DispatchID) I have found an earlier issue all the way from 2021: #1467, although this was initially for something else? just adding stride? And no base versions were suggested there too. And one of the issues represented:
We should also note the reason why indirect dispatches don't have an array-count. Even if dispatches had a count, each dispatch operation would be executing the same compute shader, with the same uniforms, SSBOs, and push-constants. And there's no way to synchronize CS execution between work-groups, so it's impossible for work groups to inter-communicate.
As shown in the more detailed post, is not an issue at all, as all those "issues" are exactly what is wanted. Anything more complex, with different push_constants, different shaders, different etc., can be done with VK_EXT_device_generated_commands, albeit (potentially) a lot slower.
Although I do not know for sure, as I do not use Directx12, Directx12 should be able to do something like this too using ExecuteIndirect, Although it seems somewhat more similar to VK_EXT_device_generated_commands, but it seems no preprocess buffer or anything like that is needed for this, however I have no experience with DirectX12 so maybe this is not the case.
The text was updated successfully, but these errors were encountered:
I'd like to see improved dispatch functionality.
In Vulkan 1.1,
vkCmdDispatchBase
was added, but no equivalent to indirect:and
That is to be more complete. The actual real improvement would be something like:
and
I have tried to emulate this with the new
VK_EXT_device_generated_commands
extension, however, at least on Nvidia, this resulted in~25.5x
slower execution of almost exact same shader code (at most a couple instructions different). This also required a preprocess buffer of more than27MB
. A more detailed post about this can be found here. Even if theVK_EXT_device_generated_commands
did not have such a massive slowdown, I think using regular commands is not only a lot easier, it allows for the same optimizations that the regular commands always have been able to do, which theVK_EXT_device_generated_commands
would not.I do not know if non-base versions of these, i.e.
vkCmdDispatchIndirect2
andvkCmdDispatchIndirectCount2
, would work as I do not see a way to distinguish between the separate dispatches. The base versions can all have a different base to distinguish them. I gues these could work if, something like agl_DispatchID
would be added.After some time (by searching for the term
gl_DispatchID
) I have found an earlier issue all the way from2021
: #1467, although this was initially for something else? just addingstride
? And no base versions were suggested there too. And one of the issues represented:As shown in the more detailed post, is not an issue at all, as all those "issues" are exactly what is wanted. Anything more complex, with different
push_constant
s, different shaders, different etc., can be done withVK_EXT_device_generated_commands
, albeit (potentially) a lot slower.Although I do not know for sure, as I do not use Directx12, Directx12 should be able to do something like this too using
ExecuteIndirect
, Although it seems somewhat more similar toVK_EXT_device_generated_commands
, but it seems no preprocess buffer or anything like that is needed for this, however I have no experience with DirectX12 so maybe this is not the case.The text was updated successfully, but these errors were encountered: