Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved Dispatch Functionality #2496

Open
timeester3648 opened this issue Feb 21, 2025 · 0 comments
Open

Improved Dispatch Functionality #2496

timeester3648 opened this issue Feb 21, 2025 · 0 comments

Comments

@timeester3648
Copy link

timeester3648 commented Feb 21, 2025

I'd like to see improved dispatch functionality.

In Vulkan 1.1, vkCmdDispatchBase was added, but no equivalent to indirect:

typedef struct VkDispatchBaseIndirectCommand {
    uint32_t    baseX;
    uint32_t    baseY;
    uint32_t    baseZ;
    uint32_t    x;
    uint32_t    y;
    uint32_t    z;
} VkDispatchBaseIndirectCommand;

and

void vkCmdDispatchBaseIndirect(
    VkCommandBuffer                             commandBuffer,
    VkBuffer                                    buffer,
    VkDeviceSize                                offset);

That is to be more complete. The actual real improvement would be something like:

void vkCmdDispatchBaseIndirect2(
    VkCommandBuffer                 commandBuffer,
    VkBuffer                        buffer,
    VkDeviceSize                    offset,
    uint32_t                        dispatchCount,
    uint32_t                        stride);

and

void vkCmdDispatchBaseIndirectCount2(
    VkCommandBuffer                   commandBuffer,
    VkBuffer                          buffer,
    VkDeviceSize                      offset,
    VkBuffer                          countBuffer,
    VkDeviceSize                      countBufferOffset,
    uint32_t                          maxDispatchCount,
    uint32_t                          stride);

I have tried to emulate this with the new VK_EXT_device_generated_commands extension, however, at least on Nvidia, this resulted in ~25.5x slower execution of almost exact same shader code (at most a couple instructions different). This also required a preprocess buffer of more than 27MB. A more detailed post about this can be found here. Even if the VK_EXT_device_generated_commands did not have such a massive slowdown, I think using regular commands is not only a lot easier, it allows for the same optimizations that the regular commands always have been able to do, which the VK_EXT_device_generated_commands would not.

I do not know if non-base versions of these, i.e. vkCmdDispatchIndirect2 and vkCmdDispatchIndirectCount2, would work as I do not see a way to distinguish between the separate dispatches. The base versions can all have a different base to distinguish them. I gues these could work if, something like a gl_DispatchID would be added.

After some time (by searching for the term gl_DispatchID) I have found an earlier issue all the way from 2021: #1467, although this was initially for something else? just adding stride? And no base versions were suggested there too. And one of the issues represented:

We should also note the reason why indirect dispatches don't have an array-count. Even if dispatches had a count, each dispatch operation would be executing the same compute shader, with the same uniforms, SSBOs, and push-constants. And there's no way to synchronize CS execution between work-groups, so it's impossible for work groups to inter-communicate.

As shown in the more detailed post, is not an issue at all, as all those "issues" are exactly what is wanted. Anything more complex, with different push_constants, different shaders, different etc., can be done with VK_EXT_device_generated_commands, albeit (potentially) a lot slower.

Although I do not know for sure, as I do not use Directx12, Directx12 should be able to do something like this too using ExecuteIndirect, Although it seems somewhat more similar to VK_EXT_device_generated_commands, but it seems no preprocess buffer or anything like that is needed for this, however I have no experience with DirectX12 so maybe this is not the case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant