Skip to content

Commit

Permalink
Update VK_AMDX_shader_enqueue to v2
Browse files Browse the repository at this point in the history
 - Adds mesh shader node support
 - Adds two new limits to workgroups
 - Variable scratch size support
  • Loading branch information
Tobski committed Oct 4, 2024
1 parent bfd6841 commit 9daeaa2
Show file tree
Hide file tree
Showing 9 changed files with 587 additions and 192 deletions.
8 changes: 5 additions & 3 deletions appendices/VK_AMDX_shader_enqueue.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ include::{generated}/meta/{refprefix}VK_AMDX_shader_enqueue.adoc[]
=== Other Extension Metadata

*Last Modified Date*::
2021-07-22
2024-07-17

*Provisional*::

Expand All @@ -30,12 +30,14 @@ between revisions, and before final release.*

=== Description

This extension adds the ability for developers to enqueue compute shader
workgroups from other compute shaders.
This extension adds the ability for developers to enqueue mesh
and compute shader workgroups from other compute shaders.

include::{generated}/interfaces/VK_AMDX_shader_enqueue.adoc[]

=== Version History

* Revision 2, 2024-07-17 (Tobias Hector)
** Add mesh nodes
* Revision 1, 2021-07-22 (Tobias Hector)
** Initial revision
12 changes: 10 additions & 2 deletions appendices/spirvenv.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -2168,10 +2168,18 @@ endif::VK_KHR_maintenance5[]
ifdef::VK_AMDX_shader_enqueue[]
* [[VUID-{refpage}-ShaderEnqueueAMDX-09191]]
The code:ShaderEnqueueAMDX capability must: only be used in shaders with
the code:GLCompute execution model
the code:GLCompute
ifdef::VK_EXT_mesh_shader[]
or code:MeshEXT
endif::VK_EXT_mesh_shader[]
execution model
* [[VUID-{refpage}-NodePayloadAMDX-09192]]
Variables in the code:NodePayloadAMDX storage class must: only be
declared in the code:GLCompute execution model
declared in the code:GLCompute
ifdef::VK_EXT_mesh_shader[]
or code:MeshEXT
endif::VK_EXT_mesh_shader[]
execution model
* [[VUID-{refpage}-maxExecutionGraphShaderPayloadSize-09193]]
Variables declared in the code:NodePayloadAMDX storage class must: not
be larger than the <<limits-maxExecutionGraphShaderPayloadSize,
Expand Down
21 changes: 11 additions & 10 deletions chapters/commonvalidity/dispatch_graph_common.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,11 @@ include::{chapters}/commonvalidity/draw_dispatch_common.adoc[]
pname:commandBuffer must: not be a protected command buffer
* [[VUID-{refpage}-commandBuffer-09182]]
pname:commandBuffer must: be a primary command buffer
* [[VUID-{refpage}-scratch-09183]]
pname:scratch must: be the device address of an allocated memory range
at least as large as the value of
slink:VkExecutionGraphPipelineScratchSizeAMDX::pname:size returned by
slink:VkExecutionGraphPipelineScratchSizeAMDX for the currently bound
* pname:scratch must: be the device address of an allocated memory range
at least as large as pname:scratchSize
* pname:scratchSize must: be greater than or equal to
slink:VkExecutionGraphPipelineScratchSizeAMDX::pname:minSize returned by
flink:vkGetExecutionGraphPipelineScratchSizeAMDX for the currently bound
execution graph pipeline
* [[VUID-{refpage}-scratch-09184]]
pname:scratch must: be a device address within a slink:VkBuffer created
Expand All @@ -22,11 +22,9 @@ ifdef::VK_KHR_maintenance5[]
or ename:VK_BUFFER_USAGE_2_EXECUTION_GRAPH_SCRATCH_BIT_AMDX
endif::VK_KHR_maintenance5[]
flag
* [[VUID-{refpage}-scratch-09185]]
Device memory in the range [pname:scratch,pname:scratch +
slink:VkExecutionGraphPipelineScratchSizeAMDX::pname:size) must: have
been initialized with flink:vkCmdInitializeGraphScratchMemoryAMDX using
the currently bound execution graph pipeline, and not modified after
* The device memory range [pname:scratch,pname:scratch + pname:scratchSize]
must: have been initialized with flink:vkCmdInitializeGraphScratchMemoryAMDX
using the currently bound execution graph pipeline, and not modified after
that by anything other than another execution graph dispatch command
* [[VUID-{refpage}-maxComputeWorkGroupCount-09186]]
Execution of this command must: not cause a node to be dispatched with a
Expand All @@ -43,4 +41,7 @@ endif::VK_KHR_maintenance5[]
specified by the max number of payloads for that decoration.
This requirement applies to each code:NodeMaxPayloadsAMDX decoration
separately
* If the currently bound execution graph pipeline includes draw nodes,
this command must: be called within a render pass instance that is
compatible with the graphics pipeline used to create each of those nodes
// Common Valid Usage
147 changes: 115 additions & 32 deletions chapters/executiongraphs.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,11 @@ Each shader stage provided when creating an execution graph pipeline
determined by the inclusion or omission of a
slink:VkPipelineShaderStageNodeCreateInfoAMDX structure in its pname:pNext
chain.
For any graphics pipeline libraries, only the name and index of the vertex
or mesh shader stage is linked directly to the graph as a node - other
shader stages in the pipeline will be executed after those shader stages as
normal.
Task shaders cannot be included in a graphics pipeline used for a draw node.

In addition to the shader name and index, an internal "node index" is also
generated for each node, which can be queried with
Expand Down Expand Up @@ -142,8 +147,20 @@ include::{chapters}/commonvalidity/compute_graph_pipeline_create_info_common.ado
sname:VkPhysicalDeviceLimits::pname:maxPerStageResources
* [[VUID-VkExecutionGraphPipelineCreateInfoAMDX-pLibraryInfo-09133]]
If pname:pLibraryInfo is not `NULL`, each element of
pname:pLibraryInfo->libraries must: be either a compute pipeline or an
execution graph pipeline
pname:pLibraryInfo->pLibraries must: be either a compute pipeline,
an execution graph pipeline, or a graphics pipeline
* If pname:pLibraryInfo is not `NULL`, each element of
pname:pLibraryInfo->pLibraries that is a compute pipeline
or a graphics pipeline must: have been created with
ename:VK_PIPELINE_CREATE_2_EXECUTION_GRAPH_BIT_AMDX set
* If the <<features-shaderMeshEnqueue,pname:shaderMeshEnqueue>> feature
is not enabled, and pname:pLibraryInfo->pLibraries is not `NULL`,
pname:pLibraryInfo->pLibraries must: not contain any graphics pipelines
ifdef::VK_EXT_graphics_pipeline_library[]
* Any element of pname:pLibraryInfo->pLibraries identifying a
graphics pipeline must: have been created with
<<pipelines-graphics-subsets-complete, all possible state subsets>>
endif::VK_EXT_graphics_pipeline_library[]
* [[VUID-VkExecutionGraphPipelineCreateInfoAMDX-None-09134]]
There must: be no two nodes in the pipeline that share both the same
shader name and index, as specified by
Expand All @@ -166,6 +183,11 @@ include::{chapters}/commonvalidity/compute_graph_pipeline_create_info_common.ado
matches the shader name of any other node in the graph, the size of the
output payload must: match the size of the input payload in the matching
node
* If pname:flags does not include ename:VK_PIPELINE_CREATE_LIBRARY_BIT_KHR,
and an output payload declared in any shader in the pipeline does not
have a code:PayloadNodeSparseArrayAMDX decoration, there must: be a node
in the graph corresponding to every index from 0 to its
code:PayloadNodeArraySizeAMDX decoration
****

include::{generated}/validity/structs/VkExecutionGraphPipelineCreateInfoAMDX.adoc[]
Expand Down Expand Up @@ -215,6 +237,12 @@ By associating multiple shaders with the same name but different indexes,
applications can dynamically select different nodes to execute.
Applications must: ensure each node has a unique name and index.

[NOTE]
====
Shaders with the same name must: be of the same type - e.g. a compute and
graphics shader, or even two compute shaders where one is coalescing and the
other is not, cannot share the same name.
====
include::{generated}/validity/structs/VkPipelineShaderStageNodeCreateInfoAMDX.adoc[]
--

Expand All @@ -227,7 +255,7 @@ graph, call:

include::{generated}/api/protos/vkGetExecutionGraphPipelineNodeIndexAMDX.adoc[]

* pname:device is the that pname:executionGraph was created on.
* pname:device is the logical device that pname:executionGraph was created on.
* pname:executionGraph is the execution graph pipeline to query the
internal node index for.
* pname:pNodeInfo is a pointer to a
Expand Down Expand Up @@ -269,7 +297,7 @@ To query the scratch space required to dispatch an execution graph, call:

include::{generated}/api/protos/vkGetExecutionGraphPipelineScratchSizeAMDX.adoc[]

* pname:device is the that pname:executionGraph was created on.
* pname:device is the logical device that pname:executionGraph was created on.
* pname:executionGraph is the execution graph pipeline to query the
scratch space for.
* pname:pSizeInfo is a pointer to a
Expand All @@ -293,8 +321,18 @@ include::{generated}/api/structs/VkExecutionGraphPipelineScratchSizeAMDX.adoc[]
* pname:sType is a elink:VkStructureType value identifying this structure.
* pname:pNext is `NULL` or a pointer to a structure extending this
structure.
* pname:size indicates the scratch space required for dispatch the queried
execution graph.
* pname:minSize indicates the minimum scratch space required for
dispatching the queried execution graph.
* pname:maxSize indicates the maximum scratch space that can be used for
dispatching the queried execution graph.
* pname:sizeGranularity indicates the granularity at which the scratch space can be
increased from pname:minSize.

Applications can: use any amount of scratch memory greater than
pname:minSize for dispatching a graph, however only the values equal to pname:minSize
+ an integer multiple of pname:sizeGranularity will be used.
Greater values may: result in higher performance, up to pname:maxSize which indicates the most memory
that an implementation can use effectively.

include::{generated}/validity/structs/VkExecutionGraphPipelineScratchSizeAMDX.adoc[]
--
Expand All @@ -309,16 +347,16 @@ include::{generated}/api/protos/vkCmdInitializeGraphScratchMemoryAMDX.adoc[]

* pname:commandBuffer is the command buffer into which the command will be
recorded.
* pname:scratch is a pointer to the scratch memory to be initialized.
* pname:executionGraph is the execution graph pipeline to initialize the
scratch memory for.
* pname:scratch is the address of scratch memory to be initialized.
* pname:scratchSize is a range in bytes of scratch memory to be initialized.

This command must: be called before using pname:scratch to dispatch the
currently bound execution graph pipeline.

Execution of this command may: modify any memory locations in the range
[pname:scratch,pname:scratch + pname:size), where pname:size is the value
returned in slink:VkExecutionGraphPipelineScratchSizeAMDX::pname:size by
slink:VkExecutionGraphPipelineScratchSizeAMDX for the currently bound
execution graph pipeline.
[pname:scratch,pname:scratch + pname:scratchSize).
Accesses to this memory range are performed in the
ename:VK_PIPELINE_STAGE_2_COMPUTE_SHADER_BIT pipeline stage with the
ename:VK_ACCESS_2_SHADER_STORAGE_READ_BIT and
Expand All @@ -327,17 +365,17 @@ ename:VK_ACCESS_2_SHADER_STORAGE_WRITE_BIT access flags.
If any portion of pname:scratch is modified by any command other than
flink:vkCmdDispatchGraphAMDX, flink:vkCmdDispatchGraphIndirectAMDX,
flink:vkCmdDispatchGraphIndirectCountAMDX, or
fname:vkCmdInitializeGraphScratchMemoryAMDX with the same execution graph,
flink:vkCmdInitializeGraphScratchMemoryAMDX with the same execution graph,
it must: be reinitialized for the execution graph again before dispatching
against it.

.Valid Usage
****
* [[VUID-vkCmdInitializeGraphScratchMemoryAMDX-scratch-09143]]
pname:scratch must: be the device address of an allocated memory range
at least as large as the value of
slink:VkExecutionGraphPipelineScratchSizeAMDX::pname:size returned by
slink:VkExecutionGraphPipelineScratchSizeAMDX for the currently bound
* pname:scratch must: be the device address of an allocated memory range
at least as large as pname:scratchSize
* pname:scratchSize must: be greater than or equal to
slink:VkExecutionGraphPipelineScratchSizeAMDX::pname:minSize returned by
flink:vkGetExecutionGraphPipelineScratchSizeAMDX for the currently bound
execution graph pipeline.
* [[VUID-vkCmdInitializeGraphScratchMemoryAMDX-scratch-09144]]
pname:scratch must: be a multiple of 64
Expand All @@ -363,7 +401,8 @@ include::{generated}/api/protos/vkCmdDispatchGraphAMDX.adoc[]

* pname:commandBuffer is the command buffer into which the command will be
recorded.
* pname:scratch is a pointer to the scratch memory to be used.
* pname:scratch is the address of scratch memory to be used.
* pname:scratchSize is a range in bytes of scratch memory to be used.
* pname:pCountInfo is a host pointer to a
slink:VkDispatchGraphCountInfoAMDX structure defining the nodes which
will be initially executed.
Expand All @@ -372,21 +411,27 @@ When this command is executed, the nodes specified in pname:pCountInfo are
executed.
Nodes executed as part of this command are not implicitly synchronized in
any way against each other once they are dispatched.
There are no rasterization order guarantees between separately dispatched
graphics nodes, though individual primitives within a single dispatch do
adhere to rasterization order.
Draw calls executed before or after the execution graph also execute relative to
each graphics node with respect to rasterization order.

For this command, all device/host pointers in substructures are treated as
host pointers and read only during host execution of this command.
Once this command returns, no reference to the original pointers is
retained.

Execution of this command may: modify any memory locations in the range
[pname:scratch,pname:scratch + pname:size), where pname:size is the value
returned in slink:VkExecutionGraphPipelineScratchSizeAMDX::pname:size by
slink:VkExecutionGraphPipelineScratchSizeAMDX for the currently bound
execution graph pipeline Accesses to this memory range are performed in the
[pname:scratch,pname:scratch + pname:scratchSize).
Accesses to this memory range are performed in the
ename:VK_PIPELINE_STAGE_2_COMPUTE_SHADER_BIT pipeline stage with the
ename:VK_ACCESS_2_SHADER_STORAGE_READ_BIT and
ename:VK_ACCESS_2_SHADER_STORAGE_WRITE_BIT access flags.

This command <<executiongraphs-meshnodes-statecapture,captures command
buffer state>> for mesh nodes similarly to draw commands.

.Valid Usage
****
include::{chapters}/commonvalidity/dispatch_graph_common.adoc[]
Expand Down Expand Up @@ -431,7 +476,8 @@ include::{generated}/api/protos/vkCmdDispatchGraphIndirectAMDX.adoc[]

* pname:commandBuffer is the command buffer into which the command will be
recorded.
* pname:scratch is a pointer to the scratch memory to be used.
* pname:scratch is the address of scratch memory to be used.
* pname:scratchSize is a range in bytes of scratch memory to be used.
* pname:pCountInfo is a host pointer to a
slink:VkDispatchGraphCountInfoAMDX structure defining the nodes which
will be initially executed.
Expand All @@ -440,6 +486,11 @@ When this command is executed, the nodes specified in pname:pCountInfo are
executed.
Nodes executed as part of this command are not implicitly synchronized in
any way against each other once they are dispatched.
There are no rasterization order guarantees between separately dispatched
graphics nodes, though individual primitives within a single dispatch do
adhere to rasterization order.
Draw calls executed before or after the execution graph also execute relative to
each graphics node with respect to rasterization order.

For this command, all device/host pointers in substructures are treated as
device pointers and read during device execution of this command.
Expand All @@ -450,15 +501,15 @@ ename:VK_PIPELINE_STAGE_2_COMPUTE_SHADER_BIT pipeline stage with the
ename:VK_ACCESS_2_SHADER_STORAGE_READ_BIT access flag.

Execution of this command may: modify any memory locations in the range
[pname:scratch,pname:scratch + pname:size), where pname:size is the value
returned in slink:VkExecutionGraphPipelineScratchSizeAMDX::pname:size by
slink:VkExecutionGraphPipelineScratchSizeAMDX for the currently bound
execution graph pipeline.
[pname:scratch,pname:scratch + pname:scratchSize).
Accesses to this memory range are performed in the
ename:VK_PIPELINE_STAGE_2_COMPUTE_SHADER_BIT pipeline stage with the
ename:VK_ACCESS_2_SHADER_STORAGE_READ_BIT and
ename:VK_ACCESS_2_SHADER_STORAGE_WRITE_BIT access flags.

This command <<executiongraphs-meshnodes-statecapture,captures command
buffer state>> for mesh nodes similarly to draw commands.

.Valid Usage
****
include::{chapters}/commonvalidity/dispatch_graph_common.adoc[]
Expand Down Expand Up @@ -525,7 +576,8 @@ include::{generated}/api/protos/vkCmdDispatchGraphIndirectCountAMDX.adoc[]

* pname:commandBuffer is the command buffer into which the command will be
recorded.
* pname:scratch is a pointer to the scratch memory to be used.
* pname:scratch is the address of scratch memory to be used.
* pname:scratchSize is a range in bytes of scratch memory to be used.
* pname:countInfo is a device address of a
slink:VkDispatchGraphCountInfoAMDX structure defining the nodes which
will be initially executed.
Expand All @@ -544,10 +596,7 @@ ename:VK_PIPELINE_STAGE_2_COMPUTE_SHADER_BIT pipeline stage with the
ename:VK_ACCESS_2_SHADER_STORAGE_READ_BIT access flag.

Execution of this command may: modify any memory locations in the range
[pname:scratch,pname:scratch + pname:size), where pname:size is the value
returned in slink:VkExecutionGraphPipelineScratchSizeAMDX::pname:size by
slink:VkExecutionGraphPipelineScratchSizeAMDX for the currently bound
execution graph pipeline.
[pname:scratch,pname:scratch + pname:scratchSize).
Accesses to this memory range are performed in the
ename:VK_PIPELINE_STAGE_2_COMPUTE_SHADER_BIT pipeline stage with the
ename:VK_ACCESS_2_SHADER_STORAGE_READ_BIT and
Expand Down Expand Up @@ -732,3 +781,37 @@ The number of invocations coalesced into a given workgroup in this way can:
be queried via the <<interfaces-builtin-variables-coalescedinputcountamd,
code:CoalescedInputCountAMDX>> built-in.
Any values in the payload have no effect on execution.

ifdef::VK_EXT_mesh_shader[]
[[executiongraphs-meshnodes]]
=== Mesh Nodes

Graphics pipelines added as nodes to an execution graph are executed in a
manner similar to a flink:vkCmdDrawMeshTasksIndirectEXT, using the same
payloads as compute shaders, but capturing some state from the command buffer.

[[executiongraphs-meshnodes-statecapture]]
When an execution graph dispatch is recorded into a command buffer, it
captures the following dynamic state for use with draw nodes:

* `VK_DYNAMIC_STATE_VIEWPORT`
* `VK_DYNAMIC_STATE_SCISSOR`
* `VK_DYNAMIC_STATE_LINE_WIDTH`
* `VK_DYNAMIC_STATE_DEPTH_BIAS`
* `VK_DYNAMIC_STATE_BLEND_CONSTANTS`
* `VK_DYNAMIC_STATE_DEPTH_BOUNDS`
ifdef::VK_VERSION_1_3,VK_EXT_extended_dynamic_state[]
* `VK_DYNAMIC_STATE_VIEWPORT_WITH_COUNT`
* `VK_DYNAMIC_STATE_SCISSOR_WITH_COUNT`
endif::VK_VERSION_1_3,VK_EXT_extended_dynamic_state[]
ifdef::VK_EXT_sampler_locations[]
* `VK_DYNAMIC_STATE_SAMPLE_LOCATIONS_EXT`
endif::VK_EXT_sampler_locations[]
ifdef::VK_KHR_fragment_shading_rate[]
* `VK_DYNAMIC_STATE_FRAGMENT_SHADING_RATE_KHR`
endif::VK_KHR_fragment_shading_rate[]

Other state is not captured, and graphics pipelines must: not be created
with other dynamic states when used as a library in an execution graph
pipeline.
endif::VK_EXT_mesh_shader[]
3 changes: 3 additions & 0 deletions chapters/features.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -7386,6 +7386,9 @@ This structure describes the following feature:

* [[features-shaderEnqueue]] pname:shaderEnqueue indicates whether the
implementation supports <<executiongraphs,execution graphs>>.
* [[features-shaderMeshEnqueue]] pname:shaderMeshEnqueue indicates whether the
implementation supports
<<executiongraphs-meshnodes,mesh nodes in execution graphs>>.

:refpage: VkPhysicalDeviceShaderEnqueueFeaturesAMDX
include::{chapters}/features.adoc[tag=features]
Expand Down
Loading

0 comments on commit 9daeaa2

Please sign in to comment.