diff --git a/.wordlist.txt b/.wordlist.txt index 62f06d93e2..ada48cea86 100644 --- a/.wordlist.txt +++ b/.wordlist.txt @@ -72,6 +72,7 @@ ltrace makefile Malloc malloc +memset multicore multigrid multithreading diff --git a/docs/how-to/programming_manual.md b/docs/how-to/programming_manual.md index 33ab58de93..22847adaf9 100644 --- a/docs/how-to/programming_manual.md +++ b/docs/how-to/programming_manual.md @@ -146,7 +146,7 @@ For Linux developers, the link [here](https://github.com/ROCm/hip-tests/blob/dev ## HIP Graph -HIP graph is supported. For more details, refer to the HIP API Guide. +HIP graphs are supported. For more details, refer to the [HIP API Guide](../doxygen/html/group___graph) or the [understand section for HIP graphs](../understand/hipgraph). ## Device-Side Malloc diff --git a/docs/index.md b/docs/index.md index 636b8ba812..d4bb398eac 100644 --- a/docs/index.md +++ b/docs/index.md @@ -31,6 +31,7 @@ On non-AMD platforms, like NVIDIA, HIP provides header files required to support * {doc}`./understand/programming_model` * {doc}`./understand/hardware_implementation` +* {doc}`./understand/hipgraph` * {doc}`./understand/amd_clr` ::: diff --git a/docs/sphinx/_toc.yml.in b/docs/sphinx/_toc.yml.in index 7b188db9e3..6c07ba17cc 100644 --- a/docs/sphinx/_toc.yml.in +++ b/docs/sphinx/_toc.yml.in @@ -17,6 +17,7 @@ subtrees: entries: - file: understand/programming_model - file: understand/hardware_implementation + - file: understand/hipgraph - file: understand/amd_clr - caption: How to diff --git a/docs/understand/hipgraph.rst b/docs/understand/hipgraph.rst new file mode 100644 index 0000000000..ed795999d8 --- /dev/null +++ b/docs/understand/hipgraph.rst @@ -0,0 +1,73 @@ +.. meta:: + :description: This chapter provides an overview over the usage of HIP graph. + :keywords: ROCm, HIP, graph, stream + +.. understand_HIP_graph: + +******************************************************************************** +HIP graph +******************************************************************************** + +HIP graphs are an alternative way of executing work on a GPU. It can provide +performance benefits over repeatedly launching the same kernels in the standard +way via streams. + +.. note:: + The HIP graph API is currently in Beta. Some features can change and might + have outstanding issues. Not all features supported by CUDA graphs are yet + supported. For a list of all currently supported functions see the + :doc:`HIP graph API documentation<../doxygen/html/group___graph>`. + +Graph format +================================================================================ + +A HIP graph is, like any other graph, made up of nodes and edges. The nodes of a +HIP graph represent the operations performed, while the edges mark dependencies +between those operations. + +The nodes can consist of: + +- empty nodes +- nested graphs +- kernel launches +- host-side function calls +- HIP memory functions (copy, memset, ...) +- HIP events +- signalling or waiting on external semaphores + +HIP graph advantages +================================================================================ + +The standard way of launching work on GPUs via streams incurs a small overhead for each operation involved +every time. For kernels that take a considerable amount to finish, this overhead +usually is negligible, however many workloads, including scientific simulations +and AI, involve launching many relatively small kernels repeatedly for many iterations. + +HIP graphs have been specifically designed to tackle this problem by only +requiring one launch from the host per iteration, and minimizing that overhead +by performing most of the initialization beforehand. Graphs may provide +additional performance benefits, by enabling optimizations that are only +possible when knowing the dependencies between the operations. + +HIP graph usage +================================================================================ + +Using HIP graphs to execute your work requires three different steps, where the +first two are the initial setup and only need to be executed once. First the +definition of the operations (nodes) and the dependencies (edges) between them. +The second step is the instantiation of the graph. This takes care of validating +and initializing the graph, to reduce the overhead when executing the graph. + +The third step is the actual execution of the graph, which then takes care of +launching all the kernels and executing the operations while respecting their +dependencies and necessary synchronizations as specified. + +As HIP graphs require some set up and initialization overhead before their first +execution, they only provide a benefit for workloads that require many iterations to complete. + +Setting up HIP graphs +================================================================================ + +HIP graphs can be created by explicitly defining them, or using stream capture. +For the available functions see the +:doc:`HIP graph API documentation<../doxygen/html/group___graph>`.