From 5f27a8acd10670f6756c0e6f003e4a7230ba17f8 Mon Sep 17 00:00:00 2001 From: Matthias Knorr Date: Tue, 13 Aug 2024 12:33:26 +0200 Subject: [PATCH] Add understand chapter for HIP Graphs --- .wordlist.txt | 2 + .../data/understand/hipgraph/hip_graph.drawio | 76 ++++++++ docs/data/understand/hipgraph/hip_graph.svg | 4 + .../hipgraph/hip_graph_speedup.drawio | 162 ++++++++++++++++++ .../understand/hipgraph/hip_graph_speedup.svg | 4 + docs/how-to/programming_manual.md | 2 +- docs/index.md | 1 + docs/sphinx/_toc.yml.in | 1 + docs/understand/hipgraph.rst | 84 +++++++++ 9 files changed, 335 insertions(+), 1 deletion(-) create mode 100644 docs/data/understand/hipgraph/hip_graph.drawio create mode 100644 docs/data/understand/hipgraph/hip_graph.svg create mode 100644 docs/data/understand/hipgraph/hip_graph_speedup.drawio create mode 100644 docs/data/understand/hipgraph/hip_graph_speedup.svg create mode 100644 docs/understand/hipgraph.rst diff --git a/.wordlist.txt b/.wordlist.txt index 87d1579669..d2c8d7f28c 100644 --- a/.wordlist.txt +++ b/.wordlist.txt @@ -44,6 +44,7 @@ hardcoded HC HIP's hipcc +hipDeviceSynchronize hipexamine hipified hipother @@ -72,6 +73,7 @@ ltrace makefile Malloc malloc +memset multicore multigrid multithreading diff --git a/docs/data/understand/hipgraph/hip_graph.drawio b/docs/data/understand/hipgraph/hip_graph.drawio new file mode 100644 index 0000000000..03569ac734 --- /dev/null +++ b/docs/data/understand/hipgraph/hip_graph.drawio @@ -0,0 +1,76 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/docs/data/understand/hipgraph/hip_graph.svg b/docs/data/understand/hipgraph/hip_graph.svg new file mode 100644 index 0000000000..6eed6b92e5 --- /dev/null +++ b/docs/data/understand/hipgraph/hip_graph.svg @@ -0,0 +1,4 @@ + + + +Stream 1
Kernel B
Kernel B
Stream 2
Kernel A
Kernel A
hipDeviceSynchronize
hipDeviceSynchronize
Kernel C
Kernel C
hipDeviceSynchronize
hipDeviceSynchronize
Kernel D
Kernel D
Kernel A
Kernel A
Kernel B
Kernel B
Kernel C
Kernel C
Kernel D
Kernel D
Text is not SVG - cannot display
\ No newline at end of file diff --git a/docs/data/understand/hipgraph/hip_graph_speedup.drawio b/docs/data/understand/hipgraph/hip_graph_speedup.drawio new file mode 100644 index 0000000000..95d02e1290 --- /dev/null +++ b/docs/data/understand/hipgraph/hip_graph_speedup.drawio @@ -0,0 +1,162 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/docs/data/understand/hipgraph/hip_graph_speedup.svg b/docs/data/understand/hipgraph/hip_graph_speedup.svg new file mode 100644 index 0000000000..13b6a3323b --- /dev/null +++ b/docs/data/understand/hipgraph/hip_graph_speedup.svg @@ -0,0 +1,4 @@ + + + +Streams
kernel A
kernel A
kernel launch A
kernel launch A
kernel B
kernel B
kernel C
kernel C
kernel launch B
kernel launch B
kernel launch C
kernel launch C
host activity
host activity
device activity
device activi...
time
time
kernel launch D
kernel launch D
kernel D
kernel D
device idling due to kernel launch congestion
device idling due to kernel launch congesti...
kernel A
kernel A
kernel B
kernel B
kernel C
kernel C
graph launch
graph launch
host activity
host activity
device activity
device activi...
kernel D
kernel D
Graph
speedup
speedup
Text is not SVG - cannot display
\ No newline at end of file diff --git a/docs/how-to/programming_manual.md b/docs/how-to/programming_manual.md index 33ab58de93..22847adaf9 100644 --- a/docs/how-to/programming_manual.md +++ b/docs/how-to/programming_manual.md @@ -146,7 +146,7 @@ For Linux developers, the link [here](https://github.com/ROCm/hip-tests/blob/dev ## HIP Graph -HIP graph is supported. For more details, refer to the HIP API Guide. +HIP graphs are supported. For more details, refer to the [HIP API Guide](../doxygen/html/group___graph) or the [understand section for HIP graphs](../understand/hipgraph). ## Device-Side Malloc diff --git a/docs/index.md b/docs/index.md index 2558b73e68..411dac710f 100644 --- a/docs/index.md +++ b/docs/index.md @@ -31,6 +31,7 @@ On non-AMD platforms, like NVIDIA, HIP provides header files required to support * {doc}`./understand/programming_model` * {doc}`./understand/hardware_implementation` +* {doc}`./understand/hipgraph` * {doc}`./understand/amd_clr` ::: diff --git a/docs/sphinx/_toc.yml.in b/docs/sphinx/_toc.yml.in index 850fde34e1..6b99b3767f 100644 --- a/docs/sphinx/_toc.yml.in +++ b/docs/sphinx/_toc.yml.in @@ -17,6 +17,7 @@ subtrees: entries: - file: understand/programming_model - file: understand/hardware_implementation + - file: understand/hipgraph - file: understand/amd_clr - caption: How to diff --git a/docs/understand/hipgraph.rst b/docs/understand/hipgraph.rst new file mode 100644 index 0000000000..12b83f2749 --- /dev/null +++ b/docs/understand/hipgraph.rst @@ -0,0 +1,84 @@ +.. meta:: + :description: This chapter provides an overview over the usage of HIP graph. + :keywords: ROCm, HIP, graph, stream + +.. understand_HIP_graph: + +******************************************************************************** +HIP graph +******************************************************************************** + +.. note:: + The HIP graph API is currently in Beta. Some features can change and might + have outstanding issues. Not all features supported by CUDA graphs are yet + supported. For a list of all currently supported functions see the + :doc:`HIP graph API documentation<../doxygen/html/group___graph>`. + +A HIP graph is made up of nodes and edges. The nodes of a HIP graph represent +the operations performed, while the edges mark dependencies between those +operations. + +The nodes can consist of: + +- empty nodes +- nested graphs +- kernel launches +- host-side function calls +- HIP memory functions (copy, memset, ...) +- HIP events +- signalling or waiting on external semaphores + +The following figure visualizes the concept of graphs, compared to using streams. + +.. figure:: ../data/understand/hipgraph/hip_graph.svg + :alt: Diagram depicting the difference between using streams to execute + kernels with dependencies, resolved by explicitly calling + hipDeviceSynchronize, or using graphs, where the edges denote the + dependencies. + +HIP graph advantages +================================================================================ + +The standard way of launching work on GPUs via streams incurs a small overhead +for each iteration of the operation involved. For kernels that perform large +operations during an iteration this overhead is usually negligible. However +in many workloads, such as scientific simulations and AI, a kernel performs a +small operation for many iterations, and so the overhead of launching kernels +can be a significant cost on performance. + +HIP graphs have been specifically designed to tackle this problem by only +requiring one launch from the host per iteration, and minimizing that overhead +by performing most of the initialization beforehand. Graphs can provide +additional performance benefits, by enabling optimizations that are only +possible when knowing the dependencies between the operations. + +.. figure:: ../data/understand/hipgraph/hip_graph_speedup.svg + :alt: Diagram depicting the speed up achievable with HIP graphs compared to + HIP streams when launching many short-running kernels. + + Qualitative presentation of the execution time of many short-running kernels + when launched using HIP stream versus HIP graph. This does not include the + time needed to set up the graph. + +HIP graph usage +================================================================================ + +Using HIP graphs to execute your work requires three different steps, where the +first two are the initial setup and only need to be executed once. First the +definition of the operations (nodes) and the dependencies (edges) between them. +The second step is the instantiation of the graph. This takes care of validating +and initializing the graph, to reduce the overhead when executing the graph. + +The third step is the actual execution of the graph, which then takes care of +launching all the kernels and executing the operations while respecting their +dependencies and necessary synchronizations as specified. + +As HIP graphs require some set up and initialization overhead before their first +execution, they only provide a benefit for workloads that require many iterations to complete. + +Setting up HIP graphs +================================================================================ + +HIP graphs can be created by explicitly defining them, or using stream capture. +For the available functions see the +:doc:`HIP graph API documentation<../doxygen/html/group___graph>`.