Skip to content

Commit

Permalink
Merge understand and how-to hip graphs
Browse files Browse the repository at this point in the history
  • Loading branch information
MKKnorr committed Sep 23, 2024
1 parent 8f539d7 commit a30a0b8
Show file tree
Hide file tree
Showing 6 changed files with 74 additions and 100 deletions.
File renamed without changes.
File renamed without changes
80 changes: 74 additions & 6 deletions docs/how-to/hipgraph.rst
Original file line number Diff line number Diff line change
@@ -1,16 +1,72 @@
.. meta::
:description: This chapter describes how to use HIP graphs.
:description: This chapter describes how to use HIP graphs and highlights their use cases.
:keywords: ROCm, HIP, graph, stream

.. _how_to_HIP_graph:

********************************************************************************
Using HIP graphs
HIP graphs
********************************************************************************

This chapter explains how to create and use HIP graphs. To get a better
understanding of HIP graphs see
:ref:`the understand-chapter about HIP graphs<understand_HIP_graph>`.
.. note::
The HIP graph API is currently in Beta. Some features can change and might
have outstanding issues. Not all features supported by CUDA graphs are yet
supported. For a list of all currently supported functions see the
:doc:`HIP graph API documentation<../doxygen/html/group___graph>`.

HIP graphs are an alternative way of executing tasks on a GPU that can provide
performance benefits over launching kernels using the standard
method via streams. A HIP graph is made up of nodes and edges. The nodes of a HIP graph represent
the operations performed, while the edges mark dependencies between those
operations.

The nodes can be one of the following:

- empty nodes
- nested graphs
- kernel launches
- host-side function calls
- HIP memory functions (copy, memset, ...)
- HIP events
- signalling or waiting on external semaphores

.. note::
The available node types are specified by ``hipGraphNodeType``.

The following figure visualizes the concept of graphs, compared to using streams.

.. figure:: ../data/understand/hipgraph/hip_graph.svg
:alt: Diagram depicting the difference between using streams to execute
kernels with dependencies, resolved by explicitly calling
hipDeviceSynchronize, or using graphs, where the edges denote the
dependencies.

The standard method of launching kernels incurs a small overhead
for each iteration of the operation involved. For kernels that perform large
operations during an iteration this overhead is usually negligible. However
in many workloads, such as scientific simulations and AI, a kernel might perform a
small operation over a great number of iterations, and so the overhead of repeatedly
launching kernels can have a significant impact on performance.

HIP graphs are designed to address this issue, by predefining the HIP API calls
and their dependencies with a graph, and performing most of the initialization
beforehand. Launching a graph only requires a single call, after which the
driver takes care of executing the operations within the graph.
Graphs can provide additional performance benefits, by enabling optimizations
that are only possible when knowing the dependencies between the operations.

.. figure:: ../data/understand/hipgraph/hip_graph_speedup.svg
:alt: Diagram depicting the speed up achievable with HIP graphs compared to
HIP streams when launching many short-running kernels.

Qualitative presentation of the execution time of many short-running kernels
when launched using HIP stream versus HIP graph. This does not include the
time needed to set up the graph.


********************************************************************************
Using HIP graphs
********************************************************************************

There are two different ways of creating graphs: Capturing kernel launches from
a stream, or explicitly creating graphs. The difference between the two
Expand All @@ -23,6 +79,18 @@ The general flow for using HIP graphs includes the following steps.
#. Use ``hipGraphLaunch`` to launch the executable graph to a stream
#. After execution completes free and destroy graph resources

The first two steps are the initial setup and only need to be executed once. First
step is the definition of the operations (nodes) and the dependencies (edges)
between them. The second step is the instantiation of the graph. This takes care
of validating and initializing the graph, to reduce the overhead when executing
the graph. The third step is the execution of the graph, which takes care of
launching all the kernels and executing the operations while respecting their
dependencies and necessary synchronizations as specified.

Because HIP graphs require some setup and initialization overhead before their
first execution, graphs only provide a benefit for workloads that require
many iterations to complete.

In both methods the ``hipGraph_t`` template for a graph is used to define the graph.
In order to actually launch a graph, the template needs to be instantiated using
``hipGraphInstantiate``, which results in an actually executable graph of type ``hipGraphExec_t``.
Expand All @@ -41,7 +109,7 @@ memory on the device or copying memory between the host and the device.
Whether you want to pre-allocate the memory or manage it within the graph
depends on the use-case. If the graph is executed in a tight loop the
performance is usually better when the memory is preallocated, so that it
doesn't need to be reallocated in every iteration.
does not need to be reallocated in every iteration.

The same rules as for normal memory allocations apply for memory allocated and
freed by nodes, meaning that the nodes that access memory allocated in a graph
Expand Down
94 changes: 0 additions & 94 deletions docs/understand/hipgraph.rst

This file was deleted.

0 comments on commit a30a0b8

Please sign in to comment.