diff --git a/.wordlist.txt b/.wordlist.txt
index 87d1579669..d2c8d7f28c 100644
--- a/.wordlist.txt
+++ b/.wordlist.txt
@@ -44,6 +44,7 @@ hardcoded
HC
HIP's
hipcc
+hipDeviceSynchronize
hipexamine
hipified
hipother
@@ -72,6 +73,7 @@ ltrace
makefile
Malloc
malloc
+memset
multicore
multigrid
multithreading
diff --git a/docs/data/understand/hipgraph/hip_graph.drawio b/docs/data/understand/hipgraph/hip_graph.drawio
new file mode 100644
index 0000000000..03569ac734
--- /dev/null
+++ b/docs/data/understand/hipgraph/hip_graph.drawio
@@ -0,0 +1,76 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/docs/data/understand/hipgraph/hip_graph.svg b/docs/data/understand/hipgraph/hip_graph.svg
new file mode 100644
index 0000000000..6eed6b92e5
--- /dev/null
+++ b/docs/data/understand/hipgraph/hip_graph.svg
@@ -0,0 +1,4 @@
+
+
+
+
\ No newline at end of file
diff --git a/docs/data/understand/hipgraph/hip_graph_speedup.drawio b/docs/data/understand/hipgraph/hip_graph_speedup.drawio
new file mode 100644
index 0000000000..95d02e1290
--- /dev/null
+++ b/docs/data/understand/hipgraph/hip_graph_speedup.drawio
@@ -0,0 +1,162 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/docs/data/understand/hipgraph/hip_graph_speedup.svg b/docs/data/understand/hipgraph/hip_graph_speedup.svg
new file mode 100644
index 0000000000..13b6a3323b
--- /dev/null
+++ b/docs/data/understand/hipgraph/hip_graph_speedup.svg
@@ -0,0 +1,4 @@
+
+
+
+
\ No newline at end of file
diff --git a/docs/how-to/programming_manual.md b/docs/how-to/programming_manual.md
index 33ab58de93..22847adaf9 100644
--- a/docs/how-to/programming_manual.md
+++ b/docs/how-to/programming_manual.md
@@ -146,7 +146,7 @@ For Linux developers, the link [here](https://github.com/ROCm/hip-tests/blob/dev
## HIP Graph
-HIP graph is supported. For more details, refer to the HIP API Guide.
+HIP graphs are supported. For more details, refer to the [HIP API Guide](../doxygen/html/group___graph) or the [understand section for HIP graphs](../understand/hipgraph).
## Device-Side Malloc
diff --git a/docs/index.md b/docs/index.md
index 2558b73e68..411dac710f 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -31,6 +31,7 @@ On non-AMD platforms, like NVIDIA, HIP provides header files required to support
* {doc}`./understand/programming_model`
* {doc}`./understand/hardware_implementation`
+* {doc}`./understand/hipgraph`
* {doc}`./understand/amd_clr`
:::
diff --git a/docs/sphinx/_toc.yml.in b/docs/sphinx/_toc.yml.in
index 850fde34e1..6b99b3767f 100644
--- a/docs/sphinx/_toc.yml.in
+++ b/docs/sphinx/_toc.yml.in
@@ -17,6 +17,7 @@ subtrees:
entries:
- file: understand/programming_model
- file: understand/hardware_implementation
+ - file: understand/hipgraph
- file: understand/amd_clr
- caption: How to
diff --git a/docs/understand/hipgraph.rst b/docs/understand/hipgraph.rst
new file mode 100644
index 0000000000..12b83f2749
--- /dev/null
+++ b/docs/understand/hipgraph.rst
@@ -0,0 +1,84 @@
+.. meta::
+ :description: This chapter provides an overview over the usage of HIP graph.
+ :keywords: ROCm, HIP, graph, stream
+
+.. understand_HIP_graph:
+
+********************************************************************************
+HIP graph
+********************************************************************************
+
+.. note::
+ The HIP graph API is currently in Beta. Some features can change and might
+ have outstanding issues. Not all features supported by CUDA graphs are yet
+ supported. For a list of all currently supported functions see the
+ :doc:`HIP graph API documentation<../doxygen/html/group___graph>`.
+
+A HIP graph is made up of nodes and edges. The nodes of a HIP graph represent
+the operations performed, while the edges mark dependencies between those
+operations.
+
+The nodes can consist of:
+
+- empty nodes
+- nested graphs
+- kernel launches
+- host-side function calls
+- HIP memory functions (copy, memset, ...)
+- HIP events
+- signalling or waiting on external semaphores
+
+The following figure visualizes the concept of graphs, compared to using streams.
+
+.. figure:: ../data/understand/hipgraph/hip_graph.svg
+ :alt: Diagram depicting the difference between using streams to execute
+ kernels with dependencies, resolved by explicitly calling
+ hipDeviceSynchronize, or using graphs, where the edges denote the
+ dependencies.
+
+HIP graph advantages
+================================================================================
+
+The standard way of launching work on GPUs via streams incurs a small overhead
+for each iteration of the operation involved. For kernels that perform large
+operations during an iteration this overhead is usually negligible. However
+in many workloads, such as scientific simulations and AI, a kernel performs a
+small operation for many iterations, and so the overhead of launching kernels
+can be a significant cost on performance.
+
+HIP graphs have been specifically designed to tackle this problem by only
+requiring one launch from the host per iteration, and minimizing that overhead
+by performing most of the initialization beforehand. Graphs can provide
+additional performance benefits, by enabling optimizations that are only
+possible when knowing the dependencies between the operations.
+
+.. figure:: ../data/understand/hipgraph/hip_graph_speedup.svg
+ :alt: Diagram depicting the speed up achievable with HIP graphs compared to
+ HIP streams when launching many short-running kernels.
+
+ Qualitative presentation of the execution time of many short-running kernels
+ when launched using HIP stream versus HIP graph. This does not include the
+ time needed to set up the graph.
+
+HIP graph usage
+================================================================================
+
+Using HIP graphs to execute your work requires three different steps, where the
+first two are the initial setup and only need to be executed once. First the
+definition of the operations (nodes) and the dependencies (edges) between them.
+The second step is the instantiation of the graph. This takes care of validating
+and initializing the graph, to reduce the overhead when executing the graph.
+
+The third step is the actual execution of the graph, which then takes care of
+launching all the kernels and executing the operations while respecting their
+dependencies and necessary synchronizations as specified.
+
+As HIP graphs require some set up and initialization overhead before their first
+execution, they only provide a benefit for workloads that require many iterations to complete.
+
+Setting up HIP graphs
+================================================================================
+
+HIP graphs can be created by explicitly defining them, or using stream capture.
+For the available functions see the
+:doc:`HIP graph API documentation<../doxygen/html/group___graph>`.