NVIDIA
diff --git a/‎tutorials/gpu-cuda-python-tutorial/0.0_Welcome.ipynb‎
Lines changed: 207 additions & 0 deletions b/‎tutorials/gpu-cuda-python-tutorial/0.0_Welcome.ipynb‎
Lines changed: 207 additions & 0 deletions
diff --git a/‎tutorials/gpu-cuda-python-tutorial/1.0_CPU_GPU_Comparison.ipynb‎
Lines changed: 151 additions & 0 deletions b/‎tutorials/gpu-cuda-python-tutorial/1.0_CPU_GPU_Comparison.ipynb‎
Lines changed: 151 additions & 0 deletions
@@ -0,0 +1,207 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "11150e25-024f-46cb-8ec5-992864291646",
+   "metadata": {},
+   "source": [
+    "<img src=\"images/cuda-python.jpg\" style=\"float: right;\" />\n",
+    "\n",
+    "# CUDA Python Kernel Authoring\n",
+    "\n",
+    "_Written by [Katrina Riehl](https://www.linkedin.com/in/katrinariehl)_\n",
+    "\n",
+    "**Welcome to the CUDA Python Kernel Authoring tutorial.**\n",
+    "\n",
+    "In this tutorial we will cover:\n",
+    "- What is a GPU and why is it different to a CPU?\n",
+    "- An overview of the CUDA development model.\n",
+    "- The CUDA Python ecosystem.\n",
+    "- Working with NumPy / CuPy style arrays on the GPU.\n",
+    "- Writing CUDA kernels using cuda.core.\n",
+    "- Using cuda.cccl.parallel library for algorithmic support.\n",
+    "\n",
+    "Attendees will be expected to have a general knowledge of Python and programming concepts, as well as a basic understanding of GPU computing."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6cee88d4-a78d-4013-be48-d30834a9becd",
+   "metadata": {},
+   "source": [
+    "### Outline\n",
+    "\n",
+    "- **0.0** - Overview of CUDA Python ecosystem\n",
+    "- **1.0** - cuda.core\n",
+    "- **1.1** - cuda.core exercise solutions\n",
+    "- **2.0** - cuda.cccl.parallel\n",
+    "- **2.1** - cuda.cccl.parallel exercise solutions"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8a1bc4fc-755b-4d79-af13-54787d27a19d",
+   "metadata": {},
+   "source": [
+    "# CPU and GPU Comparison"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fc72f899-45b3-4951-a63f-f729b21a3d67",
+   "metadata": {},
+   "source": [
+    "The CPU is the most common type of processor for executing your code. CPUs have one or more serial processors which each take single instructions from a stack and **execute them sequentially**.\n",
+    "\n",
+    "GPUs are a form of coprocessor which are commonly used for video and image rendering, but are extremely popular in machine learning and data science fields too. GPUs have one or more streaming multiprocessors which take in arrays of instructions and **execute them in parallel**."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "345434c3-0798-4a75-906a-b21670de786c",
+   "metadata": {},
+   "source": [
+    "<figure>\n",
+    "\n",
+    "![CPU GPU Comparison](images/cpu-gpu.png)\n",
+    "\n",
+    "<figcaption style=\"text-align: center;\"> \n",
+    "    \n",
+    "Image source <a href=\"https://docs.nvidia.com/cuda/cuda-c-programming-guide/\">https://docs.nvidia.com/cuda/cuda-c-programming-guide/</a>\n",
+    "    \n",
+    "</figcaption>\n",
+    "</figure>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f1963293-c6ce-4d1c-becc-5861743cd3ec",
+   "metadata": {},
+   "source": [
+    "## What is a kernel?\n",
+    "\n",
+    "A kernel is similar to a function, it is a block of code which takes some inputs and is executed by a processor.\n",
+    "\n",
+    "The difference between a function and a kernel is:\n",
+    "- A kernel cannot return anything, it must instead modify memory\n",
+    "- A kernel must specify its thread hierarchy (threads and blocks)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "922c26aa-73b0-4436-86db-004e2d8e16bc",
+   "metadata": {},
+   "source": [
+    "## What are grids, threads and blocks (and warps)?\n",
+    "\n",
+    "[Threads and blocks](https://en.wikipedia.org/wiki/Thread_block_(CUDA_programming) ) are how you instruct you GPU to process some code in parallel. Our GPU is a parallel processor, so we need to specify how many times we want our kernel to be executed.\n",
+    "\n",
+    "Threads have the benefit of having some shared cache memory between them, but there are a limited number of cores on each GPU so we need to break our work down into blocks which will be scheduled and run in parallel on the GPU.\n",
+    "\n",
+    "<figure>\n",
+    "\n",
+    "![CPU GPU Comparison](images/threads-blocks-warps.png)\n",
+    "\n",
+    "<figcaption style=\"text-align: center;\"> \n",
+    "    \n",
+    "Image source <a href=\"https://docs.nvidia.com/cuda/cuda-c-programming-guide/\">https://docs.nvidia.com/cuda/cuda-c-programming-guide/</a>\n",
+    "    \n",
+    "</figcaption>\n",
+    "</figure>\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b688c518-3545-4c0c-ae90-a7aa4bf40690",
+   "metadata": {},
+   "source": [
+    "## So how do you control the GPU?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c89b6124-a7cc-4d92-ba6d-43d3bd7cdd55",
+   "metadata": {},
+   "source": [
+    "Executing code on your GPU feels a lot like executing code on a second computer over a network.\n",
+    "\n",
+    "If I wanted to send a Python program to another machine to be executed I would need a few things:\n",
+    "- A way to copy data and code to the remote machine (SCP, SFTP, SMB, NFS, etc)\n",
+    "- A way to log in and execute programs on that remote machine (SSH, VNC, Remote Desktop, etc)\n",
+    "\n",
+    "![CPU GPU Comparison](images/two-computers-network.png)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c89e534a-5f6e-4562-a90b-10624dd6afea",
+   "metadata": {},
+   "source": [
+    "To achieve the same things with the GPU we need to use CUDA over PCI. But the idea is still the same &mdash; we need to move data and code to the device and execute that code. \n",
+    "\n",
+    "\n",
+    "\n",
+    "![CPU GPU Comparison](images/computer-gpu-cuda.png)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c8385d5e-5a11-4d30-b574-418f69fbfbf9",
+   "metadata": {},
+   "source": [
+    "## What is CUDA?\n",
+    "\n",
+    "[CUDA](https://developer.nvidia.com/cuda-zone) (Compute Unified Device Architecture) is a parallel computing platform and programming model developed by NVIDIA that allows developers to use NVIDIA GPUs for general-purpose processing, not just graphics. It enables programmers to harness the massive parallelism of GPUs to significantly accelerate compute-intensive applications in fields like artificial intelligence, scientific simulations, and data analysis.\n",
+    "\n",
+    "## What is CUDA Python?\n",
+    "\n",
+    "[CUDA Python](https://nvidia.github.io/cuda-python/latest/) is the home for accessing NVIDIA’s CUDA platform from Python. It consists of multiple components:\n",
+    "\n",
+    "- **cuda.core**: Pythonic access to CUDA runtime and other core functionalities\n",
+    "- **cuda.bindings**: Low-level Python bindings to CUDA C APIs\n",
+    "- **cuda.pathfinder**: Utilities for locating CUDA components installed in the user’s Python environment\n",
+    "- **cuda.cccl.cooperative**: A Python module providing CCCL’s reusable block-wide and warp-wide device primitives for use within Numba CUDA kernels\n",
+    "- **cuda.cccl.parallel**: A Python module for easy access to CCCL’s highly efficient and customizable parallel algorithms, like sort, scan, reduce, transform, etc, that are callable on the host\n",
+    "- **numba.cuda**: Numba’s target for CUDA GPU programming by directly compiling a restricted subset of Python code into CUDA kernels and device functions following the CUDA execution model.\n",
+    "- **nvmath-python**: Pythonic access to NVIDIA CPU & GPU Math Libraries, with both host and device (through nvmath.device) APIs. It also provides low-level Python bindings to host C APIs (through nvmath.bindings).\n",
+    "\n",
+    "In this tutorial, we will focus on cuda.core and cuda.cccl.parallel."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "07174e57-4015-4844-97be-c6d6909ccd85",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "accelerator": "GPU",
+  "colab": {
+   "gpuType": "T4",
+   "provenance": [],
+   "toc_visible": true
+  },
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.7"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
@@ -0,0 +1,151 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "ff189fc2-802c-42d7-a299-8b8224f0d025",
+   "metadata": {},
+   "source": [
+    "# CPU and GPU Comparison"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "216bd640-6104-45e8-a7c6-09ce9af20f1c",
+   "metadata": {},
+   "source": [
+    "The CPU is the most common type of processor for executing your code. CPUs have one or more serial processors which each take single instructions from a stack and **execute them sequentially**.\n",
+    "\n",
+    "GPUs are a form of coprocessor which are commonly used for video and image rendering, but are extremely popular in machine learning and data science fields too. GPUs have one or more streaming multiprocessors which take in arrays of instructions and **execute them in parallel**."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "afdb98ac-89dc-44a6-a3fc-bd5a8374daf7",
+   "metadata": {},
+   "source": [
+    "<figure>\n",
+    "\n",
+    "![CPU GPU Comparison](images/cpu-gpu.png)\n",
+    "\n",
+    "<figcaption style=\"text-align: center;\"> \n",
+    "    \n",
+    "Image source <a href=\"https://docs.nvidia.com/cuda/cuda-c-programming-guide/\">https://docs.nvidia.com/cuda/cuda-c-programming-guide/</a>\n",
+    "    \n",
+    "</figcaption>\n",
+    "</figure>\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "85278019-6617-43dd-9928-9571a683ea6a",
+   "metadata": {},
+   "source": [
+    "## Mythbusters explanation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fb6bad2c-63ee-4c28-a42f-9323b1e689d2",
+   "metadata": {},
+   "source": [
+    "This video may help explain the concept visually."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "85961642-b01c-4ce2-9872-31c3f2186072",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "from IPython.display import YouTubeVideo\n",
+    "YouTubeVideo(id='-P28LKWTzrI',width=1000,height=600)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fbf0ab63-065c-40e9-9d16-2f51ea192bfc",
+   "metadata": {},
+   "source": [
+    "## So how do you control the GPU?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a63ebb5e-3a44-43d6-bab4-22bd98ea523e",
+   "metadata": {},
+   "source": [
+    "Executing code on your GPU feels a lot like executing code on a second computer over a network.\n",
+    "\n",
+    "If I wanted to send a Python program to another machine to be executed I would need a few things:\n",
+    "- A way to copy data and code to the remote machine (SCP, SFTP, SMB, NFS, etc)\n",
+    "- A way to log in and execute programs on that remote machine (SSH, VNC, Remote Desktop, etc)\n",
+    "\n",
+    "![CPU GPU Comparison](images/two-computers-network.png)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a302560a-a0bc-4dc9-914c-340817c4b85d",
+   "metadata": {},
+   "source": [
+    "To achieve the same things with the GPU we need to use CUDA over PCI. But the idea is still the same &mdash; we need to move data and code to the device and execute that code. \n",
+    "\n",
+    "\n",
+    "\n",
+    "![CPU GPU Comparison](images/computer-gpu-cuda.png)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "888d4144-13bc-487a-a4e1-45995e4b54bc",
+   "metadata": {},
+   "source": [
+    "## What is CUDA?\n",
+    "\n",
+    "[CUDA](https://developer.nvidia.com/cuda-zone) is an extension to C++ which allows us to compile GPU code and interact with the GPU.\n",
+    "\n",
+    "### But I write Python, not C++!\n",
+    "\n",
+    "Over the last few years NVIDIA has invested in bringing CUDA functionality to Python. \n",
+    "\n",
+    "Today there are packages like [Numba](https://numba.pydata.org/) which allows us to Just In Time (JIT) compile Python code into something that is compatible with CUDA and provides bindings to transfer data and execute that code.\n",
+    "\n",
+    "There are also many high level packages such as [CuPy](https://cupy.dev/), [cuDF](https://github.com/rapidsai/cudf), [cuML](https://github.com/rapidsai/cuml), [cuGraph](https://github.com/rapidsai/cugraph), and more which implement functionality in CUDA C++ and then package that with Python bindings so that it can be used directly from Python. These packages are collectively known as [RAPIDS](https://rapids.ai/).\n",
+    "\n",
+    "Lastly, there is also the [CUDA Python](https://developer.nvidia.com/cuda-python) library which provides Cython/Python wrappers for the CUDA driver and runtime APIs.\n",
+    "\n",
+    "This tutorial will focus on Numba and RAPIDS."
+   ]
+  }
+ ],
+ "metadata": {
+  "accelerator": "GPU",
+  "colab": {
+   "gpuType": "T4",
+   "provenance": [],
+   "toc_visible": true
+  },
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.7"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}