GPU Programming page (#325)

* New version of the GPU programming section. * Added abstractions. * Added OneProf * Added GPU programming wiht Julia.
NLeSC · Aug 8, 2024 · ac43572 · ac43572
1 parent 6edf262
commit ac43572
Show file tree

Hide file tree

Showing 2 changed files with 125 additions and 51 deletions.
diff --git a/_sidebar.md b/_sidebar.md
@@ -11,7 +11,7 @@
       * [Bash](/best_practices/language_guides/bash.md)
       * [JavaScript and TypeScript](/best_practices/language_guides/javascript.md)
       * [Python](/best_practices/language_guides/python.md)
-      * [OpenCL and CUDA](/best_practices/language_guides/opencl_cuda.md)
+      * [GPU](/best_practices/language_guides/opencl_cuda.md)
       * [R](/best_practices/language_guides/r.md)
       * [C and C++](/best_practices/language_guides/ccpp.md)
       * [Fortran](/best_practices/language_guides/fortran.md)

diff --git a/best_practices/language_guides/opencl_cuda.md b/best_practices/language_guides/opencl_cuda.md
@@ -1,69 +1,143 @@
-# OpenCL & CUDA
+# GPU Programming Languages
 
 *Page maintainer: Alessio Sclocco* [@isazi](https://github.com/isazi)
 
 
-## Sources for learning
- *please add university courses and informative videos*
- * Parallel Reduction [[Slides](http://developer.download.nvidia.com/compute/cuda/1.1-Beta/x86_website/projects/reduction/doc/reduction.pdf)]
- * GPU Memory bootcamp - Tony Scudiero [[git repo](https://github.com/tscudiero/MemBootcamp)]
-   * Best Practices [[Slides](https://github.com/tscudiero/MemBootcamp/blob/master/Slides/S5353_Scudiero_Bootcamp1.pdf)] [[Video](http://on-demand.gputechconf.com/gtc/2015/video/S5353.html)]
-   * Beyond the Best Practices [[Slides](https://github.com/tscudiero/MemBootcamp/blob/master/Slides/S5376-Scudiero_Bootcamp2.pdf)] [[Video](http://on-demand.gputechconf.com/gtc/2015/video/S5376.html)]
-   * Collaborative Access Patterns [[Slides](https://github.com/tscudiero/MemBootcamp/blob/master/Slides/S6181-Scudiero_Bootcamp3.pdf)] [[Video](http://on-demand.gputechconf.com/gtc/2016/video/s6181-tony-scudiero-bootcamp-3.mp4)]
- * CUB: CUDA Collective primitives library [[Git](https://github.com/NVlabs/cub)] [[Slides](http://on-demand.gputechconf.com/gtc/2015/presentation/S5617-Duane-Merrill.pdf)] [[Video](http://on-demand.gputechconf.com/gtc/2015/video/S5617.html)]
- * Best Practices Guide by PRACE [[HTML](https://prace-ri.eu/training-support/best-practice-guides/best-practice-guide-gpgpu/)] [[PDF](https://prace-ri.eu/wp-content/uploads/Best-Practice-Guide_GPGPU.pdf)]
-
+## Learning Resources
+
+* Carpentries GPU Programming course
+  * [Lesson material](https://carpentries-incubator.github.io/lesson-gpu-programming/)
+* Introduction to CUDA C
+  * [Slides](http://developer.download.nvidia.com/compute/developertrainingmaterials/presentations/cuda_language/Introduction_to_CUDA_C.pptx)
+  * [Video](http://on-demand.gputechconf.com/gtc/2012/video/S0624-Monday-Introduction-to-CUDA-C.mp4)
+* Introduction to OpenACC
+  * [Slides](http://developer.download.nvidia.com/compute/developertrainingmaterials/presentations/openacc/Introduction_To_OpenACC.pptx)
+* Introduction to HIP Programming
+  * [Video](https://www.youtube.com/watch?v=3ejUwypP0bI)
+* SYCL Introduction and Best Practices
+  * [Video](https://www.youtube.com/watch?v=TbkrODiVDQY)
+* CSCS GPU Programming with Julia
+  * [Course recordings](https://github.com/omlins/julia-gpu-course)
+
 ## Documentation
-* OpenCL specification [[1.2](https://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/)] [[2.0](https://www.khronos.org/registry/cl/sdk/2.0/docs/man/xhtml/)]
-* CUDA Toolkit [[latest](http://docs.nvidia.com/cuda/index.html)]
-   * [CUDA Programming Guide](http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html)
-   * [CUDA Runtime API](http://docs.nvidia.com/cuda/cuda-runtime-api/index.html)
-
-
-## Source-to-source translation between CUDA and OpenCL
-* vtsynergy (https://github.com/vtsynergy)
-   * This was shown to work on DAS5 after copying /usr/include/limits.h to $PWD and commenting out the lines around # include_next (122-125) :
-   "cu2cl-tool host_code.cc device_code.cu  -- -DGPU_ON -I$PWD:/usr/include -I/usr/lib/gcc/x86_64-redhat-linux/4.8.2/include".
-* cutocl (https://github.com/benvanwerkhoven/cutocl)
-
 
-## Overview of libraries
-* OpenCL-based libraries
-  * [CLBlast](https://github.com/CNugteren/CLBlast)
-  * [clFFT](https://github.com/clMathLibraries/clFFT)
-* CUDA-based libraries
+* CUDA
+  * [C programming guide](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html)
+  * [Runtime API](https://docs.nvidia.com/cuda/cuda-runtime-api/)
+  * [Driver API](https://docs.nvidia.com/cuda/cuda-driver-api/index.html)
+  * [Fortran programming guide](https://docs.nvidia.com/hpc-sdk/compilers/cuda-fortran-prog-guide/index.html)
+* HIP
+  * [Kernel language syntax](https://rocm.docs.amd.com/projects/HIP/en/latest/reference/kernel_language.html)
+  * [Runtime API](https://rocm.docs.amd.com/projects/HIP/en/latest/.doxygen/docBin/html/modules.html)
+* SYCL
+  * [Specification](https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html)
+  * [Reference guide](https://www.khronos.org/files/sycl/sycl-2020-reference-guide.pdf)
+* OpenCL
+  * [Guide](https://github.com/KhronosGroup/OpenCL-Guide)
+  * [API](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_API.html)
+  * [OpenCL C specification](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_C.html)
+  * [Reference guide](https://www.khronos.org/files/opencl30-reference-guide.pdf)
+* OpenACC
+  * [Programming guide](https://www.openacc.org/sites/default/files/inline-files/OpenACC_Programming_Guide_0_0.pdf)
+  * [Reference guide](https://www.openacc.org/sites/default/files/inline-files/API%20Guide%202.7.pdf)
+* OpenMP
+  * [Reference guide](https://www.openmp.org/wp-content/uploads/OpenMPRef-5.0-111802-web.pdf)
+
+## Overview of Libraries
+
+* CUDA
   * [cuBLAS](http://docs.nvidia.com/cuda/cublas/index.html)
   * [NVBLAS](http://docs.nvidia.com/cuda/nvblas/index.html)
   * [cuFFT](http://docs.nvidia.com/cuda/cufft/index.html)
   * [cuGRAPH](https://docs.rapids.ai/api/cugraph/stable/)
   * [cuRAND](http://docs.nvidia.com/cuda/curand/index.html)
   * [cuSPARSE](http://docs.nvidia.com/cuda/cusparse/index.html)
+* HIP
+  * [hipBLAS](https://rocm.docs.amd.com/projects/hipBLAS/en/latest/index.html)
+  * [hipFFT](https://rocm.docs.amd.com/projects/hipFFT/en/latest/index.html)
+  * [hipRAND](https://rocm.docs.amd.com/projects/hipRAND/en/latest/index.html)
+  * [hipSPARSE](https://rocm.docs.amd.com/projects/hipSPARSE/en/latest/index.html)
+* SYCL
+  * [OneAPI BLAS](https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-dpcpp/2024-0/blas-routines.html#GUID-F277361F-49BA-437B-8100-3D2B6BBC3CC1)
+  * [OneAPI FFT](https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-dpcpp/2024-0/fourier-transform-functions.html#GUID-BB8891E9-D3D3-40B9-BBB1-2390C4340CDA)
+  * [OneAPI sparse](https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-dpcpp/2024-0/sparse-blas-routines.html#GUID-7123D31B-9C7F-4BA7-9792-02A417293E60)
+  * [OneAPI random number generators](https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-dpcpp/2024-0/random-number-generators.html#GUID-FFC80D12-C323-4A9F-83E3-D0ACDB686876)
+* OpenCL
+  * [CLBlast](https://github.com/CNugteren/CLBlast)
+  * [clFFT](https://github.com/clMathLibraries/clFFT)
 
+## Source-to-source Translation
+
+* CUDA to HIP
+  * [hipify](https://github.com/ROCm/HIPIFY)
+* CUDA to SYCL
+  * [SYCLomatic](https://github.com/oneapi-src/SYCLomatic)
+* CUDA to OpenCL
+  * [cutocl](https://github.com/benvanwerkhoven/cutocl)
+
+## Foreign Function Interfaces
+
+* C++
+  * CUDA
+    * [cudawrappers](https://github.com/nlesc-recruit/cudawrappers)
+  * OpenCL
+    * [CLHPP](https://github.com/KhronosGroup/OpenCL-CLHPP)
+* Python
+  * CUDA
+    * [PyCuda](https://mathema.tician.de/software/pycuda/)
+    * [CuPy](https://cupy.dev/)
+    * [cuda-python](https://nvidia.github.io/cuda-python/)
+  * HIP
+    * [PyHIP](https://github.com/jatinx/PyHIP)
+  * SYCL
+    * [dpctl](https://github.com/IntelPython/dpctl)
+  * OpenCL
+    * [PyOpenCL](https://mathema.tician.de/software/pycuda/)
+* Julia
+  * CUDA
+    * [CUDA.jl](https://github.com/JuliaGPU/CUDA.jl)
+  * HIP
+    * [AMDGPU.jl](https://github.com/JuliaGPU/AMDGPU.jl)
+  * SYCL
+    * [oneAPI.jl](https://github.com/JuliaGPU/oneAPI.jl)
+* Java
+  * CUDA
+    * [JCuda](http://www.jcuda.org/)
+  * OpenCL
+    * [JOCL](http://www.jocl.org/)
+
+## High-Level Abstractions
+
+* C++
+  * [Kokkos](https://github.com/kokkos/kokkos)
+  * [Raja](https://github.com/LLNL/RAJA)
+* Python
+  * [Numba](https://numba.pydata.org/)
+  * [pykokkos](https://github.com/kokkos/pykokkos)
 
-## Foreign Function Interfaces for CUDA and OpenCL
- * C++: [[Cuda](https://github.com/eyalroz/cuda-api-wrappers/)], [[OpenCL](https://github.com/KhronosGroup/OpenCL-CLHPP)]
- * Python: [[PyCuda](https://mathema.tician.de/software/pycuda/)], [[PyOpenCL](https://mathema.tician.de/software/pycuda/)]
- * Java: [[JCuda](http://www.jcuda.org/)], [[JOCL](http://www.jocl.org/)]
-
+## Debugging and Profiling Tools
 
-## Testing
- * Unit Testing
-    * Example of a unit test for CUDA kernel using the [Kernel Tuner](https://github.com/benvanwerkhoven/kernel_tuner/blob/master/examples/cuda/test_vector_add.py)
- * [comparing floating-point results](http://docs.nvidia.com/cuda/floating-point/index.html)
+* CUDA
+  * [Nsight Systems](https://developer.nvidia.com/nsight-systems)
+  * [Nsight Compute](https://developer.nvidia.com/nsight-compute)
+  * [CUDA-GDB](http://docs.nvidia.com/cuda/cuda-gdb/index.html)
+  * [compute-sanitizer](https://docs.nvidia.com/compute-sanitizer/index.html)
+* HIP
+  * [omniperf](https://github.com/AMDResearch/omniperf)
+  * [rocprof](https://github.com/ROCm/rocprofiler)
+* SYCL
+  * [oneprof](https://github.com/intel/pti-gpu/tree/master/tools/oneprof)
+  * [onetrace](https://github.com/intel/pti-gpu/tree/master/tools/onetrace)
 
+## Performance Optimization
 
-## Debugging and Profiling Tools
- * [Nvidia Visual Profiler](https://developer.nvidia.com/nvidia-visual-profiler) [[User Guide](http://docs.nvidia.com/cuda/profiler-users-guide)]
- * [CUDA-GDB](http://docs.nvidia.com/cuda/cuda-gdb/index.html)
- * [compute-sanitizer](https://docs.nvidia.com/cuda/compute-sanitizer/index.html)
+* [PRACE best practice guide on modern accelerators](https://zenodo.org/records/5839488)
+* [CUDA best practices](https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html)
+* [OneAPI SYCL best practices](https://www.intel.com/content/www/us/en/docs/oneapi/programming-guide/2023-2/optimize-your-sycl-applications.html)
 
+## Auto-tuning
 
-## Performance Optimization
- * Resources:
-     * Better Performance at Lower Occupancy [[Slides](http://www.nvidia.com/content/gtc-2010/pdfs/2238_gtc2010.pdf)] [[Video](http://on-demand.gputechconf.com/gtc/2010/video/S12238-Better-Performance-at-Lower-Occupancy.mp4)]
-     * [Maxwell Tuning Guide](http://docs.nvidia.com/cuda/maxwell-tuning-guide)
-     * [Pascal Tuning Guide](http://docs.nvidia.com/cuda/pascal-tuning-guide)
-
- * Generic Auto Tuners:
-     * [Kernel Tuner](https://github.com/benvanwerkhoven/kernel_tuner) (Python)
-     * [CLTune](https://github.com/CNugteren/CLTune) (C++)
+* Kernel Tuner
+  * [GitHub repository](https://github.com/KernelTuner/kernel_tuner)
+  * [Documentation](https://kerneltuner.github.io/kernel_tuner/stable/)
+  * [Tutorial](https://github.com/KernelTuner/kernel_tuner_tutorial)