Triton is a compiler for high-performance GPU kernels. It simplifies the process of writing parallel kernels for GPUs. With Triton, you can focus more on algorithm design and less on low-level programming.
In this notebook, we will guide you through the basic concepts of Triton, including writing kernels, compiling kernels, and using kernels in PyTorch. In the end, we will summarize some tips to optimize kernels and improve training or inference performance.
Why CUDA hard?
Compared with CUDA, what does Triton do?
Main benefits of Triton: