Skip to content

Latest commit

 

History

History
37 lines (23 loc) · 1.23 KB

File metadata and controls

37 lines (23 loc) · 1.23 KB

Triton Best Practices

Triton is a compiler for high-performance GPU kernels. It simplifies the process of writing parallel kernels for GPUs. With Triton, you can focus more on algorithm design and less on low-level programming.

In this notebook, we will guide you through the basic concepts of Triton, including writing kernels, compiling kernels, and using kernels in PyTorch. In the end, we will summarize some tips to optimize kernels and improve training or inference performance.

Table of Contents

Why Triton?

Why CUDA hard?

Compared with CUDA, what does Triton do?

Main benefits of Triton:

What is Triton?

How does Triton work?

How to use Triton?

How to write a performance-optimized kernel?

How to integrate Triton with PyTorch?

Key Takaways