You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Triton is an MLIR-based JIT compiler that compiles Python programs for accelerators. It's already pretty popular for writing CUDA kernels, and more hardware vendors are implementing backends for it. It's interesting as a language because it moves the abstraction level, from operating on scalars (CUDA) to operating on blocks of values.
(I hope classifying this as a new language is ok. It's not really just another library for Python, since CE will need very different disassembly steps + device code display.)
The text was updated successfully, but these errors were encountered:
I started working on this yesterday, here are some notes for posterity:
Adding Triton as a JIT is not a good fit for CE, as we want to be able to target different Nvidia Microarches. (Interestingly, the CE machines have GPUs, T40s IIRC)
Triton recently added a AOT feature, which is this script.
Currently does not allow you to specify the microarch version (doesn't pass cc to compile())
It takes a Python file that defines the kernel, and emits a .h and .c file. The compiled kernel itself is encoded as hex into a C char array in the .c file.
The .h and .c files can be compiled using a standard C compiler.
Unfortunately, the resulting binary is not a cubin, so using nvdisasm / cuobjdump does not work for getting at the SASS & PTX. We'd have to first extract the cubin from the char array.
The JIT does emit proper cubin files, plus IR files for some of the intermediate dialects.
It seems hacky for CE to extract the cubin file from the char array, dump it to disk and then run nvdisasm on it. I haven't come up with a better idea yet though.
Language name
Triton
Language version
No response
Language homepage
https://triton-lang.org/main/index.html
Compiler homepage
https://github.com/openai/triton
Compiler version
v2.0
Motivation
Triton is an MLIR-based JIT compiler that compiles Python programs for accelerators. It's already pretty popular for writing CUDA kernels, and more hardware vendors are implementing backends for it. It's interesting as a language because it moves the abstraction level, from operating on scalars (CUDA) to operating on blocks of values.
(I hope classifying this as a new language is ok. It's not really just another library for Python, since CE will need very different disassembly steps + device code display.)
The text was updated successfully, but these errors were encountered: