CUDA Flux is a profiler for GPU applications which reports the basic block executions frequencies of compute kernels
-
LLVM:
CUDA Flux is tested and developed with llvm 11.0git clone --branch release/11.x https://github.com/llvm/llvm-project.git
CMake config:
cmake -DLLVM_ENABLE_PROJECTS=clang -DCMAKE_INSTALL_PREFIX=/opt/llvm-11.0 \ -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=ON \ -DLLVM_ENABLE_DOXYGEN=OFF -DLLVM_BUILD_DOCS=OFF -GNinja \ -DLLVM_INSTALL_BINUTILS_SYMLINKS=ON -DBUILD_SHARED_LIBS=ON \ ../llvm
-
re2c lexer generator - http://re2c.org/ (make sure to check your package manager first)
-
CUDA SDK >= 9.0
-
Python (python3 preferred) with the yaml package installed
-
environment-modules (optional, but recommended)
# Make sure LLVM and CUDA are in your paths
git clone https://github.com/UniHD-CEG/cuda-flux.git
cd cuda-flux && mkdir build && cd build
cmake -DCMAKE_INSTALL_PREFIX=/opt/cuda-flux .. # change install dir if you wish
make install
Make sure the bin folder of your cuda-flux installation is in your path. Better: use environment-module to load cuda-flux.
module load /opt/cuda-flux/module/cuda_flux # LLVM and CUDA need to be loaded first!
Compile your CUDA application:
clang_cf++ --cuda-gpu-arch=sm_35 -std=c++11 -lcudart test/saxpy.cu -o saxpy`
Execute the binary like usual: ./saxpy
Output:
If any kernel was executed there is a bbc.txt file. Each kernel execution
results in a line with the following information:
- kernel name
- gridDim{X,Y,Z}
- blockDim{X,Y,Z}
- shared Memory size
- Profiling Mode (full, CTA, warp)
- List of executions counters for each block
The PTX instructions for all kernels are stored in the PTX_Analysis.yml
file. The order of the Basic Blocks corresponds with the order of the
counters in the bbc.txt file.
Profiling mode is controlled by the environment variable MEKONG_PROFILINGMODE
. If is is not set or set to 0 all threads are profiled, which is the recommended setting. export MEKONG_PROFILINGMODE=1
will only profile one CTA and export MEKONG_PROFILINGMODE=2
will profile only one warp.