A collection of GPU benchmarks to evaluate software stack performance. The tests are written in CUDA with a simple HIP compatibility layer that allows the tests to be run on AMD GPUs without modification while not requiring HIP as a dependency on NVIDIA systems.
- Memory allocations
- Page faults
- Launch latencies
- Memory access latencies
- Memory bandwidth
Support for both ROCm's rocPRIM and NVIDIA's cub/thrust.
- Radix sort
- Prefix sums
- Reductions