Enable CUPTI to measure kernel execution time instead of CUDA Events #184

fbusato · 2024-08-29T22:02:18Z

CUDA events suffer from low accuracy and include the kernel launch overhead. On the other hand, CUPTI provides a more reliable way to get consistent timing measurement.
This request asks to add an option to replace CUDA Events with CUPTI.

Details

CUDA events issues:

Accuracy and Stability:
- cudaEvent can fluctuate in the range of 10-30us, making measurements of small computations unreliable
- cudaEvent take into account the kernel launch overhead that depends on host/CPU execution and/or driver version

CUPTI:

~0.5us granularity vs. 10-30us
Not affected by kernel launch overhead
Consistency: measurements close to the profiler (nsys)
Efficiency: avoid using waiting/delay kernels to hide CPU overhead

The text was updated successfully, but these errors were encountered:

alliepiper · 2024-09-04T15:39:20Z

We do mitigate a lot of the issues with events by using blocking_kernels, so it's not quite as bad as it seems. I think this would be a great addition, I'm curious how much this would improve the stability of our results, especially when sync tags are used.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable CUPTI to measure kernel execution time instead of CUDA Events #184

Enable CUPTI to measure kernel execution time instead of CUDA Events #184

fbusato commented Aug 29, 2024

alliepiper commented Sep 4, 2024

Enable CUPTI to measure kernel execution time instead of CUDA Events #184

Enable CUPTI to measure kernel execution time instead of CUDA Events #184

Comments

fbusato commented Aug 29, 2024

Details

alliepiper commented Sep 4, 2024