Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable CUPTI to measure kernel execution time instead of CUDA Events #184

Open
fbusato opened this issue Aug 29, 2024 · 1 comment
Open

Comments

@fbusato
Copy link

fbusato commented Aug 29, 2024

CUDA events suffer from low accuracy and include the kernel launch overhead. On the other hand, CUPTI provides a more reliable way to get consistent timing measurement.
This request asks to add an option to replace CUDA Events with CUPTI.

Details

CUDA events issues:

  • Accuracy and Stability:
    • cudaEvent can fluctuate in the range of 10-30us, making measurements of small computations unreliable
    • cudaEvent take into account the kernel launch overhead that depends on host/CPU execution and/or driver version

CUPTI:

  • ~0.5us granularity vs. 10-30us
  • Not affected by kernel launch overhead
  • Consistency: measurements close to the profiler (nsys)
  • Efficiency: avoid using waiting/delay kernels to hide CPU overhead
@alliepiper
Copy link
Collaborator

We do mitigate a lot of the issues with events by using blocking_kernels, so it's not quite as bad as it seems. I think this would be a great addition, I'm curious how much this would improve the stability of our results, especially when sync tags are used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants