Skip to content

Conversation

@ConvolutedDog
Copy link
Contributor

@ConvolutedDog ConvolutedDog commented Dec 14, 2025

Intro

This commit introduces a global profiling cache system to the nsight-python. It fundamentally improves how profiling results are collected and accessed, especially when using multiple @nsight.analyze.kernel-decorated functions within a single script.

Previously, profiling was performed in separate script executions for each decorated function, which meant only the currently-profiled function returned results while others returned None. This made it impossible to aggregate results from multiple kernels or access all profiling results within the same script run.

With this update, the profiling system saves results from every profiling execution to disk using a singleton-based cache manager. On subsequent calls, each decorated function transparently loads its result from cache if profiling has already been executed, enabling access to all profiling results in a single script run.

Refer to #12 for more details.

Key Changes

  • Introduced GlobalNCUProfileCache singleton.

Files Updated

  • Added: nsight/cache.py (new cache system)
  • Modified: nsight/collection/ncu.py (uses cache system)
  • Modified: examples/06_plot_customization.py to validate new cache behavior

…python.

It fundamentally improves how profiling results are collected and accessed,
especially when using multiple `@nsight.analyze.kernel`-decorated functions
within a single script.

Previously, profiling was performed in separate script executions for each
decorated function, which meant only the currently-profiled function returned
results while others returned `None`. This made it impossible to aggregate
results from multiple kernels or access all profiling results within the same
script run.

With this update, the profiling system saves results from every profiling
execution to disk using a singleton-based cache manager. On subsequent calls,
each decorated function transparently loads its result from cache if profiling
has already been executed, enabling access to all profiling results in a single
script run.

Refer to NVIDIA#12 for more details.

- Introduced `GlobalNCUProfileCache` singleton.

- Added: nsight/cache.py (new cache system)
- Modified: nsight/collection/ncu.py (uses cache system)
- Modified: examples/06_plot_customization.py to validate new cache behavior

Signed-off-by: ConvolutedDog <[email protected]>
Signed-off-by: ConvolutedDog <[email protected]>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Dec 14, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@ConvolutedDog ConvolutedDog marked this pull request as draft December 15, 2025 03:06
@ConvolutedDog ConvolutedDog marked this pull request as ready for review December 15, 2025 05:12
@acollins3
Copy link
Collaborator

/ok to test 1c36385

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants