-
Notifications
You must be signed in to change notification settings - Fork 7
Description
Problem
Nsight's profiling system currently profiles each @nsight.analyze.kernel decorated function in a separate script execution. While this allows individual functions to be profiled, it creates two major issues:
-
Results Isolation: Each profiling execution only returns results for the currently-profiled function. Other decorated functions in a single script return
Noneduring that execution. -
Data Inaccessibility: When profiling multiple functions in a single script, results from earlier profiling executions are not accessible in later executions, making it impossible to access all profiling results within a single script run.
The root cause is that @nsight.analyze.kernel triggers a separate script execution for each decorated function it profiles.
Current Behavior
import torch
import nsight
sizes = [(2**i,) for i in range(11, 14)]
@nsight.analyze.kernel(configs=sizes, runs=10)
def kernel1(n: int) -> None:
a = torch.randn(n, n, device="cuda")
b = torch.randn(n, n, device="cuda")
with nsight.annotate("matmul"):
_ = a @ b
@nsight.analyze.kernel(configs=sizes, runs=10)
def kernel2(n: int) -> None:
a = torch.randn(n, n, device="cuda")
b = torch.randn(n, n, device="cuda")
with nsight.annotate("matmul"):
_ = a @ b
def main() -> None:
# When profiling kernel1, this will be executed and returns a ProfilerResult object.
# When profiling kernel2, this will also be executed (the second execution pass), but kernel1 returns None.
result1 = kernel1()
print("Kernel1 results:", result1.to_dataframe()) # When profiling kernel2, this cannot be accessed
# This works - kernel2 returns a ProfilerResult object.
result2 = kernel2()
print("Kernel2 results:", result2.to_dataframe()) # When profiling kernel2, this can be accessed
if __name__ == "__main__":
main()Expected Behavior
Both profiling functions should return valid ProfilerResult objects that can be accessed independently:
# User can access multiple results.
results = []
results.append(kernel1())
results.append(kernel2())