Skip to content


Yui Chun Leung (Leo) edited this page Apr 29, 2023 · 6 revisions


  • Laptop: Dell Inspiron 7590 with GTX 1650
  • nvidia-smi with Driver 470.182.03 and Cuda Version 11.4
  • /usr/local has cuda-10.1, cuda-10.2 and cuda-11.2.
  • nvprof --metrics branch_efficiency binary.out returns
    ======== Warning: This version of nvprof doesn't support the underlying device, GPU profiling skipped
    • it does show profiling result and metric result.


  1. Install Nsight systems from [1].
    • Select the version that matches your cuda driver.
      • As my cuda driver version is 11.4 and according to [2], i guess i need to install the one around 2021.3.2, thus Nsight Systems in 2021.3.1.54.
    • Execute .run file.
      chmod +x
  2. Edit log permission.
    • Due to [3] and [4], the nsight system is not able to track CPU and GPU metrics.
    • Instead of sudo sh -c 'echo 2 >/proc/sys/kernel/perf_event_paranoid' (no permission even with sudo), you should run [5]
      sudo sh -c 'echo kernel.perf_event_paranoid=1 > /etc/sysctl.d/local.conf'
      sudo reboot
  3. Now run nsight system binary,
    __PREFETCH=off /nsight-systems-2021.3.1/bin/nsys profile -o noprefetch --stats=true binary.out
    • it should show,
CUDA API Statistics:

Time(%)  Total Time (ns)  Num Calls  Average (ns)  Minimum (ns)  Maximum (ns)  StdDev (ns)           Name         
-------  ---------------  ---------  ------------  ------------  ------------  ------------  ---------------------
   73.3      111,791,880          2  55,895,940.0         5,083   111,786,797  79,041,608.0  cudaLaunch           
   26.6       40,624,811          1  40,624,811.0    40,624,811    40,624,811           0.0  cudaDeviceReset      
    0.1          140,996          2      70,498.0        69,687        71,309       1,146.9  cudaDeviceSynchronize
    0.0              662          1         662.0           662           662           0.0  cuCtxSynchronize 

CUDA Kernel Statistics:

Time(%)  Total Time (ns)  Instances  Average (ns)  Minimum (ns)  Maximum (ns)  StdDev (ns)            Name           
-------  ---------------  ---------  ------------  ------------  ------------  -----------  -------------------------
   50.0           68,255          1      68,255.0        68,255        68,255          0.0  code_without_divergence()
   50.0           68,192          1      68,192.0        68,192        68,192          0.0  divergence_code()        

  1. Nsight System Visual Profiler
    • File tab > Open > .qdrep file. image


  1. Gameworks Download Center
  2. cuda-nsight-systems-11-4_11.4.3-1_amd64.deb
  3. Nsight Systems does not collect CUDA events
  4. Nsight Systems Issue: Unable to configure the collection of CPU IP samples
  5. Unable to change kernel.perf_event_paranoid
  6. Transitioning to Nsight Systems from NVIDIA Visual Profiler / nvprof
  7. CUDA – Basic Profiling With Nsight Systems
Clone this wiki locally