-
Notifications
You must be signed in to change notification settings - Fork 0
nvprof_nsys
Yui Chun Leung (Leo) edited this page Apr 29, 2023
·
6 revisions
- Laptop: Dell Inspiron 7590 with GTX 1650
-
nvidia-smi
with Driver 470.182.03 and Cuda Version 11.4 -
/usr/local
hascuda-10.1
,cuda-10.2
andcuda-11.2
. -
nvprof --metrics branch_efficiency binary.out
returns======== Warning: This version of nvprof doesn't support the underlying device, GPU profiling skipped
- it does show profiling result and metric result.
- Anything greater than gtx 1050 should not use
nvprof
?
- Install Nsight systems from [1].
- Select the version that matches your cuda driver.
- As my cuda driver version is 11.4 and according to [2], i guess i need to install the one around 2021.3.2, thus Nsight Systems in 2021.3.1.54.
- Execute
.run
file.chmod +x NsightSystems-linux-public-2021.3.1.54-ee9c30a.run ./NsightSystems-linux-public-2021.3.1.54-ee9c30a.run
- Select the version that matches your cuda driver.
- Edit log permission.
- Due to [3] and [4], the nsight system is not able to track CPU and GPU metrics.
- Instead of
sudo sh -c 'echo 2 >/proc/sys/kernel/perf_event_paranoid'
(no permission even with sudo), you should run [5]sudo sh -c 'echo kernel.perf_event_paranoid=1 > /etc/sysctl.d/local.conf' sudo reboot
- Now run nsight system binary,
__PREFETCH=off /nsight-systems-2021.3.1/bin/nsys profile -o noprefetch --stats=true binary.out
- it should show,
CUDA API Statistics:
Time(%) Total Time (ns) Num Calls Average (ns) Minimum (ns) Maximum (ns) StdDev (ns) Name
------- --------------- --------- ------------ ------------ ------------ ------------ ---------------------
73.3 111,791,880 2 55,895,940.0 5,083 111,786,797 79,041,608.0 cudaLaunch
26.6 40,624,811 1 40,624,811.0 40,624,811 40,624,811 0.0 cudaDeviceReset
0.1 140,996 2 70,498.0 69,687 71,309 1,146.9 cudaDeviceSynchronize
0.0 662 1 662.0 662 662 0.0 cuCtxSynchronize
CUDA Kernel Statistics:
Time(%) Total Time (ns) Instances Average (ns) Minimum (ns) Maximum (ns) StdDev (ns) Name
------- --------------- --------- ------------ ------------ ------------ ----------- -------------------------
50.0 68,255 1 68,255.0 68,255 68,255 0.0 code_without_divergence()
50.0 68,192 1 68,192.0 68,192 68,192 0.0 divergence_code()
- Nsight System Visual Profiler
/nsight-systems-2021.3.1/bin/nsys-ui
- File tab > Open >
.qdrep
file.
- File tab > Open >
- Gameworks Download Center
- cuda-nsight-systems-11-4_11.4.3-1_amd64.deb
- Nsight Systems does not collect CUDA events
- Nsight Systems Issue: Unable to configure the collection of CPU IP samples
- Unable to change kernel.perf_event_paranoid
- Transitioning to Nsight Systems from NVIDIA Visual Profiler / nvprof
- CUDA – Basic Profiling With Nsight Systems