Skip to content

Conversation

ssheikholeslami
Copy link

The command sudo lspci | grep -i nvidia that is used to initialize the number of available GPUs might return inaccurate information, e.g., in my machine with 4 nvidia GPUs it outputs:

09:00.1 Audio device: NVIDIA Corporation Device 10f8 (rev a1)
09:00.2 USB controller: NVIDIA Corporation Device 1ad8 (rev a1)
09:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device 1ad9 (rev a1)
0a:00.0 VGA compatible controller: NVIDIA Corporation Device 1e84 (rev a1)
0a:00.1 Audio device: NVIDIA Corporation Device 10f8 (rev a1)
0a:00.2 USB controller: NVIDIA Corporation Device 1ad8 (rev a1)
0a:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device 1ad9 (rev a1)
42:00.0 VGA compatible controller: NVIDIA Corporation Device 1e84 (rev a1)
42:00.1 Audio device: NVIDIA Corporation Device 10f8 (rev a1)
42:00.2 USB controller: NVIDIA Corporation Device 1ad8 (rev a1)
42:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device 1ad9 (rev a1)
43:00.0 VGA compatible controller: NVIDIA Corporation Device 1e84 (rev a1)
43:00.1 Audio device: NVIDIA Corporation Device 10f8 (rev a1)
43:00.2 USB controller: NVIDIA Corporation Device 1ad8 (rev a1)
43:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device 1ad9 (rev a1)

Hence, the installer will assume that the machine has 16 GPUs.
A workaround is to modify the existing command to only consider the lines with "VGA".

A better fix is to use nvidia-smi -L | wc -l instead, but the nvidia driver and nvidia-smi might not be installed on the machine when we initialize those variables.

@jimdowling
Copy link
Contributor

Thanks Sina!
However, we assume nvidia-smi isn't installed at that stage.
I could add a check for 'VGA', which i think is normally found.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants