Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU support under Ubuntu #223

Open
kushnirm opened this issue May 14, 2018 · 8 comments
Open

GPU support under Ubuntu #223

kushnirm opened this issue May 14, 2018 · 8 comments

Comments

@kushnirm
Copy link

kushnirm commented May 14, 2018

OS is Ubuntu 16.04. Nvidia drivers installed and working fine. Nvidia drivers and CUDA work fine in nvidia-docker. Using driver 384.111 and CUDA 9.0 for testing. Slurm+shifter working fine.

But, under shifter, I can't get GPU integration to work quite right. When running an image with nvidia-docker, drivers and utilities like nvidia-smi are available and work. When running the same container via shifter they are not.

If I make a copy of /usr/lib/nvidia-384 to my siteFs, and set the PATH and LD_LIBRARY_PATH, nvidia-smi retuns the expected output. However, CUDA demo apps (i.e. deviceQuery, etc...) says:

./deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 35
-> CUDA driver version is insufficient for CUDA runtime version
Result = FAIL

Thanks,
Michael

@kushnirm
Copy link
Author

On further review, looks like some of the GPU related bind mounts are not being automatically created. I found the contrib/gpu_activate_gpu_support.sh script, But, I am not sure when, where, or how it is being invoked.

Please advise.

Thanks,
Michael

@kushnirm kushnirm changed the title GPU support under Linux GPU support under Ubuntu May 23, 2018
@scanon
Copy link
Member

scanon commented Jun 28, 2018

Michael,

We don't have a GPU system to test with at NERSC. Let me ping some of the CSCS folks and see if they can comment.

@uvNikita
Copy link
Contributor

I have the same question, how does contrib/gpu/activate_gpu_support.sh is intended to be used?

@uvNikita
Copy link
Contributor

Found commit that removed the code which was using this script: c5e66cc, but I don't see any replacement for this functionality.

@kushnirm
Copy link
Author

kushnirm commented Aug 20, 2018 via email

@scanon
Copy link
Member

scanon commented Aug 20, 2018

NERSC will hopefully be able to help more directly on this in the near future.

@sk2991
Copy link

sk2991 commented Aug 23, 2018

We are also facing the same issue. With /usr/lib/nvidia-384 loaded into the container, nvidia-smi is showing the GPUs present on the node. But when we try to execute deviceQuery and nbody benchwork it is throwing same error as
CUDA driver version is insufficient for CUDA runtime version Result = FAIL

Is there anyway another way to test GPUs with shifter and slurm integration?

@uvNikita
Copy link
Contributor

After digging through sources and git history, it seems that the plan is to replace an old GPU support with the new module system, see doc/modules.rst and doc/config/udiRoot.conf.rst.

So, we added these lines to our config:

module_nvidia_siteEnvAppend=LD_LIBRARY_PATH=/opt/udiImage/modules/nvidia PATH=/nvidia-bin PATH=/cuda/bin
module_nvidia_siteFs=/usr/bin:/nvidia-bin;/usr/local/cuda:/cuda
module_nvidia_copyPath=/usr/lib64/nvidia

After this, users can start jobs which require nvidia libraries by specifying shifter --module nvidia.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants