Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can`t train the model with GPU on a server with RTX3090 #57

Open
Quasimoodo opened this issue Aug 8, 2021 · 1 comment
Open

Can`t train the model with GPU on a server with RTX3090 #57

Quasimoodo opened this issue Aug 8, 2021 · 1 comment

Comments

@Quasimoodo
Copy link

I first ran the code with its default config on my server, but later i noticed that the training process was actually on my CPU , and nvidia-smi returned error.
After that, I found it on Dockerhub that I can use the GPU in container with --gpus all when run the docker, that is to say, replace
docker run --rm -m4g -v /path/to/data:/mnt/data -it ratsql
with
docker run --rm --gpus all -m4g -v /path/to/data:/mnt/data -it ratsql
I then found that nvidia-smi works in the container, but when I trained the modle, it turn out to be error like
"the GPU program failed to execute at /pytorch/aten/src/THC/THCBlas.cu:331"
I searched that on the internet, it is said that cuda 11+ is necessary for GPU RTX30XX. Then I modified the dockerfile to
pytorch/pytorch:1.5-cuda10.1-cudnn7-devel
and rebuild the image, but the same error occured again.
I wonder whether I can train the model with GPU in docker .Kindly please help me to resolve this issue. Any help will be really appreciated.

@m1nhtu99-hoan9
Copy link

The way I apprehend your issue is that you should check if PyTorch recognised your CUDA device. Try this in the Terminal/or any console: python3 -c "import torch; assert(torch.cuda.is_available())". What is the output?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants