Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The problem when to run experiments using the Docker container #91

Open
Aaricis opened this issue Dec 18, 2020 · 8 comments
Open

The problem when to run experiments using the Docker container #91

Aaricis opened this issue Dec 18, 2020 · 8 comments

Comments

@Aaricis
Copy link

Aaricis commented Dec 18, 2020

I met the problem when run bash run.sh $GPU python3 src/main.py --config=qmix --env-config=sc2 with env_args.map_name=2s3z.

Launching container named 'zkg_pymarl_GPU_python3_XIdE' on GPU 'python3' docker: Error response from daemon: OCI runtime create failed: container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: Running hook #1:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: device error: python3: unknown device: unknown. ERRO[0001] error waiting for container: context canceled

I don't know how to solve this error.

@mw9385
Copy link

mw9385 commented Jan 14, 2021

same problem here

@reubenwong97
Copy link

Hello, you can refer to this issue: #89. I posted my solution in the comments.

@mw9385
Copy link

mw9385 commented Jan 21, 2021

@reubenwong97 many thanks :)

@4ever-Rain
Copy link

4ever-Rain commented Feb 23, 2021

This issue caused by '$GPU', use some numbers (like: '0', '1') can make this shell work.
However, the CUDA can not work with the docker (nvidia-docker has been installed)
Any ideas of the '$GPU'?
Thanks!

btw, CUDA can work when I run python3 src/main.py --config=qmix --env-config=sc2 with env_args.map_name=2s3z (do not use docker)

@reubenwong97
Copy link

@4ever-Rain you may wanna run nvidia-smi to first, check the ids of your GPUs, which you can use in place of $GPU. I had experienced problems when the GPU was low on memory due to other tasks running. You can check if it has available memory with nvidia-smi.

@4ever-Rain
Copy link

@reubenwong97 Thanks for your advice. I'm sure my GPU is available and free. I have used GPU ids ('0') instead of '$GPU'. But CUDA still not work within docker.
Meanwhile,I'm sure torch.cuda_is_available is True in the docker.
Maybe there is something wrong about 'run.sh'?

@FanScy
Copy link

FanScy commented Mar 15, 2021

@4ever-Rain I encounter the same problem. When I use the cuda() in the container, it will get stuck and can not work. Do you have solved the problem?

@4ever-Rain
Copy link

@4ever-Rain I encounter the same problem. When I use the cuda() in the container, it will get stuck and can not work. Do you have solved the problem?

Yep. It works for me now.
I finally run the code out of the container by installing all the necessary packages into a conda virtual environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants