Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker build for stable diffusion fails #7

Open
shrutiramesh1988 opened this issue Jan 12, 2024 · 2 comments
Open

Docker build for stable diffusion fails #7

shrutiramesh1988 opened this issue Jan 12, 2024 · 2 comments

Comments

@shrutiramesh1988
Copy link

Docker build for stable diffusion failed as shown below (tried NVIDIA's and Dell's implementations).
Could you please help.

docker build . -t nvidia_stablediffusion_pytorch_mlperf3.1 [+] Building 1.7s (3/3) FINISHED docker:default
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 719B 0.0s
=> ERROR [internal] load metadata for nvcr.io/ea-bignlp/ea-mm-sd-alpha/bignlp-mm-sd:23.09-py3 1.7s

[internal] load metadata for nvcr.io/ea-bignlp/ea-mm-sd-alpha/bignlp-mm-sd:23.09-py3:


Dockerfile:2

1 | ARG FROM_IMAGE_NAME=nvcr.io/ea-bignlp/ea-mm-sd-alpha/bignlp-mm-sd:23.09-py3
2 | >>> FROM ${FROM_IMAGE_NAME}
3 |
4 | RUN pip install --upgrade webdataset

ERROR: failed to solve: nvcr.io/ea-bignlp/ea-mm-sd-alpha/bignlp-mm-sd:23.09-py3: pulling from host nvcr.io failed with status code [manifests 23.09-py3]: 401 Unauthorized

@ruijietey
Copy link

ruijietey commented Mar 12, 2024

I think your original error is that you should do docker login nvcr.io first, but I guess nvcr.io/ea-bignlp/ea-mm-sd-alpha/bignlp-mm-sd:23.09-py3 is also deprecated, so we should directly use NeMo container instead (https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo)

However, I have issues with dependencies after using NeMo container (nvcr.io/nvidia/nemo:23.08, nvcr.io/nvidia/nemo:23.10) after running docker build with the updated Dockerfile

Hunk #1 succeeded at 614 (offset 4 lines).
patching file strategies/ddp.py
Hunk #1 FAILED at 191.
1 out of 1 hunk FAILED -- saving rejects to file strategies/ddp.py.rej
patching file trainer/connectors/logger_connector/result.py
Hunk #1 FAILED at 502.
Hunk #2 FAILED at 512.
2 out of 2 hunks FAILED -- saving rejects to file trainer/connectors/logger_connector/result.py.rej
The command '/bin/sh -c PL_ROOT=$(python -c "import pytorch_lightning; print(pytorch_lightning.file.replace('/init.py',''))"); patch -p3 -d${PL_ROOT} -i /source/lightning.v1.9.4.patch' returned a non-zero code: 1

@shrutiramesh1988
Copy link
Author

I'm now facing the same issue with nemo container. Could anyone please let me know if this issue has been resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants