Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unbalanced GPU memory usage #94

Open
Michaelsqj opened this issue Sep 23, 2022 · 1 comment
Open

Unbalanced GPU memory usage #94

Michaelsqj opened this issue Sep 23, 2022 · 1 comment

Comments

@Michaelsqj
Copy link

Hi! I found that GPU memory consumption is highly unbalanced between GPU0 and the rest of GPUs. Here's the command I used to train on imagenet with resolution 128.

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
python train.py
--outdir=/storage/guangrun/qijia_3d_model/stylegan-xl/finetune128/
--cfg=stylegan3-t
--data=/datasets/guangrun/qijia_3d_model/imagenet/stylegan_xl/imagenet_sub_seg128.zip
--gpus=8
--batch=32
--mirror=1
--snap 10
--batch-gpu 4
--kimg 10000
--cond True
--superres
--up_factor 2
--head_layers 7
--path_stem /scratch/local/ssd/guangrun/qijia_3d_model/stylegan_xl/imagenet64.pkl
--resume /scratch/local/ssd/guangrun/qijia_3d_model/stylegan_xl/imagenet128.pkl

As you can see, the GPU0 only consumes much less memory than rest of the GPUs. May I ask what caused such imbalance and what's the normal memory consumption is when training at 128 resolution with the settings above?

image

@Michaelsqj
Copy link
Author

However, when I set batch-gpu=8, gpus=8, batch=64, the GPU memory consumption reduced. It's so weird, I'm wondering if someone might know any clue about this?
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant