-
Notifications
You must be signed in to change notification settings - Fork 104
Description
This error occurs only with DDP training. With 1 gpu its working fine.
I am training CelebA 256x256 as specified in the Readme
-- Process 1 terminated with the following error:
Traceback (most recent call last):
File "/MAT/stylegan/lib/python3.11/site-packages/torch/multiprocessing/spawn.py", line 75, in _wrap
fn(i, args)
File "/MAT/train.py", line 472, in subprocess_fn
training_loop.training_loop(rank=rank, **args)
File "/MAT/training/training_loop.py", line 403, in training_loop
misc.check_ddp_consistency(module, ignore_regex=[r'..w_avg', r'..relative_position_index', r'..avg_weight', r'..attn_mask', r'..resample_filter'])
File "/MAT/torch_utils/misc.py", line 197, in check_ddp_consistency
assert (nan_to_num(tensor) == nan_to_num(other)).all(), fullname
AssertionError: Generator.synthesis.dec.Dec_16x16.conv1.noise_const