Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for grid_sample_gradfix and conv2d_gradfix on pytorch 1.11 #117

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

vyabor
Copy link

@vyabor vyabor commented Apr 10, 2024

I was receiving the below error when training which seems to be a result of a backwards-incompatible change in PyTorch 1.11.0, as pointed out in PyTorch issue #75018 regarding StyleGAN3.

Traceback (most recent call last):
  File "/home/ubuntu/stylegan-xl/train.py", line 336, in <module>
    main()  # pylint: disable=no-value-for-parameter
  File "/home/ubuntu/miniconda3/envs/sgxl/lib/python3.9/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/sgxl/lib/python3.9/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/home/ubuntu/miniconda3/envs/sgxl/lib/python3.9/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ubuntu/miniconda3/envs/sgxl/lib/python3.9/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/home/ubuntu/stylegan-xl/train.py", line 321, in main
    launch_training(c=c, desc=desc, outdir=opts.outdir, dry_run=opts.dry_run)
  File "/home/ubuntu/stylegan-xl/train.py", line 104, in launch_training
    subprocess_fn(rank=0, c=c, temp_dir=temp_dir)
  File "/home/ubuntu/stylegan-xl/train.py", line 49, in subprocess_fn
    training_loop.training_loop(rank=rank, **c)
  File "/home/ubuntu/stylegan-xl/training/training_loop.py", line 339, in training_loop
    loss.accumulate_gradients(phase=phase.name, real_img=real_img, real_c=real_c, gen_z=gen_z, gen_c=gen_c, gain=phase.interval, cur_nimg=cur_nimg)
  File "/home/ubuntu/stylegan-xl/training/loss.py", line 121, in accumulate_gradients
    loss_Gmain.backward()
  File "/home/ubuntu/miniconda3/envs/sgxl/lib/python3.9/site-packages/torch/_tensor.py", line 522, in backward
    torch.autograd.backward(
  File "/home/ubuntu/miniconda3/envs/sgxl/lib/python3.9/site-packages/torch/autograd/__init__.py", line 266, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/home/ubuntu/miniconda3/envs/sgxl/lib/python3.9/site-packages/torch/autograd/function.py", line 289, in apply
    return user_fn(self, *args)
  File "/home/ubuntu/stylegan-xl/torch_utils/ops/conv2d_gradfix.py", line 144, in backward
    grad_weight = Conv2dGradWeight.apply(grad_output, input)
  File "/home/ubuntu/miniconda3/envs/sgxl/lib/python3.9/site-packages/torch/autograd/function.py", line 553, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/home/ubuntu/stylegan-xl/torch_utils/ops/conv2d_gradfix.py", line 173, in forward
    return torch._C._jit_get_operation(name)(weight_shape, grad_output, input, padding, stride, dilation, groups, *flags)
TypeError: 'tuple' object is not callable

@jannehellsten pushed a change to StyleGAN3 to fix this issue according to their comment on the aforementioned PyTorch issue.

After applying these same changes locally to conv2d_gradfix.py and grid_sample_gradfix.py in stylegan-xl, I can confirm that the model is training smoothly on my custom dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant