Gradients seem not synchronized with manual optimization and DDPStrategy with `static_graph=True`

### Bug description

Gradients seem not synchronized with manual optimization and DDPStrategy with `static_graph=True`.

### What version are you seeing the problem on?

v2.0.5

### How to reproduce the bug

Create `main.py`, and run `python main.py fit` with two GPUs:

```python
# main.py
from lightning.pytorch.cli import LightningCLI
from lightning.pytorch.demos.boring_classes import BoringModel
from lightning.pytorch.strategies import DDPStrategy

class MyModel(BoringModel):
    def __init__(self):
        super().__init__()
        self.automatic_optimization = False

    def training_step(self, batch, batch_idx: int):
        optimizer = self.optimizers()
        optimizer.zero_grad()
        out = super().training_step(batch, batch_idx)
        loss = out['loss']
        self.manual_backward(loss)
        optimizer.step()

    def on_train_batch_end(self, *args, **kwargs):
        print(self.layer.bias.grad, f'[rank {self.global_rank}] grad')
        print(self.layer.bias, f'[rank {self.global_rank}]')

def main():
    LightningCLI(
        MyModel,
        save_config_kwargs={'overwrite': True},
        trainer_defaults={
            'strategy': DDPStrategy(static_graph=True),
            'max_steps': 1,
            'enable_progress_bar': False,
        },
        seed_everything_default=42,
    )

if __name__ == '__main__':
    main()
```


### Error messages and logs

The outputs of the script above are as follows, the gradients are not synchronized.:
```
tensor([-1.8170, -1.3621], device='cuda:0') [rank 0] grad
tensor([-1.1449, -2.1265], device='cuda:1') [rank 1] grad
Parameter containing:
tensor([0.2359, 0.0994], device='cuda:0', requires_grad=True) [rank 0]
Parameter containing:
tensor([0.1687, 0.1758], device='cuda:1', requires_grad=True) [rank 1]
```
When setting `static_graph=False` or using automatic optimization, the outputs are as follows, the gradients are synchronized:
```
tensor([-1.4809, -1.7443], device='cuda:0') [rank 0] grad
tensor([-1.4809, -1.7443], device='cuda:1') [rank 1] grad
Parameter containing:
tensor([0.2023, 0.1376], device='cuda:0', requires_grad=True) [rank 0]
Parameter containing:
tensor([0.2023, 0.1376], device='cuda:1', requires_grad=True) [rank 1]
```

### Environment

<details>
  <summary>Current environment</summary>

```
#- PyTorch Lightning Version (e.g., 1.5.0): 2.0.5
#- PyTorch Version (e.g., 2.0): 2.0.1
#- Python version (e.g., 3.9): 3.11.4
#- OS (e.g., Linux): Linux
#- How you installed Lightning(`conda`, `pip`, source): pip
```

</details>


### More info

_No response_

cc @borda @justusschock @lantiga

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Gradients seem not synchronized with manual optimization and DDPStrategy with `static_graph=True` #18086

Bug description

What version are you seeing the problem on?

How to reproduce the bug

Error messages and logs

Environment

More info

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Gradients seem not synchronized with manual optimization and DDPStrategy with static_graph=True #18086

Description

Bug description

What version are you seeing the problem on?

How to reproduce the bug

Error messages and logs

Environment

More info

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Gradients seem not synchronized with manual optimization and DDPStrategy with `static_graph=True` #18086