You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Abstract optimizer so can be used with whatever module and method a user
wants, should be backwards compatible as default is `torch.optim.AdamW`,
adds
`{actor_rollout_ref.actor,critic}.optim.{optimizer,optimizer_impl,override_optimizer_config}`
```yaml
# Default
optimizer_impl: torch.optim
optimizer: AdamW
```
```yaml
# Example
optimizer_impl: torchao.optim
optimizer: _AdamW
override_optimizer_config:
bf16_stochastic_round: true
```
**Important**: fsdp_sft_trainer optim aligned with FSDP optim
`optim.warmup_steps_ratio`->`optim.lr_warmup_steps_ratio`
- ``cosine``: Cosine learning rate scheduler with warmup (default).
659
664
- ``wsd``: Warmup-Stable-Decay scheduler that provides a stable learning rate phase between warmup and decay phases.
660
665
666
+
- ``override_optimizer_config``: Dictionary of additional optimizer-specific keyword arguments. For example, to use ``torchao.optim``'s ``_AdamW`` with BF16 stochastic rounding: ``{"bf16_stochastic_round": true}``
0 commit comments