This repository is meant to port most of the original KFAC repository to PyTorch.
KFAC is an approximation of the natural gradients, i.e., it is an approximate second-order method allowing for larger step sizes at the cost of compute time. This should enable faster convergence.
git clone https://github.com/n-gao/pytorch-kfac.git
cd pytorch-kfac
python setup.py install
An example notebook is provided in examples
.
One has to enable the tracking of the forward pass and backward pass. This prevents the optimizer to track other passes, e.g., for validation.
kfac = torch_kfac.KFAC(model, learning_rate=1e-3, damping=1e-3)
...
model.zero_grad()
with kfac.track_forward():
# forward pass
loss = ...
with kfac.track_backward():
# backward pass
loss.backward()
kfac.step()
# Do some validation
This implementation features the following features:
- Regular and Adam momentum
- Adaptive damping
- Weight decay
- Norm constraint
The following layers are currently supported:
nn.Linear
nn.Conv1d
,nn.Conv2d
,nn.Conv3d
Further, the code is designed to be easily extendible by implementing the torch_kfac.layers.Layer
interface with a custom layer.
Unsupported layers fall back to classical gradient descent.
-
Due to the different structure of the KFAC optimizer, implementing the
torch.optim.optimizer.Optimizer
interface could only work by hacks, but you are welcome to add this. This implies thattorch.optim.lr_scheduler
are not available for KFAC. However, the learning rate can still be manually be edited by changing the.learning_rate
property. -
Currently it the optimizer assumes that the output is reduced by
mean
, other reductions are not supported yet. So if you use e.g.torch.nn.CrossEntropyLoss
make sure to havereduction='mean'
. -
Add more examples
-
Add support for shared layers (,e.g., RNNs)
-
Documentation
-
Add support for distributed training (This is partially possible by synchronizing the covariance matrices and gradients manually.)
-
optimizer.zero_gard
is not available yet, you can usemodel.zero_grad
This work is based on: