Training with Rational Activations on very deep ResNets. #3

23Uday · 2023-05-28T19:42:13Z

Hi,
I am using your pytorch implementation to train a Rational ResNet 164 on CIFAR 10 and while I can get the model to behave well for a ResNet with 18-38 layers, I cannot get it to train for very deep resnets without dramatically lowering the learning rate.
Here is 1 example with --lr 1e-6 --wd 1e-5
Train Epoch: 0 [0/47500 (0%)] Loss: 2.517
Train Epoch: 0 [1920/47500 (4%)] Loss: nan
While I understand that the model with rational activations is supposed to represent a rational function with degree 3^layers, the training process for deeper models isn't clear.
Could you provide me some help ?

NBoulle · 2023-05-30T13:58:20Z

Thanks for your interest in our work. We haven't tried training very deep rational networks so my intuition is limited here. There is a possibility that the weight initialization has a bad effect on the rational layers as the depth increases. One potential remedy would be to fine-tune a pretrained relu resnet by replacing the activation functions by rationals and just training the rational functions.
I'm curious to see why the loss becomes nan in your example. Perhaps you could plot the different rational functions (there should be approximatively one function per layer) to see if one of them becomes singular (with a simple pole) and which layer is affected.
Finally, and depending and the result of the above suggestion, there could be some numerical instabilities due to having an overall rational network of super large degree (3^164). I guess one could use rational functions for the first few layers (like 18-38 layers in your experiments to benefit from the extra approximation power) and then use ReLU for the rest of the networks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training with Rational Activations on very deep ResNets. #3

Training with Rational Activations on very deep ResNets. #3

23Uday commented May 28, 2023

NBoulle commented May 30, 2023

Training with Rational Activations on very deep ResNets. #3

Training with Rational Activations on very deep ResNets. #3

Comments

23Uday commented May 28, 2023

NBoulle commented May 30, 2023