This is the code used to optimize the weights of an under-parameterized neural network. Training is done via gradient flow using MLPGradientFlow.jl (https://arxiv.org/abs/2301.10638). In this repo, we release the code to train and visualize the result of training for the erf activation function and standard Gaussian input data (https://arxiv.org/abs/2311.01644).
- This file README.md
- Simulation file
erf50/erf_sims.jl
- Script to see the loss curves and gradient norms
plot-training.py
- Script to see the summary of training
plot-training-summary.py
for all widths - Script to visualize the weights at convergence
plot-results.py
- Helper functions
helper.py
To visualize the results using Python as done in this repo, need to install
- juliacall
- numpy
- matplotlib
We find that gradient flow converges to either one of two minima depending on the direction of initialization when the student width is about one-half of teacher width.
We plot the results for
Jan 15, 2024