Understanding the robustness difference between stochastic gradient descent and adaptive gradient methods

This is the official repository of "Understanding the robustness difference between stochastic gradient descent and adaptive gradient methods" accepted at Transactions on Machine Learning Research (TMLR) (paper, blog).

Comparison between models trained using SGD, Adam, and RMSProp across seven benchmark datasets. Each colored triplet denotes models on the same dataset. Models trained by different algorithms have similar standard generalization performance, but there is a distinct robustness difference as measured by the test data accuracy under Gaussian noise and adversarial perturbations.

Requirements

To run the code, the following packages are needed:

Python 3.9.15
PyTorch 2.0.1
torchvision 0.15.2
numpy 1.22.4

Checkpoints

Access our model checkpoints here, including PreActResNet18 models trained on CIFAR10/100 using SGD, Adam, and RMSProp. For detailed settings, refer to our paper.

Model training and evaluation

To evaluate model robustness under Gaussian noise, $\ell_2$ and $\ell_\infty$ AutoAttacks with various $\epsilon$, run:

python3 main.py --eval_only --dataset cifar10 --pretrain './ckpt/cifar10-adam.pt' --j_dir './exp'

To perform standard training on PreActResNet18 with SGD for 200 epochs:

python3 main.py --method standard --dataset cifar10 --j_dir './exp' --optim sgd --epoch 200 --lr 0.2 --lr_scheduler_type multistep --weight_decay 0

To train with augmented data by removing parts of the signal with low spectrum energy:

python3 main.py --method remove_low_amp --threshold 90 --dataset cifar10 --j_dir './exp' --optim sgd --epoch 200 --lr 0.2 --lr_scheduler_type multistep --weight_decay 0

To train with augmented data by removing parts of the signal with high frequencies:

python3 main.py --method remove_high_freq --threshold 90 --dataset cifar10 --j_dir './exp' --optim sgd --epoch 200 --lr 0.2 --lr_scheduler_type multistep --weight_decay 0

Observing the Presence of Irrelevant Frequencies in Natural Datasets

Spectral Energy Visualization: Explore the spectral energy distribution of natural datasets. notebook
Augmented Image Visualization: See how removing high-frequency or low spectral energy components affects images. notebook
Model Robustness Evaluation: Assess model robustness under band-limited Gaussian noises. notebook

Linear Regression Analysis with an Over-parameterized Model

Training Linear Models: Train linear models on a three-dimensional synthetic dataset using GD, signGD, Adam, and RMSProp. notebook
Error Dynamics and Risks Plotting: Visualize the dynamics of the error term, standard population risk, and adversarial population risk. notebook

Citing this Work

@article{
ma2023understanding,
title={Understanding the robustness difference between stochastic gradient descent and adaptive gradient methods},
author={Avery Ma and Yangchen Pan and Amir-massoud Farahmand},
journal={Transactions on Machine Learning Research},
issn={2835-8856},
year={2023},
url={https://openreview.net/forum?id=ed8SkMdYFT},
note={Featured Certification}
}

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
autoattack		autoattack
dct_matrix		dct_matrix
figures		figures
models		models
notebook		notebook
options		options
src		src
.gitignore		.gitignore
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Understanding the robustness difference between stochastic gradient descent and adaptive gradient methods

Requirements

Checkpoints

Model training and evaluation

Observing the Presence of Irrelevant Frequencies in Natural Datasets

Linear Regression Analysis with an Over-parameterized Model

Citing this Work

License

About

Uh oh!

Releases

Packages

Languages

averyma/opt-robust

Folders and files

Latest commit

History

Repository files navigation

Understanding the robustness difference between stochastic gradient descent and adaptive gradient methods

Requirements

Checkpoints

Model training and evaluation

Observing the Presence of Irrelevant Frequencies in Natural Datasets

Linear Regression Analysis with an Over-parameterized Model

Citing this Work

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages