DistributedTrainingExperiments

Run instructions

This section has instructions for recreating the results reported in my final report for the EECS598 Systems for AI course at Umich.

Everything is run through the train.py python script. It has several options to choose the particular model and SGD algorithm.

For example, to train on the SVHN dataset using the MLP model using Staggered EASGD on three nodes, run the following:

# Node 0
python train.py -n 3 -a <HEAD NODE ADDRESS> --trainer EASGD --model deep --dataset SVHN -nr 0

# Node 1
python train.py -n 3 -a <HEAD NODE ADDRESS> --trainer EASGD --model deep --dataset SVHN -nr 1

# Node 2
python train.py -n 3 -a <HEAD NODE ADDRESS> --trainer EASGD --model deep --dataset SVHN -nr 2

The --trainer options are the following:

DDP : Synchronized SGD
EASGD : Staggered EASGD
EASGD_0 : Standard EASGD
COMPRESS : Significance Compression

The --dataset options are the following:

SVHN
WSJ

The --model options are the following:

deep : 7 layer MLP (for use with SVHN)
conv : 2 layer CNN (for use with SVHN)
lstm : 1 layer LSTM (for use with WSJ)
bert : Small transformer (for use with WSJ)

Setting the --log flag will log results to tensorboard.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
runs		runs
.gitignore		.gitignore
ASPTrainer.py		ASPTrainer.py
DDPTrainer.py		DDPTrainer.py
EASGDSlicingTrainer.py		EASGDSlicingTrainer.py
EASGDTrainer.py		EASGDTrainer.py
GradCompressionTrainer.py		GradCompressionTrainer.py
ParameterServer.py		ParameterServer.py
README.md		README.md
SVHN.py		SVHN.py
WSJ.py		WSJ.py
boilerplate.py		boilerplate.py
plots.ipynb		plots.ipynb
results		results
setup.sh		setup.sh
train.py		train.py
wsj1-18.training		wsj1-18.training

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DistributedTrainingExperiments

Run instructions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DistributedTrainingExperiments

Run instructions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages