Tsinghua-Future-Internet-Project

In this project the team attempts to come up with an optimal solution to the problem at hand. The problem is to train a deep neural network with the VGG19 model on the CIFAR-19 image set consisting of 50 000 images for training and 10 000 images for validation. The images fall into 10 different classes that the model should be able to classify. The training will take place on a model that is not pretrained, initialized with random weights. Furthermore the problem can be split into two parts, the first one being successfully training the model, the second one being to utilize the hardware and network available at hand to the maximum efficiency. This paper proposes a parameter server style solution using: PyTorch and the adam optimizer. The team was unable to implement the proposed optimal solution, but still experiments with batch size and sees speedup results in the machine learning processing by utilizing the hardware in the distributed cluster more efficiently.

Requirements

Install PyTorch and Torchvision for CPU ( from source )
OpenMPI

How to run

mpirun --hostfile hosts -np 5 /home/a2019403475/.conda/envs/havtob/bin/python3 Tsinghua-Future-Internet-Project/src/main/p2p_adam.py --epochs 100 --lr 0.001

Name		Name	Last commit message	Last commit date
Latest commit History 140 Commits
.idea		.idea
src/main		src/main
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tsinghua-Future-Internet-Project

Requirements

How to run

About

Releases

Packages

Contributors 2

Languages

Havfar/Tsinghua-Future-Internet-Project

Folders and files

Latest commit

History

Repository files navigation

Tsinghua-Future-Internet-Project

Requirements

How to run

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages