My proposal for Implementing Essential Deep Learning Modules was selected as an official project under Google Summer of Code 2018. I worked with the organisation mlpack, under the mentorship of Marcus Edel (@zoq).
mlpack
is an intuitive, fast and flexible C++
machine learning library with bindings to other languages. It is meant to be a machine learning analog to LAPACK
, and aims to implement a wide array of machine learning methods and functions as a "swiss army knife" for machine learning researchers. In addition to its powerful C++
interface, mlpack
also provides command-line programs and Python
bindings.
Over the years, Deep Learning has become a promising field of work, attracting attention from the most prominent Machine Learning researchers of the world. One of the most prominent ideas in the field of Deep Learning is Generative Adversarial Networks
invented by Ian Goodfellow
. We aimed to implement GAN
, Deep Convolutional GAN (DCGAN)
and Wasserstein GAN (WGAN)
. Some additional goals were also planned, namely the implementation of Restricted Boltzmann Machines (RBM)
, Spike and Slab RBM (ssRBM)
, Stacked GAN (StackGAN)
and Deep Belief Networks (DBN)
.
Considering that we were able to meet all our planned goals, and also initiate the work on our additional goals, I'd say we were able to surpass the expectations of the project.
My first task under the official GSoC
period was to implement the Transposed Convolutional Layer
along with implementing a number of tests. Next, we also implemented the support for Layer Normalization
and Atrous Convolution Layers
. While completing the above tasks, I also uncovered a number of bugs in the implementation of the convolution toolbox, which was fixed alongside these tasks. The biggest milestone of Phase I was undoubtedly the completion of Kris' pending work on the standard GAN
module, which was merged within a week from the Phase I.
A lot of heavy lifting was done during Phase II as well. We decided on the idea of using the existing GAN
module as a base for DCGAN
and went ahead with directly testing a DCGAN
network. While we were getting really good results with the stochastic implementations of the above architectures, it took almost ~72 hours for GAN
and ~90 hours for DCGAN
to converge on the full MNIST
datasets. Thereon, we decided to work on optimizing our code for a couple of weeks, and as a result, we added the support for mini-batches in our GAN
module. This allowed us to bring the training time under ~7 hours! This is almost ~1.5 times the speed of a Tensorflow
network on the same input! Another major objective we achieved was the implementation of Wasserstein GAN
, adding both the weight clipping and gradient-penalty variants. The work for optimizer separation was also taken under Phase II, and we were able to wrap up all our planned goals by the end of the second phase itself! Here are some images that we generated on the full 70,000 image MNIST
dataset:
Standard GAN
DCGAN
WGAN
WGAN-GP
We spent most of our time during Phase III running experiments on different datasets, and implementing the long pending RBM
and Spike and Slab RBM
modules. I also spent some time optimizing the ANN
infrastructure, which led to a ~30% speedup for the FFN
and ~22% speedup for the RNN
networks. I had a few other cool features in mind as well, but probably we'll be working on them after GSoC
is over.
Add EvaluateWithGradient to FFN and RNN
Wasserstein GAN Implementation
Atrous Convolution Implementation
Layer Normalization Implementation
Transposed Convolution Implementation
Improvements and Speedups to Convolution Layer and Associated Rules
Generative Adversarial Network
Since most of our GAN
and Convolutional
toolbox is already quite optimized, and we have really competitive runtimes on single core CPU, probably the next step to take would be to look for alternative methods for faster convolutional training. Some papers can be looked into (suggested by Marcus):
Deep Tensor Convolution on Multicores
MEC: Memory-efficient Convolution for Deep Neural Network
We should also explore the idea of parallelizing the FFN
networks, in order to compare against multi-core benchmarks. At this point, I think that since most of the different GAN
variants are not significantly different from the standard implementation, more time should be spent on hyper-parameter searching tools (like Kris' (@kris-singh) idea of having something similar to hyperopt) and GAN
variants should be given a secondary priority for now.
I am extremely grateful to my mentor - Marcus - for all his support and for being extremely responsive and helpful everytime I was stuck. Without his help, this project would've been a lot harder and a lot less fun to do.
mlpack
was the first Machine Learning library that I contributed to, and the journey has been simply amazing. I have finally come across an Open-Source project that aligns deeply with my personal interests and what I wish to do long-term, a library providing the opportunity of implementing the latest stuff from the research world, a great build architecture and the possibility of working natively in C++
.
I am also thankful to Google
for the opportunity to work on this project, which helped me learn a lot in such a short period of time.