Skip to content
/ cuda-dnn Public

CUDA implementation of some Deep Neural Networks.

License

Notifications You must be signed in to change notification settings

XueDx/cuda-dnn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

CUDA Deep Neural Networks

This is an implementation of some Deep Neural Networks (DNN). We closely followed the ULFDL Tutorial, but using C++/CUDA instead of Matlab or Octave.

Each neural network architecture is implemented in a separate class, some of them being composed of others. We already have working versions of the following architectures:

  • Sparse autoencoder (AE)
  • Softmax regression (SM)
  • Stacked autoencoders (SAE)
  • Linear Decoder (LD) (in test)

The Math

We give here, for reference, summarized information for each architecture. Actually, we give mainly the equations that we use in our code, so refer to the ULFDL Tutorial for complete explanations. Note that our equations may not look exactly like the ones there, as we will give vectorized versions working with batches of data simultaneously. But first, some general notation:

Symbol Description
equation Data input size. The dimension of the feature vectors.
equation Data train size. How many feature vectors to train.
equation Data matrix of dimensions equation. Each column is a feature vector.
equation Label vector of dimension equation. Element equation contains the label of feature vector equation.
equation Vector of ones and dimension equation. This is not the identity matrix equation.
equation Matrix of ones and dimension equation. This is not the identity matrix equation.
equation Weight decay parameter in the cost function.
equation Learning rate for gradient descent.
equation The sigmoid function. equation whatever equation may be (real or matrix).
equation When applied to a matrix equation, returns a vector with the maximum element of each column of equation.
equation Element-wise multiplication. The Hadamard product binary operator.
equation Element-wise division.

All vectors are considered as column matrices.

You should notice that we try to give vectorized versions of each calculation. Sometimes we just need to sum all the elements of a matrix, but this operation can also be written in matrix form. In fact, given a matrix equation with dimensions equation, we have:

equation.

In the code this may be implemented different, but this notation is useful.

Sparse autoencoder

A sparse autoencoder is a neural network with a visible layer equation, a hidden layer equation and an output layer equation. It's purpose is the output the inputs the most faithful possible. This is not trivial, given that, in general, we have less neurons in the hidden layer than in the input layer.

We define the following:

Symbol Description
equation The dimension of the input vectors (and of the output too). equation size.
equation The dimension of the hidden layer. equation size.
equation Weight matrix of dimensions equation. The weights between equation and equation .
equation Bias vector of dimension equation. The bias of equation into equation .
equation Weight matrix of dimensions equation. The weights between equation and equation .
equation Bias vector of dimension equation. The bias of equation into equation .
equation Sparsity parameter. Controls the level of sparsity.
equation Weight of the sparsity penalty term in the cost function.
  • Initialize the equation using a random uniform distribution and the bias equation to zeros.

To train the network, in each iteration we do:

  • Compute the gradients:

equation

equation

equation

equation

equation

equation

equation

equation

equation

equation

equation

  • Compute the cost:

equation

  • Update the parameters:

equation

equation

equation

equation

Softmax regression

Softmax regression is a generalization of logistic regression. It's used as the final layer in many neural networks. It receives as input a dataset equation with labels equation, each label equation belonging to one of a total of $n$ classes. It's purpose is to, given only the data, predict the class of each of its points.

We define the following:

Symbol Description
equation The dimension of the input vectors.
equation The number of classes.
equation Parameters matrix of dimensions equation.
equation Groundtruth matrix of dimensions equation. Column equation contains a binary vector of dimension equation corresponding to the binary representation of label equation class.
  • Initialize equation using a normal distribution.

To train the network, in each iteration we do:

  • Compute the gradient:

equation

equation

equation

equation

  • Compute the cost:

equation

  • Update the parameters:

equation

To make predictions, we note that the matrix equation holds the conditional probabilities, so we just need to compute equation for the data equation we are predicting and take the class with maximum probability:

equation

Stacked eutoencoders

In this architecture, we stack autoencoders, passing the hidden layer activation of one as the input to the next autoencoder, and so on, until a softmax layer, that outputs the prediction for the data passed as input to the first autoencoder. Each autoencoder is trained using the procedure above, the next one being trained after the previous one finished its training. After that first training is done, we then apply backpropagation to fine-tune the network as a whole.

Here we use the notation from both sparse autoencoders and softmax regression. We just have to be careful about the input from each layer and about which layer we are talking. We will use a superscript to label each matrix/vector with the corresponding sparse autoencoder layer. For example, equation, means the matrix equation from sparse autoencoder layer equation, where equation is the number of autoencoders layers.

To pre-train the network:

  • Train the first autoencoder layer with equation as input data.
  • Train the equation autoencoder layer with equation as input data.
  • Train the softmax layer with equation as input data.

To fine-tune the network, in each iteration we do:

  • Compute the gradients:

equation

equation

equation

equation

equation

equation

equation

equation

equation

equation

equation

  • Compute the cost:

equation

  • Update the parameters:

equation

equation

equation

To make predictions we use compute the matrix equation as above, but using as input the data we are predicting, in a manner similar to pure softmax prediction.

Linear Decoder

A linear decoder is just like a sparse autoencoder, but with a identity function as activation for the output layer, instead of the sigmoid function used by sparse autoencoders. In this way, linear decoders can work with input data outside the range equation imposed by a sparse autoencoder.

The Code

To code the equations of the previous session using CUDA, we used the CUBLAS library extensively. For some more specific tasks, we implemented CUDA kernels for the job, but sure they can be optimized. All the CUDA kernels, CUBLAS wrappers and some constantes are in header file [helper.cuh](./Visual Studio/DNN/include/helper.cuh).

Besides the helper header, we have for now three other headers, each one implementing one of the above architectures. The following class diagram show the classes we have currently implemented and their relationship:

![class](./Visual Studio/DNN/ClassDiagram.png)

We also provide a file [mnist.cu](./Visual Studio/MNIST/mnist.cu), with an example application for digit recognition using the MNIST dataset. The data is read from text files stored in column-major order. The data are compressed in the file [Visual Studio/MNIST/data/data.7z](./Visual Studio/MNIST/data/data.7z) and need to be extracted before running the program.

About this documentation

This markdown file README.md display equations as images rendered by CodeCogs. But the urls of the images are generated from the file README by a Python script which can be found in allanino/markdown-latex. So, when updating this document, we should always change only the README file and generate the README.md using that Python script.

About

CUDA implementation of some Deep Neural Networks.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published