Skip to content

Commit

Permalink
Add absolute paths for images and update pointer to Windows DSVM
Browse files Browse the repository at this point in the history
  • Loading branch information
paulshealy1 committed Sep 28, 2017
1 parent b80b567 commit 912d5bd
Showing 1 changed file with 4 additions and 4 deletions.
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

# Goal

In this repo we demonstrate how to build and train two different neural network architectures for *classification of text documents*. We show a simple implementation on an [NC series Data Science Virtual Machine](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu) with a [Tesla K80 GPU](http://www.nvidia.com/object/tesla-k80.html), that uses the [Keras API](https://keras.io) for deep learning. Keras is a front end to three of the most popular deep learning frameworks, CNTK, Tensorflow and Theano.
In this repo we demonstrate how to build and train two different neural network architectures for *classification of text documents*. We show a simple implementation on an [NC series Data Science Virtual Machine](https://aka.ms/dsvm/windows) with a [Tesla K80 GPU](http://www.nvidia.com/object/tesla-k80.html), that uses the [Keras API](https://keras.io) for deep learning. Keras is a front end to three of the most popular deep learning frameworks, CNTK, Tensorflow and Theano.

# Data

Expand All @@ -21,7 +21,7 @@ The first layer in this architecture is an *embedding* layer, which maps each (o

Using a document length of 300 words and an embedding dimensionality equal to 100, we obtain a model architecture with 761,202 trainable weights, of which the large majority resides in the embedding layer.

![model](/images/lstm_model.png)
![model](https://raw.githubusercontent.com/anargyri/lstm_han/master/images/lstm_model.png)


# Hierarchical Attention Network
Expand All @@ -33,11 +33,11 @@ We have implemented the Hierarchical Attention Network in Keras and Theano by ad
[Richard Liao's implementation](https://github.com/richliao/textClassifier/blob/master/textClassifierHATT.py).
We use a sentence length of 50 words and a document length of 15 sentences. We set the embedding, context and GRU dimensionalities according to the Hierarchical Attention Network paper. We also follow other choices from this paper, that is, initialize the embedding with word2vec; optimize with SGD and momentum; and reorder the documents in the training batches by number of sentences. We also opt to use *l2* regularization in all layers. In this way we obtain an architecture with 942,102 trainable weights.

![model](/images/hatt_model.png)
![model](https://raw.githubusercontent.com/anargyri/lstm_han/master/images/hatt_model.png)

The second layer expands to the following model, which is distributed to all the sentences:

![sent_model](/images/hatt_model_sent.png)
![sent_model](https://raw.githubusercontent.com/anargyri/lstm_han/master/images/hatt_model_sent.png)


# Performance
Expand Down

0 comments on commit 912d5bd

Please sign in to comment.