Main source code will be available at zero (might require some time, 31/05/2019).
The used NMT structure is in deepnmt.py
.
Main experimental results are summarized below.
Model | #Params | BLEU | Train | Decode |
---|---|---|---|---|
GNMT | - | 24.61 | - | - |
GRU | 206M | 26.28 | 2.67 | 45.35 |
ATR | 122M | 25.70 | 1.33 | 34.40 |
SRU | 170M | 25.91 | 1.34 | 42.84 |
LRN | 143M | 26.26 | 0.99 | 36.50 |
oLRN | 164M | 26.73 | 1.15 | 40.19 |
Train: time in seconds per training batch measured from 0.2k training steps. Decode: time in milliseconds used to decode one sentence measured on newstest2014 dataset. BLEU: case-insensitive tokenized BLEU score on WMT14 English-German translation task.
Unlike LRN, oLRN employs an additional output gate, inspired by LSTM, to handle output information flow. This additional gate also help avoid hidden state explosion when linear activation is applied.
Training and evaluation, please refer to project zero.