Machine Translation

Main source code will be available at zero (might require some time, 31/05/2019). The used NMT structure is in deepnmt.py.

Main experimental results are summarized below.

Model	#Params	BLEU	Train	Decode
GNMT	-	24.61	-	-
GRU	206M	26.28	2.67	45.35
ATR	122M	25.70	1.33	34.40
SRU	170M	25.91	1.34	42.84
LRN	143M	26.26	0.99	36.50
oLRN	164M	26.73	1.15	40.19

Train: time in seconds per training batch measured from 0.2k training steps. Decode: time in milliseconds used to decode one sentence measured on newstest2014 dataset. BLEU: case-insensitive tokenized BLEU score on WMT14 English-German translation task.

oLRN structure

$\begin{align*} \mathbf{q}_t, \mathbf{k}_t, \mathbf{v}_t, \mathbf{x}_o = \mathbf{x}_t\mathbf{W}_q, \mathbf{x}_t\mathbf{W}_k, \mathbf{x}_t\mathbf{W}_v, \mathbf{x}_t \mathbf{W}_o \\ \mathbf{i}_t = \sigma(\mathbf{k}_t + \mathbf{h}_{t-1}) \\ \mathbf{f}_t = \sigma(\mathbf{q}_t - \mathbf{h}_{t-1}) \\ \mathbf{c}_t = g(\mathbf{i}_t \odot \mathbf{v}_t + \mathbf{f}_t \odot \mathbf{h}_{t-1}) \\ \mathbf{o}_t = \sigma(\mathbf{x}_o - \mathbf{c}_t) \\ \mathbf{h}_t = \mathbf{o}_t \odot \mathbf{c}_t \end{align*}$

Unlike LRN, oLRN employs an additional output gate, inspired by LSTM, to handle output information flow. This additional gate also help avoid hidden state explosion when linear activation is applied.

How to Run?

Training and evaluation, please refer to project zero.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Machine Translation

oLRN structure

How to Run?

Files

README.md

Latest commit

History

README.md

File metadata and controls

Machine Translation

oLRN structure

How to Run?