We do experiments on PTB and WT2 dataset, and use the mixture of softmax model MoS. Main experimental results are summarized below.
Model | #Params | PTB | WT2 | |||||
---|---|---|---|---|---|---|---|---|
Base | +Finetune | +Dynamic | Base | +Finetune | +Dynamic | |||
Yang et al. (2018) | 22M | 55.97 | 54.44 | 47.69 | 63.33 | 61.45 | 40.68 | |
This Work |
LSTM | 22M | 63.78 | 62.12 | 53.11 | 69.78 | 68.68 | 44.60 |
GRU | 17M | 69.09 | 67.61 | 60.21 | 73.37 | 73.05 | 49.77 | |
ATR | 9M | 66.24 | 65.86 | 58.29 | 75.36 | 73.35 | 48.65 | |
SRU | 13M | 69.64 | 65.29 | 60.97 | 85.15 | 84.97 | 57.97 | |
LRN | 11M | 61.26 | 61.00 | 54.45 | 69.91 | 68.86 | 46.97 |
Test perplexity.
PyTorch >= 0.4.1
-
download and preprocess dataset
- see MoS about the preprocessing of datasets
-
training and evaluation
- training
#! /bin/bash export CUDA_VISIBLE_DEVICES=0 # for PTB python3 main.py --data path-of/penn --dropouti 0.4 --dropoutl 0.29 --dropouth 0.225 --seed 28 --batch_size 12 --lr 10.0 --epoch 1000 --nhid 960 --nhidlast 620 --emsize 280 --n_experts 15 --save PTB --single_gpu --model lrn # for WT2 python3 main.py --epochs 1000 --data path-of/wikitext-2 --save WT2 --dropouth 0.2 --seed 1882 --n_experts 15 --nhid 1150 --nhidlast 650 --emsize 300 --batch_size 15 --lr 15.0 --dropoutl 0.29 --small_batch_size 5 --max_seq_len_delta 20 --dropouti 0.55 --single_gpu --model lrn
- finetuning
# for PTB python3 finetune.py --data path-of/penn --dropouti 0.4 --dropoutl 0.29 --dropouth 0.225 --seed 28 --batch_size 12 --lr 15.0 --epoch 1000 --nhid 960 --emsize 280 --n_experts 15 --save PTB-XXX --single_gpu --model lrn # for WT2 python3 finetune.py --epochs 1000 --data path-of/wikitext-2 --save WT2-XXX --dropouth 0.2 --seed 1882 --n_experts 15 --nhid 1150 --emsize 300 --batch_size 15 --lr 20.0 --dropoutl 0.29 --small_batch_size 5 --max_seq_len_delta 20 --dropouti 0.55 --single_gpu --model lrn
- dynamic evaluation
# for PTB python3 dynamiceval.py --model PTB-XXX/finetune_model.pt --data path-of/penn --lamb 0.075 --gpu 0 # for WT2 python3 dynamiceval.py --data path-of/wikitext-2 --model WT2-XXX/finetune_model.pt --epsilon 0.002 --gpu 0
- general evaluation
# for PTB python3 evaluate.py --data path-of/penn --dropouti 0.4 --dropoutl 0.29 --dropouth 0.225 --seed 28 --batch_size 12 --lr 10.0 --epoch 1000 --nhid 960 --nhidlast 620 --emsize 280 --n_experts 15 --save PTB-XXX --single_gpu --model lrn # for WT2 python3 evaluate.py --epochs 1000 --data path-of/wikitext-2 --save WT2-XXX --dropouth 0.2 --seed 1882 --n_experts 15 --nhid 1150 --nhidlast 650 --emsize 300 --batch_size 15 --lr 15.0 --dropoutl 0.29 --small_batch_size 5 --max_seq_len_delta 20 --dropouti 0.55 --single_gpu --model lrn
Source code structure is adapted from MoS.