lrn/lm at master · bzhangGo/lrn

History

Name		Name	Last commit message	Last commit date
parent directory ..
code		code
README.md		README.md

README.md

Language Modeling

We do experiments on PTB and WT2 dataset, and use the mixture of softmax model MoS. Main experimental results are summarized below.

Model		#Params	PTB			WT2
Model		#Params	Base	+Finetune	+Dynamic	Base	+Finetune	+Dynamic
Yang et al. (2018)		22M	55.97	54.44	47.69	63.33	61.45	40.68
This Work	LSTM	22M	63.78	62.12	53.11	69.78	68.68	44.60
	GRU	17M	69.09	67.61	60.21	73.37	73.05	49.77
	ATR	9M	66.24	65.86	58.29	75.36	73.35	48.65
	SRU	13M	69.64	65.29	60.97	85.15	84.97	57.97
	LRN	11M	61.26	61.00	54.45	69.91	68.86	46.97

Test perplexity.

Requirement

PyTorch >= 0.4.1

How to Run?

download and preprocess dataset
- see MoS about the preprocessing of datasets

training and evaluation

training

#! /bin/bash

export CUDA_VISIBLE_DEVICES=0

# for PTB
python3 main.py --data path-of/penn --dropouti 0.4 --dropoutl 0.29 --dropouth 0.225 --seed 28 --batch_size 12 --lr 10.0 --epoch 1000 --nhid 960 --nhidlast 620 --emsize 280 --n_experts 15 --save PTB --single_gpu --model lrn
# for WT2
python3 main.py --epochs 1000 --data path-of/wikitext-2 --save WT2 --dropouth 0.2 --seed 1882 --n_experts 15 --nhid 1150 --nhidlast 650 --emsize 300 --batch_size 15 --lr 15.0 --dropoutl 0.29 --small_batch_size 5 --max_seq_len_delta 20 --dropouti 0.55 --single_gpu --model lrn

finetuning

# for PTB
python3 finetune.py --data path-of/penn --dropouti 0.4 --dropoutl 0.29 --dropouth 0.225 --seed 28 --batch_size 12 --lr 15.0 --epoch 1000 --nhid 960 --emsize 280 --n_experts 15 --save PTB-XXX --single_gpu --model lrn
# for WT2
python3 finetune.py --epochs 1000 --data path-of/wikitext-2 --save WT2-XXX --dropouth 0.2 --seed 1882 --n_experts 15 --nhid 1150 --emsize 300 --batch_size 15 --lr 20.0 --dropoutl 0.29 --small_batch_size 5 --max_seq_len_delta 20 --dropouti 0.55 --single_gpu --model lrn

dynamic evaluation

# for PTB
python3 dynamiceval.py --model PTB-XXX/finetune_model.pt --data path-of/penn --lamb 0.075 --gpu 0
# for WT2
python3 dynamiceval.py --data path-of/wikitext-2 --model WT2-XXX/finetune_model.pt --epsilon 0.002 --gpu 0

general evaluation

# for PTB
python3 evaluate.py --data path-of/penn --dropouti 0.4 --dropoutl 0.29 --dropouth 0.225 --seed 28 --batch_size 12 --lr 10.0 --epoch 1000 --nhid 960 --nhidlast 620 --emsize 280 --n_experts 15 --save PTB-XXX --single_gpu --model lrn
# for WT2
python3 evaluate.py --epochs 1000 --data path-of/wikitext-2 --save WT2-XXX --dropouth 0.2 --seed 1882 --n_experts 15 --nhid 1150 --nhidlast 650 --emsize 300 --batch_size 15 --lr 15.0 --dropoutl 0.29 --small_batch_size 5 --max_seq_len_delta 20 --dropouti 0.55 --single_gpu --model lrn

Credits

Source code structure is adapted from MoS.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lm

lm

README.md

Language Modeling

Requirement

How to Run?

Credits

Files

lm

Directory actions

More options

Directory actions

More options

Latest commit

History

lm

Folders and files

parent directory

README.md

Language Modeling

Requirement

How to Run?

Credits