Skip to content

Latest commit

 

History

History

lm

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

Language Modeling

We do experiments on PTB and WT2 dataset, and use the mixture of softmax model MoS. Main experimental results are summarized below.

Model #Params PTB WT2
Base +Finetune +Dynamic Base +Finetune +Dynamic
Yang et al. (2018) 22M 55.97 54.44 47.69 63.33 61.45 40.68
This
Work
LSTM 22M 63.78 62.12 53.11 69.78 68.68 44.60
GRU 17M 69.09 67.61 60.21 73.37 73.05 49.77
ATR 9M 66.24 65.86 58.29 75.36 73.35 48.65
SRU 13M 69.64 65.29 60.97 85.15 84.97 57.97
LRN 11M 61.26 61.00 54.45 69.91 68.86 46.97

Test perplexity.

Requirement

PyTorch >= 0.4.1

How to Run?

  • download and preprocess dataset

    • see MoS about the preprocessing of datasets
  • training and evaluation

    • training
    #! /bin/bash
    
    export CUDA_VISIBLE_DEVICES=0
    
    # for PTB
    python3 main.py --data path-of/penn --dropouti 0.4 --dropoutl 0.29 --dropouth 0.225 --seed 28 --batch_size 12 --lr 10.0 --epoch 1000 --nhid 960 --nhidlast 620 --emsize 280 --n_experts 15 --save PTB --single_gpu --model lrn
    # for WT2
    python3 main.py --epochs 1000 --data path-of/wikitext-2 --save WT2 --dropouth 0.2 --seed 1882 --n_experts 15 --nhid 1150 --nhidlast 650 --emsize 300 --batch_size 15 --lr 15.0 --dropoutl 0.29 --small_batch_size 5 --max_seq_len_delta 20 --dropouti 0.55 --single_gpu --model lrn  
    
    • finetuning
    # for PTB
    python3 finetune.py --data path-of/penn --dropouti 0.4 --dropoutl 0.29 --dropouth 0.225 --seed 28 --batch_size 12 --lr 15.0 --epoch 1000 --nhid 960 --emsize 280 --n_experts 15 --save PTB-XXX --single_gpu --model lrn
    # for WT2
    python3 finetune.py --epochs 1000 --data path-of/wikitext-2 --save WT2-XXX --dropouth 0.2 --seed 1882 --n_experts 15 --nhid 1150 --emsize 300 --batch_size 15 --lr 20.0 --dropoutl 0.29 --small_batch_size 5 --max_seq_len_delta 20 --dropouti 0.55 --single_gpu --model lrn
    
    • dynamic evaluation
    # for PTB
    python3 dynamiceval.py --model PTB-XXX/finetune_model.pt --data path-of/penn --lamb 0.075 --gpu 0
    # for WT2
    python3 dynamiceval.py --data path-of/wikitext-2 --model WT2-XXX/finetune_model.pt --epsilon 0.002 --gpu 0
    
    • general evaluation
    # for PTB
    python3 evaluate.py --data path-of/penn --dropouti 0.4 --dropoutl 0.29 --dropouth 0.225 --seed 28 --batch_size 12 --lr 10.0 --epoch 1000 --nhid 960 --nhidlast 620 --emsize 280 --n_experts 15 --save PTB-XXX --single_gpu --model lrn
    # for WT2
    python3 evaluate.py --epochs 1000 --data path-of/wikitext-2 --save WT2-XXX --dropouth 0.2 --seed 1882 --n_experts 15 --nhid 1150 --nhidlast 650 --emsize 300 --batch_size 15 --lr 15.0 --dropoutl 0.29 --small_batch_size 5 --max_seq_len_delta 20 --dropouti 0.55 --single_gpu --model lrn
    

Credits

Source code structure is adapted from MoS.