We use SQUAD v1 for experiments and adopt the RNET model. Main experimental results are summarized below.
Model | #Params | Base | +Elmo |
---|---|---|---|
rnet | - | 71.1/79.5 | -/- |
LSTM | 2.67M | 70.46/78.98 | 75.17/82.79 |
GRU | 2.31M | 70.41/79.15 | 75.81/83.12 |
ATR | 1.59M | 69.73/78.70 | 75.06/82.76 |
SRU | 2.44M | 69.27/78.41 | 74.56/82.50 |
LRN | 2.14M | 70.11/78.83 | 76.14/83.83 |
Exact match/F1-score.
tensorflow >= 1.8.1
-
download and preprocess dataset
- see R-Net about the preprocessing of datasets
- Basically, you need the following datasets: squad v1.1, GloVe, Elmo and convert raw datasets into the required data format.
-
no hyperparameters are tuned, we keep them all in default.
-
training and evaluation
Please see the
train_lrn.sh
andtest_lrn.sh
scripts inrnet
(Base) andelmo_rnet
(Base+Elmo).For reporting final EM/F1 score, we used the
evaluate-v1.1.py
script.
Source code structure is adapted from R-Net.