This repository consists of our implementation of different machine learning models on SQuAD dataset along with a state-of-the-art model BERT. A pre-print is written in detail about the comparative analysis on SQuAD.
Link to pre-print: https://arxiv.org/pdf/2005.11313.pdf
Note:ML_final file consists of all the models used by us. PCA and Regression files consists of individual implementation of PCA and regression.
For running the final file(ML_final.ipynb):
- Run the first cell for importing all the libraries.
- Run the cells after the "start here" section to avoid doing preprocessing again.(It takes 3-4 hours for preprocessing as the embedding files are quite large in size).
- (Optional) To run sentiment analysis code: Get the vader lexicon with a different way rather than nltk.download(vader_lexicon) command. Try wget if possible.