Solution to named-entity-recognition problem, using Glove word embeddings with BiLSTM model implemented in pytorch.
The main file is solution.ipynb file in the root folder.
The html version of this file can be found in the "html" folder
All of the supporting code is extracted to logically separated .py files inside "scripts" folder
- Implement functionality to read and process NER 2003 English Shared Task data in CoNNL file format, data will be provided (10% of score).
Needed functionality can be found in scripts/util.py file - Implement 3 strategies for loading the embeddings
Needed functionality is located in scripts/embedding_fabric.py - Implement training on batches
The function for batching is in scripts/utils.py file. The logic for training in batches is implemented in scripts/training_model.py - Implement the calculation of token-level Precision / Recall / F1 / F0.5 scores for all classes in average.
Implementation is in scripts/metrics.py - Provide the report the performances (F1 and F0.5 scores) on the dev / test subsets w.r.t epoch number during the training for the first 5 epochs for each strategy of loading the embeddings
The expirement execution and results can be recreated by running solution.ipynb files. I have not followed the instructions strictly: for each model and epoch I validated the results on dev set, but the performance for test subset is done only after the training
Sorry about that, I noticed this line too late.