A General Purpose Tagger for POS Tagging, NER Tagging, and Chunking.
-
Prepare WSJ Data for Part-OF-Speech Tagging a. convert to conll format https://gist.github.com/khlmnn/3cc07407a002bb1773cd b. map XPOSTAG to UPOSTAG before training using convert.py
-
Install Syntaxnet https://github.com/tensorflow/models/tree/a9133ae914b44602c5f26afbbd7dd794ff9c6637/syntaxnet
-
Train and test the model using taggerTrain.sh, taggerTest.sh and tagger.pbtxt
cd PATH_TO_TAGGER/src/feedforward_model
Training: python tagger_trainer.py
Evaluating: python tagger_eval.py
Reference: https://github.com/guillaumegenthial/sequence_tagging
python build_data.py python main_ff.py/main.py
Model | POS | NER | Chunk |
---|---|---|---|
Feedforword (word) | 95.89 | - | - |
Feedforword (history and spelling features) | 97.31 | - | - |
Bi-LSTM (word) | 95.88 | 78.66 | - |
Bi-LSTM (Character Embedding) | 97.08 | 78.87 | - |
Bi-LSTM-CRF (Character Embedding) | 97.34 | - | - |
Model | POS | NER | Chunk |
---|---|---|---|
Feedforward Model(word feature only) | 11000/s | - | - |
Feedforword Model (history spelling feature) | ~8000/s | - | - |
Bi-LSTM-CRF (word feature only) | ~2000/s | - | - |
Bi-LSTM-CRF (Character Embedding) | ~1500/s | - | - |