AttnConvnet at SemEval-2018 Task 1 : Attention-based Convolutional Neural Networks for Multi-label Emotion Classification
Tensorflow implementation of AttnConvnet
Warning!! Dirty version source code
The source code will be beautified and restructured ASAP
-
Model
-
Embedding
- Pre-trained Glove embeddings
- random initialized embeddings
-
Multi-head Dot-product Attention
- using partial code from google's Transformer v1.0.8
-
1-layer Convolutional Neural Network
-
-
Dataset
-
SemEval-2018 Task 1 : Affect in Tweets
-
Lexicon
-
- python 2.7
- numpy
- pandas
- Tensorflow 1.4
- nltk
- Download data to
data/
(In the root directory)
mkdir data
cd data
mkdir processed
wget http://saifmohammad.com/WebDocs/AIT-2018/AIT2018-DATA/E-c/English/2018-E-c-En-train.zip
wget http://saifmohammad.com/WebDocs/AIT-2018/AIT2018-DATA/E-c/English/2018-E-c-En-dev.zip
wget http://saifmohammad.com/WebDocs/AIT-2018/AIT2018-DATA/AIT2018-TEST-DATA/semeval2018englishtestfiles/2018-E-c-En-test.zip
# unzip
unzip 2018-E-c-En-train.zip
unzip 2018-E-c-En-dev.zip
unzip 2018-E-c-En-test.zip
-
Emoji-to-meaning preprocessing(To be updated)
-
Get pre-trained embedding(Optional)
wget http://nlp.stanford.edu/data/glove.840B.300d.zip
unzip glove.840B.300d.zip
- Process data
cd ..
python process_data.py
python process_embedding.py # (optional) get pre-trained embedding, it will take a couple of minutes
- Get & process nrc Lexicon file(Optional)
- Put the file
NRC-Emotion-Lexicon-Wordlevel-v0.92.txt
insidedata/
- Put the file
python process_nrc.py # it will take a couple of minutes
- Train & Test model
# Train
bash run.sh [parameter set] train
# Example : bash run.sh basic_params train
# Text
bash run.sh [parameter set] pred
# Example : bash run.sh basic_params pred
Open params.py
and change embedding = None
to the path of pre-trained embedding file
Example : embedding = 'data/processed/glove_embedding.npy'
Open params.py
and change lexicon_effect = None
to other values