- PKU Course team member's github repository: WyAzx/ml_final_project
Getting started (these is included in
simple_lstm_baseline.py
)# Download the dataset kaggle competitions download -c jigsaw-unintended-bias-in-toxicity-classification # unzip data mkdir data unzip test.csv.zip -d data unzip train.csv.zip -d data # not sure why it don't have read permission chmod +r data/* # clean up rm *.zip
For evaluation, test set examples with target >= 0.5 will be considered to be in the positive class (toxic).
Models do not need to predict the additional attributes for the competition
id,prediction
7000000,0.0
7000001,0.0
etc.
Submetric
- Overall AUC: the ROC-AUC for the full evaluation set
- Bias AUCs:
- Subgroup AUC
- BPSN (Background Positive, Subgroup Negative) AUC
- BNSP (Background Negative, Subgroup Positive) AUC
Generalized Mean of Bias AUCs
$$ M_p(m_s) = \left(\frac{1}{N} \sum_{s=1}^{N} m_s^p\right)^\frac{1}{p} $$
# simple LSTM baseline
python3 simple_lstm_baseline.py
Preprocessing
Model
- Simple LSTM - 93%