Bots are a prevalent problem on Twitter. At best, bots create inauthentic interactions and artificially inflate one’s social influence; at worst, they spread dangerous content like scams or fake news. At Twitter’s scale, it can no longer rely on human annotators to identify bots from humans and has to opt for some form of automatic detection. This project aims to detect human from bots using their user description and tweets modeled with different deep learning approaches, including multilayer perceptron (MLP) and different types of graph neural networks (GNN), including graph convolutional network (GCN), graph isomorphic network (GIN), and graph attention network (GAN). We also experimented with different model architectures for extracting the embedding that summarizes the users' tweets. We found that the best model is an architecture that combines MLP and GAN, giving an accuracy score of X on the Y dataset.
Cresci-15 dataset contains node.json
, label.csv
, split.csv
and edge.csv
(for datasets with graph structure).
Cresci-15 is available at Google Drive.
- Download
Other-Dataset-TwiBot22-Format.zip
and unzip. - Copy
cresci-2015
tosrc/BotRGCN/datasets/
.
To setup the environment and install the requirement bash commands_local.sh
. You might need to adjust the cuda version depending on the cuda version that you use.
- clone this repo by running
git clone https://github.com/travistangvh/TwitterBotBusters
- change directory to
src/BotRGCN/datasets
and download datasets and create new folder in./cresci-2015
- create the preprocessed data by changing the directory to
src/BotRGCN/cresci_15
and runpython3 ./preprocess_combined.py
. This will create a preprocess data in thesrc/BotRGCN/cresci_15/processed
- change directory to
src/GCN_GAT
- run experiments by executing
python train.py --config gat-mlp-1.yaml
. You can explore other model by changing the config file.