Capstone: Rumor Detection On Deep Learning Concatenating Hand-crafted Features and Context Embedding (One-page Summary)

Abstract

With the rapid development and integration of social media into people's daily lives, the spread of false information and unverified rumors has been a rising problem. Many researchers have used machine learning and deep learning to address the rumor classification task to catch the propagation of rumors in real-time or after the fact, to understand the behaviors of spreading rumors, and label unverified rumorous tweets. This project performs the rumor detection task with SVM and Deep Learning classifiers using different sets of hand-crafted features and tweet context embeddings inspired by previous researches. The hand-crafted features include the source tweet's Twitter object fields and statistics of tweets derived by the source tweet. To test the generalization ability of proposed methods on unseen data and rumors, cross-validation is applied on PHEME and PHEME-R datasets, and an analysis of the results is presented. The final results achieved an accuracy of 70.16 and a recall score of 83.56, yet show a relative lower precision score, 56.74.

I. INTRODUCTION

Rumor Detection is a binary classification task to classify whether a social media post is reporting rumors, which is not yet verified when it is spreading. This final year project tackles the rumor detection problem by training classification models with the tweet dataset that contains rumorous tweets and compares the results from various features representing the rumorousness of the tweets in different vector spaces. The rumors are assumed to be unseen by the model.

II. DESIGN/METHODOLOGY/IMPLEMENTATION

This project approaches the rumor detection problem by utilizing two different aspects of the chosen dataset. The first is to exploit tweet object data stored in the files of JSON format to create hand-crafted features. The second is to perform word embedding algorithms to get dense vectors of tweet texts and then feed them into a deep neural network. The finalized list of features extracted for this experiment is presented in figure 1.

A. Data

The experiment used the PHEME dataset and PHEME-R dataset. PHEME dataset contains a total of 5,802 annotated rumors that consist of 3,830 tweets and 1,972 tweets that are deemed to be either rumors or not rumors, respectively.

B. Cross Validation

Furthermore, to avoid the potential overfitting problems that could worsen the model's generalization ability on the small dataset of PHEME, k-fold Cross Validation is applied on PHEME and PHEME-R dataset, respectively, per event.

III. EVALAUATION AND RESULTS

TABLE I. THE RESULT OF BASELINE CLASSIFIERS INCLUDING CV-9 SVM, AND CRF ON PHEME DATASET

THE RESULT OF CV-9 & NEURAL NETWORK WITH WEIGHTED SAMPLING

IV. CONCLUSION

The analysis of the performances of each model and feature set, respectively, are presented in Table 1 and 2. The final model can classify the unseen events based on the feature sets extracted from the rumorous tweets from other events that have been discussed on Twitter, with an accuracy of 76.08 and an F1 score of 44.32 on the PHEME dataset, and an accuracy of 70.16 and an F1 score of 42.43 on the dataset combining the PHEME and the PHEME-R dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
.ipynb_checkpoints		.ipynb_checkpoints
.vscode		.vscode
Model		Model
__pycache__		__pycache__
data		data
misc		misc
other		other
.DS_Store		.DS_Store
.gitignore		.gitignore
0_Data_BERTweet.py		0_Data_BERTweet.py
0_PHEME.py		0_PHEME.py
CLF_BERT copy.ipynb		CLF_BERT copy.ipynb
CLF_baseline.ipynb		CLF_baseline.ipynb
COPY_CLF_SPARSE_MLP.ipynb		COPY_CLF_SPARSE_MLP.ipynb
Data.ipynb		Data.ipynb
Data_annotated.ipynb		Data_annotated.ipynb
Data_integration.ipynb		Data_integration.ipynb
EXPORT_RumorEval.ipynb		EXPORT_RumorEval.ipynb
FYP_DataScience.code-workspace		FYP_DataScience.code-workspace
PHEME_BERTweets.ipynb		PHEME_BERTweets.ipynb
PHEME_BERTweets_to_fclayer.ipynb		PHEME_BERTweets_to_fclayer.ipynb
PHEME_data_reaction.ipynb		PHEME_data_reaction.ipynb
PHEME_data_whole_valid.ipynb		PHEME_data_whole_valid.ipynb
README.md		README.md
Test.ipynb		Test.ipynb
Util_Parse.ipynb		Util_Parse.ipynb
Util_WebScroll.ipynb		Util_WebScroll.ipynb
Util_eda.ipynb		Util_eda.ipynb
_CLF_BERT.ipynb		_CLF_BERT.ipynb
_CLF_BERT_SPARSE_MLP.ipynb		_CLF_BERT_SPARSE_MLP.ipynb
_CLF_BERT_scratch.ipynb		_CLF_BERT_scratch.ipynb
_CLF_BERTweets_scratch.ipynb		_CLF_BERTweets_scratch.ipynb
_CLF_MLP.ipynb		_CLF_MLP.ipynb
_CLF_MLP_BERT.ipynb		_CLF_MLP_BERT.ipynb
_CLF_MLP_FINAL_CV copy.ipynb		_CLF_MLP_FINAL_CV copy.ipynb
_CLF_MLP_FINAL_CV.ipynb		_CLF_MLP_FINAL_CV.ipynb
_CLF_MLP_FINAL_CV_final.ipynb		_CLF_MLP_FINAL_CV_final.ipynb
_CLF_MLP_Final.ipynb		_CLF_MLP_Final.ipynb
_CLF_TFIDF.ipynb		_CLF_TFIDF.ipynb
_CLF_TRAD.ipynb		_CLF_TRAD.ipynb
_CLF_W2V_LSTM.ipynb		_CLF_W2V_LSTM.ipynb
_CLF_W2V_MLP.ipynb		_CLF_W2V_MLP.ipynb
_EXPORT_PHEME.ipynb		_EXPORT_PHEME.ipynb
_EXPORT_POS copy.ipynb		_EXPORT_POS copy.ipynb
_EXPORT_POS.ipynb		_EXPORT_POS.ipynb
_EXPORT_POS.ipynb copy		_EXPORT_POS.ipynb copy
_EXPORT_SparseFeatures.ipynb		_EXPORT_SparseFeatures.ipynb
_EXPORT_Thread copy.ipynb		_EXPORT_Thread copy.ipynb
_EXPORT_Thread.ipynb		_EXPORT_Thread.ipynb
_EXPORT_rumorhasit.ipynb		_EXPORT_rumorhasit.ipynb
_FINAL_SPARSE.py		_FINAL_SPARSE.py
_PHEME_BERT.ipynb		_PHEME_BERT.ipynb
_VECTOR.ipynb		_VECTOR.ipynb
_VECTOR_D2V.ipynb		_VECTOR_D2V.ipynb
_VECTOR_Glove.ipynb		_VECTOR_Glove.ipynb
_VECTOR_W2Vavg.ipynb		_VECTOR_W2Vavg.ipynb
__Doodle.ipynb		__Doodle.ipynb
__MLP.py		__MLP.py
__Preprocessing.py		__Preprocessing.py
__snippet.ipynb		__snippet.ipynb
ext_text_forpos.txt		ext_text_forpos.txt
fetchData.py		fetchData.py
gittest		gittest
jsonlist.json		jsonlist.json
temp.ipynb		temp.ipynb
temp.py		temp.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Capstone: Rumor Detection On Deep Learning Concatenating Hand-crafted Features and Context Embedding (One-page Summary)

Abstract

I. INTRODUCTION

II. DESIGN/METHODOLOGY/IMPLEMENTATION

A. Data

B. Cross Validation

III. EVALAUATION AND RESULTS

IV. CONCLUSION

About

Releases

Packages

Languages

zzunebye/Capstone-code-data

Folders and files

Latest commit

History

Repository files navigation

Capstone: Rumor Detection On Deep Learning Concatenating Hand-crafted Features and Context Embedding (One-page Summary)

Abstract

I. INTRODUCTION

II. DESIGN/METHODOLOGY/IMPLEMENTATION

A. Data

B. Cross Validation

III. EVALAUATION AND RESULTS

IV. CONCLUSION

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages