onionometer

Detecting satire using machine learning.

This project uses the doc2vec algorithm and scikit learn's logisitic regression classifier to try to identify satirical headlines from real ones.

The onion headline data was collected using the puppeteer library (via web scraping). The real headlines were obtained from CBC since they had an easy to use API. I just directly accessed their API using the request library instead of scraping since using the API is faster.

Using these headlines, I first trained a Doc2Vec model (currently saved as onionometer.d2v). The main purpose of using Doc2Vec was to avoid using one-hot vectors, as that method does not consider context of words in a sentence (something that is very important in satire).

I then exported the headlines as vectors from Doc2Vec and split them in to train/test groups. My initial split was 30000 train (15000 onion and 15000 real headlines), 8000 test (4000/4000). These were then tossed in to scikit learn's logistic regression classifier.

Version 1 yields ~71% accuracy. After cleaning the data a little bit more the model predicts satire with around ~80% accuracy.

I will update this soon with a more in depth data analysis & will try to increase the accuracy.

Update:

Now using tensorflow/keras neural network instead of scikit logistic regression. Achieved 91% accuracy on test set using a model consisting of a word embedding layer and LSTMs, with Adam/RMSprop optimizer and binary crossentropy loss function.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
sentences		sentences
README.md		README.md
headlines.js		headlines.js
onion_headlines.js		onion_headlines.js
onionometer.d2v		onionometer.d2v
onionometer.ipynb		onionometer.ipynb
onionometer.py		onionometer.py
real_headlines.js		real_headlines.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

onionometer

About

Releases

Packages

Languages

dmreiter/onionometer

Folders and files

Latest commit

History

Repository files navigation

onionometer

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages