Spam Classification

Analysis and detection of short url spam on twitter. Achieved an accuracy of 89.23% on 100,000 tweets.

Steps performed

Collecting 100,000 tweets containing bit.ly short url using Twitter API.
Gathering meta-data about each short url using Bitly API.
Storage of all information in MongoDB.
Analysis of the information to discover significant patterns.
Classification of short urls using [Weka] (http://www.cs.waikato.ac.nz/ml/weka/).