Used IFTTT to monitor twitter for the following keywords. #BITSPilani, #BITSGoa, #BITSHyd, #BITSDubai, #BITSAA, #BITS, #Pilani Obviously this has lot of noise as well due to 'BITS' keyword. Since IFTTT has stopped Twitter support for live searches, Use https://zapier.com/ and create your own dataset for analysis.
All the services will send you emails in one or the other formats, which is easy to parse. If you just have list of all tweets, Use twitter API to crawl the texts.
ifttt
via task 618721:
http://ifttt.com/tasks/618721
Agam, the band, with BITSian roots http://t.co/EvIumdmJ http://twitter.com/BITSAA/status/166177110573064192
by http://twitter.com/BITSAA
- Twitter training dataset taken from http://thinknook.com/twitter-sentiment-analysis-training-corpus-dataset-2012-09-22/ .
- Parsed and formatted training datasets for 1.5M and .1M tweets has been included.
- BITS Pilani Dataset containing tweets for the duration January 20, 2012 to September 27, 2012
- Use Rapidminer 5.3 with -Xms2048m -Xmx3072m for faster calculations. Though other models are faster, SVM is really slow and so avoid using more than 0.1 Million dataset.
true 0 | true 1 | class precision | |
---|---|---|---|
pred. 0 | 24042 | 9922 | 70.79% |
pred. 1 | 19482 | 46537 | 70.49% |
class recall | 55.24% | 82.43% |
Top 10 Positive and Negative words
word | weight | word | weight |
---|---|---|---|
thank | 0.06800427050495744 | sad | 0.06904954519705979 |
love | 0.04238921785592977 | miss | 0.06799716497097386 |
good | 0.03864780316342833 | sorri | 0.06447410364223946 |
great | 0.03332699835307452 | wish | 0.04964308132602499 |
quot | 0.028049576202737663 | suck | 0.04549754050714666 |
welcom | 0.028045093611976712 | bad | 0.03882145370669514 |
awesom | 0.027883840586310205 | hate | 0.038814744730334146 |
haha | 0.027711586964757735 | work | 0.038456277249749565 |
nice | 0.026502431781819224 | poor | 0.03537374379337165 |
happi | 0.024842171425360552 | want | 0.03312521661076012 |
Positive Tweets | 4759 |
Negative Tweets | 1552 |
true 0 | true 1 | class precision | |
---|---|---|---|
pred. 0 | 34413 | 36884 | 48.27% |
pred. 1 | 9111 | 19575 | 68.24% |
class recall | 79.07% | 34.67% |
Positive Tweets | 3436 |
Negative Tweets | 2875 |