Realtime Twitter Sentiment Analysis Dashboard

Description

Our project Real Time Twitter Sentiment Analysis, revolves around the idea of using unsuopervised Machine Learning approaches to classify the twitter data(tweets) into sentiment categories of POSITIVE, NEGATIVE or NEUTRAL.

Characteristic functionalities

Analysis of Tweets from Twitter Usernames and Keywords.
Classification of Tweets based on their sentiments in real-time.
Interactive Charts and Graphs visualizing the corresponding twitter engagement.
Options to choose custom input attributes like range of dates, maximum number of tweets to be fetched, etc.
Dashboard presenting a complete twitter-performance-chart for the respective Username or keyword.
Analysis of user engagement on the Twitter, based on different languages used, number of retweets and distribution of tweets over weekdays.

Tech Stack

Twint package is used for fetching tweets from Twitter in realtime.
Training the Sentiment Model:
- NLTK provides several modules for data-preprocessing and Natural Language Processing in Python.
  - Preprocessing utilities from NTLK like stopwords, porter stemmer were used during the Text preprocessing stage in preparing the training dataset to be fed into the model.
- Twitter Sentiment Dataset from Kaggle is used for gathering data to train the sentiment-model.
- ScikitLearn provides useful model libraries.
  - SkLeanr's TfIdf Vectorizer was used for preparing the embedded matrix.
  - Followed by it, K-Means Clustering model is used to cluster the semantically similar words from the embedded matrix and derive the cluster centers of three different sentiments.
- Gensim provides fast utilites for training NLP models and vector embeddings.
  - Word2Vec model from gensim was used for vector embeddings.
- Pickle was used for serializing trained models and using them for prediction and production. The trained models were pickled and dumped in the directory for further use.
Dashboard for Twitter Analysis:
- Flask is used as backend for Dashboard.
- Dash, an HTML, CSS wrapper is used for laying out the UI for the Dashboard. Dash was predominantly used for setting up the Frontend of the Dashboard.
- Plotly is used for all charts, plots and graphical visualizations on the dashboard.
Determining the accuracy of the Sentiment Analysis Model: For determining the accuracy, a dataset was choosen and its polarity was determined using pretrained Supervised ML model Vader Sentiment Analyser and then the F1 score was calculated using both the labelled data and the predicted data.
- The accuracy of the model stands at: 75.2%

Screenshots of the Dashboard

Using a Twitter-Username for Analysing data

Using a Keyword for Analysing data

Thought behind the Project

The project has several use cases in the industry ranging from, Analysing the sentiment of Users on Twitter for a particular product or service, to managing and proctoring the twitter engagement for tweets related a particular topic. The dashboard can act as a perfect tool for analysing market performance and further deciding the future of the service or product offered.

Setup Process

For setting up the project on a local machine

Fork this repository.

Clone the repository using simple zip download or use the command

    git clone https://github.com/gautamanirudh/twitterdash.git

Move to the master branch by using command
```
    git checkout  master
```

Create a virtual environment for the project

    pip install virtualenv
    virtualenv -p /usr/bin/python3 env_name

Activate the Virtual environment
```
   source env_name/bin/activate
```
Once the virtual environment is activated, the name of your virtual environment will appear on left side of terminal. This will let you know that the virtual environment is currently active.
Install all the dependencies
```
   pip install -r requirements.txt
```
To start the Dashboard app, run the command
```
    python app.py
```

Above Steps are sufficient for running the dashboard and analyzing realtime twitter data sentiment performance. But, for running the preprocessing and training model files, nltk data has to be downloaded to access the utilities. For that use the command:

```
    nltk.download()
```

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
__pycache__		__pycache__
Prediction Script.ipynb		Prediction Script.ipynb
Prediction.ipynb		Prediction.ipynb
Procfile		Procfile
README.md		README.md
Sentiment_Analysis.ipynb		Sentiment_Analysis.ipynb
app.py		app.py
final_kmeans.pkl		final_kmeans.pkl
final_vectorizer.pkl		final_vectorizer.pkl
nltk.txt		nltk.txt
requirements.txt		requirements.txt
stemmer.pkl		stemmer.pkl
stopwords.pkl		stopwords.pkl
tempCodeRunnerFile.py		tempCodeRunnerFile.py
twitter_languages.csv		twitter_languages.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Realtime Twitter Sentiment Analysis Dashboard

Description

Characteristic functionalities

Tech Stack

Screenshots of the Dashboard

Thought behind the Project

Setup Process

About

Releases

Packages

Contributors 3

Languages

gautamanirudh/twitterdash

Folders and files

Latest commit

History

Repository files navigation

Realtime Twitter Sentiment Analysis Dashboard

Description

Characteristic functionalities

Tech Stack

Screenshots of the Dashboard

Thought behind the Project

Setup Process

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages