Skip to content

Scala app that performs sentiment analysis on tweets and produces them to Kafka.

Notifications You must be signed in to change notification settings

guidok91/twitter-sentiment-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tweet sentiment analysis

workflow

Scala app that retrieves tweets using the Twitter API and performs sentiment analysis with the Stanford CoreNLP library.

Tweets are retrieved based on search keywords we specify, and the tweet text is fed to the NLP library for sentiment analysis.

Finally, tweets are produced to a Kafka topic.

Twitter API

The app needs a Bearer Token to authenticate against the API (OAuth 2.0 App-Only auth).

The token has to be be generated on the Twitter Developer portal. More info here.

Once you have generated one, place it in the config file (tweeter.api_auth_bearer_token).

Keywords for tweet search must also be specified in the config file (tweeter.search_keywords).

Caveat: only tweets for the last week are retrieved (we use the Recent search option as opposed to the Full-archive search).

Sentiment analysis

The Stanford CoreNLP library works by splitting a text into sentences and and assigning a sentiment value to each one:

  • Values 0 or 1 => negative sentiment.
  • Value 2 => neutral sentiment.
  • Values 3 or 4 => positive sentiment.

Given this, if a tweet contains multiple sentences, we pick the most frequently assigned sentiment.

Kafka topic

Tweets are produced to a Kafka topic using Avro serialization.

A local Kafka instance with schema registry is available (see docker-compose.yml).

Running instructions

Check the Makefile for how to compile, test and run the application.

CI/CD

A Github Actions workflow for CI/CD is defined here and can be seen here.

About

Scala app that performs sentiment analysis on tweets and produces them to Kafka.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published