Skip to content

aDJi2003/spark_sentiment_comparison

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spark Sentiment Comparison

A sentiment analysis project comparing tweets about two political figures—Trump and Kamala—using PySpark and NLTK. This project fetches tweets, cleans the text, analyzes the sentiment, and visualizes sentiment distribution, helping to determine which figure has a higher positive sentiment score.

Key Objectives

  • Clean and preprocess tweet data
  • Perform sentiment analysis on tweet content
  • Visualize sentiment distribution for comparison
  • Determine which candidate has more positive sentiment

Features

  • Data Preprocessing: Cleans tweet text by removing URLs, mentions, and special characters.
  • Sentiment Analysis: Uses NLTK’s Vader to determine sentiment (positive/negative).
  • Data Visualization: Generates a line plot showing sentiment distribution between the two candidates.
  • Result Summary: Prints a message indicating the candidate with the higher positive sentiment.

Technologies Used

  • PySpark: For data processing and transformation
  • NLTK: Sentiment analysis with Vader
  • Pandas: Data manipulation and analysis
  • Matplotlib and Seaborn: Data visualization
  • Docker: Containerization for easy setup

Prerequisites

  • Python 3.8+: Make sure you have Python installed.
  • Docker: Required to run the project in a container.
  • Twitter API: If you wish to fetch live tweets, you'll need Twitter API credentials.
  • Dependecies: See `requirements.txt` for the list of required Python packages.

Installation

1. Clone the Repository

Clone the repository and navigate into the project directory.

git clone https://github.com/aDJi2003/spark_sentiment_comparasion.git
cd SentimentComparison

2. Build the Docker Image

Build the Docker image using the following command:

docker build -t sentiment_analysis_image .

3. Run the Docker Container

Run the Docker Container, mounting your local path to the container using the following command:

docker run -it -v <your_local_path>:/sentiment_analysis sentiment_analysis_image

About

Compare tweets about two public figures using PySpark + NLTK and visualize sentiment distribution.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors