- Clean and preprocess tweet data
- Perform sentiment analysis on tweet content
- Visualize sentiment distribution for comparison
- Determine which candidate has more positive sentiment
- Data Preprocessing: Cleans tweet text by removing URLs, mentions, and special characters.
- Sentiment Analysis: Uses NLTK’s Vader to determine sentiment (positive/negative).
- Data Visualization: Generates a line plot showing sentiment distribution between the two candidates.
- Result Summary: Prints a message indicating the candidate with the higher positive sentiment.
- PySpark: For data processing and transformation
- NLTK: Sentiment analysis with Vader
- Pandas: Data manipulation and analysis
- Matplotlib and Seaborn: Data visualization
- Docker: Containerization for easy setup
- Python 3.8+: Make sure you have Python installed.
- Docker: Required to run the project in a container.
- Twitter API: If you wish to fetch live tweets, you'll need Twitter API credentials.
- Dependecies: See `requirements.txt` for the list of required Python packages.
Clone the repository and navigate into the project directory.
git clone https://github.com/aDJi2003/spark_sentiment_comparasion.git
cd SentimentComparisonBuild the Docker image using the following command:
docker build -t sentiment_analysis_image .Run the Docker Container, mounting your local path to the container using the following command:
docker run -it -v <your_local_path>:/sentiment_analysis sentiment_analysis_image