This project performs sentiment analysis on tweets, determining whether a tweet expresses a positive, neutral, or negative sentiment. It leverages natural language processing (NLP) techniques and machine learning to analyze textual data, making it a valuable tool for understanding public opinion, monitoring brand sentiment, or analyzing customer feedback.
- Text Preprocessing: Cleans the raw text data by removing special characters, converting text to lowercase, and normalizing whitespace.
- TF-IDF Vectorization: Converts text into numerical representations based on the importance of words.
- Machine Learning Model: Utilizes a Logistic Regression classifier for sentiment prediction.
- Evaluation Metrics: Provides detailed performance evaluation, including accuracy, precision, recall, and F1-score.
- Data Loading: Reads a labeled dataset of tweets with their corresponding sentiment.
- Data Cleaning: Prepares the text for analysis by removing noise and standardizing the format.
- Label Encoding: Maps sentiment labels (Positive, Neutral, Negative) to numerical values.
- Training: Trains the model using an 80/20 train-test split.
- Prediction: Predicts sentiment for test data using the trained model.
- Evaluation: Reports accuracy and provides a detailed classification report.
- Python 3.x
- Libraries:
pandasnumpyscikit-learnre
- Clone the repository:
git clone https://github.com/your-username/sentiment-analysis.git cd sentiment-analysis - Install dependencies:
pip install -r requirements.txt - Run the script:
python sentiment_analysis.py - Add the dataset (
twitter_training.csv) in the project directory.
- Incorporate additional preprocessing like removing stop words or stemming.
- Use advanced machine learning models (e.g., SVM, Random Forest) or deep learning models (e.g., LSTMs, Transformers).
- Expand the dataset to improve model accuracy and generalizability.