🚀 Sentiment Classification with NLP and Machine Learning 🌟

This project explores Natural Language Processing (NLP) techniques and Machine Learning models to classify sentiments in text data. The work integrates a combination of feature extraction methods, oversampling techniques, and models to achieve robust and accurate sentiment classification. 🧠📊

NLP and Machine Learning Illustration

🎯 Project Overview

Type: Sentiment Classification
Goal: Classify positive and negative sentiments with high accuracy.
Techniques Used: NLP, Feature Engineering, Machine Learning, Oversampling.

🛠️ Feature Extraction Techniques

The following methods were implemented to represent textual data effectively:

📚 Bag of Words (BoW):
- Focuses on word frequencies, representing text as vectors of word occurrences without considering grammar or structure.
🔍 TF-IDF (Term Frequency-Inverse Document Frequency):
- Combines Term Frequency (TF) and Inverse Document Frequency (IDF) using logarithmic scaling to emphasize the importance of words relative to both the document and the entire dataset.
🤖 Word2Vec:
- Creates dense word vectors by learning semantic relationships and associations from a large corpus of text.
🧠 GloVe (Global Vectors):
- Captures global word co-occurrences using pre-trained 100-dimensional word vectors to enhance the understanding of word meanings and contexts.

⚖️ Oversampling Technique

SMOTE (Synthetic Minority Over-sampling Technique):

To address class imbalance, SMOTE was used to generate synthetic samples for the underrepresented class.
This ensures a more balanced dataset, improving the model's ability to classify minority sentiments accurately.

💻 Machine Learning Models

Three models were evaluated for sentiment classification:

📚 Naive Bayes
🌲 Random Forest
⚡ XGBoost (Extreme Gradient Boosting)

📈 Key Results and Insights

🏆 XGBoost Performance

Baseline Performance:
- Accuracy: 91.16%
- Precision: 0.88
- Recall: 0.78
- F1-Score: 0.82
TF-IDF + SMOTE with XGBoost:
- Achieved the highest performance:
  - Accuracy: 92.71%
  - Precision, Recall, and F1-Score: 0.93
  - ROC-AUC: 0.98 🎉

🤔 GloVe + SMOTE with XGBoost

Accuracy: 87.51%
Highlighted limitations in capturing nuanced sentiment for the underrepresented class.

📊 Evaluation Metrics

Standard classification metrics were used:
✅ Accuracy
✅ Precision
✅ Recall
✅ F1-Score
✅ ROC-AUC Curve

🚀 Future Work

Explore deep learning models like BERT for enhanced text representation and sentiment analysis.
Investigate techniques such as SHAP and LIME for better model interpretability.
Experiment with additional NLP techniques and embeddings to refine performance further.

📂 Getting Started

Clone the repository:

git clone https://github.com/RanaPrince/sentiment-classification.git  
pip install -r requirements.txt

📂 Getting Started

Run the model scripts and experiment with feature extraction techniques.

🤝 Contact Me

Feel free to reach out for questions, collaborations, or feedback!

GitHub: RanaPrince
LinkedIn: Prince Rana
Email: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Data Cleaning		Data Cleaning
Data Featurization NLP		Data Featurization NLP
ML Models		ML Models
pickle files		pickle files
LICENSE		LICENSE
README.md		README.md
description of project		description of project
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 Sentiment Classification with NLP and Machine Learning 🌟

🎯 Project Overview

🛠️ Feature Extraction Techniques

⚖️ Oversampling Technique

SMOTE (Synthetic Minority Over-sampling Technique):

💻 Machine Learning Models

📈 Key Results and Insights

🏆 XGBoost Performance

🤔 GloVe + SMOTE with XGBoost

📊 Evaluation Metrics

🚀 Future Work

📂 Getting Started

📂 Getting Started

🤝 Contact Me

About

Releases

Packages

Languages

License

RanaPrince/Sentiment-Analysis-NLP-Machine-Learning

Folders and files

Latest commit

History

Repository files navigation

🚀 Sentiment Classification with NLP and Machine Learning 🌟

🎯 Project Overview

🛠️ Feature Extraction Techniques

⚖️ Oversampling Technique

SMOTE (Synthetic Minority Over-sampling Technique):

💻 Machine Learning Models

📈 Key Results and Insights

🏆 XGBoost Performance

🤔 GloVe + SMOTE with XGBoost

📊 Evaluation Metrics

🚀 Future Work

📂 Getting Started

📂 Getting Started

🤝 Contact Me

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages