GitHub - vfeng6704/Yelp-NLP-Deep-Learning: LSTM / SHAP / NLP / Sentiment Analysis

Executive Summary

Understanding customer sentiment and its causes at scale is crucial for building a competitive advantage in business. This project applies deep learning models to predict McDonald’s Yelp review ratings and uses explainable AI techniques to identify key words that influence these ratings.

Problem Definition

Accurate sentiment analysis of customer reviews is key for gaining insights into customer satisfaction and improving business strategies. The challenge is to predict review ratings based on text data. This project leverages natural language processing (NLP) techniques and machine learning models to predict Yelp review ratings and understand the factors influencing these ratings.

Data Sources

Yelp Dataset: Contains ~17K reviews of McDonald's spanning from 2007 to 2022.
Additional Data: Text data processed using various NLP techniques to extract meaningful features for modeling.

Methodology

Data Cleaning:
- Removed special characters, punctuations, and white space.
- Tokenization: Split text into individual tokens.
- Stop Words: Removed common words that may not carry significant meaning.
- Lemmatization: Reduced words to their base or root form.
Modeling:
- Feature Engineering: Created features using TF-IDF and focused on binary classification for 1 and 5 star reviews.
- Model Selection: Evaluated logistic regression, Naïve Bayes, and deep learning models (including LSTM).
- Performance Metrics: Accuracy, F1 Score, Recall, and Precision were used to evaluate model accuracy.

Lessons Learned

NLP: Text data requires extensive preprocessing and feature extraction before modeling. Deep contextual understanding (e.g., BERT) can improve model performance but may have computational constraints.
Deep Learning: Explainable AI (XAI) techniques provide interpretability, but results from Lime and SHAP were less insightful than expected.

Future Work

Improve Overfitting: Despite regularization and reducing model complexity, overfitting persisted.
Aspect-Based Sentiment Analysis (ABSA): Prioritize sentiment analysis related to specific aspects such as food quality and service.
Sentiment Trend Analysis: Analyze sentiment trends over time and correlate them with external factors like new menu items or promotional campaigns.
Dataset Expansion: Expand the dataset to include reviews from other platforms like Google Reviews and Reddit.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
ML Final Project_VF.pdf		ML Final Project_VF.pdf
Machine Learning_Final_Project_NLP.ipynb		Machine Learning_Final_Project_NLP.ipynb
README.md		README.md
requirements		requirements

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Executive Summary

Problem Definition

Data Sources

Methodology

Lessons Learned

Future Work

About

Releases

Packages

Languages

vfeng6704/Yelp-NLP-Deep-Learning

Folders and files

Latest commit

History

Repository files navigation

Executive Summary

Problem Definition

Data Sources

Methodology

Lessons Learned

Future Work

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages