Skip to content

To understand the sentiment of Japanese Whisky brand through reviews and summarize reviews with TF-IDF.

Notifications You must be signed in to change notification settings

jacquessham/JapaneseWhiskyReviews

Repository files navigation

Japanese Whisky Reviews

There is a Japanese Whisky Review data set available in Kaggle, that the data set is originated from Master of Malt. I am interested in doing some NLP works on this data set.

I will be making some analysis on the sentiment of the reviews and try to summarize the individual review.

The current version is 2.1.0, release on 17 July, 2023. You may find the previous versions in the Archive folder.

Tools

In this project, I will be using packages like SKlearn, vaderSentiment, ntlk for sentiment scores and TF-IDF. Then, we will display the result on a dashboard via Plotly Dash. Starting from Version 2.0.0, the dashboard is dockerized. We would use Docker to host the dashboard.

Data set

The data set could be found in Kaggle. The data is downloaded to japanese_whisky_review for the dashboard to read. It consists of 4 columns including, bottle label, brand name, title of the review and the review content. The data set only covers 4 Japanese whisky brands -- Yamazaki, Hibiki, Hakushu, and Nikka.

Dashboard

The dashboard consists of two parts: Sentiment Analysis and TF-IDF Analysis (Core meaning of a posted comment). The Sentiment Analysis is plotted with a static box plot of sentiment scores distribution by whiksy brand. The bottom has 4 tabs represent each whisky brand. You may click on one whisky and the dashboard would randomly pick a comment and display the core meaning.

The dashboard looks like this:



Since the dashboard is dockerized, you would host the dashboard with Docker and access from it.

How to Run the Dashboard?

First, build the Docker Image with Dockerfile and it will installed all required dependenices. Then, run and create a container.

# Build Docker Image
docker build -t japanese_whiskies .

# The Image name is now japanese_whiskies
# Run and create a container "jpn_whiskies_dashboard"
docker run -h localhost -p 9002:9000 -d --name jpn_whiskies_dashboard japanese_whiskies 

Once the dashboard is ready, you may access it at 127.0.0.1:9002

Technical Explanation

Sentiment Analysis

We will use vaderSentiment to calculate the sentiment score for each review. Then, Plotly will visualize the range of sentiment score of each brand with a boxplot and render on the Dashboard. It looks like this.


From the boxplot, we can learn that reviewers in general have a positive view on the Japanese whiskies, while they have better impression on Nikka and Hibiki. Interestingly, the median sentiment score on Yamazaki is 0, which means neutral.

TF-IDF Analysis

The second task is to build a model that shows the summary by displaying the top 5 key words in the review. The script uses TfidfVectorizer from sklearn.feature_extraction.text to build the model. To preprocess the texts, I used the same package to remove English stop words and nltk to stem the words.
jpwhisky_review_tfidf.py is the backend, and the Dashboard viz.py (Which is run by Docker) will provoke the implementation and display result.

Files

Here are the files to run the dashboards:

viz.py

The driver file to construct the dashboard and backend. When you run the Docker container, it will automatically run this driver script.

jpwhisky_review_sentiment.py

The helper script to calculate sentiment scores in the backend.

jpwhisky_reivew_tfidf.py

The helper script to calculate TF-IDF scores in the backend.

Note: stop_words is depreated after Scikit-learn v0.22. The current version dashboard has upgraded to the latest Scikit-learn and replace all stop word variables with _stop_words already.

viz_helper Folder

The framework to render a Plotly visualization, the blueprint comes from the DashExamples Respository with some modification.

About

To understand the sentiment of Japanese Whisky brand through reviews and summarize reviews with TF-IDF.

Topics

Resources

Stars

Watchers

Forks

Packages