Classifying Japanese Whisky reviews using TFIDF/Naive Bayes
Given a CSV file with 200 japanese whisky reviews labeled "positive" or "negative", and 900 unlabeled reviews, let's train a Naive Bayes model on TFIDF data so we can predict the label for the remaining 900.
Note: The dataset was not that great (small, lots of typos, many negative reviews with positive sentiment words "ex: not that good"), but it still served as good practice for cleaning data, NB, vectorizers, working with dataframes, and even a little bit of SQL at the end.
This dataset was downloaded from Kaggle.