Product Review Analysis

Overview:

The objective of this application is to scrape a collection of product reviews from a set of web pages, preprocess the data, and evaluate the performance of different classifiers in the context of two related text classification tasks:

predicting review sentiment
predicting review helpfulness

The implementation and procedure explanation is split into two Jupyter Notebooks.

Task 1 is completed in the 1st notebook Task1.ipynb, while Tasks 2 and 3 are completed in the 2nd notebook Task2_Task3.ipynb.

Task 1. Data Collection

Scrape the complete set of web pages from your personal website address:

http://mlg.ucd.ie/modules/python/assign2/<STUDENT_NUMBER>/
From the web pages above, parse every review across all years 2016-2021. For each product review, extract the following information:

i) The star rating of the review

ii) The title text of the review

iii) The main body text of the review

iv) Review helpfulness information
Store the parsed review data in an appropriate format.

Task 2. Review Sentiment Classification

Load the data from Task 1 and create a set of documents, one per review. And each document should consist of the concatenation of the review's title and body text.
Assign a class label ("positive" or "negative") to each review. We will assume that 1-star to 3-star reviews are "negative", and 4-star to 5-star reviews are "positive".
Apply preprocessing steps to create a numeric representation of the documents, suitable for classification.
Build two different binary classification models using two classifiers, to distinguish between "positive" and "negative" reviews
Compare the performance of the classification models using an appropriate evaluation strategy. Report and discuss the evaluation results.

Task 3. Review Helpfulness Classification

Assign a class label (“helpful” or “unhelpful”) to each review from Task 2, based on its associated helpfulness information.
Build two different binary classification models using two classifiers, to distinguish between “helpful” and “unhelpful” reviews.
Compare the performance of the classification models using an appropriate evaluation strategy.
Based on the evaluation results from both Tasks 2 and 3, compare and discuss the differences in performance for the two classification tasks (i.e. sentiment and helpfulness classification).

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
LICENSE		LICENSE
README.md		README.md
Task1.ipynb		Task1.ipynb
Task2_Task3.ipynb		Task2_Task3.ipynb
dataset.csv		dataset.csv
review_url.txt		review_url.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Product Review Analysis

Overview:

Task 1. Data Collection

Task 2. Review Sentiment Classification

Task 3. Review Helpfulness Classification

About

Releases

Packages

Languages

License

naivecoder-irl/Product-Review-Analysis

Folders and files

Latest commit

History

Repository files navigation

Product Review Analysis

Overview:

Task 1. Data Collection

Task 2. Review Sentiment Classification

Task 3. Review Helpfulness Classification

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages