Skip to content

Shaked-g/goodreads

Repository files navigation

Goodreads first year ML practice project

in this project i have used KNN and Random Forest in order to check if there is a correlation between The average_rating of the books in Goodreads and

  1. the language of the book
  2. num_pages
  3. ratings_count
  4. text_reviews_count

during the project i have faced an Imbalanced data set and have used an Up-sampling method to try and solve the problem

the Project is devided to the following sections:

1.Data Overview

2.Data Cleaning

3.Data Adjusting

4.Applying Machine Learning Model

4.1. K-nearest neighbors

5.The Problem

6.Random Forest

7.Up-Sampling the minority classes as a solution

8.KNN after re-sampling and comparison