Netflix Movie Recommender

Netflix Prize data was an open competition for the best collaborative filtering algorithm to predict user ratings for films, based on previous ratings without any other information about the users or films

tl&dr

In this project, we calculate one person's rating for one movie he has not seem and make recommendation for top N movies for him.
We use collaborative filtering based on the similarity between items calculated using people’s rating of those items.

example output

based on the user rating files for 100 movie, we generate a recommendation list for the user about top 5 movies for them to watch next.

we predict user id 2351607 is recommend with the following movie in ascending order:
- Lilo and Stitch
- 7 Seconds
- Never Die Alone
- Mostly Martha
- What the #$*! Do We Know!?

how the code works

Pre-processing of Netflix raw data
Data processing
Build co-occurrence matrix
Build rating matrix
Generate top N recommend list for user based on his view history and rating

Pre-Processing

Netflix Prize data raw data has the following format:

MovieID:
UserID, Rating, Date

Movie ID range from 1-17770
Customer ID range from 1 to 2649429
Rating from 1-5
in this project, I don't take account of the dates.

For prove of concept, I first process the data so I only have user rating data from 100 movie, so my input is 100 files that has the following format

UserID, MovieID, Rating

Job 1: Data Processing

DataDividerByUser.java

input args:

arg0: input path from pre-processed data (user rating data from 100 movies)
arg1: output path job1

job 1 output format

UserID movieID: rating, movieID: rating, movieID: rating

Job 2: Build co-occurrence matrix

CoOccurrenceMatrixGenerator.java.
calculate number of time two movies are seem together.

input args:

arg0: output path from job1
arg1: output path for job2

job2 output format

movieID: movieID  num_time_seem_together

ex: movieID_100 and movie_ID_1 has been seem together 8 times.

Job 3: Build rating matrix

Multiplication.java

From user’s rating for movie he has seen, predict his score on movies he hasn’t seem based on the relationship between the 2 movies.
normalize the original input by multiply use rating matrix (original input) with co-occurence matrix (job2_output)
- For example:
  - user rate movie_1 with rating “3”, movie_1 and movie_2 has co-occurance of 10 times.
  - movie_2 has total occurance of 121.
  - so for user1, the likelihood of watching movie_2 is 3*10 / 121

input args

arg0: output from job2
arg1: input for job1 (user rating file for 100 movies)
arg2: output path for job3

how job3 works

Build movieRelation map from ouput of job2

input: 
movie_A:movie_B \t num_time_seem_together

output:
{ movie_1 : {movie_1, movie_2, num_time_seem_together}, {movie_1, movie_2, num_time_seem_together}}

use movieRelationMap to build denominatorMap

{ movie_1, totalOccurance}

mapper job to transform above output to

user_id:movie_id dividedScore

formula for divided score:

divided_score for user on movie_2= (user rating on movie_1) *(num of time movie_1 movie_2 co-ocuurence)/ (number of time movie 2 occure)

reducer job to get sum

user_id movie_id:sum(dividedScore)

job3 output format

Job4: Generate top n recommend list based on user view history and rating

RecommenderListGenerator.java
get recommendation user about movie them have not seem

input argeument

args0: userRating.txt

userId, movieId, rating

args1: movie_titles.txt

movieId, movieName

args2: number of recommendation for this user
args3:user movie relation (input from job3)

user_id movie_id:sum(dividedScore)

args4:output path for this job
args5:name of finalRecommendationResult

how job4 works

Build a map for user id and list of movie he/she has seem (alreadyWatchedList)
Filter out the alreadyWatchedList from output of job3
Replace movieid with movieName

user_name, movieName:Rating

Cumston comparator to return top N movie recommendation for user and their predicted rating for those recommended movie

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
input		input
src/main/java		src/main/java
README.md		README.md
movie_titles.txt		movie_titles.txt
pom.xml		pom.xml
sortedRecommendationList.txt		sortedRecommendationList.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Netflix Movie Recommender

tl&dr

example output

how the code works

Pre-Processing

Job 1: Data Processing

input args:

job 1 output format

Job 2: Build co-occurrence matrix

input args:

job2 output format

Job 3: Build rating matrix

input args

how job3 works

job3 output format

Job4: Generate top n recommend list based on user view history and rating

input argeument

how job4 works

job4 output format

About

Releases

Packages

Languages

ritakuo/MovieRecommender

Folders and files

Latest commit

History

Repository files navigation

Netflix Movie Recommender

tl&dr

example output

how the code works

Pre-Processing

Job 1: Data Processing

input args:

job 1 output format

Job 2: Build co-occurrence matrix

input args:

job2 output format

Job 3: Build rating matrix

input args

how job3 works

job3 output format

Job4: Generate top n recommend list based on user view history and rating

input argeument

how job4 works

job4 output format

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages