GitHub - sanketvega/Spark-Notebooks: Spark Notebook for imdb Movie Data Analysis

Spark Notebook for imdb movie data analysis

-You can fork the snb notebook and open in your local spark-notebook OR -Fork the generated scala code and execute in Eclipse/IntelliJ

Problem Description: File Name Description / Schema

movies.dat MovieID – Title – Genres

ratings.dat UserID – MovieID – Rating – Timestamp

users.dat UserID – Gender – Age – Occupation – ZipCode

README Additonal information / explanation about the above three files

The dataset can be downloaded from the link : http://grouplens.org/datasets/movielens/1m/

You are required to write a code Spark code in scala/python to get results for the

following questions,

Top ten most viewed movies with their movies Name (Ascending or Descending order)
Top twenty rated movies (Condition : The movie should be rated/viewed by at least 40 users)
Top twenty rated movies (which is calculated in the previous step) with no of views in the following age group (Age group : 1. Young (<20 years), 2. Young Adult(20-40 years), 3.adult (> 40years) )
Top ten critics (Users who have given very low ratings; Condition : The users should have at least rated 40 movies)

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
Sanket_Patil_FDSDE1602 .snb		Sanket_Patil_FDSDE1602 .snb
Sanket_Patil_FDSDE1602_scala .scala		Sanket_Patil_FDSDE1602_scala .scala
readme.md		readme.md

Provide feedback