Skip to content

Latest commit

 

History

History
13 lines (10 loc) · 471 Bytes

README.md

File metadata and controls

13 lines (10 loc) · 471 Bytes

Use Of PySpark For Movie Similarities With Jaccard Index

Dataset

The dataset is the MovieLens 100K Dataset that can be found here. It includes 100,000 ratings from 1000 users on 1700 movies and was released 4/1998. The needed files for the app are uploaded with changed name.

Requirements

  • PySpark

Example Usage

To find similar movies with 'Star Wars (1977)' movie:

spark-submit movie-similarites.py 50