The Problem:

A recommendation system by Mau Hernandes, Ph.D. - September 2020

The Problem:

Item Recommendation System for users of a (web) app.

The setup

Given the artificial files item_history.tsv, user_master.tsv and target_users.tsv (see 'how to download' section), to build a recommendation system that maximizes the accuracy of the recommendation but with certain level of variaty (entropy) on the selection.

This repo contains 3 main componets:

A PDF report with the key concepts and ideas for this project,
Jupyter notebooks going through the steps of building the system and
a few python files with functions and classes used for some of the notebooks.

The final recommendation model is in the notebook called 5. Model

System Requirements

spark 3.0.0 python 3.8

tested on Ubuntu 20

A quick tutorial on how to install Spark on Ubuntu: https://medium.com/solving-the-human-problem/installing-spark-on-ubuntu-20-on-digital-ocean-in-2020-a7e4b5b65ffb

Folder description

Notebooks:

Cleaner: for cleaning the data for training and testing used by the other notebooks
Feature Exploration: A few histograms of the distribution of some of the features in the user_master.tsv file.
ALS Training: An Alternating Least Squares training notebook for Collaborative Filtering. Including gridsearch and evaluation with nDCG.
kMeans: A Kmeans training notebook for Contente Based Filtering. Including gridsearch and evaluation with nDCG.
Evaluation: A notebook for quick trying and evaluating different models.
Model: The notebook with the recommendation system proposed by this project. A mix of ALS + Kmeans.

PDF Report:

The pdf file is an overview of the methods and technologies applied in the development of the recommendation system. It includes some mathematical discussions of key concepts, some plots and tables from our benchmark tests and a extensive description and diagrams of how our system works.

Python files:

model: - als_trainer.py: Convenience functions for cleaning the data and training an ALS model - kmeans_trainer.py: Convenience functions for cleaning the data and fitting a k-means cluster.
utils: - evaluate.py: File containing our evaluate function to benchmark different models - make_Y.py: File for cleaning the data to make training and testing data - metrics.py: File containg functions to calculate the nDCG metrics.

How to download data

A dataset for the code can be downloaded here

Data Folder:

data: Contain all the data (.tsv files) that the different functions and methods read and write to.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
model		model
utils		utils
1. Cleaner - Prep Y_train and Y_test.ipynb		1. Cleaner - Prep Y_train and Y_test.ipynb
2. Feature Exploration.ipynb		2. Feature Exploration.ipynb
3A. ALS Training - Collaborative Filter.ipynb		3A. ALS Training - Collaborative Filter.ipynb
3B. KMeans.ipynb		3B. KMeans.ipynb
4. Evaluation.ipynb		4. Evaluation.ipynb
5. Model.ipynb		5. Model.ipynb
README.md		README.md
hybrid recommendation system.pdf		hybrid recommendation system.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Problem:

The setup

Suggested reading order:

System Requirements

Folder description

Notebooks:

PDF Report:

Python files:

How to download data

Data Folder:

About

Releases

Packages

Languages

mauhcs/hybrid-recommender

Folders and files

Latest commit

History

Repository files navigation

The Problem:

The setup

Suggested reading order:

System Requirements

Folder description

Notebooks:

PDF Report:

Python files:

How to download data

Data Folder:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages