Skip to content

Latest commit

 

History

History
23 lines (19 loc) · 1.78 KB

README.md

File metadata and controls

23 lines (19 loc) · 1.78 KB

Evaluate

In this directory, a notebook is provided to illustrate evaluating models using various performance measures which can be found in reco_utils.

Notebook Description
evaluation Examples of different rating and ranking metrics in Python+CPU and PySpark environments.
comparison Example of comparing different algorithms for both Rating and Ranking metrics

Two approaches for evaluating model performance are demonstrated along with their respective metrics.

  1. Rating Metrics: These are used to evaluate how accurate a recommender is at predicting ratings that users gave to items
    • Root Mean Square Error (RMSE) - measure of average error in predicted ratings
    • R Squared (R2) - essentially how much of the total variation is explained by the model
    • Mean Absolute Error (MAE) - similar to RMSE but uses absolute value instead of squaring and taking the root of the average
    • Explained Variance - how much of the variance in the data is explained by the model
  2. Ranking Metrics: These are used to evaluate how relevant recommendations are for users
    • Precision - this measures the proportion of recommended items that are relevant
    • Recall - this measures the proportion of relevant items that are recommended
    • Normalized Discounted Cumulative Gain (NDCG) - evaluates how well the predicted items for a user are ranked based on relevance
    • Mean Average Precision (MAP) - average precision for each user normalized over all users

More details on recommender metrics can be found in ths paper by Asela Gunawardana and Guy Shani: A Survey of Accuracy Evaluation Metrics of Recommendation Tasks .