diff --git a/GLOSSARY.md b/GLOSSARY.md index 1829c1057..7f325b4a2 100644 --- a/GLOSSARY.md +++ b/GLOSSARY.md @@ -9,7 +9,7 @@ Licensed under the MIT License. * **Click-through rate (CTR)**: Ratio of the number of users who click on a link over the total number of users that visited the page. CTR is a measure of the user engagement. -* **Cold-start problem**: The cold start problem concerns the recommendations for users with no or few past history (new users). Providing recommendations to users with small past history becomes a difficult problem for collaborative filtering models because their learning and predictive ability is limited. Multiple research have been conducted in this direction using content-based filtering models or hybrid models. These models use auxiliary information like user or item metadata to overcome the cold start problem. +* **Cold-start problem**: The cold start problem concerns the recommendations for users with no or few past history (new users). Providing recommendations to users with small past history becomes a difficult problem for collaborative filtering models because their learning and predictive ability is limited. Multiple research have been conducted in this direction using content-based filtering models. These models use auxiliary information like user or item metadata to overcome the cold start problem. * **Collaborative filtering algorithms (CF)**: CF algorithms make prediction of what is the likelihood of a user selecting an item based on the behavior of other users [1]. It assumes that if user A likes item X and Y, and user B likes item X, user B would probably like item Y. See the [list of CF examples in Recommenders repository](examples/02_model_collaborative_filtering). @@ -21,8 +21,6 @@ Licensed under the MIT License. * **Explicit interaction data**: When a user explicitly rate an item, typically between 1-5, the user is giving a value on the likeliness of the item. -* **Hybrid filtering algorithms**: This type of recommendation system can implement a combination of collaborative and content-based filtering models. See the [list of examples in Recommenders repository](examples/02_model_hybrid). - * **Implicit interaction data**: Implicit interactions are views or clicks that show a certain interest of the user about a specific items. These kind of data is more common but it doesn't define the intention of the user as clearly as the explicit data. * **Item information**: These include information about the item, some examples can be name, description, price, etc. diff --git a/README.md b/README.md index 6c6a341bd..87b9ef986 100644 --- a/README.md +++ b/README.md @@ -83,12 +83,12 @@ The table below lists the recommender algorithms currently available in the repo | Cornac/Bilateral Variational Autoencoder (BiVAE) | Collaborative Filtering | Generative model for dyadic data (e.g., user-item interactions). It works in the CPU/GPU environment. | [Deep dive](examples/02_model_collaborative_filtering/cornac_bivae_deep_dive.ipynb) | | Convolutional Sequence Embedding Recommendation (Caser) | Collaborative Filtering | Algorithm based on convolutions that aim to capture both user’s general preferences and sequential patterns. It works in the CPU/GPU environment. | [Quick start](examples/00_quick_start/sequential_recsys_amazondataset.ipynb) | | Deep Knowledge-Aware Network (DKN)* | Content-Based Filtering | Deep learning algorithm incorporating a knowledge graph and article embeddings for providing news or article recommendations. It works in the CPU/GPU environment. | [Quick start](examples/00_quick_start/dkn_MIND.ipynb) / [Deep dive](examples/02_model_content_based_filtering/dkn_deep_dive.ipynb) | -| Extreme Deep Factorization Machine (xDeepFM)* | Hybrid | Deep learning based algorithm for implicit and explicit feedback with user/item features. It works in the CPU/GPU environment. | [Quick start](examples/00_quick_start/xdeepfm_criteo.ipynb) | +| Extreme Deep Factorization Machine (xDeepFM)* | Collaborative Filtering | Deep learning based algorithm for implicit and explicit feedback with user/item features. It works in the CPU/GPU environment. | [Quick start](examples/00_quick_start/xdeepfm_criteo.ipynb) | | FastAI Embedding Dot Bias (FAST) | Collaborative Filtering | General purpose algorithm with embeddings and biases for users and items. It works in the CPU/GPU environment. | [Quick start](examples/00_quick_start/fastai_movielens.ipynb) | -| LightFM/Hybrid Matrix Factorization | Hybrid | Hybrid matrix factorization algorithm for both implicit and explicit feedbacks. It works in the CPU environment. | [Quick start](examples/02_model_hybrid/lightfm_deep_dive.ipynb) | +| LightFM/Factorization Machine | Collaborative Filtering | Factorization Machine algorithm for both implicit and explicit feedbacks. It works in the CPU environment. | [Quick start](examples/02_model_collaborative_filtering/lightfm_deep_dive.ipynb) | | LightGBM/Gradient Boosting Tree* | Content-Based Filtering | Gradient Boosting Tree algorithm for fast training and low memory usage in content-based problems. It works in the CPU/GPU/PySpark environments. | [Quick start in CPU](examples/00_quick_start/lightgbm_tinycriteo.ipynb) / [Deep dive in PySpark](examples/02_model_content_based_filtering/mmlspark_lightgbm_criteo.ipynb) | | LightGCN | Collaborative Filtering | Deep learning algorithm which simplifies the design of GCN for predicting implicit feedback. It works in the CPU/GPU environment. | [Deep dive](examples/02_model_collaborative_filtering/lightgcn_deep_dive.ipynb) | -| GeoIMC* | Hybrid | Matrix completion algorithm that has into account user and item features using Riemannian conjugate gradients optimization and following a geometric approach. It works in the CPU environment. | [Quick start](examples/00_quick_start/geoimc_movielens.ipynb) | +| GeoIMC* | Collaborative Filtering | Matrix completion algorithm that has into account user and item features using Riemannian conjugate gradients optimization and following a geometric approach. It works in the CPU environment. | [Quick start](examples/00_quick_start/geoimc_movielens.ipynb) | | GRU | Collaborative Filtering | Sequential-based algorithm that aims to capture both long and short-term user preferences using recurrent neural networks. It works in the CPU/GPU environment. | [Quick start](examples/00_quick_start/sequential_recsys_amazondataset.ipynb) | | Multinomial VAE | Collaborative Filtering | Generative model for predicting user/item interactions. It works in the CPU/GPU environment. | [Deep dive](examples/02_model_collaborative_filtering/multi_vae_deep_dive.ipynb) | | Neural Recommendation with Long- and Short-term User Representations (LSTUR)* | Content-Based Filtering | Neural recommendation algorithm for recommending news articles with long- and short-term user interest modeling. It works in the CPU/GPU environment. | [Quick start](examples/00_quick_start/lstur_MIND.ipynb) | @@ -108,8 +108,8 @@ The table below lists the recommender algorithms currently available in the repo | Surprise/Singular Value Decomposition (SVD) | Collaborative Filtering | Matrix factorization algorithm for predicting explicit rating feedback in small datasets. It works in the CPU/GPU environment. | [Deep dive](examples/02_model_collaborative_filtering/surprise_svd_deep_dive.ipynb) | | Term Frequency - Inverse Document Frequency (TF-IDF) | Content-Based Filtering | Simple similarity-based algorithm for content-based recommendations with text datasets. It works in the CPU environment. | [Quick start](examples/00_quick_start/tfidf_covid.ipynb) | | Vowpal Wabbit (VW)* | Content-Based Filtering | Fast online learning algorithms, great for scenarios where user features / context are constantly changing. It uses the CPU for online learning. | [Deep dive](examples/02_model_content_based_filtering/vowpal_wabbit_deep_dive.ipynb) | -| Wide and Deep | Hybrid | Deep learning algorithm that can memorize feature interactions and generalize user features. It works in the CPU/GPU environment. | [Quick start](examples/00_quick_start/wide_deep_movielens.ipynb) | -| xLearn/Factorization Machine (FM) & Field-Aware FM (FFM) | Hybrid | Quick and memory efficient algorithm to predict labels with user/item features. It works in the CPU/GPU environment. | [Deep dive](examples/02_model_hybrid/fm_deep_dive.ipynb) | +| Wide and Deep | Collaborative Filtering | Deep learning algorithm that can memorize feature interactions and generalize user features. It works in the CPU/GPU environment. | [Quick start](examples/00_quick_start/wide_deep_movielens.ipynb) | +| xLearn/Factorization Machine (FM) & Field-Aware FM (FFM) | Collaborative Filtering | Quick and memory efficient algorithm to predict labels with user/item features. It works in the CPU/GPU environment. | [Deep dive](examples/02_model_collaborative_filtering/fm_deep_dive.ipynb) | **NOTE**: * indicates algorithms invented/contributed by Microsoft. @@ -130,7 +130,7 @@ We provide a [benchmark notebook](examples/06_benchmarks/movielens.ipynb) to ill | [BPR](examples/02_model_collaborative_filtering/cornac_bpr_deep_dive.ipynb) | 0.132478 | 0.441997 | 0.388229 | 0.212522 | N/A | N/A | N/A | N/A | | [FastAI](examples/00_quick_start/fastai_movielens.ipynb) | 0.025503 | 0.147866 | 0.130329 | 0.053824 | 0.943084 | 0.744337 | 0.285308 | 0.287671 | | [LightGCN](examples/02_model_collaborative_filtering/lightgcn_deep_dive.ipynb) | 0.088526 | 0.419846 | 0.379626 | 0.144336 | N/A | N/A | N/A | N/A | -| [NCF](examples/02_model_hybrid/ncf_deep_dive.ipynb) | 0.107720 | 0.396118 | 0.347296 | 0.180775 | N/A | N/A | N/A | N/A | +| [NCF](examples/02_model_collaborative_filtering/ncf_deep_dive.ipynb) | 0.107720 | 0.396118 | 0.347296 | 0.180775 | N/A | N/A | N/A | N/A | | [SAR](examples/00_quick_start/sar_movielens.ipynb) | 0.110591 | 0.382461 | 0.330753 | 0.176385 | 1.253805 | 1.048484 | -0.569363 | 0.030474 | | [SVD](examples/02_model_collaborative_filtering/surprise_svd_deep_dive.ipynb) | 0.012873 | 0.095930 | 0.091198 | 0.032783 | 0.938681 | 0.742690 | 0.291967 | 0.291971 | diff --git a/examples/02_model_collaborative_filtering/README.md b/examples/02_model_collaborative_filtering/README.md index 658af1f6f..c4cb97cf8 100644 --- a/examples/02_model_collaborative_filtering/README.md +++ b/examples/02_model_collaborative_filtering/README.md @@ -8,6 +8,8 @@ In this directory, notebooks are provided to give a deep dive of collaborative f | [baseline_deep_dive](baseline_deep_dive.ipynb) | --- | Deep dive on baseline performance estimation. | [cornac_bivae_deep_dive](cornac_bivae_deep_dive.ipynb) | Python CPU, GPU | Deep dive on the BiVAE algorithm and implementation. | [cornac_bpr_deep_dive](cornac_bpr_deep_dive.ipynb) | Python CPU | Deep dive on the BPR algorithm and implementation. +| [fm_deep_dive](fm_deep_dive.ipynb) | Python CPU | Deep dive into factorization machine (FM) and field-aware FM (FFM) algorithm. +| [lightfm_deep_dive](lightfm_deep_dive.ipynb) | Python CPU | Deep dive into matrix factorization model with LightFM. | [lightgcn_deep_dive](lightgcn_deep_dive.ipynb) | Python CPU, GPU | Deep dive on a LightGCN algorithm and implementation. | [multi_vae_deep_dive](multi_vae_deep_dive.ipynb) | Python CPU, GPU | Deep dive on the Multinomial VAE algorithm and implementation. | [ncf_deep_dive](ncf_deep_dive.ipynb) | Python CPU, GPU | Deep dive on a NCF algorithm and implementation. diff --git a/examples/02_model_hybrid/fm_deep_dive.ipynb b/examples/02_model_collaborative_filtering/fm_deep_dive.ipynb similarity index 99% rename from examples/02_model_hybrid/fm_deep_dive.ipynb rename to examples/02_model_collaborative_filtering/fm_deep_dive.ipynb index 04d782e5a..190f3bc5d 100644 --- a/examples/02_model_hybrid/fm_deep_dive.ipynb +++ b/examples/02_model_collaborative_filtering/fm_deep_dive.ipynb @@ -15,7 +15,7 @@ "source": [ "# Factorization Machine Deep Dive\n", "\n", - "Factorization machine (FM) is one of the representative algorithms that are used for building hybrid recommenders model. The algorithm is powerful in terms of capturing the effects of not just the input features but also their interactions. The algorithm provides better generalization capability and expressiveness compared to other classic algorithms such as SVMs. The most recent research extends the basic FM algorithms by using deep learning techniques, which achieve remarkable improvement in a few practical use cases.\n", + "Factorization machine (FM) is one of the representative algorithms that are used for building recommendation model. The algorithm is powerful in terms of capturing the effects of not just the input features but also their interactions. The algorithm provides better generalization capability and expressiveness compared to other classic algorithms such as SVMs. The most recent research extends the basic FM algorithms by using deep learning techniques, which achieve remarkable improvement in a few practical use cases.\n", "\n", "This notebook presents a deep dive into the Factorization Machine algorithm, and demonstrates some best practices of using the contemporary FM implementations like [`xlearn`](https://github.com/aksnzhy/xlearn) for dealing with tasks like click-through rate prediction." ] diff --git a/examples/02_model_hybrid/lightfm_deep_dive.ipynb b/examples/02_model_collaborative_filtering/lightfm_deep_dive.ipynb similarity index 98% rename from examples/02_model_hybrid/lightfm_deep_dive.ipynb rename to examples/02_model_collaborative_filtering/lightfm_deep_dive.ipynb index 5ce4b7915..8e588760f 100755 --- a/examples/02_model_hybrid/lightfm_deep_dive.ipynb +++ b/examples/02_model_collaborative_filtering/lightfm_deep_dive.ipynb @@ -13,16 +13,16 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# LightFM - hybrid matrix factorisation on MovieLens (Python, CPU)" + "# LightFM - Factorization Machine on MovieLens (Python, CPU)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "This notebook explains the concept of a hybrid matrix factorisation based model for recommendation, it also outlines the steps to construct a pure matrix factorisation and a hybrid models using the [LightFM](https://github.com/lyst/lightfm) package. It also demonstrates how to extract both user and item affinity from a fitted hybrid model.\n", + "This notebook explains the concept of a Factorization Machine based model for recommendation, it also outlines the steps to construct a pure matrix factorization and a Factorization Machine using the [LightFM](https://github.com/lyst/lightfm) package. It also demonstrates how to extract both user and item affinity from a fitted model.\n", "\n", - "## 1. Hybrid matrix factorisation model\n", + "## 1. Factorization Machine model\n", "\n", "### 1.1 Background\n", "\n", @@ -30,27 +30,24 @@ "- Content based model,\n", "- Collaborative filtering model.\n", "\n", - "The content-based model recommends based on similarity of the items and/or users using their description/metadata/profile. On the other hand, collaborative filtering model (discussion is limited to matrix factorisation approach in this notebook) computes the latent factors of the users and items. It works based on the assumption that if a group of people expressed similar opinions on an item, these peole would tend to have similar opinions on other items. For further background and detailed explanation between these two approaches, the reader can refer to machine learning literatures [3, 4].\n", + "The content-based model recommends based on similarity of the items and/or users using their description/metadata/profile. On the other hand, collaborative filtering model (discussion is limited to matrix factorization approach in this notebook) computes the latent factors of the users and items. It works based on the assumption that if a group of people expressed similar opinions on an item, these people would tend to have similar opinions on other items. For further background and detailed explanation between these two approaches, the reader can refer to machine learning literatures [3, 4].\n", "\n", "The choice between the two models is largely based on the data availability. For example, the collaborative filtering model is usually adopted and effective when sufficient ratings/feedbacks have been recorded for a group of users and items.\n", "\n", "However, if there is a lack of ratings, content based model can be used provided that the metadata of the users and items are available. This is also the common approach to address the cold-start issues, where there are insufficient historical collaborative interactions available to model new users and/or items.\n", "\n", - "\n", + "### 1.2 Factorization Machine algorithm\n", "\n", - "### 1.2 Hybrid matrix factorisation algorithm\n", + "In view of the above problems, there have been a number of proposals to address the cold-start issues by combining both content-based and collaborative filtering approaches. The Factorization Machine model is among one of the solutions proposed [1]. \n", "\n", - "In view of the above problems, there have been a number of proposals to address the cold-start issues by combining both content-based and collaborative filtering approaches. The hybrid matrix factorisation model is among one of the solutions proposed [1]. \n", - "\n", - "In general, most hybrid approaches proposed different ways of assessing and/or combining the feature data in conjunction with the collaborative information.\n", + "In general, most approaches proposed different ways of assessing and/or combining the feature data in conjunction with the collaborative information.\n", "\n", "### 1.3 LightFM package \n", "\n", - "LightFM is a Python implementation of a hybrid recommendation algorithms for both implicit and explicit feedbacks [1].\n", + "LightFM is a Python implementation of a Factorization Machine recommendation algorithm for both implicit and explicit feedbacks [1].\n", "\n", - "It is a hybrid content-collaborative model which represents users and items as linear combinations of their content features’ latent factors. The model learns **embeddings or latent representations of the users and items in such a way that it encodes user preferences over items**. These representations produce scores for every item for a given user; items scored highly are more likely to be interesting to the user.\n", + "It is a Factorization Machine model which represents users and items as linear combinations of their content features’ latent factors. The model learns **embeddings or latent representations of the users and items in such a way that it encodes user preferences over items**. These representations produce scores for every item for a given user; items scored highly are more likely to be interesting to the user.\n", "\n", "The user and item embeddings are estimated for every feature, and these features are then added together to be the final representations for users and items. \n", "\n", @@ -1907,7 +1904,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "In this notebook, the background of hybrid matrix factorisation model has been explained together with a detailed example of LightFM's implementation. \n", + "In this notebook, the background of Factorization Machine model has been explained together with a detailed example of LightFM's implementation. \n", "\n", "The process of incorporating additional user and item metadata has also been demonstrated with performance comparison. Furthermore, the calculation of both user and item affinity scores have also been demonstrated and extracted from the fitted model.\n", "\n", diff --git a/examples/02_model_hybrid/README.md b/examples/02_model_hybrid/README.md deleted file mode 100644 index c268cc0bf..000000000 --- a/examples/02_model_hybrid/README.md +++ /dev/null @@ -1,10 +0,0 @@ -# Deep dive in hybrid algorithms - -In this directory, notebooks are provided to give a deep dive of hybrid recommendation algorithms. The notebooks make use of the utility functions ([recommenders](../../recommenders)) available in the repo. - -| Notebook | Environment | Description | -| --- | --- | --- | -| [fm_deep_dive](fm_deep_dive.ipynb) | Python CPU | Deep dive into factorization machine (FM) and field-aware FM (FFM) algorithm. -| [lightfm_deep_dive](lightfm_deep_dive.ipynb) | Python CPU | Deep dive into hybrid matrix factorisation model with LightFM. - -Details on model training are best found inside each notebook. diff --git a/examples/README.md b/examples/README.md index 10967bdca..365fcf08b 100644 --- a/examples/README.md +++ b/examples/README.md @@ -17,11 +17,11 @@ The following summarizes each directory of the best practice notebooks. | [01_prepare_data](01_prepare_data) | Yes | Data preparation notebooks for each recommender algorithm| | [02_model_collaborative_filtering](02_model_collaborative_filtering) | Yes | Deep dive notebooks about model training and evaluation using collaborative filtering algorithms | | [02_model_content_based_filtering](02_model_content_based_filtering) | Yes |Deep dive notebooks about model training and evaluation using content-based filtering algorithms | -| [02_model_hybrid](02_model_hybrid) | Yes | Deep dive notebooks about model training and evaluation using hybrid algorithms | | [03_evaluate](03_evaluate) | Yes | Notebooks that introduce different evaluation methods for recommenders | | [04_model_select_and_optimize](04_model_select_and_optimize) | Some local, some on Azure | Best practice notebooks for model tuning and selecting by using Azure Machine Learning Service and/or open source technologies | | [05_operationalize](05_operationalize) | No, Run on Azure | Operationalization notebooks that illustrate an end-to-end pipeline by using a recommender algorithm for a certain real-world use case scenario | | [06_benchmarks](06_benchmarks) | Yes | Benchmark comparison of several recommender algorithms | +| [07_tutorials](07_tutorials) | Yes | Tutorials for using the Recommenders library | ## On-premise notebooks diff --git a/tests/conftest.py b/tests/conftest.py index 7063c47fc..12c636d8f 100644 --- a/tests/conftest.py +++ b/tests/conftest.py @@ -319,10 +319,10 @@ def notebooks(): "cornac_bivae_deep_dive.ipynb", ), "xlearn_fm_deep_dive": os.path.join( - folder_notebooks, "02_model_hybrid", "fm_deep_dive.ipynb" + folder_notebooks, "02_model_collaborative_filtering", "fm_deep_dive.ipynb" ), "lightfm_deep_dive": os.path.join( - folder_notebooks, "02_model_hybrid", "lightfm_deep_dive.ipynb" + folder_notebooks, "02_model_collaborative_filtering", "lightfm_deep_dive.ipynb" ), "evaluation": os.path.join(folder_notebooks, "03_evaluate", "evaluation.ipynb"), "evaluation_diversity": os.path.join(