From af9a035362dd7424bbcab2e29113875c1bf69d37 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Tue, 21 May 2024 00:18:29 +0200 Subject: [PATCH 1/8] Ideas for contributions Signed-off-by: miguelgfierro --- CONTRIBUTING.md | 37 ++++++++++++++++++++++++++++++++++++- 1 file changed, 36 insertions(+), 1 deletion(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 217c1c900..298f8560c 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -10,7 +10,6 @@ Contributions are welcomed! Here's a few things to know: - [Contribution Guidelines](#contribution-guidelines) - [Steps to Contributing](#steps-to-contributing) - [Coding Guidelines](#coding-guidelines) - - [Microsoft Contributor License Agreement](#microsoft-contributor-license-agreement) - [Code of Conduct](#code-of-conduct) - [Do not point fingers](#do-not-point-fingers) - [Provide code feedback based on evidence](#provide-code-feedback-based-on-evidence) @@ -33,6 +32,42 @@ Here are the basic steps to get started with your first contribution. Please rea See the wiki for more details about our [merging strategy](https://github.com/microsoft/recommenders/wiki/Strategy-to-merge-the-code-to-main-branch). +## Ideas for Contributions + +### A first contribution + +For people who are new to open source or to Recommenders, a good way to start is by contribution with documentation. You can help with any of the README files or in the notebooks. + +### Datasets + +To contribute new datasets, please consider this: + +* Minimize dependencies, it's better to use `requests` library than a custom library. +* Make sure that the dataset is publicly available and that the license allows for redistribution. + +### Models + +To contribute new models, please consider this: + +* Please don't add models that are already implemented in the repo. An exception to this rule is if you are adding a more optimal implementation or you want to migrate a model from TensorFlow to PyTorch. +* Prioritize the minimal code necessary instead of adding a full library. If you add code from another repository, please make sure to follow the license and give proper credit. +* All models should be accompanied by a notebook that shows how to use the model and how to train it. The notebook should be in the [examples](examples) folder. +* The model should be tested with unit tests, and the notebooks should be tested with functional tests. + +### Metrics + +To contribute new metrics, please consider this: + +* A good way to contribute with metrics is by optimizing the code of the existing ones. +* If you are adding a new metric, please consider adding not only a CPU version, but also a PySpark version. + +### General tips + +* Minimize dependencies. Around 80% of the issues in the repo are related to dependencies. +* Prioritize PyTorch over TensorFlow. +* Avoid GPL and other viral licenses. Prioritize MIT, Apache, and other permissive licenses. + + ## Coding Guidelines We strive to maintain high quality code to make the utilities in the repository easy to understand, use, and extend. We also work hard to maintain a friendly and constructive environment. We've found that having clear expectations on the development process and consistent style helps to ensure everyone can contribute and collaborate effectively. From acbaf0fe8d079205fca6de5398c1e81d39f804a6 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Tue, 21 May 2024 00:22:56 +0200 Subject: [PATCH 2/8] Ideas for contributions :memo: Signed-off-by: miguelgfierro --- CONTRIBUTING.md | 25 +++++++++++++++++-------- 1 file changed, 17 insertions(+), 8 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 298f8560c..91a9be162 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -9,11 +9,17 @@ Contributions are welcomed! Here's a few things to know: - [Contribution Guidelines](#contribution-guidelines) - [Steps to Contributing](#steps-to-contributing) + - [Ideas for Contributions](#ideas-for-contributions) + - [A first contribution](#a-first-contribution) + - [Datasets](#datasets) + - [Models](#models) + - [Metrics](#metrics) + - [General tips](#general-tips) - [Coding Guidelines](#coding-guidelines) - [Code of Conduct](#code-of-conduct) - - [Do not point fingers](#do-not-point-fingers) - - [Provide code feedback based on evidence](#provide-code-feedback-based-on-evidence) - - [Ask questions do not give answers](#ask-questions-do-not-give-answers) + - [Do not point fingers](#do-not-point-fingers) + - [Provide code feedback based on evidence](#provide-code-feedback-based-on-evidence) + - [Ask questions do not give answers](#ask-questions-do-not-give-answers) ## Steps to Contributing @@ -38,6 +44,8 @@ See the wiki for more details about our [merging strategy](https://github.com/mi For people who are new to open source or to Recommenders, a good way to start is by contribution with documentation. You can help with any of the README files or in the notebooks. +For more advanced users, consider fixing one of the bugs listed in the issues. + ### Datasets To contribute new datasets, please consider this: @@ -65,8 +73,7 @@ To contribute new metrics, please consider this: * Minimize dependencies. Around 80% of the issues in the repo are related to dependencies. * Prioritize PyTorch over TensorFlow. -* Avoid GPL and other viral licenses. Prioritize MIT, Apache, and other permissive licenses. - +* Avoid adding code with GPL and other viral licenses. Prioritize MIT, Apache, and other permissive licenses. ## Coding Guidelines @@ -74,9 +81,11 @@ We strive to maintain high quality code to make the utilities in the repository Please review the [coding guidelines](https://github.com/recommenders-team/recommenders/wiki/Coding-Guidelines) wiki page to see more details about the expectations for development approach and style. +## Code of Conduct + Apart from the official [Code of Conduct](CODE_OF_CONDUCT.md), in Recommenders team we adopt the following behaviors, to ensure a great working environment: -#### Do not point fingers +### Do not point fingers Let’s be constructive.
@@ -86,7 +95,7 @@ Let’s be constructive.
-#### Provide code feedback based on evidence +### Provide code feedback based on evidence When making code reviews, try to support your ideas based on evidence (papers, library documentation, stackoverflow, etc) rather than your personal preferences. @@ -97,7 +106,7 @@ When making code reviews, try to support your ideas based on evidence (papers, l -#### Ask questions do not give answers +### Ask questions do not give answers Try to be empathic.
From 385746c148a30cb40010c2e07d4e52ca6c1e7c5a Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Tue, 21 May 2024 00:24:43 +0200 Subject: [PATCH 3/8] :bug: Signed-off-by: miguelgfierro --- CONTRIBUTING.md | 25 ++++++++++++------------- 1 file changed, 12 insertions(+), 13 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 91a9be162..110108bdd 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -7,19 +7,18 @@ Licensed under the MIT License. Contributions are welcomed! Here's a few things to know: -- [Contribution Guidelines](#contribution-guidelines) - - [Steps to Contributing](#steps-to-contributing) - - [Ideas for Contributions](#ideas-for-contributions) - - [A first contribution](#a-first-contribution) - - [Datasets](#datasets) - - [Models](#models) - - [Metrics](#metrics) - - [General tips](#general-tips) - - [Coding Guidelines](#coding-guidelines) - - [Code of Conduct](#code-of-conduct) - - [Do not point fingers](#do-not-point-fingers) - - [Provide code feedback based on evidence](#provide-code-feedback-based-on-evidence) - - [Ask questions do not give answers](#ask-questions-do-not-give-answers) +- [Steps to Contributing](#steps-to-contributing) +- [Ideas for Contributions](#ideas-for-contributions) + - [A first contribution](#a-first-contribution) + - [Datasets](#datasets) + - [Models](#models) + - [Metrics](#metrics) + - [General tips](#general-tips) +- [Coding Guidelines](#coding-guidelines) +- [Code of Conduct](#code-of-conduct) + - [Do not point fingers](#do-not-point-fingers) + - [Provide code feedback based on evidence](#provide-code-feedback-based-on-evidence) + - [Ask questions do not give answers](#ask-questions-do-not-give-answers) ## Steps to Contributing From b43ea7bb131a7f27c2856bf618a71bf9d56aae6a Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Tue, 21 May 2024 00:34:54 +0200 Subject: [PATCH 4/8] :memo: Signed-off-by: miguelgfierro --- CONTRIBUTING.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 110108bdd..e2bb4c576 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -67,6 +67,7 @@ To contribute new metrics, please consider this: * A good way to contribute with metrics is by optimizing the code of the existing ones. * If you are adding a new metric, please consider adding not only a CPU version, but also a PySpark version. +* When adding the tests, make sure you check for the limits. For example, if you add an error metric, check that the error between two identical datasets is zero. ### General tips @@ -78,7 +79,7 @@ To contribute new metrics, please consider this: We strive to maintain high quality code to make the utilities in the repository easy to understand, use, and extend. We also work hard to maintain a friendly and constructive environment. We've found that having clear expectations on the development process and consistent style helps to ensure everyone can contribute and collaborate effectively. -Please review the [coding guidelines](https://github.com/recommenders-team/recommenders/wiki/Coding-Guidelines) wiki page to see more details about the expectations for development approach and style. +Please review the [Coding Guidelines](https://github.com/recommenders-team/recommenders/wiki/Coding-Guidelines) wiki page to see more details about the expectations for development approach and style. ## Code of Conduct @@ -101,7 +102,7 @@ When making code reviews, try to support your ideas based on evidence (papers, l
Click here to see some examples -"When reviewing this code, I saw that the Python implementation the metrics are based on classes, however, [scikit-learn](https://scikit-learn.org/stable/modules/classes.html#sklearn-metrics-metrics) and [tensorflow](https://www.tensorflow.org/api_docs/python/tf/metrics) use functions. We should follow the standard in the industry." +"When reviewing this code, I saw that the Python implementation of the metrics are based on classes, however, [scikit-learn](https://scikit-learn.org/stable/modules/classes.html#sklearn-metrics-metrics) use functions. We should follow the standard in the industry."
From 87546e6bbb4cdf805ecd130edb27e805284b0f64 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Tue, 21 May 2024 11:51:55 +0200 Subject: [PATCH 5/8] Feedback @anargyri Signed-off-by: miguelgfierro --- CONTRIBUTING.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index e2bb4c576..9d3d8f4a3 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -71,9 +71,10 @@ To contribute new metrics, please consider this: ### General tips -* Minimize dependencies. Around 80% of the issues in the repo are related to dependencies. * Prioritize PyTorch over TensorFlow. +* Minimize dependencies. Around 80% of the issues in the repo are related to dependencies. * Avoid adding code with GPL and other viral licenses. Prioritize MIT, Apache, and other permissive licenses. +* Add the copyright statement at the beginning of the file: `Copyright (c) Recommenders contributors. Licensed under the MIT License.` ## Coding Guidelines From 0976df459db26f817306910f85eb3e69109ce207 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Wed, 22 May 2024 10:37:34 +0200 Subject: [PATCH 6/8] :memo: Signed-off-by: miguelgfierro --- CONTRIBUTING.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 9d3d8f4a3..4d25db4e3 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -73,7 +73,7 @@ To contribute new metrics, please consider this: * Prioritize PyTorch over TensorFlow. * Minimize dependencies. Around 80% of the issues in the repo are related to dependencies. -* Avoid adding code with GPL and other viral licenses. Prioritize MIT, Apache, and other permissive licenses. +* Avoid adding code with GPL and other copyleft licenses. Prioritize MIT, Apache, and other permissive licenses. * Add the copyright statement at the beginning of the file: `Copyright (c) Recommenders contributors. Licensed under the MIT License.` ## Coding Guidelines From 1da9670ed778ad756755e77006fd30123f4abd4f Mon Sep 17 00:00:00 2001 From: Martin Date: Sun, 26 May 2024 13:53:23 +0800 Subject: [PATCH 7/8] Update cornac_bivae_deep_dive.ipynb: fix typos --- .../cornac_bivae_deep_dive.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/02_model_collaborative_filtering/cornac_bivae_deep_dive.ipynb b/examples/02_model_collaborative_filtering/cornac_bivae_deep_dive.ipynb index 731ab0c12..fb432ccca 100644 --- a/examples/02_model_collaborative_filtering/cornac_bivae_deep_dive.ipynb +++ b/examples/02_model_collaborative_filtering/cornac_bivae_deep_dive.ipynb @@ -610,7 +610,7 @@ "source": [ "## 4 Discussion\n", "\n", - "BiVAE is a new variational autoencoder tailored for dyadic data, where observations consist of measurements associated with two sets of objects, e.g., users, items and corresponding ratings. The model is symmetric, which makes it easier to extend axiliary data from both sides of users and items. In addition to preference data, the model can be applied to other types of dyadic data such as documentword matrices, and other tasks such as co-clustering. \n", + "BiVAE is a new variational autoencoder tailored for dyadic data, where observations consist of measurements associated with two sets of objects, e.g., users, items and corresponding ratings. The model is symmetric, which makes it easier to extend auxiliary data from both sides of users and items. In addition to preference data, the model can be applied to other types of dyadic data such as document-word matrices, and other tasks such as co-clustering. \n", "\n", "In the paper, there is also a discussion on Constrained Adaptive Priors (CAP), a proposed method to build informative priors to mitigate the well-known posterior collapse problem. We have left out that part purposely, not to distract the audiences. Nevertheless, it is very interesting and worth taking a look. \n", "\n", From a92b31a644c91550e696a0a800073e8689de3ea7 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Mon, 3 Jun 2024 16:58:06 +0200 Subject: [PATCH 8/8] breaking change in sklearn in log_loss :boom::boom: Signed-off-by: miguelgfierro --- examples/00_quick_start/lightgbm_tinycriteo.ipynb | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/examples/00_quick_start/lightgbm_tinycriteo.ipynb b/examples/00_quick_start/lightgbm_tinycriteo.ipynb index f7a786415..ffd827eac 100644 --- a/examples/00_quick_start/lightgbm_tinycriteo.ipynb +++ b/examples/00_quick_start/lightgbm_tinycriteo.ipynb @@ -717,7 +717,7 @@ "source": [ "test_preds = lgb_model.predict(test_x)\n", "auc = roc_auc_score(np.asarray(test_y.reshape(-1)), np.asarray(test_preds))\n", - "logloss = log_loss(np.asarray(test_y.reshape(-1)), np.asarray(test_preds), eps=1e-12)\n", + "logloss = log_loss(np.asarray(test_y.reshape(-1)), np.asarray(test_preds))\n", "res_basic = {\"auc\": auc, \"logloss\": logloss}\n", "print(res_basic)\n" ] @@ -904,7 +904,7 @@ ], "source": [ "auc = roc_auc_score(np.asarray(test_y.reshape(-1)), np.asarray(test_preds))\n", - "logloss = log_loss(np.asarray(test_y.reshape(-1)), np.asarray(test_preds), eps=1e-12)\n", + "logloss = log_loss(np.asarray(test_y.reshape(-1)), np.asarray(test_preds))\n", "res_optim = {\"auc\": auc, \"logloss\": logloss}\n", "\n", "print(res_optim)" @@ -959,7 +959,7 @@ ], "source": [ "auc = roc_auc_score(np.asarray(test_y.reshape(-1)), np.asarray(test_preds))\n", - "logloss = log_loss(np.asarray(test_y.reshape(-1)), np.asarray(test_preds), eps=1e-12)\n", + "logloss = log_loss(np.asarray(test_y.reshape(-1)), np.asarray(test_preds))\n", "\n", "print({\"auc\": auc, \"logloss\": logloss})" ]