From 20f7bec62e63df56c0d432585b868b1616c72e91 Mon Sep 17 00:00:00 2001 From: Oliver Holworthy Date: Fri, 26 May 2023 15:31:09 +0100 Subject: [PATCH] Build docs for stable branch and make default (#1826) * Update links in README to point to stable version of docs * Limit tags to only those corresponding to a release * Build docs for stable branch * Update intersphinx references to stable branch of merlin repos * Setup local branches for docs build * Update redirect to stable branch of docs * Update canonical url of docs to stable version * Replace links to main branch with stable branch * revert change to publications link --- .github/workflows/docs-sched-rebuild.yaml | 45 ++++++++++--------- CONTRIBUTING.md | 2 +- README.md | 20 ++++----- bench/examples/MultiGPUBench.md | 4 +- .../environments/nvtabular_aws_sagemaker.yml | 2 +- docs/README.md | 10 ++--- docs/source/conf.py | 10 +++-- docs/source/core_features.md | 4 +- docs/source/index.rst | 2 +- docs/source/resources/architecture.md | 2 +- docs/source/resources/cloud_integration.md | 4 +- docs/source/resources/troubleshooting.md | 4 +- docs/source/training/hugectr.rst | 2 +- examples/01-Getting-started.ipynb | 4 +- examples/02-Advanced-NVTabular-workflow.ipynb | 4 +- examples/README.md | 8 ++-- nvtabular/ops/hash_bucket.py | 2 +- 17 files changed, 68 insertions(+), 61 deletions(-) diff --git a/.github/workflows/docs-sched-rebuild.yaml b/.github/workflows/docs-sched-rebuild.yaml index 38f4cd05b5b..8a4404776e3 100644 --- a/.github/workflows/docs-sched-rebuild.yaml +++ b/.github/workflows/docs-sched-rebuild.yaml @@ -26,6 +26,10 @@ jobs: - name: Install dependencies run: | python -m pip install --upgrade pip setuptools==59.4.0 wheel tox + - name: Setup local branches for docs build + run: | + git branch --track main origin/main || true + git branch --track stable origin/stable || true - name: Building docs (multiversion) run: | tox -e docs-multi @@ -83,27 +87,28 @@ jobs: exit 0 fi # If any of these commands fail, fail the build. - def_branch=$(gh api "repos/${GITHUB_REPOSITORY}" --jq ".default_branch") + def_branch="stable" html_url=$(gh api "repos/${GITHUB_REPOSITORY}/pages" --jq ".html_url") - # Beware ugly quotation mark avoidance in the foll lines. - echo '' > index.html - echo '' >> index.html - echo ' ' >> index.html - echo ' Redirect to documentation' >> index.html - echo ' ' >> index.html - echo ' ' >> index.html - echo ' ' >> index.html - echo ' ' >> index.html - echo ' ' >> index.html - echo ' ' >> index.html - echo '

Please follow the link to the ' >> index.html - echo ${def_branch}' branch documentation.

' >> index.html - echo ' ' >> index.html - echo '' >> index.html + cat > index.html << EOF + + + + Redirect to documentation + + + + + +

Please follow the link to the + ${def_branch}' branch documentation.

+ + + EOF git add index.html - name: Commit changes to the GitHub Pages branch run: | diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 0ae1a4a31b2..f5751b54290 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -23,7 +23,7 @@ into three categories: ### Your first issue -1. Read the project's [README.md](https://github.com/nvidia/NVTabular/blob/main/README.md) +1. Read the project's [README.md](https://github.com/nvidia/NVTabular/blob/stable/README.md) to learn how to setup the development environment. 2. Find an issue to work on. The best way is to look for the [good first issue](https://github.com/nvidia/NVTabular/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22) diff --git a/README.md b/README.md index bdc6210517b..ee66613c9fc 100644 --- a/README.md +++ b/README.md @@ -1,8 +1,8 @@ ## [NVTabular](https://github.com/NVIDIA/NVTabular) [![PyPI](https://img.shields.io/pypi/v/NVTabular?color=orange&label=version)](https://pypi.python.org/pypi/NVTabular/) -[![LICENSE](https://img.shields.io/github/license/NVIDIA-Merlin/NVTabular)](https://github.com/NVIDIA-Merlin/NVTabular/blob/main/LICENSE) -[![Documentation](https://img.shields.io/badge/documentation-blue.svg)](https://nvidia-merlin.github.io/NVTabular/main/Introduction.html) +[![LICENSE](https://img.shields.io/github/license/NVIDIA-Merlin/NVTabular)](https://github.com/NVIDIA-Merlin/NVTabular/blob/stable/LICENSE) +[![Documentation](https://img.shields.io/badge/documentation-blue.svg)](https://nvidia-merlin.github.io/NVTabular/stable/Introduction.html) NVTabular is a feature engineering and preprocessing library for tabular data that is designed to easily manipulate terabyte scale datasets and train deep learning (DL) based recommender systems. It provides high-level abstraction to simplify code and accelerates computation on the GPU using the [RAPIDS Dask-cuDF](https://github.com/rapidsai/cudf/tree/main/python/dask_cudf) library. @@ -26,7 +26,7 @@ NVTabular alleviates these challenges and helps data scientists and ML engineers - prepare datasets quickly and easily for experimentation so that more models can be trained. - deploy models into production by providing faster dataset transformation -Learn more in the NVTabular [core features documentation](https://nvidia-merlin.github.io/NVTabular/main/core_features.html). +Learn more in the NVTabular [core features documentation](https://nvidia-merlin.github.io/NVTabular/stable/core_features.html). ### Performance @@ -74,11 +74,11 @@ The following table summarizes the key information about the containers: | merlin-tensorflow | https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-tensorflow | NVTabular, Tensorflow and Triton Inference | | merlin-pytorch | https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-pytorch | NVTabular, PyTorch, and Triton Inference | -To use these Docker containers, you'll first need to install the [NVIDIA Container Toolkit](https://github.com/NVIDIA/nvidia-docker) to provide GPU support for Docker. You can use the NGC links referenced in the table above to obtain more information about how to launch and run these containers. To obtain more information about the software and model versions that NVTabular supports per container, see [Support Matrix](https://github.com/NVIDIA/NVTabular/blob/main/docs/source/resources/support_matrix.rst). +To use these Docker containers, you'll first need to install the [NVIDIA Container Toolkit](https://github.com/NVIDIA/nvidia-docker) to provide GPU support for Docker. You can use the NGC links referenced in the table above to obtain more information about how to launch and run these containers. To obtain more information about the software and model versions that NVTabular supports per container, see [Support Matrix](https://github.com/NVIDIA/NVTabular/blob/stable/docs/source/resources/support_matrix.rst). ### Notebook Examples and Tutorials -We provide a [collection of examples](https://github.com/NVIDIA-Merlin/NVTabular/tree/main/examples) to demonstrate feature engineering with NVTabular as Jupyter notebooks: +We provide a [collection of examples](https://github.com/NVIDIA-Merlin/NVTabular/tree/stable/examples) to demonstrate feature engineering with NVTabular as Jupyter notebooks: - Introduction to NVTabular's High-Level API - Advanced workflows with NVTabular @@ -87,13 +87,13 @@ We provide a [collection of examples](https://github.com/NVIDIA-Merlin/NVTabular In addition, NVTabular is used in many of our examples in other Merlin libraries: -- [End-To-End Examples with Merlin](https://github.com/NVIDIA-Merlin/Merlin/tree/main/examples) -- [Training Examples with Merlin Models](https://github.com/NVIDIA-Merlin/models/tree/main/examples) -- [Training Examples with Transformer4Rec](https://github.com/NVIDIA-Merlin/Transformers4Rec/tree/main/examples) +- [End-To-End Examples with Merlin](https://github.com/NVIDIA-Merlin/Merlin/tree/stable/examples) +- [Training Examples with Merlin Models](https://github.com/NVIDIA-Merlin/models/tree/stable/examples) +- [Training Examples with Transformer4Rec](https://github.com/NVIDIA-Merlin/Transformers4Rec/tree/stable/examples) ### Feedback and Support -If you'd like to contribute to the library directly, see the [Contributing.md](https://github.com/NVIDIA/NVTabular/blob/main/CONTRIBUTING.md). We're particularly interested in contributions or feature requests for our feature engineering and preprocessing operations. To further advance our Merlin Roadmap, we encourage you to share all the details regarding your recommender system pipeline in this [survey](https://developer.nvidia.com/merlin-devzone-survey). +If you'd like to contribute to the library directly, see the [Contributing.md](https://github.com/NVIDIA/NVTabular/blob/stable/CONTRIBUTING.md). We're particularly interested in contributions or feature requests for our feature engineering and preprocessing operations. To further advance our Merlin Roadmap, we encourage you to share all the details regarding your recommender system pipeline in this [survey](https://developer.nvidia.com/merlin-devzone-survey). If you're interested in learning more about how NVTabular works, see -[our NVTabular documentation](https://nvidia-merlin.github.io/NVTabular/main/Introduction.html). We also have [API documentation](https://nvidia-merlin.github.io/NVTabular/main/api/index.html) that outlines the specifics of the available calls within the library. +[our NVTabular documentation](https://nvidia-merlin.github.io/NVTabular/stable/Introduction.html). We also have [API documentation](https://nvidia-merlin.github.io/NVTabular/stable/api/index.html) that outlines the specifics of the available calls within the library. diff --git a/bench/examples/MultiGPUBench.md b/bench/examples/MultiGPUBench.md index f0461b53d82..8b31756c360 100644 --- a/bench/examples/MultiGPUBench.md +++ b/bench/examples/MultiGPUBench.md @@ -2,7 +2,7 @@ The benchmark script described in this document is located at `NVTabular/examples/dask-nvtabular-criteo-benchmark.py`. -The [multi-GPU Criteo/DLRM benchmark](https://github.com/NVIDIA/NVTabular/blob/main/examples/dask-nvtabular-criteo-benchmark.py) is designed to measure the time required to preprocess the [Criteo (1TB) dataset](https://www.kaggle.com/c/criteo-display-ad-challenge/data) for Facebook’s [DLRM model](https://github.com/facebookresearch/dlrm). The user must specify the path of the raw dataset (using the `--data-path` flag), as well as the output directory for all temporary/final data (using the `--out-path` flag). +The [multi-GPU Criteo/DLRM benchmark](https://github.com/NVIDIA/NVTabular/blob/stable/examples/dask-nvtabular-criteo-benchmark.py) is designed to measure the time required to preprocess the [Criteo (1TB) dataset](https://www.kaggle.com/c/criteo-display-ad-challenge/data) for Facebook’s [DLRM model](https://github.com/facebookresearch/dlrm). The user must specify the path of the raw dataset (using the `--data-path` flag), as well as the output directory for all temporary/final data (using the `--out-path` flag). ### Example Usage @@ -12,7 +12,7 @@ python dask-nvtabular-criteo-benchmark.py --data-path /path/to/criteo_parquet -- ### Dataset Requirements (Parquet) -The script is designed with a parquet-formatted dataset in mind. Although csv files can also be handled by NVTabular, converting to parquet yields significantly better performance. To convert your dataset, try using the [conversion notebook](https://github.com/NVIDIA/NVTabular/blob/main/examples/optimize_criteo.ipynb) (located at `NVTabular/examples/optimize_criteo.ipynb`). +The script is designed with a parquet-formatted dataset in mind. Although csv files can also be handled by NVTabular, converting to parquet yields significantly better performance. To convert your dataset, try using the [conversion notebook](https://github.com/NVIDIA/NVTabular/blob/stable/examples/optimize_criteo.ipynb) (located at `NVTabular/examples/optimize_criteo.ipynb`). ### General Notes on Parameter Tuning diff --git a/conda/environments/nvtabular_aws_sagemaker.yml b/conda/environments/nvtabular_aws_sagemaker.yml index f1d0380eeb3..098fbbf4782 100644 --- a/conda/environments/nvtabular_aws_sagemaker.yml +++ b/conda/environments/nvtabular_aws_sagemaker.yml @@ -1,4 +1,4 @@ -# Based on https://github.com/NVIDIA-Merlin/NVTabular/blob/main/conda/environments/nvtabular_dev_cuda11.0.yml +# Based on https://github.com/NVIDIA-Merlin/NVTabular/blob/stable/conda/environments/nvtabular_dev_cuda11.0.yml name: nvtabular channels: - rapidsai diff --git a/docs/README.md b/docs/README.md index 3da002d6b2a..487813f2c5d 100644 --- a/docs/README.md +++ b/docs/README.md @@ -1,7 +1,7 @@ # Documentation This folder contains the scripts necessary to build NVTabular's documentation. -You can view the generated [NVTabular documentation here](https://nvidia-merlin.github.io/NVTabular/main/Introduction.html). +You can view the generated [NVTabular documentation here](https://nvidia-merlin.github.io/NVTabular/stable/Introduction.html). ## Contributing to Docs @@ -66,8 +66,8 @@ that the link is broken. "lineno": 88, "status": "broken", "code": 0, - "uri": "https://github.com/NVIDIA-Merlin/Merlin/blob/main/docker/build-hadoop.sh", - "info": "404 Client Error: Not Found for url: https://github.com/NVIDIA-Merlin/Merlin/blob/main/docker/build-hadoop.sh" + "uri": "https://github.com/NVIDIA-Merlin/Merlin/blob/stable/docker/build-hadoop.sh", + "info": "404 Client Error: Not Found for url: https://github.com/NVIDIA-Merlin/Merlin/blob/stable/docker/build-hadoop.sh" } ``` @@ -127,7 +127,7 @@ the link is to the repository: ```markdown Refer to the sample Python programs in the -[examples/blah](https://github.com/NVIDIA-Merlin/NVTabular/tree/main/examples/blah) +[examples/blah](https://github.com/NVIDIA-Merlin/NVTabular/tree/stable/examples/blah) directory of the repository. ``` @@ -164,7 +164,7 @@ a relative path works both in the HTML docs page and in the repository browsing Use a link to the HTML page like the following: ```markdown - + ``` > I'd like to change this in the future. My preference would be to use a relative diff --git a/docs/source/conf.py b/docs/source/conf.py index a55b3ecfeea..4e2f54d9d6e 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -11,6 +11,7 @@ # documentation root, use os.path.abspath to make it absolute, like shown here. # import os +import re import subprocess import sys @@ -107,6 +108,7 @@ # at a commit (not a Git repo). if os.path.exists(gitdir): tag_refs = subprocess.check_output(["git", "tag", "-l", "v*"]).decode("utf-8").split() + tag_refs = [tag for tag in tag_refs if re.match(r"^v[0-9]+.[0-9]+.[0-9]+$", tag)] tag_refs = natsorted(tag_refs)[-6:] smv_tag_whitelist = r"^(" + r"|".join(tag_refs) + r")$" else: @@ -114,12 +116,12 @@ smv_tag_whitelist = r"^v.*$" # Only include main branch for now -smv_branch_whitelist = "^main$" +smv_branch_whitelist = "^(main|stable)$" smv_refs_override_suffix = "-docs" html_sidebars = {"**": ["versions.html"]} -html_baseurl = "https://nvidia-merlin.github.io/NVTabular/main" +html_baseurl = "https://nvidia-merlin.github.io/NVTabular/stable/" autodoc_inherit_docstrings = False autodoc_default_options = { @@ -136,8 +138,8 @@ "cudf": ("https://docs.rapids.ai/api/cudf/stable/", None), "distributed": ("https://distributed.dask.org/en/latest/", None), "torch": ("https://pytorch.org/docs/stable/", None), - "merlin-core": ("https://nvidia-merlin.github.io/core/main", None), - "merlin-systems": ("https://nvidia-merlin.github.io/systems/main", None), + "merlin-core": ("https://nvidia-merlin.github.io/core/stable/", None), + "merlin-systems": ("https://nvidia-merlin.github.io/systems/stable/", None), } copydirs_additional_dirs = [ diff --git a/docs/source/core_features.md b/docs/source/core_features.md index 6ac0132ed95..aff10c1a803 100644 --- a/docs/source/core_features.md +++ b/docs/source/core_features.md @@ -17,7 +17,7 @@ In addition to providing mechanisms for transforming the data to prepare it for ## HugeCTR Interoperability -NVTabular is also capable of preprocessing datasets that can be passed to HugeCTR for training. For additional information, see the [HugeCTR Example Notebook](https://github.com/NVIDIA-Merlin/NVTabular/blob/main/examples/scaling-criteo/03-Training-with-HugeCTR.ipynb) for details about how this works. +NVTabular is also capable of preprocessing datasets that can be passed to HugeCTR for training. For additional information, see the [HugeCTR Example Notebook](https://github.com/NVIDIA-Merlin/NVTabular/blob/stable/examples/scaling-criteo/03-Training-with-HugeCTR.ipynb) for details about how this works. ## Multi-GPU Support @@ -38,7 +38,7 @@ workflow = nvt.Workflow(..., client=client) Currently, there are many ways to deploy a "cluster" for Dask. This [article](https://blog.dask.org/2020/07/23/current-state-of-distributed-dask-clusters) gives a summary of all the practical options. For a single machine with multiple GPUs, the `dask_cuda.LocalCUDACluster` API is typically the most convenient option. -Since NVTabular already uses [Dask-CuDF](https://docs.rapids.ai/api/cudf/stable/) for internal data processing, there are no other requirements for multi-GPU scaling. With that said, the parallel performance can depend strongly on (1) the size of `Dataset` partitions, (2) the shuffling procedure used for data output, and (3) the specific arguments used for both global-statistics and transformation operations. For additional information, see [Multi-GPU](https://github.com/NVIDIA/NVTabular/blob/main/examples/multi-gpu-toy-example/multi-gpu_dask.ipynb) for a simple step-by-step example. +Since NVTabular already uses [Dask-CuDF](https://docs.rapids.ai/api/cudf/stable/) for internal data processing, there are no other requirements for multi-GPU scaling. With that said, the parallel performance can depend strongly on (1) the size of `Dataset` partitions, (2) the shuffling procedure used for data output, and (3) the specific arguments used for both global-statistics and transformation operations. For additional information, see [Multi-GPU](https://github.com/NVIDIA/NVTabular/blob/stable/examples/multi-gpu-toy-example/multi-gpu_dask.ipynb) for a simple step-by-step example. ## Multi-Node Support diff --git a/docs/source/index.rst b/docs/source/index.rst index ad4ba428a5d..3fd2ba95bb4 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -14,7 +14,7 @@ Merlin NVTabular GitHub Repository About Merlin Merlin is the overarching project that brings together the Merlin projects. - See the `documentation `_ + See the `documentation `_ or the `repository `_ on GitHub. Developer website for Merlin diff --git a/docs/source/resources/architecture.md b/docs/source/resources/architecture.md index b044255f6f1..e1df02d4301 100644 --- a/docs/source/resources/architecture.md +++ b/docs/source/resources/architecture.md @@ -2,7 +2,7 @@ The NVTabular engine uses the [RAPIDS](http://www.rapids.ai) [Dask-cuDF library](https://github.com/rapidsai/dask-cuda), which provides the bulk of the functionality for accelerating dataframe operations on the GPU and scaling across multiple GPUs. NVTabular provides functionality commonly found in deep learning recommendation workflows, allowing you to focus on what you want to do with your data, and not how you need to do it. NVTabular also provides a template for our core compute mechanism, which is referred to as Operations (ops), allowing you to build your own custom ops from cuDF and other libraries. -Once NVTabular is installed, the next step is to define the preprocessing and feature engineering pipeline by applying the ops that you need. For additional information about installing NVTabular, see [Installation](https://nvidia-merlin.github.io/NVTabular/main/Introduction.html#installation). +Once NVTabular is installed, the next step is to define the preprocessing and feature engineering pipeline by applying the ops that you need. For additional information about installing NVTabular, see [Installation](https://nvidia-merlin.github.io/NVTabular/stable/Introduction.html#installation). ## Operations diff --git a/docs/source/resources/cloud_integration.md b/docs/source/resources/cloud_integration.md index 0e7137a63eb..65e9d62f2fd 100644 --- a/docs/source/resources/cloud_integration.md +++ b/docs/source/resources/cloud_integration.md @@ -165,7 +165,7 @@ To run NVTabular on Databricks, do the following: [AWS SageMaker](https://aws.amazon.com/sagemaker/) is a service from AWS to "build, train and deploy machine learning" models. It automates and manages the MLOps workflow. It supports jupyter notebook instances enabling users to work directly in jupyter notebook/jupyter lab without any additional configurations. In this section, we will explain how to run NVIDIA Merlin (NVTabular) on AWS SageMaker notebook instances. We adopted the work from [Eugene](https://twitter.com/eugeneyan/) from his [twitter post](https://twitter.com/eugeneyan/status/1470916049604268035). We tested the workflow on February, 1st, 2022, but it is not integrated into our CI workflows. Future release of Merlin or Merlin's dependencies can cause issues. -To run the [movielens example](https://github.com/NVIDIA-Merlin/NVTabular/tree/main/examples/getting-started-movielens) on AWS SageMaker, do the following: +To run the [movielens example](https://github.com/NVIDIA-Merlin/NVTabular/tree/stable/examples/getting-started-movielens) on AWS SageMaker, do the following: 1. Login into your AWS console and select AWS SageMaker. @@ -213,6 +213,6 @@ conda install -y torchmetrics ipykernel python -m ipykernel install --user --name=nvtabular ``` -11. You can switch in jupyter lab and run the [movielens example](https://github.com/NVIDIA-Merlin/NVTabular/tree/main/examples/getting-started-movielens). +11. You can switch in jupyter lab and run the [movielens example](https://github.com/NVIDIA-Merlin/NVTabular/tree/stable/examples/getting-started-movielens). This workflow enables NVTabular ETL and training with TensorFlow or Pytorch. Deployment with Triton Inference Server will follow soon. diff --git a/docs/source/resources/troubleshooting.md b/docs/source/resources/troubleshooting.md index b0ec83b997f..6b4416bc529 100644 --- a/docs/source/resources/troubleshooting.md +++ b/docs/source/resources/troubleshooting.md @@ -30,7 +30,7 @@ issue](https://github.com/NVIDIA/NVTabular/issues/429). ## Reducing Memory Consumption for NVTabular Workflows -NVTabular is designed to scale to larger than GPU or host memory datasets. In our experiments, we are able to [scale to 1.3TB of uncompressed click logs](https://github.com/NVIDIA/NVTabular/tree/main/examples/scaling-criteo). However, some workflows can result in OOM errors `cudaErrorMemoryAllocation out of memory`, which can be addressed by small configuration changes. +NVTabular is designed to scale to larger than GPU or host memory datasets. In our experiments, we are able to [scale to 1.3TB of uncompressed click logs](https://github.com/NVIDIA/NVTabular/tree/stable/examples/scaling-criteo). However, some workflows can result in OOM errors `cudaErrorMemoryAllocation out of memory`, which can be addressed by small configuration changes. ### Setting the Row Group Size for the Parquet Files @@ -102,7 +102,7 @@ This is usually caused by string columns in parquet files. If you encounter this The example notebooks in the repository are developed and tested with the latest [Merlin containers](https://catalog.ngc.nvidia.com/containers?filters=&orderBy=dateModifiedDESC&query=merlin) that are available from the NGC Catalog. If you run the example notebooks in an environment that has a different version of TensorFlow, you might experience an out-of-memory condition that requires you to perform additional configuration. -The version of TensorFlow in each Merlin container is available from the [support matrix](https://nvidia-merlin.github.io/Merlin/main/support_matrix/index.html) in the Merlin documentation. +The version of TensorFlow in each Merlin container is available from the [support matrix](https://nvidia-merlin.github.io/Merlin/stable/support_matrix/index.html) in the Merlin documentation. TensorFlow 2.8 uses `cuda_malloc_async` as the default GPU memory allocation function. TensorFlow specifies the function in the `TF_GPU_ALLOCATOR` environment variable. diff --git a/docs/source/training/hugectr.rst b/docs/source/training/hugectr.rst index 5674b5b1bec..9c83cbbbc1d 100644 --- a/docs/source/training/hugectr.rst +++ b/docs/source/training/hugectr.rst @@ -126,5 +126,5 @@ When training is accelerated with HugeCTR, the following happens: metrics = sess.evaluation() print("[HUGECTR][INFO] iter: {}, {}".format(i, metrics)) -For more information, refer to the `HugeCTR documentation `_ +For more information, refer to the `HugeCTR documentation `_ or the `HugeCTR repository `_ on GitHub. diff --git a/examples/01-Getting-started.ipynb b/examples/01-Getting-started.ipynb index ee68e0318d7..29e8de747c4 100644 --- a/examples/01-Getting-started.ipynb +++ b/examples/01-Getting-started.ipynb @@ -357,7 +357,7 @@ "\n", "Additionally, we tag the `rating` column with appropriate tags. This will allow other components of the Merlin Framework to use this information and minimize the code we will have to write to perform complex operations such as training or serving a Deep Learning model.\n", "\n", - "If you would like to learn more about using `Tags`, take a look at the [NVTabular and Merlin Models integrated example](https://nvidia-merlin.github.io/models/main/examples/02-Merlin-Models-and-NVTabular-integration.html) notebook in the Merlin Models [repository](https://github.com/NVIDIA-Merlin/models)." + "If you would like to learn more about using `Tags`, take a look at the [NVTabular and Merlin Models integrated example](https://nvidia-merlin.github.io/models/stable/examples/02-Merlin-Models-and-NVTabular-integration.html) notebook in the Merlin Models [repository](https://github.com/NVIDIA-Merlin/models)." ] }, { @@ -567,7 +567,7 @@ "source": [ "Let's finish off this notebook with training a DLRM (a Deep Learning Recommendation Model introduced in [Deep Learning Recommendation Model for Personalization and Recommendation Systems](https://arxiv.org/abs/1906.00091)) on our preprocessed data.\n", "\n", - "To learn more about the integration between NVTabular and Merlin Models, please see the [NVTabular and Merlin Models integrated example](https://nvidia-merlin.github.io/models/main/examples/02-Merlin-Models-and-NVTabular-integration.html) in the Merlin Models [repository](https://github.com/NVIDIA-Merlin/models)." + "To learn more about the integration between NVTabular and Merlin Models, please see the [NVTabular and Merlin Models integrated example](https://nvidia-merlin.github.io/models/stable/examples/02-Merlin-Models-and-NVTabular-integration.html) in the Merlin Models [repository](https://github.com/NVIDIA-Merlin/models)." ] }, { diff --git a/examples/02-Advanced-NVTabular-workflow.ipynb b/examples/02-Advanced-NVTabular-workflow.ipynb index 6ba16486fde..bb97df931f6 100644 --- a/examples/02-Advanced-NVTabular-workflow.ipynb +++ b/examples/02-Advanced-NVTabular-workflow.ipynb @@ -1058,9 +1058,9 @@ "source": [ "We are using NVTabular in our Merlin repositories to preprocess and engineer features before training. We can recommend multiple examples, which show complex pipelines with NVTabular:\n", "\n", - "* [This notebook](https://github.com/NVIDIA-Merlin/models/blob/main/examples/usecases/ecommerce-session-based-next-item-prediction-for-fashion.ipynb) demonstrates how to aggregate data -- we are going from multiple rows of session information to a single row describing a session.\n", + "* [This notebook](https://github.com/NVIDIA-Merlin/models/blob/stable/examples/usecases/ecommerce-session-based-next-item-prediction-for-fashion.ipynb) demonstrates how to aggregate data -- we are going from multiple rows of session information to a single row describing a session.\n", "\n", - "* The [following notebook](https://github.com/NVIDIA-Merlin/Transformers4Rec/blob/main/examples/end-to-end-session-based/01-ETL-with-NVTabular.ipynb) demonstrates how to derive features from timestamps and how to define a custom operator (`ItemRecency`).\n", + "* The [following notebook](https://github.com/NVIDIA-Merlin/Transformers4Rec/blob/stable/examples/end-to-end-session-based/01-ETL-with-NVTabular.ipynb) demonstrates how to derive features from timestamps and how to define a custom operator (`ItemRecency`).\n", "\n", "- In [this notebook](https://github.com/NVIDIA-Merlin/publications/blob/main/tutorials/RecSys22tutorial/02-Implementing-RecSys-architectures.ipynb), among other functionality, you can familiarize yourself with using `TargetEncoding`, filling missing values, `Normalization` as well as adding various `MetaData`. Do note, this notebook is not maintained and thus there might be discrepancies between it and the most up to date version of NVTabular." ] diff --git a/examples/README.md b/examples/README.md index 2701d86db48..497046d8989 100644 --- a/examples/README.md +++ b/examples/README.md @@ -12,9 +12,9 @@ In this library, we provide a collection of Jupyter notebooks, which demonstrate In addition, NVTabular is used in many of our examples in other Merlin libraries. You can explore more complex processing pipelines in following examples: -- [End-To-End Examples with Merlin](https://github.com/NVIDIA-Merlin/Merlin/tree/main/examples) -- [Training Examples with Merlin Models](https://github.com/NVIDIA-Merlin/models/tree/main/examples) -- [Training Examples with Transformer4Rec](https://github.com/NVIDIA-Merlin/Transformers4Rec/tree/main/examples) +- [End-To-End Examples with Merlin](https://github.com/NVIDIA-Merlin/Merlin/tree/stable/examples) +- [Training Examples with Merlin Models](https://github.com/NVIDIA-Merlin/models/tree/stable/examples) +- [Training Examples with Transformer4Rec](https://github.com/NVIDIA-Merlin/Transformers4Rec/tree/stable/examples) ## Running the Example Notebooks @@ -65,4 +65,4 @@ To run the example notebooks using Docker containers, perform the following step ## Troubleshooting -If you experience any trouble running the example notebooks, check the latest [troubleshooting](https://nvidia-merlin.github.io/NVTabular/main/resources/troubleshooting.html) documentation. +If you experience any trouble running the example notebooks, check the latest [troubleshooting](https://nvidia-merlin.github.io/NVTabular/stable/resources/troubleshooting.html) documentation. diff --git a/nvtabular/ops/hash_bucket.py b/nvtabular/ops/hash_bucket.py index dc41c52e781..ceaba72c9a4 100644 --- a/nvtabular/ops/hash_bucket.py +++ b/nvtabular/ops/hash_bucket.py @@ -52,7 +52,7 @@ class HashBucket(Operator): If you would like to do frequency capping or frequency hashing, you should use Categorify op instead. See - `Categorify op `_ + `Categorify op `_ for example usage.