Build docs for stable branch and make default (#1826)

* Update links in README to point to stable version of docs * Limit tags to only those corresponding to a release * Build docs for stable branch * Update intersphinx references to stable branch of merlin repos * Setup local branches for docs build * Update redirect to stable branch of docs * Update canonical url of docs to stable version * Replace links to main branch with stable branch * revert change to publications link
NVIDIA-Merlin · May 26, 2023 · 20f7bec · 20f7bec
1 parent 9233565
commit 20f7bec
Show file tree

Hide file tree

Showing 17 changed files with 68 additions and 61 deletions.
diff --git a/.github/workflows/docs-sched-rebuild.yaml b/.github/workflows/docs-sched-rebuild.yaml
@@ -26,6 +26,10 @@ jobs:
       - name: Install dependencies
         run: |
           python -m pip install --upgrade pip setuptools==59.4.0 wheel tox
+      - name: Setup local branches for docs build
+        run: |
+          git branch --track main origin/main || true
+          git branch --track stable origin/stable || true
       - name: Building docs (multiversion)
         run: |
           tox -e docs-multi
@@ -83,27 +87,28 @@ jobs:
             exit 0
           fi
           # If any of these commands fail, fail the build.
-          def_branch=$(gh api "repos/${GITHUB_REPOSITORY}" --jq ".default_branch")
+          def_branch="stable"
           html_url=$(gh api "repos/${GITHUB_REPOSITORY}/pages" --jq ".html_url")
-          # Beware ugly quotation mark avoidance in the foll lines.
-          echo '<!DOCTYPE html>'                                                                         > index.html
-          echo '<html>'                                                                                 >> index.html
-          echo '  <head>'                                                                               >> index.html
-          echo '    <title>Redirect to documentation</title>'                                           >> index.html
-          echo '    <meta charset="utf-8">'                                                             >> index.html
-          echo '    <meta http=equiv="refresh" content="3; URL='${html_url}${def_branch}'/index.html">' >> index.html
-          echo '    <link rel="canonical" href="'${html_url}${def_branch}'/index.html">'                >> index.html
-          echo '    <script language="javascript">'                                                     >> index.html
-          echo '      function redirect() {'                                                            >> index.html
-          echo '        window.location.assign("'${html_url}${def_branch}'/index.html")'                >> index.html
-          echo '      }'                                                                                >> index.html
-          echo '    </script>'                                                                          >> index.html
-          echo '  </head>'                                                                              >> index.html
-          echo '  <body onload="redirect()">'                                                           >> index.html
-          echo '    <p>Please follow the link to the <a href="'${html_url}${def_branch}'/index.html">'  >> index.html
-          echo      ${def_branch}'</a> branch documentation.</p>'                                       >> index.html
-          echo '  </body>'                                                                              >> index.html
-          echo '</html>'                                                                                >> index.html
+          cat > index.html << EOF
+          <!DOCTYPE html>
+          <html>
+            <head>
+              <title>Redirect to documentation</title>
+              <meta charset="utf-8">
+              <meta http=equiv="refresh" content="3; URL="${html_url}${def_branch}/index.html"
+              <link rel="canonical" href="'${html_url}${def_branch}/index.html">
+              <script language="javascript">
+                function redirect() {
+                  window.location.assign("${html_url}${def_branch}/index.html")
+                }
+              </script>
+            </head>
+            <body onload="redirect()">
+              <p>Please follow the link to the <a href="${html_url}${def_branch}/index.html">
+              ${def_branch}'</a> branch documentation.</p>
+            </body>
+          </html>
+          EOF
           git add index.html
       - name: Commit changes to the GitHub Pages branch
         run: |

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -23,7 +23,7 @@ into three categories:
 
 ### Your first issue
 
-1. Read the project's [README.md](https://github.com/nvidia/NVTabular/blob/main/README.md)
+1. Read the project's [README.md](https://github.com/nvidia/NVTabular/blob/stable/README.md)
    to learn how to setup the development environment.
 2. Find an issue to work on. The best way is to look for the
    [good first issue](https://github.com/nvidia/NVTabular/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22)

diff --git a/README.md b/README.md
@@ -1,8 +1,8 @@
 ## [NVTabular](https://github.com/NVIDIA/NVTabular)
 
 [![PyPI](https://img.shields.io/pypi/v/NVTabular?color=orange&label=version)](https://pypi.python.org/pypi/NVTabular/)
-[![LICENSE](https://img.shields.io/github/license/NVIDIA-Merlin/NVTabular)](https://github.com/NVIDIA-Merlin/NVTabular/blob/main/LICENSE)
-[![Documentation](https://img.shields.io/badge/documentation-blue.svg)](https://nvidia-merlin.github.io/NVTabular/main/Introduction.html)
+[![LICENSE](https://img.shields.io/github/license/NVIDIA-Merlin/NVTabular)](https://github.com/NVIDIA-Merlin/NVTabular/blob/stable/LICENSE)
+[![Documentation](https://img.shields.io/badge/documentation-blue.svg)](https://nvidia-merlin.github.io/NVTabular/stable/Introduction.html)
 
 NVTabular is a feature engineering and preprocessing library for tabular data that is designed to easily manipulate terabyte scale datasets and train deep learning (DL) based recommender systems. It provides high-level abstraction to simplify code and accelerates computation on the GPU using the [RAPIDS Dask-cuDF](https://github.com/rapidsai/cudf/tree/main/python/dask_cudf) library.
 
@@ -26,7 +26,7 @@ NVTabular alleviates these challenges and helps data scientists and ML engineers
 - prepare datasets quickly and easily for experimentation so that more models can be trained.
 - deploy models into production by providing faster dataset transformation
 
-Learn more in the NVTabular [core features documentation](https://nvidia-merlin.github.io/NVTabular/main/core_features.html).
+Learn more in the NVTabular [core features documentation](https://nvidia-merlin.github.io/NVTabular/stable/core_features.html).
 
 ### Performance
 
@@ -74,11 +74,11 @@ The following table summarizes the key information about the containers:
 | merlin-tensorflow | https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-tensorflow | NVTabular, Tensorflow and Triton Inference |
 | merlin-pytorch    | https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-pytorch    | NVTabular, PyTorch, and Triton Inference   |
 
-To use these Docker containers, you'll first need to install the [NVIDIA Container Toolkit](https://github.com/NVIDIA/nvidia-docker) to provide GPU support for Docker. You can use the NGC links referenced in the table above to obtain more information about how to launch and run these containers. To obtain more information about the software and model versions that NVTabular supports per container, see [Support Matrix](https://github.com/NVIDIA/NVTabular/blob/main/docs/source/resources/support_matrix.rst).
+To use these Docker containers, you'll first need to install the [NVIDIA Container Toolkit](https://github.com/NVIDIA/nvidia-docker) to provide GPU support for Docker. You can use the NGC links referenced in the table above to obtain more information about how to launch and run these containers. To obtain more information about the software and model versions that NVTabular supports per container, see [Support Matrix](https://github.com/NVIDIA/NVTabular/blob/stable/docs/source/resources/support_matrix.rst).
 
 ### Notebook Examples and Tutorials
 
-We provide a [collection of examples](https://github.com/NVIDIA-Merlin/NVTabular/tree/main/examples) to demonstrate feature engineering with NVTabular as Jupyter notebooks:
+We provide a [collection of examples](https://github.com/NVIDIA-Merlin/NVTabular/tree/stable/examples) to demonstrate feature engineering with NVTabular as Jupyter notebooks:
 
 - Introduction to NVTabular's High-Level API
 - Advanced workflows with NVTabular
@@ -87,13 +87,13 @@ We provide a [collection of examples](https://github.com/NVIDIA-Merlin/NVTabular
 
 In addition, NVTabular is used in many of our examples in other Merlin libraries:
 
-- [End-To-End Examples with Merlin](https://github.com/NVIDIA-Merlin/Merlin/tree/main/examples)
-- [Training Examples with Merlin Models](https://github.com/NVIDIA-Merlin/models/tree/main/examples)
-- [Training Examples with Transformer4Rec](https://github.com/NVIDIA-Merlin/Transformers4Rec/tree/main/examples)
+- [End-To-End Examples with Merlin](https://github.com/NVIDIA-Merlin/Merlin/tree/stable/examples)
+- [Training Examples with Merlin Models](https://github.com/NVIDIA-Merlin/models/tree/stable/examples)
+- [Training Examples with Transformer4Rec](https://github.com/NVIDIA-Merlin/Transformers4Rec/tree/stable/examples)
 
 ### Feedback and Support
 
-If you'd like to contribute to the library directly, see the [Contributing.md](https://github.com/NVIDIA/NVTabular/blob/main/CONTRIBUTING.md). We're particularly interested in contributions or feature requests for our feature engineering and preprocessing operations. To further advance our Merlin Roadmap, we encourage you to share all the details regarding your recommender system pipeline in this [survey](https://developer.nvidia.com/merlin-devzone-survey).
+If you'd like to contribute to the library directly, see the [Contributing.md](https://github.com/NVIDIA/NVTabular/blob/stable/CONTRIBUTING.md). We're particularly interested in contributions or feature requests for our feature engineering and preprocessing operations. To further advance our Merlin Roadmap, we encourage you to share all the details regarding your recommender system pipeline in this [survey](https://developer.nvidia.com/merlin-devzone-survey).
 
 If you're interested in learning more about how NVTabular works, see
-[our NVTabular documentation](https://nvidia-merlin.github.io/NVTabular/main/Introduction.html). We also have [API documentation](https://nvidia-merlin.github.io/NVTabular/main/api/index.html) that outlines the specifics of the available calls within the library.
+[our NVTabular documentation](https://nvidia-merlin.github.io/NVTabular/stable/Introduction.html). We also have [API documentation](https://nvidia-merlin.github.io/NVTabular/stable/api/index.html) that outlines the specifics of the available calls within the library.
diff --git a/bench/examples/MultiGPUBench.md b/bench/examples/MultiGPUBench.md
@@ -2,7 +2,7 @@
 
 The benchmark script described in this document is located at `NVTabular/examples/dask-nvtabular-criteo-benchmark.py`.
 
-The [multi-GPU Criteo/DLRM benchmark](https://github.com/NVIDIA/NVTabular/blob/main/examples/dask-nvtabular-criteo-benchmark.py) is designed to measure the time required to preprocess the [Criteo (1TB) dataset](https://www.kaggle.com/c/criteo-display-ad-challenge/data) for Facebook’s [DLRM model](https://github.com/facebookresearch/dlrm). The user must specify the path of the raw dataset (using the `--data-path` flag), as well as the output directory for all temporary/final data (using the `--out-path` flag).
+The [multi-GPU Criteo/DLRM benchmark](https://github.com/NVIDIA/NVTabular/blob/stable/examples/dask-nvtabular-criteo-benchmark.py) is designed to measure the time required to preprocess the [Criteo (1TB) dataset](https://www.kaggle.com/c/criteo-display-ad-challenge/data) for Facebook’s [DLRM model](https://github.com/facebookresearch/dlrm). The user must specify the path of the raw dataset (using the `--data-path` flag), as well as the output directory for all temporary/final data (using the `--out-path` flag).
 
 ### Example Usage
 
@@ -12,7 +12,7 @@ python dask-nvtabular-criteo-benchmark.py --data-path /path/to/criteo_parquet --
 
 ### Dataset Requirements (Parquet)
 
-The script is designed with a parquet-formatted dataset in mind. Although csv files can also be handled by NVTabular, converting to parquet yields significantly better performance. To convert your dataset, try using the [conversion notebook](https://github.com/NVIDIA/NVTabular/blob/main/examples/optimize_criteo.ipynb) (located at `NVTabular/examples/optimize_criteo.ipynb`).
+The script is designed with a parquet-formatted dataset in mind. Although csv files can also be handled by NVTabular, converting to parquet yields significantly better performance. To convert your dataset, try using the [conversion notebook](https://github.com/NVIDIA/NVTabular/blob/stable/examples/optimize_criteo.ipynb) (located at `NVTabular/examples/optimize_criteo.ipynb`).
 
 ### General Notes on Parameter Tuning
 

diff --git a/conda/environments/nvtabular_aws_sagemaker.yml b/conda/environments/nvtabular_aws_sagemaker.yml
@@ -1,4 +1,4 @@
-# Based on https://github.com/NVIDIA-Merlin/NVTabular/blob/main/conda/environments/nvtabular_dev_cuda11.0.yml
+# Based on https://github.com/NVIDIA-Merlin/NVTabular/blob/stable/conda/environments/nvtabular_dev_cuda11.0.yml
 name: nvtabular
 channels:
   - rapidsai

diff --git a/docs/README.md b/docs/README.md
@@ -1,7 +1,7 @@
 # Documentation
 
 This folder contains the scripts necessary to build NVTabular's documentation.
-You can view the generated [NVTabular documentation here](https://nvidia-merlin.github.io/NVTabular/main/Introduction.html).
+You can view the generated [NVTabular documentation here](https://nvidia-merlin.github.io/NVTabular/stable/Introduction.html).
 
 ## Contributing to Docs
 
@@ -66,8 +66,8 @@ that the link is broken.
   "lineno": 88,
   "status": "broken",
   "code": 0,
-  "uri": "https://github.com/NVIDIA-Merlin/Merlin/blob/main/docker/build-hadoop.sh",
-  "info": "404 Client Error: Not Found for url: https://github.com/NVIDIA-Merlin/Merlin/blob/main/docker/build-hadoop.sh"
+  "uri": "https://github.com/NVIDIA-Merlin/Merlin/blob/stable/docker/build-hadoop.sh",
+  "info": "404 Client Error: Not Found for url: https://github.com/NVIDIA-Merlin/Merlin/blob/stable/docker/build-hadoop.sh"
 }
 ```
 
@@ -127,7 +127,7 @@ the link is to the repository:
 
 ```markdown
 Refer to the sample Python programs in the
-[examples/blah](https://github.com/NVIDIA-Merlin/NVTabular/tree/main/examples/blah)
+[examples/blah](https://github.com/NVIDIA-Merlin/NVTabular/tree/stable/examples/blah)
 directory of the repository.
 ```
 
@@ -164,7 +164,7 @@ a relative path works both in the HTML docs page and in the repository browsing
 Use a link to the HTML page like the following:
 
 ```markdown
-<https://nvidia-merlin.github.io/NVTabular/main/Introduction.html>
+<https://nvidia-merlin.github.io/NVTabular/stable/Introduction.html>
 ```
 
 > I'd like to change this in the future. My preference would be to use a relative

diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -11,6 +11,7 @@
 # documentation root, use os.path.abspath to make it absolute, like shown here.
 #
 import os
+import re
 import subprocess
 import sys
 
@@ -107,19 +108,20 @@
 # at a commit (not a Git repo).
 if os.path.exists(gitdir):
     tag_refs = subprocess.check_output(["git", "tag", "-l", "v*"]).decode("utf-8").split()
+    tag_refs = [tag for tag in tag_refs if re.match(r"^v[0-9]+.[0-9]+.[0-9]+$", tag)]
     tag_refs = natsorted(tag_refs)[-6:]
     smv_tag_whitelist = r"^(" + r"|".join(tag_refs) + r")$"
 else:
     # SMV is reading conf.py from a Git archive of the repo at a specific commit.
     smv_tag_whitelist = r"^v.*$"
 
 # Only include main branch for now
-smv_branch_whitelist = "^main$"
+smv_branch_whitelist = "^(main|stable)$"
 
 smv_refs_override_suffix = "-docs"
 
 html_sidebars = {"**": ["versions.html"]}
-html_baseurl = "https://nvidia-merlin.github.io/NVTabular/main"
+html_baseurl = "https://nvidia-merlin.github.io/NVTabular/stable/"
 
 autodoc_inherit_docstrings = False
 autodoc_default_options = {
@@ -136,8 +138,8 @@
     "cudf": ("https://docs.rapids.ai/api/cudf/stable/", None),
     "distributed": ("https://distributed.dask.org/en/latest/", None),
     "torch": ("https://pytorch.org/docs/stable/", None),
-    "merlin-core": ("https://nvidia-merlin.github.io/core/main", None),
-    "merlin-systems": ("https://nvidia-merlin.github.io/systems/main", None),
+    "merlin-core": ("https://nvidia-merlin.github.io/core/stable/", None),
+    "merlin-systems": ("https://nvidia-merlin.github.io/systems/stable/", None),
 }
 
 copydirs_additional_dirs = [

diff --git a/docs/source/core_features.md b/docs/source/core_features.md
@@ -17,7 +17,7 @@ In addition to providing mechanisms for transforming the data to prepare it for
 
 ## HugeCTR Interoperability
 
-NVTabular is also capable of preprocessing datasets that can be passed to HugeCTR for training. For additional information, see the [HugeCTR Example Notebook](https://github.com/NVIDIA-Merlin/NVTabular/blob/main/examples/scaling-criteo/03-Training-with-HugeCTR.ipynb) for details about how this works.
+NVTabular is also capable of preprocessing datasets that can be passed to HugeCTR for training. For additional information, see the [HugeCTR Example Notebook](https://github.com/NVIDIA-Merlin/NVTabular/blob/stable/examples/scaling-criteo/03-Training-with-HugeCTR.ipynb) for details about how this works.
 
 ## Multi-GPU Support
 
@@ -38,7 +38,7 @@ workflow = nvt.Workflow(..., client=client)
 
 Currently, there are many ways to deploy a "cluster" for Dask. This [article](https://blog.dask.org/2020/07/23/current-state-of-distributed-dask-clusters) gives a summary of all the practical options. For a single machine with multiple GPUs, the `dask_cuda.LocalCUDACluster` API is typically the most convenient option.
 
-Since NVTabular already uses [Dask-CuDF](https://docs.rapids.ai/api/cudf/stable/) for internal data processing, there are no other requirements for multi-GPU scaling. With that said, the parallel performance can depend strongly on (1) the size of `Dataset` partitions, (2) the shuffling procedure used for data output, and (3) the specific arguments used for both global-statistics and transformation operations. For additional information, see [Multi-GPU](https://github.com/NVIDIA/NVTabular/blob/main/examples/multi-gpu-toy-example/multi-gpu_dask.ipynb) for a simple step-by-step example.
+Since NVTabular already uses [Dask-CuDF](https://docs.rapids.ai/api/cudf/stable/) for internal data processing, there are no other requirements for multi-GPU scaling. With that said, the parallel performance can depend strongly on (1) the size of `Dataset` partitions, (2) the shuffling procedure used for data output, and (3) the specific arguments used for both global-statistics and transformation operations. For additional information, see [Multi-GPU](https://github.com/NVIDIA/NVTabular/blob/stable/examples/multi-gpu-toy-example/multi-gpu_dask.ipynb) for a simple step-by-step example.
 
 ## Multi-Node Support
 

diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -14,7 +14,7 @@ Merlin NVTabular GitHub Repository
 
 About Merlin
   Merlin is the overarching project that brings together the Merlin projects.
-  See the `documentation <https://nvidia-merlin.github.io/Merlin/main/README.html>`_
+  See the `documentation <https://nvidia-merlin.github.io/Merlin/stable/README.html>`_
   or the `repository <https://github.com/NVIDIA-Merlin/Merlin>`_ on GitHub.
 
 Developer website for Merlin

diff --git a/docs/source/resources/architecture.md b/docs/source/resources/architecture.md
@@ -2,7 +2,7 @@
 
 The NVTabular engine uses the [RAPIDS](http://www.rapids.ai) [Dask-cuDF library](https://github.com/rapidsai/dask-cuda), which provides the bulk of the functionality for accelerating dataframe operations on the GPU and scaling across multiple GPUs. NVTabular provides functionality commonly found in deep learning recommendation workflows, allowing you to focus on what you want to do with your data, and not how you need to do it. NVTabular also provides a template for our core compute mechanism, which is referred to as Operations (ops), allowing you to build your own custom ops from cuDF and other libraries.
 
-Once NVTabular is installed, the next step is to define the preprocessing and feature engineering pipeline by applying the ops that you need. For additional information about installing NVTabular, see [Installation](https://nvidia-merlin.github.io/NVTabular/main/Introduction.html#installation).
+Once NVTabular is installed, the next step is to define the preprocessing and feature engineering pipeline by applying the ops that you need. For additional information about installing NVTabular, see [Installation](https://nvidia-merlin.github.io/NVTabular/stable/Introduction.html#installation).
 
 ## Operations