Skip to content

Commit

Permalink
Build docs for stable branch and make default (#1826)
Browse files Browse the repository at this point in the history
* Update links in README to point to stable version of docs

* Limit tags to only those corresponding to a release

* Build docs for stable branch

* Update intersphinx references to stable branch of merlin repos

* Setup local branches for docs build

* Update redirect to stable branch of docs

* Update canonical url of docs to stable version

* Replace links to main branch with stable branch

* revert change to publications link
  • Loading branch information
oliverholworthy committed May 26, 2023
1 parent 9233565 commit 20f7bec
Show file tree
Hide file tree
Showing 17 changed files with 68 additions and 61 deletions.
45 changes: 25 additions & 20 deletions .github/workflows/docs-sched-rebuild.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,10 @@ jobs:
- name: Install dependencies
run: |
python -m pip install --upgrade pip setuptools==59.4.0 wheel tox
- name: Setup local branches for docs build
run: |
git branch --track main origin/main || true
git branch --track stable origin/stable || true
- name: Building docs (multiversion)
run: |
tox -e docs-multi
Expand Down Expand Up @@ -83,27 +87,28 @@ jobs:
exit 0
fi
# If any of these commands fail, fail the build.
def_branch=$(gh api "repos/${GITHUB_REPOSITORY}" --jq ".default_branch")
def_branch="stable"
html_url=$(gh api "repos/${GITHUB_REPOSITORY}/pages" --jq ".html_url")
# Beware ugly quotation mark avoidance in the foll lines.
echo '<!DOCTYPE html>' > index.html
echo '<html>' >> index.html
echo ' <head>' >> index.html
echo ' <title>Redirect to documentation</title>' >> index.html
echo ' <meta charset="utf-8">' >> index.html
echo ' <meta http=equiv="refresh" content="3; URL='${html_url}${def_branch}'/index.html">' >> index.html
echo ' <link rel="canonical" href="'${html_url}${def_branch}'/index.html">' >> index.html
echo ' <script language="javascript">' >> index.html
echo ' function redirect() {' >> index.html
echo ' window.location.assign("'${html_url}${def_branch}'/index.html")' >> index.html
echo ' }' >> index.html
echo ' </script>' >> index.html
echo ' </head>' >> index.html
echo ' <body onload="redirect()">' >> index.html
echo ' <p>Please follow the link to the <a href="'${html_url}${def_branch}'/index.html">' >> index.html
echo ${def_branch}'</a> branch documentation.</p>' >> index.html
echo ' </body>' >> index.html
echo '</html>' >> index.html
cat > index.html << EOF
<!DOCTYPE html>
<html>
<head>
<title>Redirect to documentation</title>
<meta charset="utf-8">
<meta http=equiv="refresh" content="3; URL="${html_url}${def_branch}/index.html"
<link rel="canonical" href="'${html_url}${def_branch}/index.html">
<script language="javascript">
function redirect() {
window.location.assign("${html_url}${def_branch}/index.html")
}
</script>
</head>
<body onload="redirect()">
<p>Please follow the link to the <a href="${html_url}${def_branch}/index.html">
${def_branch}'</a> branch documentation.</p>
</body>
</html>
EOF
git add index.html
- name: Commit changes to the GitHub Pages branch
run: |
Expand Down
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ into three categories:

### Your first issue

1. Read the project's [README.md](https://github.com/nvidia/NVTabular/blob/main/README.md)
1. Read the project's [README.md](https://github.com/nvidia/NVTabular/blob/stable/README.md)
to learn how to setup the development environment.
2. Find an issue to work on. The best way is to look for the
[good first issue](https://github.com/nvidia/NVTabular/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22)
Expand Down
20 changes: 10 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
## [NVTabular](https://github.com/NVIDIA/NVTabular)

[![PyPI](https://img.shields.io/pypi/v/NVTabular?color=orange&label=version)](https://pypi.python.org/pypi/NVTabular/)
[![LICENSE](https://img.shields.io/github/license/NVIDIA-Merlin/NVTabular)](https://github.com/NVIDIA-Merlin/NVTabular/blob/main/LICENSE)
[![Documentation](https://img.shields.io/badge/documentation-blue.svg)](https://nvidia-merlin.github.io/NVTabular/main/Introduction.html)
[![LICENSE](https://img.shields.io/github/license/NVIDIA-Merlin/NVTabular)](https://github.com/NVIDIA-Merlin/NVTabular/blob/stable/LICENSE)
[![Documentation](https://img.shields.io/badge/documentation-blue.svg)](https://nvidia-merlin.github.io/NVTabular/stable/Introduction.html)

NVTabular is a feature engineering and preprocessing library for tabular data that is designed to easily manipulate terabyte scale datasets and train deep learning (DL) based recommender systems. It provides high-level abstraction to simplify code and accelerates computation on the GPU using the [RAPIDS Dask-cuDF](https://github.com/rapidsai/cudf/tree/main/python/dask_cudf) library.

Expand All @@ -26,7 +26,7 @@ NVTabular alleviates these challenges and helps data scientists and ML engineers
- prepare datasets quickly and easily for experimentation so that more models can be trained.
- deploy models into production by providing faster dataset transformation

Learn more in the NVTabular [core features documentation](https://nvidia-merlin.github.io/NVTabular/main/core_features.html).
Learn more in the NVTabular [core features documentation](https://nvidia-merlin.github.io/NVTabular/stable/core_features.html).

### Performance

Expand Down Expand Up @@ -74,11 +74,11 @@ The following table summarizes the key information about the containers:
| merlin-tensorflow | https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-tensorflow | NVTabular, Tensorflow and Triton Inference |
| merlin-pytorch | https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-pytorch | NVTabular, PyTorch, and Triton Inference |

To use these Docker containers, you'll first need to install the [NVIDIA Container Toolkit](https://github.com/NVIDIA/nvidia-docker) to provide GPU support for Docker. You can use the NGC links referenced in the table above to obtain more information about how to launch and run these containers. To obtain more information about the software and model versions that NVTabular supports per container, see [Support Matrix](https://github.com/NVIDIA/NVTabular/blob/main/docs/source/resources/support_matrix.rst).
To use these Docker containers, you'll first need to install the [NVIDIA Container Toolkit](https://github.com/NVIDIA/nvidia-docker) to provide GPU support for Docker. You can use the NGC links referenced in the table above to obtain more information about how to launch and run these containers. To obtain more information about the software and model versions that NVTabular supports per container, see [Support Matrix](https://github.com/NVIDIA/NVTabular/blob/stable/docs/source/resources/support_matrix.rst).

### Notebook Examples and Tutorials

We provide a [collection of examples](https://github.com/NVIDIA-Merlin/NVTabular/tree/main/examples) to demonstrate feature engineering with NVTabular as Jupyter notebooks:
We provide a [collection of examples](https://github.com/NVIDIA-Merlin/NVTabular/tree/stable/examples) to demonstrate feature engineering with NVTabular as Jupyter notebooks:

- Introduction to NVTabular's High-Level API
- Advanced workflows with NVTabular
Expand All @@ -87,13 +87,13 @@ We provide a [collection of examples](https://github.com/NVIDIA-Merlin/NVTabular

In addition, NVTabular is used in many of our examples in other Merlin libraries:

- [End-To-End Examples with Merlin](https://github.com/NVIDIA-Merlin/Merlin/tree/main/examples)
- [Training Examples with Merlin Models](https://github.com/NVIDIA-Merlin/models/tree/main/examples)
- [Training Examples with Transformer4Rec](https://github.com/NVIDIA-Merlin/Transformers4Rec/tree/main/examples)
- [End-To-End Examples with Merlin](https://github.com/NVIDIA-Merlin/Merlin/tree/stable/examples)
- [Training Examples with Merlin Models](https://github.com/NVIDIA-Merlin/models/tree/stable/examples)
- [Training Examples with Transformer4Rec](https://github.com/NVIDIA-Merlin/Transformers4Rec/tree/stable/examples)

### Feedback and Support

If you'd like to contribute to the library directly, see the [Contributing.md](https://github.com/NVIDIA/NVTabular/blob/main/CONTRIBUTING.md). We're particularly interested in contributions or feature requests for our feature engineering and preprocessing operations. To further advance our Merlin Roadmap, we encourage you to share all the details regarding your recommender system pipeline in this [survey](https://developer.nvidia.com/merlin-devzone-survey).
If you'd like to contribute to the library directly, see the [Contributing.md](https://github.com/NVIDIA/NVTabular/blob/stable/CONTRIBUTING.md). We're particularly interested in contributions or feature requests for our feature engineering and preprocessing operations. To further advance our Merlin Roadmap, we encourage you to share all the details regarding your recommender system pipeline in this [survey](https://developer.nvidia.com/merlin-devzone-survey).

If you're interested in learning more about how NVTabular works, see
[our NVTabular documentation](https://nvidia-merlin.github.io/NVTabular/main/Introduction.html). We also have [API documentation](https://nvidia-merlin.github.io/NVTabular/main/api/index.html) that outlines the specifics of the available calls within the library.
[our NVTabular documentation](https://nvidia-merlin.github.io/NVTabular/stable/Introduction.html). We also have [API documentation](https://nvidia-merlin.github.io/NVTabular/stable/api/index.html) that outlines the specifics of the available calls within the library.
4 changes: 2 additions & 2 deletions bench/examples/MultiGPUBench.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

The benchmark script described in this document is located at `NVTabular/examples/dask-nvtabular-criteo-benchmark.py`.

The [multi-GPU Criteo/DLRM benchmark](https://github.com/NVIDIA/NVTabular/blob/main/examples/dask-nvtabular-criteo-benchmark.py) is designed to measure the time required to preprocess the [Criteo (1TB) dataset](https://www.kaggle.com/c/criteo-display-ad-challenge/data) for Facebook’s [DLRM model](https://github.com/facebookresearch/dlrm). The user must specify the path of the raw dataset (using the `--data-path` flag), as well as the output directory for all temporary/final data (using the `--out-path` flag).
The [multi-GPU Criteo/DLRM benchmark](https://github.com/NVIDIA/NVTabular/blob/stable/examples/dask-nvtabular-criteo-benchmark.py) is designed to measure the time required to preprocess the [Criteo (1TB) dataset](https://www.kaggle.com/c/criteo-display-ad-challenge/data) for Facebook’s [DLRM model](https://github.com/facebookresearch/dlrm). The user must specify the path of the raw dataset (using the `--data-path` flag), as well as the output directory for all temporary/final data (using the `--out-path` flag).

### Example Usage

Expand All @@ -12,7 +12,7 @@ python dask-nvtabular-criteo-benchmark.py --data-path /path/to/criteo_parquet --

### Dataset Requirements (Parquet)

The script is designed with a parquet-formatted dataset in mind. Although csv files can also be handled by NVTabular, converting to parquet yields significantly better performance. To convert your dataset, try using the [conversion notebook](https://github.com/NVIDIA/NVTabular/blob/main/examples/optimize_criteo.ipynb) (located at `NVTabular/examples/optimize_criteo.ipynb`).
The script is designed with a parquet-formatted dataset in mind. Although csv files can also be handled by NVTabular, converting to parquet yields significantly better performance. To convert your dataset, try using the [conversion notebook](https://github.com/NVIDIA/NVTabular/blob/stable/examples/optimize_criteo.ipynb) (located at `NVTabular/examples/optimize_criteo.ipynb`).

### General Notes on Parameter Tuning

Expand Down
2 changes: 1 addition & 1 deletion conda/environments/nvtabular_aws_sagemaker.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Based on https://github.com/NVIDIA-Merlin/NVTabular/blob/main/conda/environments/nvtabular_dev_cuda11.0.yml
# Based on https://github.com/NVIDIA-Merlin/NVTabular/blob/stable/conda/environments/nvtabular_dev_cuda11.0.yml
name: nvtabular
channels:
- rapidsai
Expand Down
10 changes: 5 additions & 5 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Documentation

This folder contains the scripts necessary to build NVTabular's documentation.
You can view the generated [NVTabular documentation here](https://nvidia-merlin.github.io/NVTabular/main/Introduction.html).
You can view the generated [NVTabular documentation here](https://nvidia-merlin.github.io/NVTabular/stable/Introduction.html).

## Contributing to Docs

Expand Down Expand Up @@ -66,8 +66,8 @@ that the link is broken.
"lineno": 88,
"status": "broken",
"code": 0,
"uri": "https://github.com/NVIDIA-Merlin/Merlin/blob/main/docker/build-hadoop.sh",
"info": "404 Client Error: Not Found for url: https://github.com/NVIDIA-Merlin/Merlin/blob/main/docker/build-hadoop.sh"
"uri": "https://github.com/NVIDIA-Merlin/Merlin/blob/stable/docker/build-hadoop.sh",
"info": "404 Client Error: Not Found for url: https://github.com/NVIDIA-Merlin/Merlin/blob/stable/docker/build-hadoop.sh"
}
```

Expand Down Expand Up @@ -127,7 +127,7 @@ the link is to the repository:

```markdown
Refer to the sample Python programs in the
[examples/blah](https://github.com/NVIDIA-Merlin/NVTabular/tree/main/examples/blah)
[examples/blah](https://github.com/NVIDIA-Merlin/NVTabular/tree/stable/examples/blah)
directory of the repository.
```

Expand Down Expand Up @@ -164,7 +164,7 @@ a relative path works both in the HTML docs page and in the repository browsing
Use a link to the HTML page like the following:

```markdown
<https://nvidia-merlin.github.io/NVTabular/main/Introduction.html>
<https://nvidia-merlin.github.io/NVTabular/stable/Introduction.html>
```

> I'd like to change this in the future. My preference would be to use a relative
Expand Down
10 changes: 6 additions & 4 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
import os
import re
import subprocess
import sys

Expand Down Expand Up @@ -107,19 +108,20 @@
# at a commit (not a Git repo).
if os.path.exists(gitdir):
tag_refs = subprocess.check_output(["git", "tag", "-l", "v*"]).decode("utf-8").split()
tag_refs = [tag for tag in tag_refs if re.match(r"^v[0-9]+.[0-9]+.[0-9]+$", tag)]
tag_refs = natsorted(tag_refs)[-6:]
smv_tag_whitelist = r"^(" + r"|".join(tag_refs) + r")$"
else:
# SMV is reading conf.py from a Git archive of the repo at a specific commit.
smv_tag_whitelist = r"^v.*$"

# Only include main branch for now
smv_branch_whitelist = "^main$"
smv_branch_whitelist = "^(main|stable)$"

smv_refs_override_suffix = "-docs"

html_sidebars = {"**": ["versions.html"]}
html_baseurl = "https://nvidia-merlin.github.io/NVTabular/main"
html_baseurl = "https://nvidia-merlin.github.io/NVTabular/stable/"

autodoc_inherit_docstrings = False
autodoc_default_options = {
Expand All @@ -136,8 +138,8 @@
"cudf": ("https://docs.rapids.ai/api/cudf/stable/", None),
"distributed": ("https://distributed.dask.org/en/latest/", None),
"torch": ("https://pytorch.org/docs/stable/", None),
"merlin-core": ("https://nvidia-merlin.github.io/core/main", None),
"merlin-systems": ("https://nvidia-merlin.github.io/systems/main", None),
"merlin-core": ("https://nvidia-merlin.github.io/core/stable/", None),
"merlin-systems": ("https://nvidia-merlin.github.io/systems/stable/", None),
}

copydirs_additional_dirs = [
Expand Down
4 changes: 2 additions & 2 deletions docs/source/core_features.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ In addition to providing mechanisms for transforming the data to prepare it for

## HugeCTR Interoperability

NVTabular is also capable of preprocessing datasets that can be passed to HugeCTR for training. For additional information, see the [HugeCTR Example Notebook](https://github.com/NVIDIA-Merlin/NVTabular/blob/main/examples/scaling-criteo/03-Training-with-HugeCTR.ipynb) for details about how this works.
NVTabular is also capable of preprocessing datasets that can be passed to HugeCTR for training. For additional information, see the [HugeCTR Example Notebook](https://github.com/NVIDIA-Merlin/NVTabular/blob/stable/examples/scaling-criteo/03-Training-with-HugeCTR.ipynb) for details about how this works.

## Multi-GPU Support

Expand All @@ -38,7 +38,7 @@ workflow = nvt.Workflow(..., client=client)

Currently, there are many ways to deploy a "cluster" for Dask. This [article](https://blog.dask.org/2020/07/23/current-state-of-distributed-dask-clusters) gives a summary of all the practical options. For a single machine with multiple GPUs, the `dask_cuda.LocalCUDACluster` API is typically the most convenient option.

Since NVTabular already uses [Dask-CuDF](https://docs.rapids.ai/api/cudf/stable/) for internal data processing, there are no other requirements for multi-GPU scaling. With that said, the parallel performance can depend strongly on (1) the size of `Dataset` partitions, (2) the shuffling procedure used for data output, and (3) the specific arguments used for both global-statistics and transformation operations. For additional information, see [Multi-GPU](https://github.com/NVIDIA/NVTabular/blob/main/examples/multi-gpu-toy-example/multi-gpu_dask.ipynb) for a simple step-by-step example.
Since NVTabular already uses [Dask-CuDF](https://docs.rapids.ai/api/cudf/stable/) for internal data processing, there are no other requirements for multi-GPU scaling. With that said, the parallel performance can depend strongly on (1) the size of `Dataset` partitions, (2) the shuffling procedure used for data output, and (3) the specific arguments used for both global-statistics and transformation operations. For additional information, see [Multi-GPU](https://github.com/NVIDIA/NVTabular/blob/stable/examples/multi-gpu-toy-example/multi-gpu_dask.ipynb) for a simple step-by-step example.

## Multi-Node Support

Expand Down
2 changes: 1 addition & 1 deletion docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ Merlin NVTabular GitHub Repository

About Merlin
Merlin is the overarching project that brings together the Merlin projects.
See the `documentation <https://nvidia-merlin.github.io/Merlin/main/README.html>`_
See the `documentation <https://nvidia-merlin.github.io/Merlin/stable/README.html>`_
or the `repository <https://github.com/NVIDIA-Merlin/Merlin>`_ on GitHub.

Developer website for Merlin
Expand Down
2 changes: 1 addition & 1 deletion docs/source/resources/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

The NVTabular engine uses the [RAPIDS](http://www.rapids.ai) [Dask-cuDF library](https://github.com/rapidsai/dask-cuda), which provides the bulk of the functionality for accelerating dataframe operations on the GPU and scaling across multiple GPUs. NVTabular provides functionality commonly found in deep learning recommendation workflows, allowing you to focus on what you want to do with your data, and not how you need to do it. NVTabular also provides a template for our core compute mechanism, which is referred to as Operations (ops), allowing you to build your own custom ops from cuDF and other libraries.

Once NVTabular is installed, the next step is to define the preprocessing and feature engineering pipeline by applying the ops that you need. For additional information about installing NVTabular, see [Installation](https://nvidia-merlin.github.io/NVTabular/main/Introduction.html#installation).
Once NVTabular is installed, the next step is to define the preprocessing and feature engineering pipeline by applying the ops that you need. For additional information about installing NVTabular, see [Installation](https://nvidia-merlin.github.io/NVTabular/stable/Introduction.html#installation).

## Operations

Expand Down
Loading

0 comments on commit 20f7bec

Please sign in to comment.