Skip to content

Commit

Permalink
Merge branch 'xgboost-azure' of github.com:rapidsai/deployment into x…
Browse files Browse the repository at this point in the history
…gboost-azure
  • Loading branch information
melodywang060 committed Oct 14, 2024
2 parents 622f326 + 48c4096 commit e78dfa9
Show file tree
Hide file tree
Showing 7 changed files with 15 additions and 15 deletions.
2 changes: 1 addition & 1 deletion source/cloud/azure/azure-vm.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,7 @@ Next, we can SSH into our VM to install RAPIDS. SSH instructions can be found by

### Useful Links

- [Using NGC with Azure](https://docs.nvidia.com/ngc/ngc-azure-setup-guide/index.html)
- [Using NGC with Azure](https://docs.nvidia.com/ngc/ngc-deploy-public-cloud/ngc-azure/index.html)

```{relatedexamples}
Expand Down
4 changes: 2 additions & 2 deletions source/examples/rapids-azureml-hpo/notebook.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Initialize`MLClient`[class](https://learn.microsoft.com/en-us/python/api/azure-ai-ml/azure.ai.ml.mlclient?view=azure-python) to handle the workspace you created in the prerequisites step. \n",
"Initialize `MLClient` [class](https://learn.microsoft.com/en-us/python/api/azure-ai-ml/azure.ai.ml.mlclient?view=azure-python) to handle the workspace you created in the prerequisites step. \n",
"\n",
"You can manually provide the workspace details or call `MLClient.from_config(credential, path)`\n",
"to create a workspace object from the details stored in `config.json`"
Expand Down Expand Up @@ -307,7 +307,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We'll be using a custom RAPIDS docker image to [setup the environment]((https://learn.microsoft.com/en-us/azure/machine-learning/how-to-manage-environments-v2?tabs=python#create-an-environment-from-a-docker-image). This is available in `rapidsai/rapidsai` repo on [DockerHub](https://hub.docker.com/r/rapidsai/rapidsai/).\n",
"We'll be using a custom RAPIDS docker image to [setup the environment](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-manage-environments-v2?tabs=python#create-an-environment-from-a-docker-image). This is available in `rapidsai/rapidsai` repo on [DockerHub](https://hub.docker.com/r/rapidsai/rapidsai/).\n",
"\n",
"Make sure you have the correct path to the docker build context as `os.getcwd()`,"
]
Expand Down
4 changes: 2 additions & 2 deletions source/examples/rapids-optuna-hpo/notebook.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -277,7 +277,7 @@
" \n",
"Optuna uses [studies](https://optuna.readthedocs.io/en/stable/reference/study.html) and [trials](https://optuna.readthedocs.io/en/stable/reference/trial.html) to keep track of the HPO experiments. Put simply, a trial is a single call of the objective function while a set of trials make up a study. We will pick the best observed trial from a study to get the best parameters that were used in that run.\n",
"\n",
"Here, `DaskStorage` class is used to set up a storage shared by all workers in the cluster. Learn more about what storages can be used [here](https://optuna.readthedocs.io/en/stable/tutorial/distributed.html)\n",
"Here, `DaskStorage` class is used to set up a storage shared by all workers in the cluster. Learn more about what storages can be used [here](https://optuna.readthedocs.io/en/stable/reference/storages.html)\n",
"\n",
"`optuna.create_study` is used to set up the study. As you can see, it specifies the study name, sampler to be used, the direction of the study, and the storage.\n",
"With just a few lines of code, we have set up a distributed HPO experiment."
Expand Down Expand Up @@ -347,7 +347,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Conluding Remarks\n",
"## Concluding Remarks\n",
" \n",
"This notebook shows how RAPIDS and Optuna can be used along with dask to run multi-GPU HPO jobs, and can be used as a starting point for anyone wanting to get started with the framework. We have seen how by just adding a few lines of code we were able to integrate the libraries for a muli-GPU HPO runs. This can also be scaled to multiple nodes.\n",
" \n",
Expand Down
10 changes: 5 additions & 5 deletions source/examples/rapids-sagemaker-hpo/notebook.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src='../../_static/images/examples/rapids-sagemaker-hpo/hpo.png'>"
"![](../../_static/images/examples/rapids-sagemaker-hpo/hpo.png)"
]
},
{
Expand Down Expand Up @@ -595,7 +595,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src='../../_static/images/examples/rapids-sagemaker-hpo/ml_workflow.png' width='800'> "
"![](../../_static/images/examples/rapids-sagemaker-hpo/ml_workflow.png) "
]
},
{
Expand Down Expand Up @@ -707,7 +707,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src='../../_static/images/examples/rapids-sagemaker-hpo/estimator.png' width='800'>"
"![](../../_static/images/examples/rapids-sagemaker-hpo/estimator.png)"
]
},
{
Expand Down Expand Up @@ -1477,7 +1477,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src='../../_static/images/examples/rapids-sagemaker-hpo/run_hpo.png'>"
"![](../../_static/images/examples/rapids-sagemaker-hpo/run_hpo.png)"
]
},
{
Expand Down Expand Up @@ -2186,7 +2186,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src='../../_static/images/examples/rapids-sagemaker-hpo/results.png' width='70%'>"
"![](../../_static/images/examples/rapids-sagemaker-hpo/results.png)"
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion source/platforms/coiled.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ We can also connect a Dask client to see that information for the workers too.
```python
from dask.distributed import Client

client = Client(cluster)
client = Client()
client
```

Expand Down
2 changes: 1 addition & 1 deletion source/platforms/kubeflow.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ To use Dask, we need to create a scheduler and some workers that will perform ou

### Installing the Dask Kubernetes operator

To install the operator we need to create any custom resources and the operator itself, please [refer to the documentation](https://kubernetes.dask.org/en/latest/operator_installation.html) to find up-to-date installation instructions. From the terminal run the following command.
To install the operator we need to create any custom resources and the operator itself, please [refer to the documentation](https://kubernetes.dask.org/en/latest/installing.html) to find up-to-date installation instructions. From the terminal run the following command.

```console
$ helm install --repo https://helm.dask.org --create-namespace -n dask-operator --generate-name dask-kubernetes-operator
Expand Down
6 changes: 3 additions & 3 deletions source/tools/kubernetes/dask-operator.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Dask Operator

Many libraries in RAPIDS can leverage Dask to scale out computation onto multiple GPUs and multiple nodes.
[Dask has an operator for Kubernetes](https://kubernetes.dask.org/en/latest/operator.html) which allows you to launch Dask clusters as native Kubernetes resources.
[Dask has an operator for Kubernetes](https://kubernetes.dask.org/en/latest/) which allows you to launch Dask clusters as native Kubernetes resources.

With the operator and associated Custom Resource Definitions (CRDs)
you can create `DaskCluster`, `DaskWorkerGroup` and `DaskJob` resources that describe your Dask components and the operator will
Expand Down Expand Up @@ -45,7 +45,7 @@ graph TD

Your Kubernetes cluster must have GPU nodes and have [up to date NVIDIA drivers installed](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/getting-started.html).

To install the Dask operator follow the [instructions in the Dask documentation](https://kubernetes.dask.org/en/latest/operator_installation.html).
To install the Dask operator follow the [instructions in the Dask documentation](https://kubernetes.dask.org/en/latest/installing.html).

## Configuring a RAPIDS `DaskCluster`

Expand Down Expand Up @@ -226,7 +226,7 @@ spec:
```

For the scheduler pod we are also setting the `rapidsai/base` container image, mainly to ensure our Dask versions match between
the scheduler and workers. We also disable Jupyter and ensure that the `dask-scheduler` command is configured.
the scheduler and workers. We ensure that the `dask-scheduler` command is configured.

Then we configure both the Dask communication port on `8786` and the Dask dashboard on `8787` and add some probes so that Kubernetes can monitor
the health of the scheduler.
Expand Down

0 comments on commit e78dfa9

Please sign in to comment.