Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure Infiniband Updated Ubuntu Versions #462

Merged
merged 19 commits into from
Oct 14, 2024
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions source/_includes/check-gpu-pod-works.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Let's create a sample pod that uses some GPU compute to make sure that everything is working as expected.

```console
$ cat << EOF | kubectl create -f -
```bash
cat << EOF | kubectl create -f -
apiVersion: v1
kind: Pod
metadata:
Expand Down
8 changes: 4 additions & 4 deletions source/cloud/azure/aks.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,8 @@ $ az login

Now we can launch a GPU enabled AKS cluster. First launch an AKS cluster.

```console
$ az aks create -g <resource group> -n rapids \
```bash
az aks create -g <resource group> -n rapids \
--enable-managed-identity \
--node-count 1 \
--enable-addons monitoring \
Expand Down Expand Up @@ -91,8 +91,8 @@ $ az extension add --name aks-preview

`````

```console
$ az aks nodepool add \
```bash
az aks nodepool add \
melodywang060 marked this conversation as resolved.
Show resolved Hide resolved
--resource-group <resource group> \
--cluster-name rapids \
--name gpunp \
Expand Down
2 changes: 1 addition & 1 deletion source/cloud/azure/azureml.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ The compute instance provides an integrated Jupyter notebook service, JupyterLab

Sign in to [Azure Machine Learning Studio](https://ml.azure.com/) and navigate to your workspace on the left-side menu.

Select **Compute** > **+ New** > choose a [RAPIDS compatible GPU](https://medium.com/dropout-analytics/which-gpus-work-with-rapids-ai-f562ef29c75f) VM size (e.g., `Standard_NC12s_v3`)
Select **Compute** > **+ New** (Create compute instance) > choose a [RAPIDS compatible GPU](https://medium.com/dropout-analytics/which-gpus-work-with-rapids-ai-f562ef29c75f) VM size (e.g., `Standard_NC12s_v3`)

![Screenshot of create new notebook with a gpu-instance](../../images/azureml-create-notebook-instance.png)

Expand Down
12 changes: 8 additions & 4 deletions source/examples/rapids-azureml-hpo/notebook.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
]
},
"source": [
"# Train and Hyperparameter-Tune with RAPIDS"
"# Train and Hyperparameter-Tune with RAPIDS on AzureML"
]
},
{
Expand Down Expand Up @@ -97,12 +97,16 @@
"from azure.ai.ml import MLClient\n",
"from azure.identity import DefaultAzureCredential\n",
"\n",
"subscription_id = \"FILL IN WITH YOUR AZURE ML CREDENTIALS\"\n",
"resource_group_name = \"FILL IN WITH YOUR AZURE ML CREDENTIALS\"\n",
"workspace_name = \"FILL IN WITH YOUR AZURE ML CREDENTIALS\"\n",
"\n",
"# Get a handle to the workspace\n",
"ml_client = MLClient(\n",
" credential=DefaultAzureCredential(),\n",
" subscription_id=\"fc4f4a6b-4041-4b1c-8249-854d68edcf62\",\n",
" resource_group_name=\"rapidsai-deployment\",\n",
" workspace_name=\"rapids-aml-cluster\",\n",
" subscription_id=subscription_id,\n",
" resource_group_name=resource_group_name,\n",
" workspace_name=workspace_name,\n",
")\n",
"\n",
"print(\n",
Expand Down
21 changes: 14 additions & 7 deletions source/guides/azure/infiniband.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@ for demonstration.
- Select `East US` region.
- Change `Availability options` to `Availability set` and create a set.
- If building multiple instances put additional instances in the same set.
- Use the 2nd Gen Ubuntu 20.04 image.
- Search all images for `Ubuntu Server 20.04` and choose the second one down on the list.
- Use the 2nd Gen Ubuntu 24.04 image.
- Search all images for `Ubuntu Server 24.04` and choose the second one down on the list.
- Change size to `ND40rs_v2`.
- Set password login with credentials.
- User `someuser`
Expand All @@ -39,8 +39,8 @@ The commands below should work for Ubuntu. See the [CUDA Toolkit documentation](
```shell
sudo apt-get install -y linux-headers-$(uname -r)
distribution=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e 's/\.//g')
wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb
wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get -y install cuda-drivers
```
Expand Down Expand Up @@ -118,11 +118,11 @@ Mon Nov 14 20:32:39 2022

### InfiniBand Driver

On Ubuntu 20.04
On Ubuntu 24.04

```shell
sudo apt-get install -y automake dh-make git libcap2 libnuma-dev libtool make pkg-config udev curl librdmacm-dev rdma-core \
libgfortran5 bison chrpath flex graphviz gfortran tk dpatch quilt swig tcl ibverbs-utils
libgfortran5 bison chrpath flex graphviz gfortran tk quilt swig tcl ibverbs-utils
```

Check install
Expand Down Expand Up @@ -247,7 +247,14 @@ wget https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforg
bash Mambaforge-Linux-x86_64.sh
```

Accept the default and allow conda init to run. Then start a new shell.
Accept the default and allow conda init to run.

```shell
~/mambaforge/bin/conda init

melodywang060 marked this conversation as resolved.
Show resolved Hide resolved
```

Then start a new shell.

Create a conda environment (see [UCX-Py](https://ucx-py.readthedocs.io/en/latest/install.html) docs)

Expand Down