From 578544f910634daddccfb5c9bc237bafbc5c7899 Mon Sep 17 00:00:00 2001
From: Eda Johnson <edemiraydin@users.noreply.github.com>
Date: Tue, 8 Oct 2024 06:56:53 -0500
Subject: [PATCH 01/21] Update README.md

NIMOperator instructions added
---
 cloud-service-providers/aws/eks/README.md | 4 ++++
 1 file changed, 4 insertions(+)
diff --git a/cloud-service-providers/aws/eks/README.md b/cloud-service-providers/aws/eks/README.md
index 8cc05a4..c8ad3c0 100644
--- a/cloud-service-providers/aws/eks/README.md
+++ b/cloud-service-providers/aws/eks/README.md
@@ -29,6 +29,10 @@ Note: Ensure that you are in the cloud-service-providers/aws/eks directory for t
 
 ## Cluster setup for inference:
 
+You can either use the NIM Helm Chart or NIM Operator. If you would like to use the NIM Operator, please see the instructions here. 
+
+To install the NIM Helm Chart, please follow the steps below:
+
 1: Install NVIDIA Device Plugin: Install NVIDIA device plugins to run GPU workloads. Check CUDA base image version for compatibility.
 
     kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.14.1/nvidia-device-plugin.yml

From 8e1e53dd2663d3d8b3df37a3045cd50e759a97d6 Mon Sep 17 00:00:00 2001
From: Eda Johnson <edemiraydin@users.noreply.github.com>
Date: Tue, 8 Oct 2024 07:14:32 -0500
Subject: [PATCH 02/21] Create nim-operator-setup.md

NIM operator set up instructions added
---
 .../aws/eks/nim-operator-setup.md             | 166 ++++++++++++++++++
 1 file changed, 166 insertions(+)
 create mode 100644 cloud-service-providers/aws/eks/nim-operator-setup.md

diff --git a/cloud-service-providers/aws/eks/nim-operator-setup.md b/cloud-service-providers/aws/eks/nim-operator-setup.md
new file mode 100644
index 0000000..b46d3c8
--- /dev/null
+++ b/cloud-service-providers/aws/eks/nim-operator-setup.md
@@ -0,0 +1,166 @@
+# NVIDIA NIM Operator on AWS EKS:
+
+Please see the NIM Operator documentation before you proceed: https://docs.nvidia.com/nim-operator/latest/index.html
+This repository is dedicated to testing NVIDIA NIM Operator on AWS EKS (Elastic Kubernetes Service).
+
+## AWS Infrastructure setup:
+
+1:Refer this high-level architecture diagram for an overview of the setup
+![High level architecture diagram](aws-eks-architecture.png)
+
+2: Refer this to prepare your local environment to run cdk
+https://docs.aws.amazon.com/cdk/v2/guide/getting_started.html
+
+There are four stacks in the nim-eks-cdk/lib directory. You can either deploy all of them using `cdk deploy --all` or deploy each stack by specifying the stack name
+like `cdk deploy vpc-stack`
+
+Stack Deployment Order:
+
+    cdk deploy vpc-stack
+    cdk deploy eks-cluster-stack
+    cdk deploy efs-stack
+
+Note: Ensure that you are in the nim-eks-cdk directory to run the above mentioned commands
+
+ClusterAdmin user is created by the cdk. Create access keys for this user to run kubectl and helm commands. Once secret credentials are created run the following command to manage the cluster
+
+    aws eks update-kubeconfig --name <eks-cluster-name> --region <region-code> --profile <profile-name-if-any>
+
+Note: Ensure that you are in the cloud-service-providers/aws/eks directory for the rest of this guide.
+
+## Cluster setup for inference:
+
+You can either use the NIM Helm Chart or NIM Operator. If you would like to use the NIM Operator, please see the instructions here. 
+
+To install the NIM Helm Chart, please follow the steps below:
+
+1: Install NVIDIA Device Plugin: Install NVIDIA device plugins to run GPU workloads. Check CUDA base image version for compatibility.
+
+    kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.14.1/nvidia-device-plugin.yml
+
+Note: The CDK in this repo spins up AWS G5 instances with OS - Amazon linux 2.  
+`v0.14.1 version for Nvidia device plugins is compatible with Amazon linux 2`
+
+We can also validate if the plugin installation was successful.
+
+    kubectl get nodes "-o=custom-columns=NAME:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu"
+
+2: Install the GPU Operator. https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/getting-started.html#procedure
+
+   helm install --wait --generate-name -n gpu-operator --create-namespace nvidia/gpu-operator --version=v23.6.0
+   
+3: Follow the instructions for the NIM Operator installation: https://docs.nvidia.com/nim-operator/latest/install.html#install-nim-operator
+
+
+# Caching Models
+
+1.         bash setup/setup.sh
+
+    Note: This setup script (directory: nim-deploy/setup)creates two storage classes- EFS and EBS. The necessary csi drivers are installed as add-ons by the CDK.
+
+2.  Use Helm to deploy the custom-values.yaml.
+    a) EBS volume:
+
+         helm install nim-llm ../../../helm/nim-llm/ -f storage/custom-values-ebs-sc.yaml
+
+    b) EFS storage:
+
+         helm install nim-llm ../../../helm/nim-llm/ -f storage/custom-values-efs-sc.yaml
+
+    c) Host path storage:
+
+         helm install nim-llm ../../../helm/nim-llm/ -f storage/custom-values-host-path.yaml
+
+         Note: Since we are running pods as non-root user, cache path specified in the custom-values-host-path.yaml should be created on the EC2 instance prior to installing helm. Also the directory ownership should be assigned to 1000:1000 (or any no root uid:gid as specified in the custom-values.yaml)
+
+3.  Use ingress.yaml to add an alb ingress controller.
+
+         kubectl apply -f ingress.yaml
+
+# Sample request and response:
+Get the DNS of the Load Balancer created in the previous step:
+```
+ELB_DNS=$(aws elbv2 describe-load-balancers --query "LoadBalancers[*].{DNSName:DNSName}")
+```
+Send as sample request:
+
+```
+curl -X 'POST' \
+  "http://${ELB_DNS}/v1/chat/completions" \
+  -H 'accept: application/json' \
+  -H 'Content-Type: application/json' \
+  -d '{
+    "messages": [
+    {
+        "content": "You are a polite and respectful chatbot helping people plan a vacation.",
+        "role": "system"
+    },
+    {
+        "content": "What should I do for a 4 day vacation in Spain?",
+        "role": "user"
+    }
+   ],
+   "model": "meta/llama3-8b-instruct",
+   "max_tokens": 16,
+   "top_p": 1,
+   "n": 1,
+   "stream": false,
+   "stop": "\n",
+   "frequency_penalty": 0.0
+}'
+
+```
+Response:
+
+```
+    {
+    "id": "cmpl-ba02077a544e411f8ba2ff9f38a6917a",
+    "object": "chat.completion",
+    "created": 1717642306,
+    "model": "meta/llama3-8b-instruct",
+    "choices": [
+        {
+            "index": 0,
+            "message": {
+                "role": "assistant",
+                "content": "Spain is a wonderful destination! With four days, you can easily explore one or"
+            },
+            "logprobs": null,
+            "finish_reason": "length",
+            "stop_reason": null
+        }
+    ],
+    "usage": {
+        "prompt_tokens": 42,
+        "total_tokens": 58,
+        "completion_tokens": 16
+    }
+}
+```
+
+# Gen-ai perf tool
+
+      kubectl apply -f perf/gen-ai-perf.yaml
+
+ssh into the triton pod
+
+      kubectl exec -it triton -- bash
+
+Run the following command
+
+      NIM_MODEL_NAME="meta/llama3-8b-instruct"
+      server_url=http://nim-llm-service:8000
+      concurrency=20
+      input_tokens=128
+      output_tokens=10
+
+      genai-perf -m $NIM_MODEL_NAME --endpoint v1/chat/completions --endpoint-type chat \
+      --service-kind openai --streaming \
+      -u $server_url \
+      --num-prompts 100 --prompt-source synthetic \
+      --synthetic-input-tokens-mean $input_tokens \
+      --synthetic-input-tokens-stddev 50 \
+      --concurrency $concurrency \
+      --extra-inputs max_tokens:$output_tokens \
+      --extra-input ignore_eos:true \
+      --profile-export-file test_chat_${concurrency}

From 6600b213678e30395dfbc252d9fd7d2ac907aa91 Mon Sep 17 00:00:00 2001
From: Eda Johnson <edemiraydin@users.noreply.github.com>
Date: Tue, 8 Oct 2024 07:15:54 -0500
Subject: [PATCH 03/21] Update README.md

Added NIM operator link
---
 cloud-service-providers/aws/eks/README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/cloud-service-providers/aws/eks/README.md b/cloud-service-providers/aws/eks/README.md
index c8ad3c0..af4df5f 100644
--- a/cloud-service-providers/aws/eks/README.md
+++ b/cloud-service-providers/aws/eks/README.md
@@ -29,7 +29,7 @@ Note: Ensure that you are in the cloud-service-providers/aws/eks directory for t
 
 ## Cluster setup for inference:
 
-You can either use the NIM Helm Chart or NIM Operator. If you would like to use the NIM Operator, please see the instructions here. 
+You can either use the NIM Helm Chart or NIM Operator. If you would like to use the NIM Operator, please see the instructions ![here](nim-operator-setup.md). 
 
 To install the NIM Helm Chart, please follow the steps below:
 

From 4609a4aa14e442262d777d5f66db00adbb637f13 Mon Sep 17 00:00:00 2001
From: Eda Johnson <edemiraydin@users.noreply.github.com>
Date: Tue, 8 Oct 2024 08:14:05 -0500
Subject: [PATCH 04/21] Update README.md

Added NIMOperator doc link
---
 cloud-service-providers/aws/eks/README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/cloud-service-providers/aws/eks/README.md b/cloud-service-providers/aws/eks/README.md
index af4df5f..e7fedfc 100644
--- a/cloud-service-providers/aws/eks/README.md
+++ b/cloud-service-providers/aws/eks/README.md
@@ -29,7 +29,7 @@ Note: Ensure that you are in the cloud-service-providers/aws/eks directory for t
 
 ## Cluster setup for inference:
 
-You can either use the NIM Helm Chart or NIM Operator. If you would like to use the NIM Operator, please see the instructions ![here](nim-operator-setup.md). 
+You can either use the NIM Helm Chart or NIM Operator. If you would like to use the NIM Operator, please see the instructions [here](nim-operator-setup.md). 
 
 To install the NIM Helm Chart, please follow the steps below:
 

From 1342daa0bbe03ccc7f08926567f1dbb818ac2b04 Mon Sep 17 00:00:00 2001
From: Eda Johnson <edemiraydin@users.noreply.github.com>
Date: Tue, 8 Oct 2024 08:30:41 -0500
Subject: [PATCH 05/21] Create nim-operator-nim-cache-efs.yaml

NimCache example for llama3 8b added
---
 .../storage/nim-operator-nim-cache-efs.yaml   | 20 +++++++++++++++++++
 1 file changed, 20 insertions(+)
 create mode 100644 cloud-service-providers/aws/eks/storage/nim-operator-nim-cache-efs.yaml

diff --git a/cloud-service-providers/aws/eks/storage/nim-operator-nim-cache-efs.yaml b/cloud-service-providers/aws/eks/storage/nim-operator-nim-cache-efs.yaml
new file mode 100644
index 0000000..de337ba
--- /dev/null
+++ b/cloud-service-providers/aws/eks/storage/nim-operator-nim-cache-efs.yaml
@@ -0,0 +1,20 @@
+apiVersion: apps.nvidia.com/v1alpha1
+kind: NIMCache
+metadata:
+  name: meta-llama3-8b-instruct
+spec:
+  source:
+    ngc:
+      modelPuller: nvcr.io/nim/meta/llama3-8b-instruct:1.0.3
+      pullSecret: ngc-secret
+      authSecret: ngc-api-secret
+      model:
+        engine: tensorrt_llm
+        tensorParallelism: "1"
+  storage:
+    pvc:
+      create: true
+      storageClass: "efs-sc"
+      size: "50Gi"
+      volumeAccessMode: ReadWriteMany
+  resources: {}

From 4b6abc9ae54042448848e41fc6d7e7e392a075ce Mon Sep 17 00:00:00 2001
From: Eda Johnson <edemiraydin@users.noreply.github.com>
Date: Tue, 8 Oct 2024 08:31:22 -0500
Subject: [PATCH 06/21] Create nim-operator-nim-cache-ebs.yaml

---
 .../storage/nim-operator-nim-cache-ebs.yaml   | 20 +++++++++++++++++++
 1 file changed, 20 insertions(+)
 create mode 100644 cloud-service-providers/aws/eks/storage/nim-operator-nim-cache-ebs.yaml

diff --git a/cloud-service-providers/aws/eks/storage/nim-operator-nim-cache-ebs.yaml b/cloud-service-providers/aws/eks/storage/nim-operator-nim-cache-ebs.yaml
new file mode 100644
index 0000000..8fc040a
--- /dev/null
+++ b/cloud-service-providers/aws/eks/storage/nim-operator-nim-cache-ebs.yaml
@@ -0,0 +1,20 @@
+apiVersion: apps.nvidia.com/v1alpha1
+kind: NIMCache
+metadata:
+  name: meta-llama3-8b-instruct
+spec:
+  source:
+    ngc:
+      modelPuller: nvcr.io/nim/meta/llama3-8b-instruct:1.0.3
+      pullSecret: ngc-secret
+      authSecret: ngc-api-secret
+      model:
+        engine: tensorrt_llm
+        tensorParallelism: "1"
+  storage:
+    pvc:
+      create: true
+      storageClass: "ebs-sc"
+      size: "50Gi"
+      volumeAccessMode: ReadWriteMany
+  resources: {}

From ae02e60d311805894241a6fa8d6af28c16625dfc Mon Sep 17 00:00:00 2001
From: Eda Johnson <edemiraydin@users.noreply.github.com>
Date: Tue, 8 Oct 2024 08:39:01 -0500
Subject: [PATCH 07/21] Create nim-operator-nim-service.yaml

Added sample NIMService yaml
---
 .../eks/storage/nim-operator-nim-service.yaml | 24 +++++++++++++++++++
 1 file changed, 24 insertions(+)
 create mode 100644 cloud-service-providers/aws/eks/storage/nim-operator-nim-service.yaml

diff --git a/cloud-service-providers/aws/eks/storage/nim-operator-nim-service.yaml b/cloud-service-providers/aws/eks/storage/nim-operator-nim-service.yaml
new file mode 100644
index 0000000..9d1938b
--- /dev/null
+++ b/cloud-service-providers/aws/eks/storage/nim-operator-nim-service.yaml
@@ -0,0 +1,24 @@
+apiVersion: apps.nvidia.com/v1alpha1
+kind: NIMService
+metadata:
+  name: meta-llama3-8b-instruct
+spec:
+  image:
+    repository: nvcr.io/nim/meta/llama3-8b-instruct
+    tag: 1.0.3
+    pullPolicy: IfNotPresent
+    pullSecrets:
+      - ngc-secret
+  authSecret: ngc-api-secret
+  storage:
+    nimCache:
+      name: meta-llama3-8b-instruct
+      profile: ''
+  replicas: 1
+  resources:
+    limits:
+      nvidia.com/gpu: 1
+  expose:
+    service:
+      type: ClusterIP
+      port: 8000

From 620f14751e2dd152b9ca90af60906aa51e9b04f8 Mon Sep 17 00:00:00 2001
From: Eda Johnson <edemiraydin@users.noreply.github.com>
Date: Tue, 8 Oct 2024 08:41:06 -0500
Subject: [PATCH 08/21] Update nim-operator-setup.md

Added NIMOperator instructions with links to sample yaml files.
---
 .../aws/eks/nim-operator-setup.md             | 50 +++++--------------
 1 file changed, 13 insertions(+), 37 deletions(-)

diff --git a/cloud-service-providers/aws/eks/nim-operator-setup.md b/cloud-service-providers/aws/eks/nim-operator-setup.md
index b46d3c8..a0aa260 100644
--- a/cloud-service-providers/aws/eks/nim-operator-setup.md
+++ b/cloud-service-providers/aws/eks/nim-operator-setup.md
@@ -3,36 +3,9 @@
 Please see the NIM Operator documentation before you proceed: https://docs.nvidia.com/nim-operator/latest/index.html
 This repository is dedicated to testing NVIDIA NIM Operator on AWS EKS (Elastic Kubernetes Service).
 
-## AWS Infrastructure setup:
-
-1:Refer this high-level architecture diagram for an overview of the setup
-![High level architecture diagram](aws-eks-architecture.png)
-
-2: Refer this to prepare your local environment to run cdk
-https://docs.aws.amazon.com/cdk/v2/guide/getting_started.html
-
-There are four stacks in the nim-eks-cdk/lib directory. You can either deploy all of them using `cdk deploy --all` or deploy each stack by specifying the stack name
-like `cdk deploy vpc-stack`
-
-Stack Deployment Order:
-
-    cdk deploy vpc-stack
-    cdk deploy eks-cluster-stack
-    cdk deploy efs-stack
-
-Note: Ensure that you are in the nim-eks-cdk directory to run the above mentioned commands
-
-ClusterAdmin user is created by the cdk. Create access keys for this user to run kubectl and helm commands. Once secret credentials are created run the following command to manage the cluster
-
-    aws eks update-kubeconfig --name <eks-cluster-name> --region <region-code> --profile <profile-name-if-any>
-
-Note: Ensure that you are in the cloud-service-providers/aws/eks directory for the rest of this guide.
-
 ## Cluster setup for inference:
 
-You can either use the NIM Helm Chart or NIM Operator. If you would like to use the NIM Operator, please see the instructions here. 
-
-To install the NIM Helm Chart, please follow the steps below:
+To install the pre-requisites for the NIM Operator, please follow the steps below:
 
 1: Install NVIDIA Device Plugin: Install NVIDIA device plugins to run GPU workloads. Check CUDA base image version for compatibility.
 
@@ -47,7 +20,7 @@ We can also validate if the plugin installation was successful.
 
 2: Install the GPU Operator. https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/getting-started.html#procedure
 
-   helm install --wait --generate-name -n gpu-operator --create-namespace nvidia/gpu-operator --version=v23.6.0
+   helm install --wait --generate-name -n gpu-operator --create-namespace nvidia/gpu-operator --version=v23.6.0 --set toolkit.enabled=false
    
 3: Follow the instructions for the NIM Operator installation: https://docs.nvidia.com/nim-operator/latest/install.html#install-nim-operator
 
@@ -58,26 +31,29 @@ We can also validate if the plugin installation was successful.
 
     Note: This setup script (directory: nim-deploy/setup)creates two storage classes- EFS and EBS. The necessary csi drivers are installed as add-ons by the CDK.
 
-2.  Use Helm to deploy the custom-values.yaml.
+2.  Follow the instructions in the docs (https://docs.nvidia.com/nim-operator/latest/cache.html#procedure) using the sample yaml files below.
+   
     a) EBS volume:
 
-         helm install nim-llm ../../../helm/nim-llm/ -f storage/custom-values-ebs-sc.yaml
+         kubectl apply -n nim-service -f storage/nim-operator-nim-cache-ebs.yaml
 
     b) EFS storage:
 
-         helm install nim-llm ../../../helm/nim-llm/ -f storage/custom-values-efs-sc.yaml
+          kubectl apply -n nim-service -f storage/nim-operator-nim-cache-efs.yaml
 
-    c) Host path storage:
+ 
+# Creating a NIM Service 
 
-         helm install nim-llm ../../../helm/nim-llm/ -f storage/custom-values-host-path.yaml
+1. Follow the instructions in the [docs](https://docs.nvidia.com/nim-operator/latest/service.html#procedure) using the sample yaml file below.
 
-         Note: Since we are running pods as non-root user, cache path specified in the custom-values-host-path.yaml should be created on the EC2 instance prior to installing helm. Also the directory ownership should be assigned to 1000:1000 (or any no root uid:gid as specified in the custom-values.yaml)
-
-3.  Use ingress.yaml to add an alb ingress controller.
+         kubectl apply -n nim-service -f storage/nim-operator-nim-service.yaml
+   
+2. Use ingress.yaml to add an alb ingress controller.
 
          kubectl apply -f ingress.yaml
 
 # Sample request and response:
+
 Get the DNS of the Load Balancer created in the previous step:
 ```
 ELB_DNS=$(aws elbv2 describe-load-balancers --query "LoadBalancers[*].{DNSName:DNSName}")

From feed7d016c6859cb828a0deec4b3f452a88e2ed7 Mon Sep 17 00:00:00 2001
From: Eda Johnson <edemiraydin@users.noreply.github.com>
Date: Tue, 8 Oct 2024 08:47:29 -0500
Subject: [PATCH 09/21] Update nim-operator-setup.md

Fixed formatting
---
 cloud-service-providers/aws/eks/nim-operator-setup.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/cloud-service-providers/aws/eks/nim-operator-setup.md b/cloud-service-providers/aws/eks/nim-operator-setup.md
index a0aa260..937d352 100644
--- a/cloud-service-providers/aws/eks/nim-operator-setup.md
+++ b/cloud-service-providers/aws/eks/nim-operator-setup.md
@@ -20,7 +20,7 @@ We can also validate if the plugin installation was successful.
 
 2: Install the GPU Operator. https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/getting-started.html#procedure
 
-   helm install --wait --generate-name -n gpu-operator --create-namespace nvidia/gpu-operator --version=v23.6.0 --set toolkit.enabled=false
+    helm install --wait --generate-name -n gpu-operator --create-namespace nvidia/gpu-operator --version=v23.6.0 --set toolkit.enabled=false
    
 3: Follow the instructions for the NIM Operator installation: https://docs.nvidia.com/nim-operator/latest/install.html#install-nim-operator
 

From e0571bf8ca57f0f5bfaeee8cdee88927279d9e6e Mon Sep 17 00:00:00 2001
From: Eda Johnson <edemiraydin@users.noreply.github.com>
Date: Tue, 8 Oct 2024 09:58:56 -0500
Subject: [PATCH 10/21] Update nim-operator-setup.md

Formatting of setup command fixed
---
 cloud-service-providers/aws/eks/nim-operator-setup.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/cloud-service-providers/aws/eks/nim-operator-setup.md b/cloud-service-providers/aws/eks/nim-operator-setup.md
index 937d352..ac9e40b 100644
--- a/cloud-service-providers/aws/eks/nim-operator-setup.md
+++ b/cloud-service-providers/aws/eks/nim-operator-setup.md
@@ -27,7 +27,7 @@ We can also validate if the plugin installation was successful.
 
 # Caching Models
 
-1.         bash setup/setup.sh
+1.     bash setup/setup.sh
 
     Note: This setup script (directory: nim-deploy/setup)creates two storage classes- EFS and EBS. The necessary csi drivers are installed as add-ons by the CDK.
 

From 2049188f386c4fe067bfe3cfdcb46a8ea199f330 Mon Sep 17 00:00:00 2001
From: Eda Johnson <edemiraydin@users.noreply.github.com>
Date: Wed, 9 Oct 2024 09:55:42 -0500
Subject: [PATCH 11/21] Update nim-operator-setup.md

Added nim-service namespace to ingress creation
---
 cloud-service-providers/aws/eks/nim-operator-setup.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/cloud-service-providers/aws/eks/nim-operator-setup.md b/cloud-service-providers/aws/eks/nim-operator-setup.md
index ac9e40b..b3de995 100644
--- a/cloud-service-providers/aws/eks/nim-operator-setup.md
+++ b/cloud-service-providers/aws/eks/nim-operator-setup.md
@@ -50,7 +50,7 @@ We can also validate if the plugin installation was successful.
    
 2. Use ingress.yaml to add an alb ingress controller.
 
-         kubectl apply -f ingress.yaml
+         kubectl apply -f ingress.yaml -n nim-service
 
 # Sample request and response:
 

From d455d54d5b497ebf06c2c5f3fee002fac88f0a4b Mon Sep 17 00:00:00 2001
From: Eda Johnson <edemiraydin@users.noreply.github.com>
Date: Tue, 22 Oct 2024 16:14:26 -0500
Subject: [PATCH 12/21] Update README.md

Changed the text to use impersonal tense
---
 cloud-service-providers/aws/eks/README.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/cloud-service-providers/aws/eks/README.md b/cloud-service-providers/aws/eks/README.md
index e7fedfc..6228a44 100644
--- a/cloud-service-providers/aws/eks/README.md
+++ b/cloud-service-providers/aws/eks/README.md
@@ -29,9 +29,9 @@ Note: Ensure that you are in the cloud-service-providers/aws/eks directory for t
 
 ## Cluster setup for inference:
 
-You can either use the NIM Helm Chart or NIM Operator. If you would like to use the NIM Operator, please see the instructions [here](nim-operator-setup.md). 
+For the NIM Operator, here are the [instructions](nim-operator-setup.md). 
 
-To install the NIM Helm Chart, please follow the steps below:
+To install the NIM Helm Chart, here are the steps:
 
 1: Install NVIDIA Device Plugin: Install NVIDIA device plugins to run GPU workloads. Check CUDA base image version for compatibility.
 

From 915cc1d566d94889fef9550a82280716787bd525 Mon Sep 17 00:00:00 2001
From: Eda Johnson <edemiraydin@users.noreply.github.com>
Date: Tue, 22 Oct 2024 19:47:57 -0500
Subject: [PATCH 13/21] Update nim-operator-setup.md

Updated based on the PR feedback. (Formatting + removed device plugin install)
---
 .../aws/eks/nim-operator-setup.md             | 19 ++++---------------
 1 file changed, 4 insertions(+), 15 deletions(-)

diff --git a/cloud-service-providers/aws/eks/nim-operator-setup.md b/cloud-service-providers/aws/eks/nim-operator-setup.md
index b3de995..947d8fe 100644
--- a/cloud-service-providers/aws/eks/nim-operator-setup.md
+++ b/cloud-service-providers/aws/eks/nim-operator-setup.md
@@ -7,22 +7,11 @@ This repository is dedicated to testing NVIDIA NIM Operator on AWS EKS (Elastic
 
 To install the pre-requisites for the NIM Operator, please follow the steps below:
 
-1: Install NVIDIA Device Plugin: Install NVIDIA device plugins to run GPU workloads. Check CUDA base image version for compatibility.
-
-    kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.14.1/nvidia-device-plugin.yml
-
-Note: The CDK in this repo spins up AWS G5 instances with OS - Amazon linux 2.  
-`v0.14.1 version for Nvidia device plugins is compatible with Amazon linux 2`
-
-We can also validate if the plugin installation was successful.
-
-    kubectl get nodes "-o=custom-columns=NAME:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu"
-
-2: Install the GPU Operator. https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/getting-started.html#procedure
+1: Install the GPU Operator. https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/getting-started.html#procedure
 
     helm install --wait --generate-name -n gpu-operator --create-namespace nvidia/gpu-operator --version=v23.6.0 --set toolkit.enabled=false
    
-3: Follow the instructions for the NIM Operator installation: https://docs.nvidia.com/nim-operator/latest/install.html#install-nim-operator
+2: Follow the instructions for the NIM Operator installation: https://docs.nvidia.com/nim-operator/latest/install.html#install-nim-operator
 
 
 # Caching Models
@@ -39,7 +28,7 @@ We can also validate if the plugin installation was successful.
 
     b) EFS storage:
 
-          kubectl apply -n nim-service -f storage/nim-operator-nim-cache-efs.yaml
+         kubectl apply -n nim-service -f storage/nim-operator-nim-cache-efs.yaml
 
  
 # Creating a NIM Service 
@@ -118,7 +107,7 @@ Response:
 
       kubectl apply -f perf/gen-ai-perf.yaml
 
-ssh into the triton pod
+exec into the triton pod
 
       kubectl exec -it triton -- bash
 

From a800276211f501cb9fba238fde7a1fc5606fe5d6 Mon Sep 17 00:00:00 2001
From: Eda Johnson <edemiraydin@users.noreply.github.com>
Date: Tue, 22 Oct 2024 20:01:01 -0500
Subject: [PATCH 14/21] Update README.md

Add AWS EKS NIM Operator deployment
---
 README.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/README.md b/README.md
index 55488fd..6dd6777 100644
--- a/README.md
+++ b/README.md
@@ -22,6 +22,7 @@ This repo showcases different ways NVIDIA NIMs can be deployed. This repo contai
 |                                    | **Amazon Web Services**                                     |             |
 |                                    | | [EKS Managed Kubernetes](https://github.com/NVIDIA/nim-deploy/tree/main/cloud-service-providers/aws/eks)                             |             |
 |                                    | | [Amazon SageMaker](https://github.com/NVIDIA/nim-deploy/tree/main/cloud-service-providers/aws/sagemaker)                                   |             |
+|                                    | | [EKS Managed Kubernetes - NIM Operator](https://github.com/NVIDIA/nim-deploy/tree/main/cloud-service-providers/aws/eks/nim-operator-setup.md)                             |             |
 |                                    | **Google Cloud Platform**                                   |             |
 |                                    | | [GKE Managed Kubernetes](https://github.com/NVIDIA/nim-deploy/tree/main/cloud-service-providers/google-cloud/gke)                             |             |
 |                                    | | [Google Cloud Vertex AI](https://github.com/NVIDIA/nim-deploy/tree/main/cloud-service-providers/google-cloud/vertexai/python)               |             |

From c211a4329ce914a6e5561ff05ac0973e2ff7c470 Mon Sep 17 00:00:00 2001
From: Eda Johnson <edemiraydin@users.noreply.github.com>
Date: Fri, 15 Nov 2024 10:50:31 -0600
Subject: [PATCH 15/21] Create nim-operator-setup.md

Initial commit
---
 .../azure/aks/nim-operator-setup.md           | 131 ++++++++++++++++++
 1 file changed, 131 insertions(+)
 create mode 100644 cloud-service-providers/azure/aks/nim-operator-setup.md

diff --git a/cloud-service-providers/azure/aks/nim-operator-setup.md b/cloud-service-providers/azure/aks/nim-operator-setup.md
new file mode 100644
index 0000000..8d96e7a
--- /dev/null
+++ b/cloud-service-providers/azure/aks/nim-operator-setup.md
@@ -0,0 +1,131 @@
+# NVIDIA NIM Operator on Azure AKS:
+
+Please see the NIM Operator documentation before you proceed: https://docs.nvidia.com/nim-operator/latest/index.html
+This repository is dedicated to testing NVIDIA NIM Operator on Azure AKS.
+
+## Cluster setup for inference:
+
+To install the pre-requisites for the NIM Operator, please follow the steps below:
+
+1: Install the GPU Operator. https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/getting-started.html#procedure
+
+    helm install --wait --generate-name -n gpu-operator --create-namespace nvidia/gpu-operator --version=v23.6.0 --set toolkit.enabled=false
+   
+2: Follow the instructions for the NIM Operator installation: https://docs.nvidia.com/nim-operator/latest/install.html#install-nim-operator
+
+
+# Caching Models
+
+1.     bash setup/setup.sh
+
+    Note: This setup script (directory: nim-deploy/setup)creates two storage classes- EFS and EBS. The necessary csi drivers are installed as add-ons by the CDK.
+
+2.  Follow the instructions in the docs (https://docs.nvidia.com/nim-operator/latest/cache.html#procedure) using the sample yaml files below.
+   
+    a) ABS volume:
+
+         kubectl apply -n nim-service -f storage/nim-operator-nim-cache-abs.yaml
+
+    b) AFS storage:
+
+         kubectl apply -n nim-service -f storage/nim-operator-nim-cache-afs.yaml
+
+ 
+# Creating a NIM Service 
+
+1. Follow the instructions in the [docs](https://docs.nvidia.com/nim-operator/latest/service.html#procedure) using the sample yaml file below.
+
+         kubectl apply -n nim-service -f storage/nim-operator-nim-service.yaml
+   
+2. Use ingress.yaml to add an  ingress controller.
+
+         kubectl apply -f ingress.yaml -n nim-service
+
+# Sample request and response:
+
+Get the DNS of the Load Balancer created in the previous step:
+```
+ELB_DNS=$()
+```
+Send as sample request:
+
+```
+curl -X 'POST' \
+  "http://${ELB_DNS}/v1/chat/completions" \
+  -H 'accept: application/json' \
+  -H 'Content-Type: application/json' \
+  -d '{
+    "messages": [
+    {
+        "content": "You are a polite and respectful chatbot helping people plan a vacation.",
+        "role": "system"
+    },
+    {
+        "content": "What should I do for a 4 day vacation in Spain?",
+        "role": "user"
+    }
+   ],
+   "model": "meta/llama3-8b-instruct",
+   "max_tokens": 16,
+   "top_p": 1,
+   "n": 1,
+   "stream": false,
+   "stop": "\n",
+   "frequency_penalty": 0.0
+}'
+
+```
+Response:
+
+```
+    {
+    "id": "cmpl-ba02077a544e411f8ba2ff9f38a6917a",
+    "object": "chat.completion",
+    "created": 1717642306,
+    "model": "meta/llama3-8b-instruct",
+    "choices": [
+        {
+            "index": 0,
+            "message": {
+                "role": "assistant",
+                "content": "Spain is a wonderful destination! With four days, you can easily explore one or"
+            },
+            "logprobs": null,
+            "finish_reason": "length",
+            "stop_reason": null
+        }
+    ],
+    "usage": {
+        "prompt_tokens": 42,
+        "total_tokens": 58,
+        "completion_tokens": 16
+    }
+}
+```
+
+# Gen-ai perf tool
+
+      kubectl apply -f perf/gen-ai-perf.yaml
+
+exec into the triton pod
+
+      kubectl exec -it triton -- bash
+
+Run the following command
+
+      NIM_MODEL_NAME="meta/llama3-8b-instruct"
+      server_url=http://nim-llm-service:8000
+      concurrency=20
+      input_tokens=128
+      output_tokens=10
+
+      genai-perf -m $NIM_MODEL_NAME --endpoint v1/chat/completions --endpoint-type chat \
+      --service-kind openai --streaming \
+      -u $server_url \
+      --num-prompts 100 --prompt-source synthetic \
+      --synthetic-input-tokens-mean $input_tokens \
+      --synthetic-input-tokens-stddev 50 \
+      --concurrency $concurrency \
+      --extra-inputs max_tokens:$output_tokens \
+      --extra-input ignore_eos:true \
+      --profile-export-file test_chat_${concurrency}

From 29a4a34cc1546864a799dff350973f18015323b1 Mon Sep 17 00:00:00 2001
From: Eda Johnson <edemiraydin@users.noreply.github.com>
Date: Fri, 15 Nov 2024 10:54:06 -0600
Subject: [PATCH 16/21] Update README.md

- Added NIM Operator link
- Fixed typos
---
 cloud-service-providers/azure/aks/README.md | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/cloud-service-providers/azure/aks/README.md b/cloud-service-providers/azure/aks/README.md
index 90c2166..d3e4da0 100644
--- a/cloud-service-providers/azure/aks/README.md
+++ b/cloud-service-providers/azure/aks/README.md
@@ -9,12 +9,16 @@ After you are ready to create AKS, the next thing is to choose the right GPU ins
 
 ## Prerequisites
 
-Please follow [Pre-rquirement instruction](./prerequisites/README.md) to get ready for AKS creation.
+Please follow [Pre-requirement instruction](./prerequisites/README.md) to get ready for AKS creation.
 
 ## Create AKS
 
 Please follow [Create AKS instruction](./setup/README.md) to create AKS.
 
-## Deploy NIM
+## Deploy NIM using Helm Chart
 
-Please follow [Deploy NIM instruction](../../../helm/README.md) to create AKS.
+Please follow [Deploy NIM instruction](../../../helm/README.md) to deploy NIM.
+
+## Deploy NIM using NIM Operator
+
+Please follow [Deploy NIM Operator instruction](../nim-operator-setup.md) to deploy NIM Operator.

From 2728808b6e39a2b4eed6367ffe837d5af9bebe13 Mon Sep 17 00:00:00 2001
From: Eda Johnson <edemiraydin@users.noreply.github.com>
Date: Fri, 15 Nov 2024 10:54:58 -0600
Subject: [PATCH 17/21] Update README.md

Fixed typos and added the link for NIM Operator
---
 cloud-service-providers/azure/aks/README.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/cloud-service-providers/azure/aks/README.md b/cloud-service-providers/azure/aks/README.md
index d3e4da0..e739558 100644
--- a/cloud-service-providers/azure/aks/README.md
+++ b/cloud-service-providers/azure/aks/README.md
@@ -9,16 +9,16 @@ After you are ready to create AKS, the next thing is to choose the right GPU ins
 
 ## Prerequisites
 
-Please follow [Pre-requirement instruction](./prerequisites/README.md) to get ready for AKS creation.
+Please follow [Pre-requirement instructions](./prerequisites/README.md) to get ready for AKS creation.
 
 ## Create AKS
 
-Please follow [Create AKS instruction](./setup/README.md) to create AKS.
+Please follow [Create AKS instructions](./setup/README.md) to create AKS.
 
 ## Deploy NIM using Helm Chart
 
-Please follow [Deploy NIM instruction](../../../helm/README.md) to deploy NIM.
+Please follow [Deploy NIM instructions](../../../helm/README.md) to deploy NIM.
 
 ## Deploy NIM using NIM Operator
 
-Please follow [Deploy NIM Operator instruction](../nim-operator-setup.md) to deploy NIM Operator.
+Please follow [Deploy NIM Operator instructions](nim-operator-setup.md) to deploy NIM Operator.

From fef57f982f97edff5d5c890e20948f5792a5d7dd Mon Sep 17 00:00:00 2001
From: Eda Johnson <edemiraydin@users.noreply.github.com>
Date: Fri, 15 Nov 2024 16:16:56 -0600
Subject: [PATCH 18/21] Update README.md

Formatting of headers
---
 cloud-service-providers/azure/aks/README.md | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/cloud-service-providers/azure/aks/README.md b/cloud-service-providers/azure/aks/README.md
index e739558..9692219 100644
--- a/cloud-service-providers/azure/aks/README.md
+++ b/cloud-service-providers/azure/aks/README.md
@@ -15,10 +15,11 @@ Please follow [Pre-requirement instructions](./prerequisites/README.md) to get r
 
 Please follow [Create AKS instructions](./setup/README.md) to create AKS.
 
-## Deploy NIM using Helm Chart
+## Deploy NIM 
+### Using Helm Chart
 
 Please follow [Deploy NIM instructions](../../../helm/README.md) to deploy NIM.
 
-## Deploy NIM using NIM Operator
+### Using NIM Operator
 
 Please follow [Deploy NIM Operator instructions](nim-operator-setup.md) to deploy NIM Operator.

From 597dd41b0596ad00e5813b73d9386c26cbd0e4cf Mon Sep 17 00:00:00 2001
From: Eda Johnson <edemiraydin@users.noreply.github.com>
Date: Fri, 15 Nov 2024 16:37:36 -0600
Subject: [PATCH 19/21] Update nim-operator-setup.md

Added the NIMCache instructions
---
 .../azure/aks/nim-operator-setup.md           | 31 ++++++++-----------
 1 file changed, 13 insertions(+), 18 deletions(-)

diff --git a/cloud-service-providers/azure/aks/nim-operator-setup.md b/cloud-service-providers/azure/aks/nim-operator-setup.md
index 8d96e7a..6ec39bb 100644
--- a/cloud-service-providers/azure/aks/nim-operator-setup.md
+++ b/cloud-service-providers/azure/aks/nim-operator-setup.md
@@ -1,35 +1,30 @@
 # NVIDIA NIM Operator on Azure AKS:
 
 Please see the NIM Operator documentation before you proceed: https://docs.nvidia.com/nim-operator/latest/index.html
-This repository is dedicated to testing NVIDIA NIM Operator on Azure AKS.
+The files in this repo are for reference, for the official NVIDIA AI Enterprise supported release, see NGC and the official documentation.
+Helm a and GPU Operator should be installed in the cluster before proceeding with the steps below. 
+Pre-requisites: https://docs.nvidia.com/nim-operator/latest/install.html#prerequisites
 
-## Cluster setup for inference:
-
-To install the pre-requisites for the NIM Operator, please follow the steps below:
-
-1: Install the GPU Operator. https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/getting-started.html#procedure
-
-    helm install --wait --generate-name -n gpu-operator --create-namespace nvidia/gpu-operator --version=v23.6.0 --set toolkit.enabled=false
-   
-2: Follow the instructions for the NIM Operator installation: https://docs.nvidia.com/nim-operator/latest/install.html#install-nim-operator
+Follow the instructions for the NIM Operator installation: https://docs.nvidia.com/nim-operator/latest/install.html#install-nim-operator
 
 
 # Caching Models
 
-1.     bash setup/setup.sh
+1.     Set your NGC_API_KEY and create secrets as show below:
 
-    Note: This setup script (directory: nim-deploy/setup)creates two storage classes- EFS and EBS. The necessary csi drivers are installed as add-ons by the CDK.
 
-2.  Follow the instructions in the docs (https://docs.nvidia.com/nim-operator/latest/cache.html#procedure) using the sample yaml files below.
-   
-    a) ABS volume:
+If you have not set up NGC, see the [NGC Setup](https://ngc.nvidia.com/setup) topic.
+Set the **NGC_API_KEY** environment variable to your NGC API key, as shown in the following example.
 
-         kubectl apply -n nim-service -f storage/nim-operator-nim-cache-abs.yaml
+```bash
+export NGC_API_KEY="key from ngc"
+```
 
-    b) AFS storage:
 
-         kubectl apply -n nim-service -f storage/nim-operator-nim-cache-afs.yaml
 
+2.  Follow the instructions in the docs (https://docs.nvidia.com/nim-operator/latest/cache.html#procedure) using the sample yaml files below.
+   
+The image and the model files are fairly large (> 10GB, typically), so ensure that however you are managing the storage for your helm release, you have enough space to host both the image. If you have a persistent volume setup available to you, as you do in most cloud providers, it is recommended that you use it. If you need to be able to deploy pods quickly and would like to be able to skip the model download step, there is an advantage to using a shared volume such as NFS as your storage setup. To try this out, it is simplest to use a normal persistent volume claim. See the Kubernetes Persistent Volumes documentation for more information.
  
 # Creating a NIM Service 
 

From 41b057d38a71d5fa37893054224b1f0b7cb04c0d Mon Sep 17 00:00:00 2001
From: Eda Johnson <edemiraydin@users.noreply.github.com>
Date: Fri, 15 Nov 2024 19:41:15 -0600
Subject: [PATCH 20/21] Update nim-operator-setup.md

Added nimcache and nimservice instructions
---
 .../azure/aks/nim-operator-setup.md           | 169 ++++++++----------
 1 file changed, 75 insertions(+), 94 deletions(-)

diff --git a/cloud-service-providers/azure/aks/nim-operator-setup.md b/cloud-service-providers/azure/aks/nim-operator-setup.md
index 6ec39bb..d863a8f 100644
--- a/cloud-service-providers/azure/aks/nim-operator-setup.md
+++ b/cloud-service-providers/azure/aks/nim-operator-setup.md
@@ -2,7 +2,7 @@
 
 Please see the NIM Operator documentation before you proceed: https://docs.nvidia.com/nim-operator/latest/index.html
 The files in this repo are for reference, for the official NVIDIA AI Enterprise supported release, see NGC and the official documentation.
-Helm a and GPU Operator should be installed in the cluster before proceeding with the steps below. 
+Helm and GPU Operator should be installed in the cluster before proceeding with the steps below. 
 Pre-requisites: https://docs.nvidia.com/nim-operator/latest/install.html#prerequisites
 
 Follow the instructions for the NIM Operator installation: https://docs.nvidia.com/nim-operator/latest/install.html#install-nim-operator
@@ -10,117 +10,98 @@ Follow the instructions for the NIM Operator installation: https://docs.nvidia.c
 
 # Caching Models
 
-1.     Set your NGC_API_KEY and create secrets as show below:
-
-
-If you have not set up NGC, see the [NGC Setup](https://ngc.nvidia.com/setup) topic.
-Set the **NGC_API_KEY** environment variable to your NGC API key, as shown in the following example.
-
-```bash
-export NGC_API_KEY="key from ngc"
-```
-
-
-
-2.  Follow the instructions in the docs (https://docs.nvidia.com/nim-operator/latest/cache.html#procedure) using the sample yaml files below.
+Follow the instructions in the docs (https://docs.nvidia.com/nim-operator/latest/cache.html#procedure) using the sample manifest file below.
    
-The image and the model files are fairly large (> 10GB, typically), so ensure that however you are managing the storage for your helm release, you have enough space to host both the image. If you have a persistent volume setup available to you, as you do in most cloud providers, it is recommended that you use it. If you need to be able to deploy pods quickly and would like to be able to skip the model download step, there is an advantage to using a shared volume such as NFS as your storage setup. To try this out, it is simplest to use a normal persistent volume claim. See the Kubernetes Persistent Volumes documentation for more information.
+The image and the model files are fairly large (> 10GB, typically), so ensure that however you are managing the storage for your helm release, you have enough space to host both the image. If you have a persistent volume setup available to you, as you do in most cloud providers, it is recommended that you use it. If you need to be able to deploy pods quickly and would like to be able to skip the model download step, there is an advantage to using a shared volume such as NFS as your storage setup. To try this out, it is simplest to use a normal persistent volume claim. See the Kubernetes [Persistent Volumes](https://kubernetes.io/docs/concepts/storage/persistent-volumes/) documentation for more information.
+
+```yaml
+apiVersion: apps.nvidia.com/v1alpha1
+kind: NIMCache
+metadata:
+  name: meta-llama3-8b-instruct
+spec:
+  source:
+    ngc:
+      modelPuller: nvcr.io/nim/meta/llama3-8b-instruct:1.0.3
+      pullSecret: ngc-secret
+      authSecret: ngc-api-secret
+      model:
+        engine: tensorrt_llm
+        tensorParallelism: "1"
+  storage:
+    pvc:
+      create: true
+      storageClass: azurefile-csi-premium
+      size: "50Gi"
+      volumeAccessMode: ReadWriteMany
+  resources: {}
+```
  
 # Creating a NIM Service 
 
 1. Follow the instructions in the [docs](https://docs.nvidia.com/nim-operator/latest/service.html#procedure) using the sample yaml file below.
 
-         kubectl apply -n nim-service -f storage/nim-operator-nim-service.yaml
-   
-2. Use ingress.yaml to add an  ingress controller.
-
-         kubectl apply -f ingress.yaml -n nim-service
+```yaml
+apiVersion: apps.nvidia.com/v1alpha1
+kind: NIMService
+metadata:
+  name: meta-llama3-8b-instruct
+spec:
+  image:
+    repository: nvcr.io/nim/meta/llama3-8b-instruct
+    tag: 1.0.3
+    pullPolicy: IfNotPresent
+    pullSecrets:
+      - ngc-secret
+  authSecret: ngc-api-secret
+  storage:
+    nimCache:
+      name: meta-llama3-8b-instruct
+      profile: ''
+  replicas: 1
+  resources:
+    limits:
+      nvidia.com/gpu: 1
+  expose:
+    service:
+      type: ClusterIP
+      port: 8000
+```    
 
 # Sample request and response:
 
-Get the DNS of the Load Balancer created in the previous step:
-```
-ELB_DNS=$()
-```
-Send as sample request:
+Avoid setting up external ingress without adding an authentication layer. This is because NIM doesn't provide authentication on its own. The chart provides options for basic ingress.
+
+Since this example assumes you aren't using an ingress controller, simply port-forward the service so that you can try it out directly.
 
+```bash
+kubectl -n nim port-forward service/my-nim-nim-llm 8000:8000
 ```
+
+Then try a request:
+
+```bash
 curl -X 'POST' \
-  "http://${ELB_DNS}/v1/chat/completions" \
+  'http://localhost:8000/v1/chat/completions' \
   -H 'accept: application/json' \
   -H 'Content-Type: application/json' \
   -d '{
-    "messages": [
+  "messages": [
     {
-        "content": "You are a polite and respectful chatbot helping people plan a vacation.",
-        "role": "system"
+      "content": "You are a polite and respectful chatbot helping people plan a vacation.",
+      "role": "system"
     },
     {
-        "content": "What should I do for a 4 day vacation in Spain?",
-        "role": "user"
+      "content": "What should I do for a 4 day vacation in Spain?",
+      "role": "user"
     }
-   ],
-   "model": "meta/llama3-8b-instruct",
-   "max_tokens": 16,
-   "top_p": 1,
-   "n": 1,
-   "stream": false,
-   "stop": "\n",
-   "frequency_penalty": 0.0
+  ],
+  "model": "meta/llama3-8b-instruct",
+  "max_tokens": 16,
+  "top_p": 1,
+  "n": 1,
+  "stream": false,
+  "stop": "\n",
+  "frequency_penalty": 0.0
 }'
-
-```
-Response:
-
-```
-    {
-    "id": "cmpl-ba02077a544e411f8ba2ff9f38a6917a",
-    "object": "chat.completion",
-    "created": 1717642306,
-    "model": "meta/llama3-8b-instruct",
-    "choices": [
-        {
-            "index": 0,
-            "message": {
-                "role": "assistant",
-                "content": "Spain is a wonderful destination! With four days, you can easily explore one or"
-            },
-            "logprobs": null,
-            "finish_reason": "length",
-            "stop_reason": null
-        }
-    ],
-    "usage": {
-        "prompt_tokens": 42,
-        "total_tokens": 58,
-        "completion_tokens": 16
-    }
-}
 ```
-
-# Gen-ai perf tool
-
-      kubectl apply -f perf/gen-ai-perf.yaml
-
-exec into the triton pod
-
-      kubectl exec -it triton -- bash
-
-Run the following command
-
-      NIM_MODEL_NAME="meta/llama3-8b-instruct"
-      server_url=http://nim-llm-service:8000
-      concurrency=20
-      input_tokens=128
-      output_tokens=10
-
-      genai-perf -m $NIM_MODEL_NAME --endpoint v1/chat/completions --endpoint-type chat \
-      --service-kind openai --streaming \
-      -u $server_url \
-      --num-prompts 100 --prompt-source synthetic \
-      --synthetic-input-tokens-mean $input_tokens \
-      --synthetic-input-tokens-stddev 50 \
-      --concurrency $concurrency \
-      --extra-inputs max_tokens:$output_tokens \
-      --extra-input ignore_eos:true \
-      --profile-export-file test_chat_${concurrency}

From 84f3de519f98830e5c3fa4935b523a023857d9b4 Mon Sep 17 00:00:00 2001
From: Eda Johnson <edemiraydin@users.noreply.github.com>
Date: Thu, 21 Nov 2024 12:25:17 -0600
Subject: [PATCH 21/21] Update nim-operator-setup.md

Created a custom storage class based on nfs protocol to create nimcache
---
 .../azure/aks/nim-operator-setup.md                | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/cloud-service-providers/azure/aks/nim-operator-setup.md b/cloud-service-providers/azure/aks/nim-operator-setup.md
index d863a8f..e0052a7 100644
--- a/cloud-service-providers/azure/aks/nim-operator-setup.md
+++ b/cloud-service-providers/azure/aks/nim-operator-setup.md
@@ -3,16 +3,20 @@
 Please see the NIM Operator documentation before you proceed: https://docs.nvidia.com/nim-operator/latest/index.html
 The files in this repo are for reference, for the official NVIDIA AI Enterprise supported release, see NGC and the official documentation.
 Helm and GPU Operator should be installed in the cluster before proceeding with the steps below. 
-Pre-requisites: https://docs.nvidia.com/nim-operator/latest/install.html#prerequisites
+[Pre-requisites](https://docs.nvidia.com/nim-operator/latest/install.html#prerequisites)
 
 Follow the instructions for the NIM Operator installation: https://docs.nvidia.com/nim-operator/latest/install.html#install-nim-operator
 
 
 # Caching Models
 
-Follow the instructions in the docs (https://docs.nvidia.com/nim-operator/latest/cache.html#procedure) using the sample manifest file below.
+Follow the instructions in the [docs](https://docs.nvidia.com/nim-operator/latest/cache.html#procedure) using the sample manifest file below.
    
-The image and the model files are fairly large (> 10GB, typically), so ensure that however you are managing the storage for your helm release, you have enough space to host both the image. If you have a persistent volume setup available to you, as you do in most cloud providers, it is recommended that you use it. If you need to be able to deploy pods quickly and would like to be able to skip the model download step, there is an advantage to using a shared volume such as NFS as your storage setup. To try this out, it is simplest to use a normal persistent volume claim. See the Kubernetes [Persistent Volumes](https://kubernetes.io/docs/concepts/storage/persistent-volumes/) documentation for more information.
+The image and the model files are fairly large (> 10GB, typically), so ensure that however you are managing the storage for your helm release, you have enough space to host both the image. If you have a persistent volume setup available to you, as you do in most cloud providers, it is recommended that you use it. If you need to be able to deploy pods quickly and would like to be able to skip the model download step, there is an advantage to using a shared volume such as NFS as your storage setup.  
+
+Follow instructions to create a custom storage class that uses NFS protocol:  https://learn.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/storage/fail-to-mount-azure-file-share#solution-2-create-a-pod-that-can-be-scheduled-on-a-fips-enabled-node
+
+Create a nimcache using the sample file that leverages the custom storage class you created (e.g. azurefile-sc-fips):
 
 ```yaml
 apiVersion: apps.nvidia.com/v1alpha1
@@ -31,7 +35,7 @@ spec:
   storage:
     pvc:
       create: true
-      storageClass: azurefile-csi-premium
+      storageClass: azurefile-sc-fips
       size: "50Gi"
       volumeAccessMode: ReadWriteMany
   resources: {}
@@ -75,7 +79,7 @@ Avoid setting up external ingress without adding an authentication layer. This i
 Since this example assumes you aren't using an ingress controller, simply port-forward the service so that you can try it out directly.
 
 ```bash
-kubectl -n nim port-forward service/my-nim-nim-llm 8000:8000
+kubectl -n nim-service port-forward service/meta-llama3-8b-instruct 8000:8000
 ```
 
 Then try a request: