GenAIOps Lab Deployment

Infrastructure-as-code for deploying a multi-tenant GenAIOps learning environment on OpenShift.

Overview

This repository provides a complete deployment solution for the AI501 GenAIOps lab environment. It enables instructors to rapidly provision a multi-tenant AI/ML lab where students can experiment with Large Language Models, prompt engineering, model optimization, and AI governance.

Prerequisites

OpenShift 4.119+ cluster with cluster-admin access
Helm 3.x installed
oc CLI configured and authenticated
(Optional) AWS credentials for GPU machine provisioning

GPU Requirements

This lab requires 3 GPU nodes with two different taints:

Component	GPU Count	Instance Type	Taint
Docling Serve	1	g4dn (T4)	`nvidia.com/gpu=g4dn`
Llama 3.2 (cloud-model)	1	g5 (A10G)	`nvidia.com/gpu=g5`
Llama 3.2 FP8 (quantized-model)	1	g5 (A10G)	`nvidia.com/gpu=g5`

If you have different taints, make sure you update the necessary template for the deployments' tolerations.

Quick Start

Clone the repository

git clone <repository-url>
cd deploy-lab

Configure your deployment (Required)

Edit student-content/values.yaml before running the installation:

cluster_domain: apps.your-cluster.example.com  # Your OpenShift apps domain
attendees: 20                                   # Number of students to create

To find your cluster domain:

oc get ingresses.config.openshift.io cluster -o jsonpath='{.spec.domain}'

Run the installation
```
./install.sh
```

Configuration

Edit student-content/values.yaml before installation:

Parameter	Description	Default
`cluster_domain`	OpenShift apps domain (e.g., `apps.mycluster.example.com`)	`apps.example.com`
`modelName`	Default LLM model name	`llama32`
`attendees`	Number of student environments to create	`20`

Repository Structure

deploy-lab/
├── install.sh                 # Main deployment script
├── operators/                 # Helm chart: OpenShift operators
├── toolings/                  # Helm chart: shared infrastructure
├── student-content/           # Helm chart: student environments
│   ├── values.yaml           # Configuration values
│   └── templates/            # Kubernetes manifests
│       ├── tinyllama/        # TinyLlama model serving
│       ├── cloud-model/      # Larger model serving
│       ├── exercise-app/     # Learning portal
│       ├── gradio/           # Interactive LLM demos
│       ├── canopy-board/     # Gamification dashboard
│       ├── argocd-instance/  # Per-user GitOps
│       └── ...
├── custom-codeserver/         # Custom workbench image
├── exercise-app/              # FastAPI exercise portal
├── gradio-app/                # Gradio LLM interfaces
├── canopy-board/              # Gamification system
├── user-signup-app/           # Student registration
├── prompt-tracker/            # Git monitoring dashboard
└── rdu-website/               # Demo marketing site

Components

Operators

Installs required OpenShift operators:

Red Hat OpenShift AI (RHOAI)
OpenShift Pipelines (Tekton)
OpenShift GitOps (ArgoCD)
NVIDIA GPU Operator
Node Feature Discovery

Toolings

Deploys shared infrastructure:

MinIO (S3-compatible storage)
TrustyAI (AI explainability)
Observability stack (Prometheus, Grafana, Tempo)
Custom workbench templates

Student Content

Per-student resources:

ArgoCD instance for GitOps workflows
Data Science project workspace
Access to shared LLM models
Interactive learning applications

Applications

Application	Description	Path
Exercise Portal	FastAPI gateway to learning modules	`exercise-app/`
Gradio App	Interactive LLM experimentation	`gradio-app/`
Canopy Board	Gamification with leaderboards	`canopy-board/`
User Signup	Student registration system	`user-signup-app/`
Prompt Tracker	Git-based prompt versioning	`prompt-tracker/`

Model Serving

Pre-configured model deployments using KServe and vLLM:

TinyLlama 1.1B: CPU-optimized for basic inference
Llama 3.2: Full-featured model for production use
Quantized Models: FP8 compressed for efficiency

Post-Installation

After running install.sh:

Verify operators are ready:
```
oc get csv -n openshift-operators
```
Check student namespaces:
```
oc get namespaces | grep user
```
Access the signup app to distribute credentials to students
Monitor ArgoCD for GitOps sync status:
```
oc get applications -A
```

GPU Provisioning (AWS)

For GPU workloads, use the machine provisioning script:

./machineset.sh

Supports: T4, A10G, A100, H100, L40 instances

Troubleshooting

Check Helm releases

helm list -n ai501

View pod status

oc get pods -n ai501

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GenAIOps Lab Deployment

Overview

Prerequisites

GPU Requirements

Quick Start

Configuration

Repository Structure

Components

Operators

Toolings

Student Content

Applications

Model Serving

Post-Installation

GPU Provisioning (AWS)

Troubleshooting

Check Helm releases

View pod status

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 221 Commits
ai-orientation-app		ai-orientation-app
canopy-board		canopy-board
custom-codeserver		custom-codeserver
exercise-app		exercise-app
llama-stack-image		llama-stack-image
operators		operators
prompt-tracker		prompt-tracker
rdu-website		rdu-website
student-content		student-content
toolings		toolings
university-data		university-data
user-signup-app		user-signup-app
.gitignore		.gitignore
README.md		README.md
install.sh		install.sh
machineset.sh		machineset.sh

Folders and files

Latest commit

History

Repository files navigation

GenAIOps Lab Deployment

Overview

Prerequisites

GPU Requirements

Quick Start

Configuration

Repository Structure

Components

Operators

Toolings

Student Content

Applications

Model Serving

Post-Installation

GPU Provisioning (AWS)

Troubleshooting

Check Helm releases

View pod status

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages