Getting Started with Kubernetes on AMD GPUs

This repository contains a comprehensive tutorial for deploying and managing AI inference workloads on Kubernetes clusters with AMD GPUs.

🚀 Quick Start

Prerequisites

Ubuntu/Debian server with AMD GPUs
Root/sudo access
At least 2GB RAM and 20GB free disk space
Internet connectivity for package downloads

Step 0: System Check and Preparation (Recommended)

Before starting the installation, run the enhanced system check to ensure your system is ready and resolve any potential issues:

sudo ./check-system-enhanced.sh

This educational script will:

Auto-detect and resolve update manager lock issues (common problem)
Explain what each check does and why it's important
Verify system requirements (OS, disk space, memory, network)
Detect container environment vs bare metal
Check Kubernetes installation status
Provide learning recommendations and next steps

Complete Installation from Scratch

Step 1: Install Kubernetes

sudo ./install-kubernetes.sh

This script will:

Install vanilla Kubernetes 1.28+ on Ubuntu/Debian
Configure containerd container runtime
Set up Calico CNI networking
Configure single-node cluster (removes control-plane taints)
Disable swap and configure kernel settings
Verify cluster functionality

Step 2: Install AMD GPU Operator

./install-amd-gpu-operator.sh

This script will:

Install Helm (if not present)
Install cert-manager (prerequisite)
Install AMD GPU Operator
Configure device settings for vanilla Kubernetes
Set up persistent storage for AI models
Verify the installation

Step 3: Deploy vLLM AI Inference

./deploy-vllm-inference.sh

This script will:

Install MetalLB load balancer
Deploy vLLM inference server with Llama-3.2-1B model
Create LoadBalancer service for external access
Generate test scripts for API validation

Architecture Overview

┌─────────────────────────────────────────────┐
│                Applications                 │
│  ┌─────────────┐  ┌─────────────┐           │
│  │    vLLM     │  │  Jupyter    │   ...     │
│  │  Inference  │  │ Notebooks   │           │
│  └─────────────┘  └─────────────┘           │
└─────────────────────────────────────────────┘
┌─────────────────────────────────────────────┐
│            Kubernetes Layer                 │
│  ┌──────────┐ ┌──────────┐ ┌───────────┐    │
│  │   Pods   │ │ Services │ │Deployments│    │
│  └──────────┘ └──────────┘ └───────────┘    │
└─────────────────────────────────────────────┘
┌─────────────────────────────────────────────┐
│           AMD GPU Operator                  │
│  ┌──────────────┐  ┌─────────────────┐      │
│  │Device Plugin │  │  Node Labeller  │      │
│  └──────────────┘  └─────────────────┘      │
└─────────────────────────────────────────────┘
┌─────────────────────────────────────────────┐
│         Hardware Infrastructure             │
│  ┌─────────────────────────────────────┐    │
│  │     AMD Instinct MI300X GPUs        │    │
│  │  ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐    │    │
│  │  │GPU 0│ │GPU 1│ │GPU 2│ │GPU 3│    │    │
│  │  └─────┘ └─────┘ └─────┘ └─────┘    │    │
│  └─────────────────────────────────────┘    │
└─────────────────────────────────────────────┘

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
templates		templates
yaml-configs		yaml-configs
README.md		README.md
check-system-enhanced.sh		check-system-enhanced.sh
deploy-vllm-inference.sh		deploy-vllm-inference.sh
detect-environment.sh		detect-environment.sh
install-amd-gpu-operator.sh		install-amd-gpu-operator.sh
install-kubernetes.sh		install-kubernetes.sh
local-storage-class.yaml		local-storage-class.yaml
metallb-config-generator.sh		metallb-config-generator.sh
port-management.sh		port-management.sh
qwen-pvc.yaml		qwen-pvc.yaml
requirements.txt		requirements.txt
test-vllm-api.sh		test-vllm-api.sh
verify-readiness.sh		verify-readiness.sh
vllm_demo.py		vllm_demo.py
vllm_web_demo.py		vllm_web_demo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Getting Started with Kubernetes on AMD GPUs

🚀 Quick Start

Prerequisites

Step 0: System Check and Preparation (Recommended)

Complete Installation from Scratch

Step 1: Install Kubernetes

Step 2: Install AMD GPU Operator

Step 3: Deploy vLLM AI Inference

Architecture Overview

About

Uh oh!

Releases

Packages

Languages

Yu-amd/Kubernetes-MI300X

Folders and files

Latest commit

History

Repository files navigation

Getting Started with Kubernetes on AMD GPUs

🚀 Quick Start

Prerequisites

Step 0: System Check and Preparation (Recommended)

Complete Installation from Scratch

Step 1: Install Kubernetes

Step 2: Install AMD GPU Operator

Step 3: Deploy vLLM AI Inference

Architecture Overview

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages