Skip to content
Open
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
edd2004
Create AKS folder and SKILL.md
julia-yin Feb 25, 2026
a4eab8e
Add azure-kubernetes to skill.json
julia-yin Feb 25, 2026
2cf0363
Update skills.json
julia-yin Feb 25, 2026
da83ce2
Merge branch 'main' into main
julia-yin Feb 25, 2026
59186b0
Fix issue of postgres skill missing from skills.json
julia-yin Feb 25, 2026
ac9301a
Fix skills.json
julia-yin Feb 25, 2026
f24eb8e
Add AKS to architecture.md and testing for AKS skill
julia-yin Feb 27, 2026
16c29c8
Update plugin/skills/azure-kubernetes/SKILL.md
julia-yin Feb 28, 2026
9dc9578
Update SKILL.md
julia-yin Feb 28, 2026
278d7a0
Merge branch 'main' of https://github.com/julia-yin/GitHub-Copilot-fo…
julia-yin Feb 28, 2026
6e2ab85
Merge branch 'main' into main
julia-yin Feb 28, 2026
3f6e3a6
Remove trailing empty lines
julia-yin Feb 28, 2026
3428b30
Merge branch 'main' of https://github.com/julia-yin/GitHub-Copilot-fo…
julia-yin Feb 28, 2026
35c636c
Add AKS to integration test schedule
julia-yin Feb 28, 2026
4995afd
Fix pr.yaml creating leading space
julia-yin Feb 28, 2026
2f940d5
Update SKILL.md
julia-yin Feb 28, 2026
1a92efd
Update triggers.test.ts.snap
julia-yin Feb 28, 2026
d58b49b
Add in missing best practices (ephemeral disk, auto upgrades, reliabi…
julia-yin Mar 2, 2026
fc92679
Add security best practices
julia-yin Mar 2, 2026
f63b19d
Merge branch 'main' into main
julia-yin Mar 2, 2026
b47bed8
Streamline and reduce token count
julia-yin Mar 2, 2026
a2acfc2
Add azure-kubernetes to skills.json
julia-yin Mar 2, 2026
1b8e483
Fix naming issues
julia-yin Mar 2, 2026
2b11b8c
Update trigger and unit tests
julia-yin Mar 2, 2026
4a0a598
Bump azure-prepare version to 1.0.1
julia-yin Mar 3, 2026
7dc8f22
Fix metadata.version
julia-yin Mar 3, 2026
0862041
Add metadata to azure-kubernetes skill
julia-yin Mar 3, 2026
77758cd
Merge branch 'main' into main
julia-yin Mar 3, 2026
a74df39
Apply suggestion from @Copilot
julia-yin Mar 3, 2026
9ac4bda
Apply suggestion from @Copilot
julia-yin Mar 3, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
252 changes: 252 additions & 0 deletions plugin/skills/azure-kubernetes/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,252 @@
---
name: azure-kubernetes
description: >-
Plan and create production-ready Azure Kubernetes Service (AKS) clusters. Covers Day-0 decisions (networking, API server access, pod IP model), Day-1 configuration (identity, secrets, governance, observability), cluster SKUs (Automatic vs Standard), workload identity, Key Vault CSI, Azure Policy, deployment safeguards, monitoring with Prometheus/Grafana, upgrade strategies, and cost analysis.
USE FOR: create AKS cluster, AKS cluster planning, AKS networking design, security design, upgrade settings, autoscaling, AKS monitoring, AKS cost analysis, AKS production best practices, AKS Automatic vs Standard, AKS add-ons
DO NOT USE FOR: debugging AKS issues (use azure-diagnostics), deploying applications to AKS (use azure-deploy), creating other Azure resources (use azure-prepare), setting up general monitoring (use azure-observability), general cost optimization strategies (use azure-cost-optimization)
---
Comment on lines +1 to +7
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The skill description/WHEN list includes many generic words (e.g., “plan”, “create”, “best”, “practices”, “deploy”). In this repo’s trigger tests, TriggerMatcher adds every description word >3 chars as a keyword and triggers on >=2 matches, which increases the chance of false positives (e.g., unrelated prompts containing “create” + “deploy” + “container”). Consider tightening the description/WHEN phrases to be more AKS-specific so keyword extraction stays discriminative.

Copilot uses AI. Check for mistakes.

# Azure Kubernetes Service

> **AUTHORITATIVE GUIDANCE — MANDATORY COMPLIANCE**
>
> This document is the **official source** for setting up best practice Azure Kubernetes Service clusters. Follow these instructions to create and configure AKS clusters that are aligned with the user's requirements.

## Triggers
Activate this skill when user wants to:
- Create a new AKS cluster
- Plan AKS cluster configuration for production workloads
- Design AKS networking (API server access, pod IP model, egress)
Comment on lines +15 to +27
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This SKILL.md doesn’t follow the repository’s Skill File Authoring Guidelines required section structure (Quick Reference, When to Use This Skill, MCP Tools, Workflow/Steps, Error Handling). Please restructure the document to include those sections/tables so it’s consistent with other plugin skills and easier to scan.

Copilot uses AI. Check for mistakes.
- Set up AKS identity and secrets management
- Configure AKS governance (Azure Policy, Deployment Safeguards)
- Enable AKS observability (monitoring, Prometheus, Grafana)
- Define AKS upgrade and patching strategy
- Enable AKS cost visibility and analysis
- Understand AKS Automatic vs Standard SKU differences
- Get a Day-0 checklist for AKS cluster setup and configuration

## Rules

1. Start with the user's requirements for provisioning compute, networking, security, and other settings.
2. Use the AKS MCP server for invoking Azure API and kubectl commands when applicable during the cluster setup and operations processes.
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rule 2 refers to an "AKS MCP server", but repo MCP config only defines a generic azure MCP server (plugin/.mcp.json). This will lead agents to look for a non-existent server; please update the rule to reference the Azure MCP server and the relevant AKS-related MCP tools (or CLI) explicitly.

Suggested change
2. Use the AKS MCP server for invoking Azure API and kubectl commands when applicable during the cluster setup and operations processes.
2. Use the `azure` MCP server and its AKS-related MCP tools to invoke Azure APIs and perform AKS and kubectl operations whenever possible during cluster setup and ongoing operations; if required functionality is not available via MCP tools, fall back to Azure CLI and kubectl commands.

Copilot uses AI. Check for mistakes.
3. Determine if AKS Automatic or Standard SKU is more appropriate based on the user's need for control vs convenience. Default to AKS Automatic unless specific customizations are required.
4. Document decisions and rationale for cluster configuration choices, especially for Day-0 decisions that are hard to change later (networking, API server access).

Comment on lines +15 to +41
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing required MCP Tools section: the repo’s skill authoring guidelines require an explicit “MCP Tools” section with a table of commands/parameters (not just listing tool names in Quick Reference). See .github/instructions/skill-files.instructions.md (Required Sections #3).

Copilot uses AI. Check for mistakes.
---

## Overview
This skill guides a user through planning and creating an Azure Kubernetes Service (AKS) cluster using public best practices for:
- cluster mode selection (Automatic vs Standard),
- networking (API server access, egress, pod IP model),
- identity (Microsoft Entra + Workload Identity),
- secrets management (Key Vault CSI),
- governance (Azure Policy + Deployment Safeguards),
- observability (Azure Monitor, Managed Prometheus, Managed Grafana),
- upgrades/patching (auto-upgrade channels, maintenance windows),
- cost visibility (AKS Cost Analysis).

References are public and included at the end.

---

## When to Use
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Section header "## When to Use" is inconsistent with the repo’s common "## When to Use This Skill" heading (e.g., plugin/skills/appinsights-instrumentation/SKILL.md:18 and plugin/skills/azure-messaging/SKILL.md:25). Aligning the heading improves consistency and navigation across skills.

Suggested change
## When to Use
## When to Use This Skill

Copilot uses AI. Check for mistakes.
Use this skill when a user asks:
- “What do I need to decide before creating AKS?”
- “Create an AKS cluster plan/design for production”
- “AKS networking: overlay vs pod subnet vs node subnet”
- “How do I set up Workload Identity / Key Vault CSI / Azure Policy?”
- “How do I configure upgrades, patching, and observability on AKS?”

---

## Goals / Outcomes
1. Produce a **recommended AKS cluster configuration** based on user requirements (security, scale, connectivity, compliance).
2. Provide a **Day-0 checklist** (decisions that are hard to change later, like networking and API server exposure).
3. Provide a **Day-1 checklist** (baseline add-ons and settings for production readiness).
4. Optionally output a **command/IaC skeleton** (placeholders only unless user provides values).

---

## Required Inputs (Ask only what’s needed)
If the user is unsure, use safe defaults.

### A) Environment & scale
- Environment: `dev/test` or `production`
- Region(s) + availability zones needed?
- Expected scale: node count / cluster count (single vs multi)

### B) Networking requirements (Day-0 critical)
- API server access:
- Public API server or Private cluster?
- Pod IP model:
- Do pods need **direct routable IPs in the VNet**?
- Egress control:
- Default outbound, NAT Gateway, or UDR + firewall/NVA?

### C) Identity & security posture
- Microsoft Entra RBAC required?
- Need pod-to-Azure access with **Workload Identity**?
- Regulated environment needs (private cluster, policy enforcement, restricted egress)?

---

## Outputs (What the Skill Produces)
### Primary Output: “AKS Setup Plan”
1. Cluster type recommendation (Automatic vs Standard)
2. Networking plan (control plane access, egress choice, pod IP model)
3. Node pools + scaling plan
4. Security baseline (identity, secrets, policy)
5. Observability baseline (metrics/logs/dashboards/alerts)
6. Upgrade & patching plan
7. Cost controls baseline
8. Day-0 checklist + Day-1 checklist

### Optional Outputs
- CLI skeleton (placeholders)
- IaC outline (Bicep/Terraform module list)

---

## Decision Framework (Defaults when user is unsure)

### 1) Cluster Type
- Prefer **AKS Automatic** when you want a production-oriented, opinionated setup with many best practices preconfigured.
- Prefer **AKS Standard** when you need maximum control and customizations.
Docs: AKS Automatic overview: https://learn.microsoft.com/azure/aks/intro-aks-automatic

### 2) Pod Networking Model (Key Day-0 decision)
- Prefer **Azure CNI Overlay** for scalability and conserving VNet IP space.
Docs: https://learn.microsoft.com/azure/aks/azure-cni-overlay

If pods must be directly addressable/routable in your VNet, use VNet-based Azure CNI options:
- Azure CNI with pod subnet or node subnet models (see Azure CNI overlay + related networking docs)

### 3) Dataplane / Network Policy
- Consider **Azure CNI powered by Cilium** for eBPF-based performance and policy/observability features.
Docs: https://learn.microsoft.com/azure/aks/azure-cni-powered-by-cilium

### 4) Workload Identity (Preferred for pod-to-Azure auth)
- Prefer **Microsoft Entra Workload ID** for workloads calling Azure services without secrets.
Docs: https://learn.microsoft.com/azure/aks/workload-identity-overview

### 5) Secrets
- Prefer Azure Key Vault via **Secrets Store CSI Driver** provider.
Docs: https://learn.microsoft.com/azure/aks/csi-secrets-store-driver

### 6) Governance
- Enable **Azure Policy** (prereq) and **Deployment Safeguards** for workload best-practice enforcement.
Docs: Deployment Safeguards: https://learn.microsoft.com/azure/aks/deployment-safeguards

### 7) Observability
- Use Azure Monitor for AKS monitoring enablement (logs + Prometheus + Grafana).
Docs: https://learn.microsoft.com/azure/azure-monitor/containers/kubernetes-monitoring-enable
Prometheus overview: https://learn.microsoft.com/azure/azure-monitor/metrics/prometheus-metrics-overview

### 8) Upgrades & Patching
- Establish an upgrade strategy and ensure workloads are upgrade-safe (PDBs, probes, etc.).
Docs: AKS patch/upgrade guidance: https://learn.microsoft.com/azure/architecture/operator-guides/aks/aks-upgrade-practices

For node OS patching:
- Node OS auto-upgrade channels: https://learn.microsoft.com/azure/aks/auto-upgrade-node-os-image
For cluster version auto-upgrades:
- Cluster auto-upgrade channels: https://learn.microsoft.com/azure/aks/auto-upgrade-cluster

---

## Step-by-Step Execution (Agent Behavior)

### Step 1 — Classify scenario
Identify environment, compliance posture, region/AZ needs, scale, and workload types.

### Step 2 — Recommend cluster type
Recommend AKS Automatic or Standard with short rationale.
- AKS Automatic intro: https://learn.microsoft.com/azure/aks/intro-aks-automatic

### Step 3 — Lock networking (Day-0)
Ask:
- Public vs Private API server?
- Pod IP model: overlay vs VNet-routable requirement?
- Egress: LB vs NAT Gateway vs UDR+Firewall?

Reference: Azure CNI Overlay setup: https://learn.microsoft.com/azure/aks/azure-cni-overlay

### Step 4 — Node pools and compute
Recommend:
- system node pool + user node pools
- separate pools for GPU/batch/stateful if applicable
- capacity planning considerations (max pods per node affects IP planning, upgrades)

### Step 5 — Configure autoscaling
Recommend:
- HPA for pods
- Cluster Autoscaler / node scaling strategy
- If user wants higher automation, discuss Node Auto Provisioning where available (if asked)

### Step 6 — Identity and secrets
- Enable Workload Identity:
https://learn.microsoft.com/azure/aks/workload-identity-overview
- Use Key Vault CSI Driver:
https://learn.microsoft.com/azure/aks/csi-secrets-store-driver

### Step 7 — Policy & safeguards
- Turn on Azure Policy and Deployment Safeguards (warn/enforce).
Docs: https://learn.microsoft.com/azure/aks/deployment-safeguards

### Step 8 — Observability baseline
- Enable monitoring using Azure Monitor guidance:
https://learn.microsoft.com/azure/azure-monitor/containers/kubernetes-monitoring-enable
- Managed Prometheus overview:
https://learn.microsoft.com/azure/azure-monitor/metrics/prometheus-metrics-overview

### Step 9 — Upgrades & patching
- Define upgrade approach:
https://learn.microsoft.com/azure/architecture/operator-guides/aks/aks-upgrade-practices
- Configure node OS upgrade channels:
https://learn.microsoft.com/azure/aks/auto-upgrade-node-os-image
- Configure cluster autoupgrade channels:
https://learn.microsoft.com/azure/aks/auto-upgrade-cluster

### Step 10 — Cost visibility
- Enable AKS cost analysis add-on (OpenCost-based):
https://learn.microsoft.com/azure/aks/cost-analysis

Return a final output with:
- recommended config
- Day-0 checklist
- Day-1 checklist
- optional command/IaC skeleton

---

## Guardrails / Safety
- Do not request or output secrets (tokens, keys, subscription IDs).
- If requirements are ambiguous, propose 2–3 safe options with tradeoffs and choose a conservative default.
- Do not promise zero downtime; advise workload safeguards (PDBs, probes, replicas) and staged upgrades.
- If user asks for actions that require privileged access, provide a plan and commands with placeholders.

---

## Quality Bar
A high-quality answer:
- flags Day-0 irreversible choices (networking, API server access),
- includes identity/secrets/policy defaults (Workload ID + Key Vault CSI + safeguards),
- includes observability baseline,
- includes upgrade/patch plan,
- includes cost visibility.

---

## References (Public)
- AKS Automatic overview: https://learn.microsoft.com/azure/aks/intro-aks-automatic
- Azure CNI Overlay (setup and parameters): https://learn.microsoft.com/azure/aks/azure-cni-overlay
- Azure CNI powered by Cilium: https://learn.microsoft.com/azure/aks/azure-cni-powered-by-cilium
- Microsoft Entra Workload ID on AKS: https://learn.microsoft.com/azure/aks/workload-identity-overview
- Key Vault provider for Secrets Store CSI Driver: https://learn.microsoft.com/azure/aks/csi-secrets-store-driver
- Deployment Safeguards: https://learn.microsoft.com/azure/aks/deployment-safeguards
- Enable AKS monitoring (Prometheus + Grafana + logs): https://learn.microsoft.com/azure/azure-monitor/containers/kubernetes-monitoring-enable
- Azure Monitor managed Prometheus overview: https://learn.microsoft.com/azure/azure-monitor/metrics/prometheus-metrics-overview
- AKS patch & upgrade practices (Day-2 guidance): https://learn.microsoft.com/azure/architecture/operator-guides/aks/aks-upgrade-practices
- Node OS auto-upgrade channels: https://learn.microsoft.com/azure/aks/auto-upgrade-node-os-image
- Cluster auto-upgrade channels: https://learn.microsoft.com/azure/aks/auto-upgrade-cluster
- AKS cost analysis (OpenCost-based): https://learn.microsoft.com/azure/aks/cost-analysis
``
3 changes: 2 additions & 1 deletion tests/skills.json
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
"azure-deploy",
"azure-diagnostics",
"azure-hosted-copilot-sdk",
"azure-kubernetes",
"azure-kusto",
"azure-messaging",
"azure-observability",
Expand All @@ -23,6 +24,6 @@
"integrationTestSchedule": {
"0 5 * * *": "microsoft-foundry",
"0 8 * * *": "azure-deploy",
"0 12 * * *": "appinsights-instrumentation,azure-ai,azure-aigateway,azure-compliance,azure-cost-optimization,azure-diagnostics,azure-hosted-copilot-sdk,azure-kusto,azure-messaging,azure-observability,azure-prepare,azure-rbac,azure-resource-lookup,azure-resource-visualizer,azure-storage,azure-validate,entra-app-registration"
"0 12 * * *": "appinsights-instrumentation,azure-ai,azure-aigateway,azure-compliance,azure-cost-optimization,azure-diagnostics,azure-hosted-copilot-sdk,azure-kubernetes,azure-kusto,azure-messaging,azure-observability,azure-prepare,azure-rbac,azure-resource-lookup,azure-resource-visualizer,azure-storage,azure-validate,entra-app-registration"
}
}
Loading