Skip to content
Open
Show file tree
Hide file tree
Changes from 25 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
edd2004
Create AKS folder and SKILL.md
julia-yin Feb 25, 2026
a4eab8e
Add azure-kubernetes to skill.json
julia-yin Feb 25, 2026
2cf0363
Update skills.json
julia-yin Feb 25, 2026
da83ce2
Merge branch 'main' into main
julia-yin Feb 25, 2026
59186b0
Fix issue of postgres skill missing from skills.json
julia-yin Feb 25, 2026
ac9301a
Fix skills.json
julia-yin Feb 25, 2026
f24eb8e
Add AKS to architecture.md and testing for AKS skill
julia-yin Feb 27, 2026
16c29c8
Update plugin/skills/azure-kubernetes/SKILL.md
julia-yin Feb 28, 2026
9dc9578
Update SKILL.md
julia-yin Feb 28, 2026
278d7a0
Merge branch 'main' of https://github.com/julia-yin/GitHub-Copilot-fo…
julia-yin Feb 28, 2026
6e2ab85
Merge branch 'main' into main
julia-yin Feb 28, 2026
3f6e3a6
Remove trailing empty lines
julia-yin Feb 28, 2026
3428b30
Merge branch 'main' of https://github.com/julia-yin/GitHub-Copilot-fo…
julia-yin Feb 28, 2026
35c636c
Add AKS to integration test schedule
julia-yin Feb 28, 2026
4995afd
Fix pr.yaml creating leading space
julia-yin Feb 28, 2026
2f940d5
Update SKILL.md
julia-yin Feb 28, 2026
1a92efd
Update triggers.test.ts.snap
julia-yin Feb 28, 2026
d58b49b
Add in missing best practices (ephemeral disk, auto upgrades, reliabi…
julia-yin Mar 2, 2026
fc92679
Add security best practices
julia-yin Mar 2, 2026
f63b19d
Merge branch 'main' into main
julia-yin Mar 2, 2026
b47bed8
Streamline and reduce token count
julia-yin Mar 2, 2026
a2acfc2
Add azure-kubernetes to skills.json
julia-yin Mar 2, 2026
1b8e483
Fix naming issues
julia-yin Mar 2, 2026
2b11b8c
Update trigger and unit tests
julia-yin Mar 2, 2026
4a0a598
Bump azure-prepare version to 1.0.1
julia-yin Mar 3, 2026
7dc8f22
Fix metadata.version
julia-yin Mar 3, 2026
0862041
Add metadata to azure-kubernetes skill
julia-yin Mar 3, 2026
77758cd
Merge branch 'main' into main
julia-yin Mar 3, 2026
a74df39
Apply suggestion from @Copilot
julia-yin Mar 3, 2026
9ac4bda
Apply suggestion from @Copilot
julia-yin Mar 3, 2026
e7e5c15
Bump azure-prepare skill version
julia-yin Mar 4, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -223,6 +223,7 @@ jobs:
SKILL_NAME=$(basename "$(dirname "$file")")
SKILLS+=("$SKILL_NAME")
done
SKILLS="${SKILLS# }"
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SKILLS is a bash array here; SKILLS="${SKILLS# }" converts it into a scalar string (first element only), so npm run frontmatter -- "${SKILLS[@]}" will no longer validate all changed skills. Remove this line, or (if you intended to trim) operate on a separate scalar variable and keep SKILLS as an array.

Suggested change
SKILLS="${SKILLS# }"

Copilot uses AI. Check for mistakes.
# Run frontmatter spec validation
if OUTPUT=$(npm run frontmatter -- "${SKILLS[@]}" 2>&1); then
Expand Down
124 changes: 124 additions & 0 deletions plugin/skills/azure-kubernetes/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
---
name: azure-kubernetes
description: "Plan and create production-ready Azure Kubernetes Service (AKS) clusters. Covers Day-0 decisions and Day-1 configuration, cluster SKUs (Automatic vs Standard), security, monitoring, reliability/performance best practices, upgrades, and networking. WHEN: create AKS cluster, plan AKS configuration, design AKS networking, AKS Automatic vs Standard, AKS security, AKS upgrade strategy, AKS autoscaling, AKS monitoring setup, AKS cost analysis, Day-0 checklist."
---
Comment on lines +1 to +7
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The skill description/WHEN list includes many generic words (e.g., “plan”, “create”, “best”, “practices”, “deploy”). In this repo’s trigger tests, TriggerMatcher adds every description word >3 chars as a keyword and triggers on >=2 matches, which increases the chance of false positives (e.g., unrelated prompts containing “create” + “deploy” + “container”). Consider tightening the description/WHEN phrases to be more AKS-specific so keyword extraction stays discriminative.

Copilot uses AI. Check for mistakes.

# Azure Kubernetes Service

> **AUTHORITATIVE GUIDANCE — MANDATORY COMPLIANCE**
>
> This skill produces a **recommended AKS cluster configuration** based on user requirements, distinguishing **Day-0 decisions** (networking, API server — hard to change later) from **Day-1 features** (can enable post-creation). See [CLI reference](./references/cli-reference.md) for commands.

## Quick Reference
| Property | Value |
|----------|-------|
| Best for | AKS cluster planning and Day-0 decisions |
| MCP Tools | `mcp_azure_mcp_aks`, `mcp_aks_mcp_az_aks_operations` |
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MCP tool name mcp_aks_mcp_az_aks_operations doesn’t match the mcp_azure_mcp_* naming used elsewhere in this repo, and it isn’t referenced anywhere else. If this is meant to be an Azure MCP tool, rename it to the correct tool identifier (or remove it) so the skill doesn’t instruct agents to call a non-existent tool.

Suggested change
| MCP Tools | `mcp_azure_mcp_aks`, `mcp_aks_mcp_az_aks_operations` |
| MCP Tools | `mcp_azure_mcp_aks`, `mcp_azure_mcp_az_aks_operations` |

Copilot uses AI. Check for mistakes.
| CLI | `az aks create`, `az aks show` |
| Related skills | azure-diagnostics (troubleshooting), azure-deploy (app deployment) |
Comment on lines +18 to +21
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MCP tool names in Quick Reference (mcp_azure_mcp_aks, mcp_aks_mcp_az_aks_operations) don’t match the MCP tool naming used elsewhere in this repo (e.g., aks is referenced as the dedicated AKS MCP tool in azure-resource-lookup). These identifiers also aren’t referenced anywhere else in the repo, so they’re likely incorrect. Please align this row with the actual tool name(s) used by the AKS MCP server (or remove the row if tool support isn’t available).

Copilot uses AI. Check for mistakes.

## When to Use This Skill
Activate this skill when user wants to:
- Create a new AKS cluster
- Plan AKS cluster configuration for production workloads
- Design AKS networking (API server access, pod IP model, egress)
Comment on lines +15 to +27
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This SKILL.md doesn’t follow the repository’s Skill File Authoring Guidelines required section structure (Quick Reference, When to Use This Skill, MCP Tools, Workflow/Steps, Error Handling). Please restructure the document to include those sections/tables so it’s consistent with other plugin skills and easier to scan.

Copilot uses AI. Check for mistakes.
- Set up AKS identity and secrets management
- Configure AKS governance (Azure Policy, Deployment Safeguards)
- Enable AKS observability (monitoring, Prometheus, Grafana)
- Define AKS upgrade and patching strategy
- Enable AKS cost visibility and analysis
- Understand AKS Automatic vs Standard SKU differences
- Get a Day-0 checklist for AKS cluster setup and configuration

## Rules
1. Start with the user's requirements for provisioning compute, networking, security, and other settings.
2. Use the AKS MCP server for invoking Azure API and kubectl commands when applicable during the cluster setup and operations processes.
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rule 2 refers to an "AKS MCP server", but repo MCP config only defines a generic azure MCP server (plugin/.mcp.json). This will lead agents to look for a non-existent server; please update the rule to reference the Azure MCP server and the relevant AKS-related MCP tools (or CLI) explicitly.

Suggested change
2. Use the AKS MCP server for invoking Azure API and kubectl commands when applicable during the cluster setup and operations processes.
2. Use the `azure` MCP server and its AKS-related MCP tools to invoke Azure APIs and perform AKS and kubectl operations whenever possible during cluster setup and ongoing operations; if required functionality is not available via MCP tools, fall back to Azure CLI and kubectl commands.

Copilot uses AI. Check for mistakes.
3. Determine if AKS Automatic or Standard SKU is more appropriate based on the user's need for control vs convenience. Default to AKS Automatic unless specific customizations are required.
4. Document decisions and rationale for cluster configuration choices, especially for Day-0 decisions that are hard to change later (networking, API server access).

Comment on lines +15 to +41
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing required MCP Tools section: the repo’s skill authoring guidelines require an explicit “MCP Tools” section with a table of commands/parameters (not just listing tool names in Quick Reference). See .github/instructions/skill-files.instructions.md (Required Sections #3).

Copilot uses AI. Check for mistakes.

## Required Inputs (Ask only what’s needed)
If the user is unsure, use safe defaults.
- Cluster environment: dev/test or production
- Region(s), availability zones, preferred node VM sizes
- Expected scale (node/cluster count, workload size)
- Networking requirements (API server access, pod IP model, ingress/egress control)
- Security and identity requirements, including image registry
- Upgrade and observability preferences
- Cost constraints

## Workflow

### 1. Cluster Type
- **AKS Automatic** (default): Best for most production workloads, provides a curated experience with pre-configured best practices for security, reliability, and performance. Use unless you have specific custom requirements for networking, autoscaling, or node pool configurations not supported by NAP.
- **AKS Standard**: Use if you need full control over cluster configuration, will require additional overhead to setup and manage.

### 2. Networking (Pod IP, Egress, Ingress, Dataplane)

**Pod IP Model** (Key Day-0 decision):
- **Azure CNI Overlay** (recommended): pod IPs from private overlay range, not VNet-routable, scales to large clusters and good for most workloads
- **Azure CNI (VNet-routable)**: pod IPs directly from VNet (pod subnet or node subnet), use when pods must be directly addressable from VNet or on-prem
- Docs: https://learn.microsoft.com/azure/aks/azure-cni-overlay

**Dataplane & Network Policy**:
- **Azure CNI powered by Cilium** (recommended): eBPF-based for high-performance packet processing, network policies, and observability

**Egress**:
- **Static Egress Gateway** for stable, predictable outbound IPs
- For restricted egress: UDR + Azure Firewall or NVA

**Ingress**:
- **App Routing addon with Gateway API** — recommended default for HTTP/HTTPS workloads
- **Istio service mesh with Gateway API** — for advanced traffic management, mTLS, canary deployments
- **Application Gateway for Containers** — for L7 load balancing with WAF integration

**DNS**:
- Enable **LocalDNS** on all node pools for reliable, performant DNS resolution

### 3. Security
- Use **Microsoft Entra ID** everywhere (control plane, Workload Identity for pods, node access). Avoid static credentials.
- Azure Key Vault via **Secrets Store CSI Driver** for secrets
- Enable **Azure Policy** + **Deployment Safeguards**
- Enable **Encryption at rest** for etcd/API server; **in-transit** for node-to-node
- Allow only signed, policy-approved images (Azure Policy + Ratify), prefer **Azure Container Registry**
- **Isolation**: Use namespaces, network policies, scoped logging

### 4. Observability
- Use Azure Monitor and Container Insights for AKS monitoring enablement (logs + Prometheus + Grafana).

### 5. Upgrades & Patching
- Configure **Maintenance Windows** for controlled upgrade timing
- Enable **auto-upgrades** for cluster and node OS to stay up-to-date with security patches and Kubernetes versions
- Consider **LTS versions** for enterprise stability (2-year support) by upgrading your cluster to the AKS Premium tier
- **Multi-cluster upgrades**: Use **AKS Fleet Manager** for staged rollout across test → production clusters

### 6. Performance
- Use **Ephemeral OS disks** (`--node-osdisk-type Ephemeral`) for faster node startup
- Select **Azure Linux** as node OS (smaller footprint, faster boot)
- Enable **KEDA** for event-driven autoscaling beyond HPA

### 7. Node Pools & Compute
- **Dedicated system node pool**: At least 2 nodes, tainted for system workloads only (`CriticalAddonsOnly`)
- Enable **Node Auto Provisioning (NAP)** on all pools for cost savings and responsive scaling
- Use **latest generation SKUs (v5/v6)** for host-level optimizations
- **Avoid B-series VMs** — burstable SKUs cause performance/reliability issues
- Use SKUs with **at least 4 vCPUs** for production workloads
- Set **topology spread constraints** to distribute pods across hosts/zones per SLO

### 8. Reliability
- Deploy across **3 Availability Zones** (`--zones 1 2 3`)
- Use **Standard tier** for zone-redundant control plane + 99.95% SLA for API server availability
- Enable **Microsoft Defender for Containers** for runtime protection
- Configure **PodDisruptionBudgets** for all production workloads
- Use **topology spread constraints** to ensure pod distribution across failure domains

### 9. Cost Controls
- Use **Spot node pools** for batch/interruptible workloads (up to 90% savings)
- **Stop/Start** dev/test clusters: `az aks stop/start`
- Consider **Reserved Instances** or **Savings Plans** for steady-state workloads

## Guardrails / Safety
- Do not request or output secrets (tokens, keys, subscription IDs).
- If requirements are ambiguous for day-0 critical decisions, ask the user clarifying questions. For day-1 enabled features, propose 2–3 safe options with tradeoffs and choose a conservative default.
- Do not promise zero downtime; advise workload safeguards (PDBs, probes, replicas) and staged upgrades along with best practices for reliability and performance.
- If user asks for actions that require privileged access, provide a plan and commands with placeholders.
33 changes: 33 additions & 0 deletions plugin/skills/azure-kubernetes/references/cli-reference.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# CLI Reference for AKS

```bash
# List AKS clusters
az aks list --output table

# Show cluster details
az aks show --name CLUSTER --resource-group RG

# Get available Kubernetes versions
az aks get-versions --location LOCATION --output table

# Create AKS Automatic cluster
az aks create --name CLUSTER --resource-group RG --sku automatic \
--network-plugin azure --network-plugin-mode overlay \
--enable-oidc-issuer --enable-workload-identity

# Create AKS Standard cluster
az aks create --name CLUSTER --resource-group RG \
--node-count 3 --zones 1 2 3 \
--network-plugin azure --network-plugin-mode overlay \
--enable-cluster-autoscaler --min-count 1 --max-count 10

# Get credentials
az aks get-credentials --name CLUSTER --resource-group RG

# List node pools
az aks nodepool list --cluster-name CLUSTER --resource-group RG --output table

# Enable monitoring
az aks enable-addons --name CLUSTER --resource-group RG \
--addons monitoring --workspace-resource-id WORKSPACE_ID
```
1 change: 1 addition & 0 deletions plugin/skills/azure-prepare/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
---
name: azure-prepare
version: 1.0.1
description: "Default entry point for Azure application development EXCEPT cross-cloud migration — use azure-cloud-migrate instead. Analyzes your project and prepares it for Azure deployment by generating infrastructure code (Bicep/Terraform), azure.yaml, and Dockerfiles. WHEN: \"create an app\", \"build a web app\", \"create API\", \"create frontend\", \"create backend\", \"add a feature\", \"build a service\", \"develop a project\", \"modernize my code\", \"update my application\", \"add database\", \"add authentication\", \"add caching\", \"deploy to Azure\", \"host on Azure\", \"Azure with terraform\", \"Azure with azd\", \"generate azure.yaml\", \"generate Bicep\", \"generate Terraform\", \"create Azure Functions app\", \"create serverless HTTP API\", \"create function app\", \"create event-driven function\", \"create and deploy to Azure\", \"create Azure Functions and deploy\", \"create function app and deploy\"."
---

Expand Down
38 changes: 33 additions & 5 deletions plugin/skills/azure-prepare/references/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,18 +21,46 @@ Select hosting stack and map components to Azure services.
| Long-running processes | ✓✓ | | ✓ |
| Minimal ops overhead | | ✓✓ | ✓ |

### Container Hosting: Container Apps vs AKS

| Factor | Container Apps | AKS |
|--------|:--------------:|:---:|
| **Scale to zero** | ✓✓ | |
| **Kubernetes API access** | | ✓✓ |
| **Custom operators/CRDs** | | ✓✓ |
| **Service mesh** | Dapr (built-in) | Istio, Cilium |
| **GPU workloads** | | ✓✓ |
| **Best for** | Microservices, event-driven | Full K8s control, complex workloads |

#### When to Use Container Apps
- Microservices without Kubernetes complexity
- Event-driven workloads (KEDA built-in)
- Need scale-to-zero for cost optimization
- Teams without Kubernetes expertise

#### When to Use AKS
- Need Kubernetes API/kubectl access
- Require custom operators or CRDs
- Service mesh requirements (Istio, Linkerd)
- GPU/ML workloads
- Complex networking or multi-tenant architectures

> **AKS Planning:** For AKS SKU selection (Automatic vs Standard), networking, identity, scaling, and security configuration, invoke the **azure-kubernetes** skill.

## Service Mapping

### Hosting

| Component Type | Primary Service | Alternatives |
|----------------|-----------------|--------------|
| SPA Frontend | Static Web Apps | Blob + CDN |
| SSR Web App | Container Apps | App Service |
| REST/GraphQL API | Container Apps | App Service, Functions |
| Background Worker | Container Apps | Functions |
| Scheduled Task | Functions (Timer) | Container Apps Jobs |
| Event Processor | Functions | Container Apps |
| SSR Web App | Container Apps | App Service, AKS |
| REST/GraphQL API | Container Apps | App Service, Functions, AKS |
| Background Worker | Container Apps | Functions, AKS |
| Scheduled Task | Functions (Timer) | Container Apps Jobs, AKS CronJob |
| Event Processor | Functions | Container Apps, AKS + KEDA |
| Microservices (full K8s) | AKS | Container Apps |
| GPU/ML Workloads | AKS | Azure ML |

### Data

Expand Down
103 changes: 103 additions & 0 deletions tests/azure-kubernetes/__snapshots__/triggers.test.ts.snap
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
// Jest Snapshot v1, https://goo.gl/fbAQLP

exports[`azure-kubernetes - Trigger Tests Trigger Keywords Snapshot skill description triggers match snapshot 1`] = `
{
"description": "Plan and create production-ready Azure Kubernetes Service (AKS) clusters. Covers Day-0 decisions and Day-1 configuration, cluster SKUs (Automatic vs Standard), security, monitoring, reliability/performance best practices, upgrades, and networking. WHEN: create AKS cluster, plan AKS configuration, design AKS networking, AKS Automatic vs Standard, AKS security, AKS upgrade strategy, AKS autoscaling, AKS monitoring setup, AKS cost analysis, Day-0 checklist.",
"extractedKeywords": [
"aks",
"analysis",
"automatic",
"autoscaling",
"azure",
"best",
"checklist",
"cli",
"cluster",
"clusters",
"configuration",
"container",
"cost",
"covers",
"create",
"day-0",
"day-1",
"decisions",
"deploy",
"design",
"diagnostic",
"entra",
"identity",
"key vault",
"kubernetes",
"mcp",
"monitor",
"monitoring",
"networking",
"observability",
"performance",
"plan",
"practices",
"production-ready",
"reliability",
"security",
"service",
"setup",
"skus",
"standard",
"strategy",
"upgrade",
"upgrades",
"when",
],
"name": "azure-kubernetes",
}
`;

exports[`azure-kubernetes - Trigger Tests Trigger Keywords Snapshot skill keywords match snapshot 1`] = `
[
"aks",
"analysis",
"automatic",
"autoscaling",
"azure",
"best",
"checklist",
"cli",
"cluster",
"clusters",
"configuration",
"container",
"cost",
"covers",
"create",
"day-0",
"day-1",
"decisions",
"deploy",
"design",
"diagnostic",
"entra",
"identity",
"key vault",
"kubernetes",
"mcp",
"monitor",
"monitoring",
"networking",
"observability",
"performance",
"plan",
"practices",
"production-ready",
"reliability",
"security",
"service",
"setup",
"skus",
"standard",
"strategy",
"upgrade",
"upgrades",
"when",
]
`;
Loading
Loading