diff --git a/.claude/skills/dev-cluster/SKILL.md b/.claude/skills/dev-cluster/SKILL.md index 157b2f834..26e7fd71a 100644 --- a/.claude/skills/dev-cluster/SKILL.md +++ b/.claude/skills/dev-cluster/SKILL.md @@ -1,658 +1,263 @@ --- name: dev-cluster -description: Manages Ambient Code Platform development clusters (kind/minikube) for testing changes +description: Manages Ambient Code Platform development clusters (kind/minikube) for testing changes. Handles cluster lifecycle, image builds, port forwarding with zombie cleanup, and deployment verification. --- -# Development Cluster Management Skill +# Development Cluster Management -You are an expert **Ambient Code Platform (ACP) DevOps Specialist**. Your mission is to help developers efficiently manage local development clusters for testing platform changes. +Manage local Kubernetes clusters for testing Ambient Code Platform changes. -## Your Role +## Components -Help developers test their code changes in local Kubernetes clusters (kind or minikube) by: -1. Understanding what components have changed -2. Determining which images need to be rebuilt -3. Managing cluster lifecycle (create, update, teardown) -4. Verifying deployments and troubleshooting issues +| Component | Location | Image | Deployment | +|-----------|----------|-------|------------| +| Backend | `components/backend` | `vteam_backend:latest` | `backend-api` | +| Frontend | `components/frontend` | `vteam_frontend:latest` | `frontend` | +| Operator | `components/operator` | `vteam_operator:latest` | `agentic-operator` | +| Runner | `components/runners/ambient-runner` | `vteam_claude_runner:latest` | (Job pods) | +| State Sync | `components/runners/state-sync` | `vteam_state_sync:latest` | (Job pods) | +| Public API | `components/public-api` | `vteam_public_api:latest` | `public-api` | -## Platform Architecture Understanding +## Port Forwarding Management -The Ambient Code Platform consists of these containerized components: +Port forwarding is the #1 source of dev-cluster pain. **Always use the manager script** — never run `kubectl port-forward` directly. -| Component | Location | Image Name | Purpose | -|-----------|----------|------------|---------| -| **Backend** | `components/backend` | `vteam_backend:latest` | Go API for K8s CRD management | -| **Frontend** | `components/frontend` | `vteam_frontend:latest` | NextJS web interface | -| **Operator** | `components/operator` | `vteam_operator:latest` | Kubernetes operator (Go) | -| **Runner** | `components/runners/claude-code-runner` | `vteam_claude_runner:latest` | Python Claude Code runner | -| **State Sync** | `components/runners/state-sync` | `vteam_state_sync:latest` | S3 persistence service | -| **Public API** | `components/public-api` | `vteam_public_api:latest` | External API gateway | +### The Script -## Development Cluster Options +Located at: `scripts/port-forward-manager.sh` (relative to this skill directory) -### Kind (Recommended) -**Best for:** Quick testing, CI/CD alignment, lightweight clusters - -**Commands:** -- `make kind-up` - Create cluster, deploy with Quay.io images -- `make kind-down` - Destroy cluster -- `make kind-port-forward` - Setup port forwarding (if needed) - -**Characteristics:** -- Uses production Quay.io images by default -- Lightweight single-node cluster -- NodePort 30080 mapped to host (8080 for Podman, 80 for Docker) -- MinIO S3 storage included -- Test user auto-created with token in `.env.test` - -**Access:** http://localhost:8080 (or http://localhost with Docker) - -### Minikube (Feature-rich) -**Best for:** Testing with local builds, full feature development - -**Commands:** -- `make local-up` - Create cluster, build and load local images -- `make local-down` - Stop services (keeps cluster) -- `make local-clean` - Destroy cluster -- `make local-rebuild` - Rebuild all components and restart -- `make local-reload-backend` - Rebuild and reload backend only -- `make local-reload-frontend` - Rebuild and reload frontend only -- `make local-reload-operator` - Rebuild and reload operator only -- `make local-status` - Check pod status -- `make local-logs-backend` - Follow backend logs -- `make local-logs-frontend` - Follow frontend logs -- `make local-logs-operator` - Follow operator logs - -**Characteristics:** -- Builds images locally from source -- Uses `localhost/` image prefix -- Includes ingress and storage-provisioner addons -- Authentication disabled (`DISABLE_AUTH=true`) -- Automatic port forwarding on macOS with Podman - -**Access:** http://localhost:3000 (frontend) / http://localhost:8080 (backend) - -## Workflow: Setting Up from a PR - -When a user provides a PR URL or number, follow this process: - -### Step 1: Fetch PR Details -```bash -# Get PR metadata (title, branch, changed files, state) -gh pr view --json title,headRefName,files,state,body -``` - -### Step 2: Checkout the PR Branch ```bash -git fetch origin -git checkout +SCRIPT=".claude/skills/dev-cluster/scripts/port-forward-manager.sh" ``` -### Step 3: Determine Affected Components -Analyze the changed files from the PR to identify which components need rebuilding (see component mapping below). Then follow the appropriate cluster workflow (Kind or Minikube). +### Standard Port Assignments -## Detecting the Container Engine +| Service | Local Port | Cluster Service | Used By | +|---------|-----------|-----------------|---------| +| backend | 8081 | backend-service:8080 | MCP servers, API testing | +| public-api | 8082 | public-api-service:8081 | mcp-acp, external clients | +| frontend | 8080 | frontend-service:3000 | Browser (optional — NodePort often sufficient) | -**Before any build step**, detect which container engine is available: +**Kind with Docker**: Frontend is accessible at `http://localhost` via NodePort (30080→80). No port-forward needed for frontend. Only forward backend and public-api. -```bash -# Check which engine is available -if command -v docker &>/dev/null && docker info &>/dev/null 2>&1; then - CONTAINER_ENGINE=docker -elif command -v podman &>/dev/null && podman info &>/dev/null 2>&1; then - CONTAINER_ENGINE=podman -else - echo "ERROR: No container engine available" - exit 1 -fi -``` +**Kind with Podman**: Frontend at `http://localhost:8080` via NodePort. Forward backend and public-api. -**Always pass `CONTAINER_ENGINE=` to make commands:** -```bash -make build-frontend CONTAINER_ENGINE=docker -make build-all CONTAINER_ENGINE=docker -``` +### Mandatory Procedures -## Detecting the Access URL +#### Before Starting Port Forwards -After deployment, **check the actual port mapping** instead of assuming a fixed port: +**Always** run preflight first. This kills zombie processes and validates ports: ```bash -# For kind with Docker: check the container's published ports -docker ps --filter "name=ambient-local" --format "{{.Ports}}" -# Example output: 0.0.0.0:80->30080/tcp → access at http://localhost -# Example output: 0.0.0.0:8080->30080/tcp → access at http://localhost:8080 - -# Quick connectivity test -curl -s -o /dev/null -w "%{http_code}" http://localhost:80 +$SCRIPT preflight # Check default services (backend, public-api) +$SCRIPT preflight frontend # Also include frontend ``` -**Port mapping depends on the container engine:** -- **Docker**: host port 80 → http://localhost -- **Podman**: host port 8080 → http://localhost:8080 - -## Workflow: Testing Changes in Kind +Preflight does: +1. Finds and kills ALL `kubectl port-forward` processes for the `ambient-code` namespace +2. Removes stale PID files from `/tmp/ambient-code/port-forward/` +3. Checks each target port with `lsof` — fails if a non-kubectl process holds a port +4. Validates cluster reachability and namespace existence -When a user says something like "test this changeset in kind", follow this process: +#### Starting Port Forwards -### Step 1: Analyze Changes ```bash -# Check what files have changed -git status -git diff --name-only main...HEAD +$SCRIPT start # Start backend + public-api (default) +$SCRIPT start frontend # Start just frontend +$SCRIPT start backend public-api frontend # Start all three ``` -Determine which components are affected: -- Changes in `components/backend/` → backend -- Changes in `components/frontend/` → frontend -- Changes in `components/operator/` → operator -- Changes in `components/runners/claude-code-runner/` → runner -- Changes in `components/runners/state-sync/` → state-sync -- Changes in `components/public-api/` → public-api - -### Step 2: Explain the Plan -Tell the user: -``` -I found changes in: [list of components] - -To test these in kind, I'll: -1. Build the affected images: [list components] -2. Push them to a local registry or load into kind -3. Update the kind cluster to use these images -4. Verify the deployment - -Note: By default, kind uses production Quay.io images. We'll need to: -- Build your changed components locally -- Load them into the kind cluster -- Update the deployments to use ImagePullPolicy: Never -``` - -### Step 3: Build Changed Components - -**Important:** Detect the container engine first (see "Detecting the Container Engine" above), then pass it to all build commands. +#### Checking Health ```bash -# Build specific components — always pass CONTAINER_ENGINE -# Build backend (if changed) -make build-backend CONTAINER_ENGINE=$CONTAINER_ENGINE - -# Build frontend (if changed) -make build-frontend CONTAINER_ENGINE=$CONTAINER_ENGINE - -# Build operator (if changed) -make build-operator CONTAINER_ENGINE=$CONTAINER_ENGINE - -# Build runner (if changed) -make build-runner CONTAINER_ENGINE=$CONTAINER_ENGINE - -# Build state-sync (if changed) -make build-state-sync CONTAINER_ENGINE=$CONTAINER_ENGINE - -# Build public-api (if changed) -make build-public-api CONTAINER_ENGINE=$CONTAINER_ENGINE - -# Or build all at once -make build-all CONTAINER_ENGINE=$CONTAINER_ENGINE +$SCRIPT status # Shows PID, port, and HTTP health for each service ``` -### Step 4: Setup/Update Kind Cluster +#### Stopping -**If cluster doesn't exist:** ```bash -# Create kind cluster -make kind-up +$SCRIPT stop # Kills tracked processes + any untracked zombies ``` -**If cluster exists, load new images:** -```bash -# Load images into kind -kind load docker-image localhost/vteam_backend:latest --name ambient-local -kind load docker-image localhost/vteam_frontend:latest --name ambient-local -kind load docker-image localhost/vteam_operator:latest --name ambient-local -# ... for each rebuilt component -``` +#### Full Restart (Stop + Preflight + Start) -### Step 5: Update Deployments ```bash -# Update deployments to use local images and Never pull policy -kubectl set image deployment/backend backend=localhost/vteam_backend:latest -n ambient-code -kubectl set image deployment/frontend frontend=localhost/vteam_frontend:latest -n ambient-code -kubectl set image deployment/operator operator=localhost/vteam_operator:latest -n ambient-code - -# Update image pull policy -kubectl patch deployment backend -n ambient-code -p '{"spec":{"template":{"spec":{"containers":[{"name":"backend","imagePullPolicy":"Never"}]}}}}' -kubectl patch deployment frontend -n ambient-code -p '{"spec":{"template":{"spec":{"containers":[{"name":"frontend","imagePullPolicy":"Never"}]}}}}' -kubectl patch deployment operator -n ambient-code -p '{"spec":{"template":{"spec":{"containers":[{"name":"operator","imagePullPolicy":"Never"}]}}}}' - -# Restart deployments to pick up new images -kubectl rollout restart deployment/backend -n ambient-code -kubectl rollout restart deployment/frontend -n ambient-code -kubectl rollout restart deployment/operator -n ambient-code +$SCRIPT restart # Restart default services +$SCRIPT restart backend public-api frontend # Restart all ``` -### Step 6: Verify Deployment -```bash -# Wait for rollout to complete -kubectl rollout status deployment/backend -n ambient-code -kubectl rollout status deployment/frontend -n ambient-code -kubectl rollout status deployment/operator -n ambient-code +### When to Run Port Forwarding Operations -# Check pod status -kubectl get pods -n ambient-code +| Event | Action | +|-------|--------| +| After `make kind-up` | `$SCRIPT preflight && $SCRIPT start` | +| After `make kind-down` | `$SCRIPT stop` (also kills zombies from dead cluster) | +| After reloading a component | `$SCRIPT restart ` | +| Before building/deploying | `$SCRIPT status` (informational) | +| User reports "connection refused" | `$SCRIPT restart` | +| Starting a new session | `$SCRIPT status` then `$SCRIPT restart` if unhealthy | -# Check for errors -kubectl get events -n ambient-code --sort-by='.lastTimestamp' +### Never Do This -# Get pod details if issues -kubectl describe pod -l app=backend -n ambient-code -kubectl logs -l app=backend -n ambient-code --tail=50 -``` - -### Step 7: Provide Access Info - -**Detect the actual URL** by checking the kind container's port mapping (see "Detecting the Access URL" above), then provide the correct URL to the user. +- `kubectl port-forward ... &` — creates untracked zombies +- `make kind-port-forward` — launches untracked background processes with `wait` +- `pkill -f "kubectl port-forward"` — use `$SCRIPT stop` instead (it also cleans PID files) +- Assume ports are free without checking -``` -✓ Deployment complete! - -Access the platform at: -- Frontend: -- Test credentials: Check .env.test for the token +## Cluster Lifecycle -To view logs: - kubectl logs -f -l app=backend -n ambient-code - kubectl logs -f -l app=frontend -n ambient-code - kubectl logs -f -l app=operator -n ambient-code +### Kind (Recommended) -To teardown: - make kind-down +```bash +make kind-up # Create cluster + deploy (Quay.io images) +make kind-down # Destroy cluster ``` -## Workflow: Testing Changes in Minikube +Kind with Docker maps NodePort 30080 to host port 80 (frontend accessible at http://localhost). -When a user wants to test in minikube: +### Minikube -### Full Rebuild and Deploy ```bash -cd /workspace/repos/platform +make local-up # Create + build + deploy +make local-down # Stop (keep cluster) +make local-clean # Destroy cluster +make local-rebuild # Rebuild all + restart +make local-reload-backend # Rebuild/reload one component +make local-reload-frontend +make local-reload-operator +``` -# If cluster doesn't exist, this will create it and build everything -make local-up +## Workflow: Testing a PR in Kind -# If cluster exists and you want to rebuild everything -make local-rebuild -``` +### Step 1: Get PR and Checkout -### Incremental Updates (Faster) ```bash -# Just rebuild and reload specific components -make local-reload-backend # If only backend changed -make local-reload-frontend # If only frontend changed -make local-reload-operator # If only operator changed +gh pr view --json title,headRefName,files,state,body +gh pr checkout ``` -### Check Status -```bash -# Quick status check -make local-status +### Step 2: Determine Affected Components -# Detailed troubleshooting -make local-troubleshoot +Map changed files to components: +- `components/backend/` → backend +- `components/frontend/` → frontend +- `components/operator/` → operator +- `components/runners/ambient-runner/` → runner +- `components/runners/state-sync/` → state-sync +- `components/public-api/` → public-api -# Follow logs -make local-logs-backend -make local-logs-frontend -make local-logs-operator -``` - -## Common Tasks +### Step 3: Detect Container Engine -### "Bring up a fresh cluster" ```bash -# With kind (uses Quay.io images) -make kind-up - -# With minikube (builds from source) -make local-up +if command -v docker &>/dev/null && docker info &>/dev/null 2>&1; then + CONTAINER_ENGINE=docker +elif command -v podman &>/dev/null && podman info &>/dev/null 2>&1; then + CONTAINER_ENGINE=podman +fi ``` -### "Rebuild everything and test" -```bash -# With minikube -cd /workspace/repos/platform -make local-rebuild - -# With kind (requires manual steps) -cd /workspace/repos/platform -make build-all -# Then load images and update deployments (see Step 4-5 above) -``` +Always pass `CONTAINER_ENGINE=` to make commands. -### "Just rebuild the backend" -```bash -# With minikube -make local-reload-backend - -# With kind -make build-backend -kind load docker-image localhost/vteam_backend:latest --name ambient-local -kubectl set image deployment/backend backend=localhost/vteam_backend:latest -n ambient-code -kubectl rollout restart deployment/backend -n ambient-code -kubectl rollout status deployment/backend -n ambient-code -``` +### Step 4: Create Cluster (if needed) -### "Show me the logs" ```bash -# With minikube -make local-logs-backend -make local-logs-frontend -make local-logs-operator - -# With kind (or minikube, direct kubectl) -kubectl logs -f -l app=backend -n ambient-code -kubectl logs -f -l app=frontend -n ambient-code -kubectl logs -f -l app=operator -n ambient-code +make kind-up CONTAINER_ENGINE=$CONTAINER_ENGINE ``` -### "Tear down the cluster" -```bash -# With kind -make kind-down - -# With minikube (keep cluster) -make local-down - -# With minikube (delete cluster) -make local-clean -``` +### Step 5: Build Changed Components -### "Check if cluster is healthy" ```bash -# With minikube -make local-status -make local-test-quick - -# With kind or any cluster -kubectl get pods -n ambient-code -kubectl get events -n ambient-code --sort-by='.lastTimestamp' -kubectl get deployments -n ambient-code +make build-backend CONTAINER_ENGINE=$CONTAINER_ENGINE +make build-public-api CONTAINER_ENGINE=$CONTAINER_ENGINE +# etc. — only build what changed ``` -## Troubleshooting - -### Pods stuck in ImagePullBackOff -**Cause:** Cluster trying to pull images from registry but they don't exist or aren't accessible +### Step 6: Load and Deploy -**Solution for kind:** ```bash -# Ensure images are built locally -make build-all - # Load images into kind -kind load docker-image localhost/vteam_backend:latest --name ambient-local -kind load docker-image localhost/vteam_frontend:latest --name ambient-local -kind load docker-image localhost/vteam_operator:latest --name ambient-local +kind load docker-image vteam_backend:latest --name ambient-local +kind load docker-image vteam_public_api:latest --name ambient-local -# Update image pull policy -kubectl patch deployment backend -n ambient-code -p '{"spec":{"template":{"spec":{"containers":[{"name":"backend","imagePullPolicy":"Never"}]}}}}' -``` - -**Solution for minikube:** -```bash -# Minikube should handle this automatically, but if issues persist: -make local-rebuild -``` - -### Pods stuck in CrashLoopBackOff -**Cause:** Application is crashing on startup - -**Solution:** -```bash -# Check logs for the failing pod -kubectl logs -l app=backend -n ambient-code --tail=100 +# Update deployments +kubectl set image deployment/backend-api backend-api=vteam_backend:latest -n ambient-code +kubectl patch deployment backend-api -n ambient-code \ + -p '{"spec":{"template":{"spec":{"containers":[{"name":"backend-api","imagePullPolicy":"Never"}]}}}}' -# Check pod events -kubectl describe pod -l app=backend -n ambient-code +kubectl set image deployment/public-api public-api=vteam_public_api:latest -n ambient-code +kubectl patch deployment public-api -n ambient-code \ + -p '{"spec":{"template":{"spec":{"containers":[{"name":"public-api","imagePullPolicy":"Never"}]}}}}' -# Common issues: -# - Missing environment variables -# - Database connection failures -# - Invalid configuration +# Wait for rollout +kubectl rollout status deployment/backend-api -n ambient-code +kubectl rollout status deployment/public-api -n ambient-code ``` -### Port forwarding not working -**Cause:** Port already in use or forwarding process died +### Step 7: Setup Port Forwarding -**Solution for minikube:** ```bash -# Kill existing port-forward processes -pkill -f "kubectl port-forward" - -# Restart port forwarding -make local-up # Will setup port forwarding again -``` - -**Solution for kind:** -```bash -# Check NodePort mapping -kubectl get svc -n ambient-code - -# Manually setup port forwarding if needed -make kind-port-forward +$SCRIPT preflight && $SCRIPT start ``` -### Changes not reflected -**Cause:** Old image cached or deployment not restarted +### Step 8: Verify and Report -**Solution:** ```bash -# Force rebuild -make build-backend # (or whatever component) - -# Reload into cluster -kind load docker-image localhost/vteam_backend:latest --name ambient-local - -# Force restart -kubectl rollout restart deployment/backend -n ambient-code -kubectl rollout status deployment/backend -n ambient-code - -# Verify new pods are running -kubectl get pods -n ambient-code -l app=backend -kubectl describe pod -l app=backend -n ambient-code | grep Image: +kubectl get pods -n ambient-code +$SCRIPT status ``` -## Environment Variables - -Key environment variables that affect cluster behavior: - +Detect the frontend URL: ```bash -# Container runtime (detect automatically — see "Detecting the Container Engine") -CONTAINER_ENGINE=docker # or podman - -# Build platform -PLATFORM=linux/amd64 # or linux/arm64 - -# Namespace -NAMESPACE=ambient-code - -# Registry (for pushing images) -REGISTRY=quay.io/your-org +docker ps --filter "name=ambient-local" --format "{{.Ports}}" +# 0.0.0.0:80->30080/tcp → http://localhost ``` -## Fast Inner-Loop: Run Frontend Locally (No Image Rebuilds) +Report to user: +- Frontend URL (from port mapping) +- Backend: http://localhost:8081 +- Public API: http://localhost:8082 +- Test token: `kubectl get secret test-user-token -n ambient-code -o jsonpath='{.data.token}' | base64 -d` + +## Fast Inner-Loop: Frontend Dev Server -For **frontend-only changes**, skip image rebuilds entirely. Run NextJS locally with hot-reload against the backend in the kind cluster: +For frontend-only changes, skip image rebuilds: ```bash -# Terminal 1: port-forward backend from kind cluster -kubectl port-forward svc/backend-service 8081:8080 -n ambient-code +# Port-forward backend +$SCRIPT preflight && $SCRIPT start backend -# Terminal 2: set up frontend with auth token +# Run NextJS locally cd components/frontend -npm install # first time only - -# Create .env.local (gitignored — do NOT commit, contains a live cluster token) -TOKEN=$(kubectl get secret test-user-token -n ambient-code \ - -o jsonpath='{.data.token}' | base64 -d) +npm install +TOKEN=$(kubectl get secret test-user-token -n ambient-code -o jsonpath='{.data.token}' | base64 -d) cat > .env.local < +kubectl logs -l app=backend-api -n ambient-code --tail=100 +kubectl describe pod -l app=backend-api -n ambient-code ``` -**When to use:** -- Frontend-only changes (components, styles, pages, API routes) -- Iterating on UI features rapidly -- Debugging frontend issues - -**When NOT to use:** -- Backend, operator, or runner changes (those still need image rebuild + load) -- Testing changes to container configuration or deployment manifests - -## Best Practices - -1. **Use local dev server for frontend**: Fastest feedback loop, no image rebuilds needed -2. **Use kind for backend/operator validation**: When you need to rebuild non-frontend components -3. **Use minikube for development**: Better tooling for iterative development with `local-reload-*` commands -4. **Always check logs**: After deploying, verify pods started successfully -5. **Clean up when done**: `make kind-down` or `make local-clean` to free resources -6. **Check what changed first**: Use `git status` and `git diff` to understand scope -7. **Build only what changed**: Don't rebuild everything if only one component changed -8. **Verify image pull policy**: Ensure deployments use `imagePullPolicy: Never` for local images - -## Quick Reference - -### Decision Tree: Which Cluster Type? - -``` -Do you need to test local code changes? -├─ No → Use kind (make kind-up) -│ Fast, uses production images -│ -└─ Yes → Is the change frontend-only? - ├─ Yes → Run locally with npm run dev - │ Instant hot-reload, no image builds - │ - └─ No → Do you need to iterate frequently? - ├─ No → Use kind with manual image loading - │ Good for one-off tests - │ - └─ Yes → Use minikube (make local-up) - Best for development with hot-reload +### Port Forwarding Not Working +```bash +$SCRIPT status # Diagnose +$SCRIPT restart # Fix ``` -### Cheat Sheet - -| Task | Kind | Minikube | -|------|------|----------| -| Create cluster | `make kind-up` | `make local-up` | -| Rebuild all | Build + load + update | `make local-rebuild` | -| Rebuild backend | Build + load + restart | `make local-reload-backend` | -| Check status | `kubectl get pods -n ambient-code` | `make local-status` | -| View logs | `kubectl logs -f -l app=backend -n ambient-code` | `make local-logs-backend` | -| Tear down | `make kind-down` | `make local-clean` | -| Access URL | Detect from port mapping (Docker: `:80`, Podman: `:8080`) | http://localhost:3000 | - -## When to Invoke This Skill - -Invoke the dev-cluster skill when users say things like: -- "Test this in kind" -- "Deploy my changes to a local cluster" -- "Bring up a development environment" -- "I want to test these changes" -- "Create a kind cluster with my changes" -- "Rebuild and deploy the backend" -- "How do I test platform changes locally?" - -## Example Interactions - -### Example 1: Quick Test in Kind - -User: "Test this changeset in kind" - -Assistant (using dev-cluster skill): -1. Checks git status → finds backend changes -2. Explains: "I see changes in components/backend. I'll build the backend image, create a kind cluster, and deploy your changes." -3. Runs: `make build-backend` -4. Runs: `make kind-up` -5. Loads image: `kind load docker-image localhost/vteam_backend:latest --name ambient-local` -6. Updates deployment with local image and ImagePullPolicy: Never -7. Verifies: `kubectl rollout status deployment/backend -n ambient-code` -8. Provides access URL and log commands - -Result: User can test their backend changes at the detected URL (http://localhost for Docker, http://localhost:8080 for Podman) - -### Example 2: Incremental Development with Minikube - -User: "I'm working on the frontend, set me up for quick iterations" - -Assistant (using dev-cluster skill): -1. Runs: `make local-up` (creates cluster, builds all images) -2. Explains the reload commands available -3. User makes frontend changes -4. User says: "Reload the frontend" -5. Runs: `make local-reload-frontend` -6. Verifies deployment -7. User continues iterating with quick reload cycles - -Result: Fast iteration loop for frontend development - -### Example 3: Troubleshooting Deployment - -User: "The backend pod is crash looping" - -Assistant (using dev-cluster skill): -1. Checks pod status: `kubectl get pods -n ambient-code` -2. Gets logs: `kubectl logs -l app=backend -n ambient-code --tail=100` -3. Analyzes error messages -4. Checks events: `kubectl get events -n ambient-code --sort-by='.lastTimestamp'` -5. Identifies issue (e.g., missing env var, bad configuration) -6. Suggests fix -7. After fix applied, verifies: `kubectl rollout status deployment/backend -n ambient-code` - -Result: Issue diagnosed and resolved - -## Integration with Makefile - -This skill knows all the relevant Makefile targets in /workspace/repos/platform: - -- `make kind-up` - Create kind cluster -- `make kind-down` - Destroy kind cluster -- `make local-up` - Create minikube cluster with local builds -- `make local-down` - Stop minikube services -- `make local-clean` - Delete minikube cluster -- `make local-rebuild` - Rebuild all and restart -- `make local-reload-backend` - Rebuild/reload backend only -- `make local-reload-frontend` - Rebuild/reload frontend only -- `make local-reload-operator` - Rebuild/reload operator only -- `make build-all` - Build all container images -- `make build-backend` - Build backend image only -- `make build-frontend` - Build frontend image only -- `make build-operator` - Build operator image only -- `make local-status` - Check pod status -- `make local-logs-backend` - Follow backend logs -- `make local-logs-frontend` - Follow frontend logs -- `make local-logs-operator` - Follow operator logs \ No newline at end of file +### Changes Not Reflected +```bash +make build-backend CONTAINER_ENGINE=$CONTAINER_ENGINE +kind load docker-image vteam_backend:latest --name ambient-local +kubectl rollout restart deployment/backend-api -n ambient-code +kubectl rollout status deployment/backend-api -n ambient-code +``` diff --git a/.claude/skills/dev-cluster/scripts/port-forward-manager.sh b/.claude/skills/dev-cluster/scripts/port-forward-manager.sh new file mode 100755 index 000000000..94b596751 --- /dev/null +++ b/.claude/skills/dev-cluster/scripts/port-forward-manager.sh @@ -0,0 +1,293 @@ +#!/usr/bin/env bash +# Port-forward manager for Ambient Code Platform dev clusters. +# Handles preflight validation, clean startup, health checks, and teardown. +# +# Usage: +# port-forward-manager.sh preflight # Validate ports and kill zombies +# port-forward-manager.sh start [services...] # Start port-forwards (default: all) +# port-forward-manager.sh stop # Stop all port-forwards +# port-forward-manager.sh status # Check health of port-forwards +# port-forward-manager.sh restart [services...] # Stop + preflight + start +# +# Services: backend, public-api, frontend +# Default: backend public-api (frontend uses NodePort on kind) + +set -euo pipefail + +PID_DIR="/tmp/ambient-code/port-forward" +NAMESPACE="${NAMESPACE:-ambient-code}" +LOCK="${PID_DIR}/.lock" + +# Service definitions: name -> local_port:svc_name:svc_port +declare -A SERVICES=( + [backend]="8081:backend-service:8080" + [public-api]="8082:public-api-service:8081" + [frontend]="8080:frontend-service:3000" +) + +DEFAULT_SERVICES=(backend public-api) + +# --- helpers --- + +log() { echo " $*"; } +ok() { echo " ✓ $*"; } +warn() { echo " ⚠ $*" >&2; } +fail() { echo " ✗ $*" >&2; exit 1; } + +acquire_lock() { + mkdir -p "$PID_DIR" + if ! mkdir "$LOCK" 2>/dev/null; then + # Stale lock? Check if holder is alive. + local holder + holder=$(cat "$LOCK/pid" 2>/dev/null || echo "") + if [ -n "$holder" ] && kill -0 "$holder" 2>/dev/null; then + fail "Another port-forward-manager is running (PID $holder)" + fi + rm -rf "$LOCK" + mkdir "$LOCK" + fi + echo $$ > "$LOCK/pid" + trap 'rm -rf "$LOCK"' EXIT +} + +port_owner() { + # Returns the PID using a port, or empty string. + local port=$1 + lsof -ti "tcp:$port" -sTCP:LISTEN 2>/dev/null | head -1 || true +} + +is_our_process() { + # Check if a PID is one of our managed port-forwards. + local pid=$1 + local cmd + cmd=$(ps -p "$pid" -o command= 2>/dev/null || true) + [[ "$cmd" == *"kubectl port-forward"*"$NAMESPACE"* ]] +} + +pid_file() { echo "$PID_DIR/$1.pid"; } + +read_pid() { + local f + f=$(pid_file "$1") + [ -f "$f" ] && cat "$f" || true +} + +is_alive() { + local pid=$1 + [ -n "$pid" ] && kill -0 "$pid" 2>/dev/null +} + +# --- commands --- + +cmd_preflight() { + echo "Port-forward preflight check" + + # 1. Kill any zombie kubectl port-forward processes for our namespace + local zombies + zombies=$(pgrep -f "kubectl port-forward.*${NAMESPACE}" 2>/dev/null || true) + if [ -n "$zombies" ]; then + local count + count=$(echo "$zombies" | wc -l | tr -d ' ') + log "Found $count existing port-forward process(es), cleaning up..." + echo "$zombies" | while read -r pid; do + local cmd + cmd=$(ps -p "$pid" -o command= 2>/dev/null || true) + kill "$pid" 2>/dev/null && log "Killed PID $pid: $cmd" || true + done + sleep 0.5 + fi + + # 2. Clean stale PID files + rm -f "$PID_DIR"/*.pid 2>/dev/null || true + + # 3. Check target ports are free + local services=("${@:-${DEFAULT_SERVICES[@]}}") + local blocked=0 + for svc in "${services[@]}"; do + local spec="${SERVICES[$svc]:-}" + [ -z "$spec" ] && { warn "Unknown service: $svc"; continue; } + local port="${spec%%:*}" + local owner + owner=$(port_owner "$port") + if [ -n "$owner" ]; then + local cmd + cmd=$(ps -p "$owner" -o command= 2>/dev/null || echo "unknown") + warn "Port $port is in use by PID $owner ($cmd)" + blocked=1 + else + ok "Port $port is free ($svc)" + fi + done + + # 4. Verify cluster is reachable + if ! kubectl cluster-info >/dev/null 2>&1; then + fail "Cluster is not reachable (kubectl cluster-info failed)" + fi + ok "Cluster is reachable" + + # 5. Verify namespace exists + if ! kubectl get namespace "$NAMESPACE" >/dev/null 2>&1; then + fail "Namespace '$NAMESPACE' does not exist" + fi + ok "Namespace '$NAMESPACE' exists" + + if [ "$blocked" -eq 1 ]; then + fail "One or more ports are in use. Free them before starting port-forwards." + fi + + ok "Preflight passed" +} + +cmd_start() { + local services=("${@:-${DEFAULT_SERVICES[@]}}") + + acquire_lock + + echo "Starting port-forwards" + + mkdir -p "$PID_DIR" + + for svc in "${services[@]}"; do + local spec="${SERVICES[$svc]:-}" + [ -z "$spec" ] && { warn "Unknown service: $svc, skipping"; continue; } + + local port="${spec%%:*}" + local rest="${spec#*:}" + local svc_name="${rest%%:*}" + local svc_port="${rest#*:}" + + # Skip if already running and healthy + local existing_pid + existing_pid=$(read_pid "$svc") + if is_alive "$existing_pid"; then + ok "$svc already running (PID $existing_pid, localhost:$port)" + continue + fi + + # Wait for service endpoint to exist + if ! kubectl get svc "$svc_name" -n "$NAMESPACE" >/dev/null 2>&1; then + warn "Service $svc_name not found in $NAMESPACE, skipping $svc" + continue + fi + + # Start the port-forward + kubectl port-forward -n "$NAMESPACE" "svc/$svc_name" "$port:$svc_port" \ + >"$PID_DIR/$svc.log" 2>&1 & + local pid=$! + echo "$pid" > "$(pid_file "$svc")" + + # Verify it started (give it a moment to bind or fail) + sleep 0.5 + if is_alive "$pid"; then + ok "$svc → localhost:$port (PID $pid)" + else + warn "$svc failed to start. Check $PID_DIR/$svc.log" + cat "$PID_DIR/$svc.log" 2>/dev/null | tail -3 | while read -r line; do + log " $line" + done + fi + done +} + +cmd_stop() { + echo "Stopping port-forwards" + + # 1. Kill tracked processes + for svc in "${!SERVICES[@]}"; do + local pid + pid=$(read_pid "$svc") + if is_alive "$pid"; then + kill "$pid" 2>/dev/null && ok "Stopped $svc (PID $pid)" || true + fi + done + + # 2. Kill any untracked kubectl port-forward for our namespace (zombies) + local zombies + zombies=$(pgrep -f "kubectl port-forward.*${NAMESPACE}" 2>/dev/null || true) + if [ -n "$zombies" ]; then + echo "$zombies" | while read -r pid; do + kill "$pid" 2>/dev/null && log "Killed untracked PID $pid" || true + done + fi + + # 3. Clean up state + rm -f "$PID_DIR"/*.pid "$PID_DIR"/*.log 2>/dev/null || true + ok "All port-forwards stopped" +} + +cmd_status() { + echo "Port-forward status" + + local any_running=0 + for svc in "${!SERVICES[@]}"; do + local spec="${SERVICES[$svc]}" + local port="${spec%%:*}" + local pid + pid=$(read_pid "$svc") + + if is_alive "$pid"; then + # Verify the port is actually accepting connections (any HTTP response = healthy) + local http_code + http_code=$(curl -s -o /dev/null -w "%{http_code}" --max-time 2 "http://localhost:$port" 2>/dev/null || echo "000") + if [ "$http_code" != "000" ]; then + ok "$svc: healthy (PID $pid, localhost:$port, HTTP $http_code)" + else + warn "$svc: running but not responding (PID $pid, localhost:$port)" + fi + any_running=1 + else + local port_pid + port_pid=$(port_owner "$port") + if [ -n "$port_pid" ]; then + warn "$svc: not managed but port $port in use by PID $port_pid" + else + log "$svc: not running (port $port free)" + fi + fi + done + + # Check for untracked port-forwards + local untracked + untracked=$(pgrep -f "kubectl port-forward.*${NAMESPACE}" 2>/dev/null || true) + if [ -n "$untracked" ]; then + local tracked_pids="" + for svc in "${!SERVICES[@]}"; do + local p + p=$(read_pid "$svc") + [ -n "$p" ] && tracked_pids="$tracked_pids $p" + done + echo "$untracked" | while read -r pid; do + if ! echo "$tracked_pids" | grep -qw "$pid"; then + local cmd + cmd=$(ps -p "$pid" -o command= 2>/dev/null || echo "unknown") + warn "Untracked port-forward: PID $pid ($cmd)" + fi + done + fi + + return 0 +} + +cmd_restart() { + cmd_stop + echo "" + cmd_preflight "$@" + echo "" + cmd_start "$@" +} + +# --- main --- + +case "${1:-}" in + preflight) shift; cmd_preflight "$@" ;; + start) shift; cmd_start "$@" ;; + stop) cmd_stop ;; + status) cmd_status ;; + restart) shift; cmd_restart "$@" ;; + *) + echo "Usage: $0 {preflight|start|stop|status|restart} [services...]" + echo "Services: ${!SERVICES[*]}" + echo "Default: ${DEFAULT_SERVICES[*]}" + exit 1 + ;; +esac diff --git a/.github/workflows/acp-triage.yml b/.github/workflows/acp-triage.yml new file mode 100644 index 000000000..9e4633d29 --- /dev/null +++ b/.github/workflows/acp-triage.yml @@ -0,0 +1,96 @@ +name: ACP Triage + +on: + schedule: + - cron: '0 0 * * *' + workflow_dispatch: + +permissions: + contents: read + issues: write + +env: + ACP_API_URL: https://ambient-code.apps.rosa.vteam-uat.0ksl.p3.openshiftapps.com + ACP_PROJECT: jeder-workspace + +jobs: + triage-issues: + runs-on: ubuntu-latest + timeout-minutes: 75 + + steps: + - name: Set date + id: date + run: echo "date=$(date -u +%Y-%m-%d)" >> "$GITHUB_OUTPUT" + + - name: Create ACP triage session + id: session + uses: ambient-code/ambient-action@v0.0.2 + with: + api-url: ${{ env.ACP_API_URL }} + api-token: ${{ secrets.AMBIENT_BOT_TOKEN }} + project: ${{ env.ACP_PROJECT }} + prompt: Triage the backlog for https://github.com/ambient-code/platform + workflow: '{"gitUrl": "https://github.com/ambient-code/workflows", "branch": "main", "path": "workflows/triage"}' + display-name: Triage Report ${{ steps.date.outputs.date }} + wait: 'true' + timeout: '60' + labels: '{"type": "triage", "target": "platform", "trigger": "scheduled"}' + + - name: Post step summary + if: always() + env: + SESSION_NAME: ${{ steps.session.outputs.session-name }} + SESSION_UID: ${{ steps.session.outputs.session-uid }} + SESSION_PHASE: ${{ steps.session.outputs.session-phase }} + SESSION_RESULT: ${{ steps.session.outputs.session-result }} + run: | + SESSION_URL="${ACP_API_URL}/projects/${ACP_PROJECT}/sessions/${SESSION_NAME}" + { + echo "### ACP Triage Session" + echo "" + echo "| Field | Value |" + echo "|-------|-------|" + echo "| **Session** | \`${SESSION_NAME}\` |" + echo "| **UID** | \`${SESSION_UID}\` |" + echo "| **Phase** | ${SESSION_PHASE} |" + echo "| **Result** | ${SESSION_RESULT:-N/A} |" + echo "| **UI** | [View in ACP](${SESSION_URL}) |" + } >> "$GITHUB_STEP_SUMMARY" + + - name: Create triage report issue + if: steps.session.outputs.session-name != '' + env: + GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} + SESSION_NAME: ${{ steps.session.outputs.session-name }} + SESSION_PHASE: ${{ steps.session.outputs.session-phase }} + SESSION_RESULT: ${{ steps.session.outputs.session-result }} + DATE: ${{ steps.date.outputs.date }} + run: | + SESSION_URL="${ACP_API_URL}/projects/${ACP_PROJECT}/sessions/${SESSION_NAME}" + + gh label create triage-report --color 0E8A16 --description "Automated triage report" 2>/dev/null || true + + cat > /tmp/triage-body.md <