Learning Roadmap: Backend Architecture, DevOps & AWS

Project Purpose: Learn best practices, software architecture patterns, DevOps workflows, and AWS deployment

Focus Areas: Backend Architecture • DevOps • AWS Cloud • Production-Grade Patterns

📊 Progress Tracker

Phase 1: Security & Production Readiness

Duration: Weeks 1-2
Theme: Build a secure, production-ready API

1.1 Authentication & Authorization ⭐ CRITICAL

What you'll learn:

JWT tokens (access + refresh tokens)
Password hashing (bcrypt/argon2)
Middleware for protected routes
RBAC (Role-Based Access Control)
Security best practices (OWASP Top 10)

Implementation Tasks:

Why this first: Almost every real application needs authentication. It touches all layers (API → Service → Database) and teaches security fundamentals.

Learning Resources:

OWASP Authentication Cheat Sheet
JWT best practices
Go bcrypt/argon2 documentation

1.2 API Documentation (OpenAPI/Swagger)

What you'll learn:

OpenAPI 3.0 specification
Automatic API documentation
API versioning best practices
Spec-first vs code-first approaches

Implementation Tasks:

Choose approach: swaggo/swag (code-first) or oapi-codegen (spec-first) ✅ Chose swaggo/swag
Add Swagger annotations to all endpoints ✅ Auth endpoints fully documented
Generate OpenAPI spec ✅ Generated docs/swagger.json and swagger.yaml
Set up Swagger UI endpoint (/swagger) ✅ Available at /swagger/index.html
Document request/response schemas ✅ Created swagger_models.go
Add authentication documentation ✅ JWT Bearer authentication documented
Document error responses ✅ StandardErrorResponse model created
Version your API (v1, v2 strategy) ✅ Using /v1 prefix, ready for v2

Deliverable: Beautiful, interactive API documentation at /swagger

1.3 Graceful Shutdown & Health Checks

What you'll learn:

Liveness vs Readiness probes (K8s concepts)
Signal handling (SIGTERM, SIGINT)
Connection draining
Zero-downtime deployments

Implementation Tasks:

Add /health endpoint (liveness probe) ✅ Available at /health
Add /ready endpoint (readiness probe - checks DB, Redis, etc.) ✅ Available at /ready
Implement graceful shutdown handler ✅ Tested with active connections
Add timeout for in-flight requests ✅ Tested with active connections
Test shutdown behavior with active connections ✅ Tested with active connections

Why this matters: Required for Kubernetes/ECS deployments. Prevents dropping requests during deploys.

Architecture Pattern: Graceful degradation

Phase 2: Local Infrastructure & Containerization

Duration: Weeks 2-3
Theme: Learn containerization and local orchestration

2.1 Docker Multi-Stage Build ✅

What you learned:

Builder pattern for Go apps
Layer caching optimization
Security: minimal base images, non-root users
.dockerignore optimization

Implementation Tasks:

Create multi-stage Dockerfile ✅
- Stage 1: Build (golang:1.24-alpine)
- Stage 2: Runtime (alpine:3.19)
Optimize layer caching (copy go.mod first) ✅
Add non-root user (appuser:1000) ✅
Set proper file permissions (--chown flag) ✅
Configure .dockerignore (74 exclusion rules) ✅
Test build size (Achieved: ~18MB) ✅
Add labels (version, commit SHA, build date) ✅

Deliverable: Production-ready Dockerfile (~18MB vs 1GB+ naive build) ✅

Files Created:

infra/docker/dockerfile.service - Production multi-stage Dockerfile
.dockerignore - Build context optimization (96% reduction)
infra/docker/DOCKER_MULTISTAGE_IMPLEMENTATION.md - Complete documentation (400+ lines)
infra/docker/QUICK_REFERENCE.md - Quick command reference
infra/docker/VISUAL_GUIDE.md - Visual architecture diagrams and flowcharts
infra/docker/PHASE_2.1_COMPLETION_SUMMARY.md - Phase completion summary with metrics
infra/docker/test-docker-setup.sh - Automated validation test suite (15 tests)
Updated cmd/doit/main.go - Added version/commit/buildDate variables
Updated Makefile - Docker automation commands (already present)

Documentation:

📖 Complete Implementation Guide - Everything you need to know about Docker multi-stage builds
🚀 Quick Reference - Common commands and workflows
🎨 Visual Guide - Architecture diagrams and visualizations
📊 Completion Summary - Phase results and metrics
✅ Test Suite - Run ./infra/docker/test-docker-setup.sh to validate

Quick Start:

make docker-build      # Build with metadata
make docker-size       # Check size (~58MB)
make docker-run        # Run locally
make docker-inspect    # View metadata
./infra/docker/test-docker-setup.sh  # Run all validation tests

Results Achieved:

✅ Image size: 58MB (95% reduction from 1.2GB)
- Alpine base: 8MB
- Binary: 30MB (includes all dependencies)
- Runtime deps: 5MB (ca-certificates, tzdata)
- Timezone data: 1.5MB
✅ Build time (cached): 30 seconds (90% faster)
✅ Security: Non-root user (UID 1000) verified
✅ Layer caching: Optimized (go.mod separate layer)
✅ Metadata: Full traceability (version, commit, date) verified
✅ Production ready: Multi-stage, minimal attack surface

Note: Binary is 30MB due to application dependencies (Swagger, SQLC, PostgreSQL drivers, Redis, JWT, etc.). Still 95% smaller than naive build (1.2GB). To achieve <20MB, consider switching to distroless base or removing Swagger from production builds.

2.2 Docker Compose - Full Local Stack ✅ COMPLETED

What you learned:

Multi-container orchestration with Docker Compose
Container networking (custom bridge network with DNS)
Volume management (named volumes vs bind mounts)
Environment variable configuration (.env file)
Health checks and dependency ordering (depends_on with conditions)
Local development workflow with hot reload
Monitoring stack setup (Prometheus + Grafana)

Services Implemented:

✅ PostgreSQL 16 Alpine (with health checks and data persistence)
✅ Redis 7 Alpine (with AOF persistence and health checks)
✅ Your Go API application (with hot reload volume mount)
✅ Prometheus (metrics collection, 30-day retention)
✅ Grafana (metrics visualization with pre-configured datasource)
✅ Adminer (DB management UI - optional with --profile tools)
⏭️ Jaeger (deferred to Phase 3.3 - Distributed Tracing)

Implementation Completed:

✅ Created docker-compose.yml with 6 services
✅ Set up custom bridge network (doit_network)
✅ Configured 4 named volumes (postgres, redis, prometheus, grafana)
✅ Added comprehensive health checks to all services
✅ Used depends_on with service_healthy conditions
✅ Created .env.example template
✅ Added 20+ Makefile targets (compose-up, compose-down, etc.)
⏭️ Run migrations automatically (deferred to Phase 2.3)
✅ Configured Prometheus to scrape API at /metrics
✅ Set up Grafana with provisioned datasource and sample dashboard

Documentation Created:

📖 Mental Model Guide - Complete conceptual understanding
📖 Implementation Guide - Detailed implementation walkthrough
🚀 Quick Reference - Command cheat sheet

Quick Start:

# Setup (first time)
make compose-setup     # Create .env file

# Start entire stack
make compose-up        # All services in background

# Check status
make compose-ps        # List services
make compose-health    # Health check all services

# View logs
make compose-logs      # All services
make compose-logs-api  # API only

# Stop stack
make compose-down      # Stop (keeps data)
make compose-down-v    # Stop and remove volumes

Service URLs:

🔹 API: http://localhost:8080
🔹 Swagger: http://localhost:8080/swagger/index.html
🔹 Health: http://localhost:8080/health
🔹 Metrics: http://localhost:8080/metrics
🔹 Grafana: http://localhost:3000 (admin/admin)
🔹 Prometheus: http://localhost:9090
🔹 Adminer: http://localhost:8081 (with --profile tools)

Results Achieved:

✅ Single command starts entire stack (make compose-up)
✅ Service discovery via Docker DNS (service names)
✅ Proper startup ordering (DB healthy → Redis healthy → API starts)
✅ Data persistence across restarts (named volumes)
✅ Hot reload for development (source code mounted)
✅ Comprehensive monitoring (Prometheus + Grafana)
✅ Easy database access (Adminer GUI)
✅ Production-ready patterns (health checks, restart policies)

Deliverable: ✅ Single command (make compose-up) brings up entire stack

Why this matters: This is your local production environment. Everything you learn here translates directly to K8s and AWS ECS.

Phase 3: Observability & Monitoring ✅ COMPLETED

Duration: Weeks 3-4
Theme: Make your application observable and debuggable

What you achieved:

Full observability stack with the three pillars: Logs, Metrics, Traces
Prometheus for metrics collection and alerting
Grafana as single pane of glass (metrics + traces)
Jaeger for distributed tracing via OpenTelemetry
Hybrid development workflow (local Go + Docker infrastructure)

3.1 Structured Logging with Context

What you'll learn:

Request ID propagation
Contextual logging (user ID, trace ID)
Log levels and sampling
JSON structured logs for parsing

Implementation Tasks:

Enhance existing logger with structured fields
Add request ID middleware (X-Request-ID header)
Propagate request ID through context
Add user ID to log context (after auth)
Log important events (auth attempts, data mutations)
Configure log levels by environment
Add log sampling for high-volume endpoints
Format logs as JSON for production

Architecture Pattern: Context propagation through middleware stack

3.2 Metrics with Prometheus

What you'll learn:

The 4 golden signals (latency, traffic, errors, saturation)
Metric types: Counter, Gauge, Histogram, Summary
Service-level indicators (SLIs)
Instrumentation best practices

Metrics to Add:

HTTP request duration (histogram)
Request count by method/path/status (counter)
Active database connections (gauge)
Database query duration (histogram)
Todo operations count (create/update/delete/read)
Cache hit/miss ratio (counter)
Active goroutines (gauge)
Memory usage (gauge)

Implementation Tasks:

Deliverable: Beautiful Grafana dashboards showing real-time metrics

3.3 Distributed Tracing (OpenTelemetry) ✅ COMPLETED

What you learned:

Trace context propagation (W3C Trace Context standard)
Span creation and relationships (parent/child spans)
Performance bottleneck identification
Distributed systems debugging
OpenTelemetry SDK architecture (Provider, Exporter, Propagator)
OTLP protocol for trace export
Sampling strategies (development vs production)

Implementation Completed:

Files Created/Modified:

internal/tracing/tracing.go - OpenTelemetry provider initialization
internal/tracing/helpers.go - Tracing helper functions (StartDBSpan, StartCacheSpan, etc.)
internal/middlewares/tracing_middleware.go - HTTP tracing middleware
internal/service/todo_service.go - Service layer tracing
internal/service/user_service.go - User service tracing
internal/service/token_service.go - Token/auth service tracing
internal/cache/redis.go - Cache operation tracing
internal/config/config.go - Tracing configuration
docker-compose.yml - Jaeger service added
docker-compose.infra.yml - Infrastructure-only compose for hybrid dev
infra/docker/grafana/provisioning-dev/datasources/prometheus.yaml - Jaeger + Prometheus datasources

Documentation Created:

📖 Prometheus Mental Model - Metrics concepts
📖 Grafana Mental Model - Visualization concepts
📖 Distributed Tracing Mental Model - Tracing concepts
📖 Observability Overview - Three pillars summary

Service URLs:

🔹 Jaeger UI: http://localhost:16686
🔹 Grafana (with Jaeger): http://localhost:3000 (Explore → Jaeger)
🔹 Prometheus: http://localhost:9090

Quick Start (Hybrid Development):

# Start infrastructure only (DB, Redis, Jaeger, Prometheus, Grafana)
make dev-infra

# Run migrations
make dev-migrate

# Run Go API locally (connects to Docker infrastructure)
make dev-run

# View traces in Jaeger
open http://localhost:16686

# View traces in Grafana
open http://localhost:3000  # Explore → Select Jaeger

Trace Structure Example:

HTTP POST /api/v1/todos (250ms)
├── TodoService.CreateTodo (200ms)
│   ├── cache.GET (5ms) - cache miss
│   ├── db.INSERT (150ms) - todos table
│   └── cache.SET (10ms) - cache update
└── Response sent

Why this matters: AWS X-Ray uses similar concepts. OpenTelemetry is vendor-neutral and industry standard. Skills transfer directly to any cloud provider.

Architecture Pattern: Observability through instrumentation

Deliverable: ✅ Full distributed tracing with Jaeger, integrated into Grafana as single pane of glass

Phase 4: Architecture Patterns & Caching

Duration: Weeks 4-5
Theme: Apply software architecture patterns for scalability

4.1 Caching Layer with Redis

What you'll learn:

Cache-aside pattern
Write-through vs write-back strategies
TTL (Time To Live) strategies
Cache invalidation patterns
Cache stampede / thundering herd problem
Distributed caching considerations

Implementation Tasks:

Architecture Evolution:

Before: [Handler] → [Service] → [Database]
After:  [Handler] → [Service] → [Repository] → [Database]
                                      ↓
                                 [Redis Cache]

Advanced (Optional):

Implement write-through caching for writes
Add distributed locking for cache updates (prevent stampede)
Implement cache sharding strategy

4.2 Repository Pattern (Abstraction Layer)

What you'll learn:

Separation of concerns
Dependency inversion principle
Swappable implementations
Testing strategies

Implementation Tasks:

Create repository interfaces (UserRepository, TodoRepository)
Implement PostgreSQL repository (existing querier)
Implement cached repository wrapper
Update services to use repositories
Create repository tests
Document when to use each pattern

Benefits: Can swap PostgreSQL for DynamoDB later without changing business logic

4.3 CQRS Pattern (Light Version)

What you'll learn:

Command Query Responsibility Segregation
Read vs Write model separation
Eventual consistency concepts
When CQRS makes sense (spoiler: not always!)

Implementation Tasks:

Separate read and write services for todos
Write operations: TodoCommandService
Read operations: TodoQueryService (uses cache)
Update handlers to use appropriate services
Document trade-offs and when to use CQRS
Test eventual consistency scenarios

Why this matters: Prepares you for microservices and event-driven architectures

4.4 Event-Driven Architecture (Basic)

What you'll learn:

Domain events
Event bus pattern
Pub/Sub with Redis (or NATS)
Async processing
Decoupled systems

Events to Implement:

UserRegistered event
TodoCreated event
TodoCompleted event
TodoDeleted event

Implementation Tasks:

Architecture:

User creates todo →
  1. Save to DB
  2. Emit "TodoCreated" event →
     - Analytics service listens
     - Audit log service listens
     - Notification service listens (future)

Why this matters: Prepares you for AWS EventBridge, SQS, SNS

4.5 Integration Tests with Real Dependencies

What you'll learn:

Testcontainers (spin up real PostgreSQL)
Database fixtures and cleanup
Test isolation strategies
E2E testing patterns

Implementation Tasks:

Add testcontainers-go dependency
Create integration test helpers
Write integration tests for auth flow
Write integration tests for todo CRUD
Test caching behavior
Test event publishing
Add to CI pipeline
Document when to use unit vs integration tests

Deliverable: High confidence in your full application stack

Phase 5: Kubernetes & Helm Charts

Duration: Weeks 5-7
Theme: Master container orchestration with Kubernetes

Why Learn This Now:

You already understand Docker containers (Phase 2.1)
You've orchestrated services with Docker Compose (Phase 2.2)
You have observability in place (Phase 3)
Now learn production-grade orchestration before cloud deployment

Learning Path: Local K8s → Manifests → Helm → Production Patterns

5.1 Kubernetes Fundamentals

What you'll learn:

Kubernetes architecture (Control Plane, Nodes, Pods)
Core concepts: Pods, Deployments, Services, ConfigMaps, Secrets
kubectl CLI and context management
Declarative vs imperative configuration
Kubernetes namespaces and resource organization
Label selectors and annotations

Setup Tasks:

Install Docker Desktop with Kubernetes enabled (or minikube/kind)
Verify installation: kubectl version
Explore with: kubectl get nodes, kubectl cluster-info
Install k9s (terminal UI for K8s - highly recommended!)
Understand kubectl contexts: kubectl config get-contexts
Create a test namespace: kubectl create namespace test

Learning Exercises:

Deploy nginx with kubectl run (imperative)
Expose nginx with kubectl expose (imperative)
Delete and recreate with YAML (declarative)
Understand the difference: imperative vs declarative

Architecture Understanding:

Kubernetes Cluster
├── Control Plane
│   ├── API Server (kubectl talks to this)
│   ├── Scheduler (assigns Pods to Nodes)
│   ├── Controller Manager (maintains desired state)
│   └── etcd (cluster state storage)
└── Nodes (Worker machines)
    └── Pods (smallest deployable unit)
        └── Containers (your Docker images)

5.2 Kubernetes Manifests for DoIt API

What you'll learn:

Writing production-ready Kubernetes manifests
Resource limits and requests
Liveness and readiness probes
ConfigMaps for configuration
Secrets for sensitive data
Multi-container pods
Init containers for migrations

Project Structure:

k8s/
├── base/                       # Base manifests
│   ├── namespace.yaml
│   ├── configmap.yaml
│   ├── secret.yaml
│   ├── deployment.yaml
│   ├── service.yaml
│   ├── postgres-deployment.yaml
│   ├── postgres-service.yaml
│   ├── postgres-pvc.yaml
│   ├── redis-deployment.yaml
│   └── redis-service.yaml
├── overlays/                   # Environment-specific
│   ├── dev/
│   │   └── kustomization.yaml
│   ├── staging/
│   │   └── kustomization.yaml
│   └── prod/
│       └── kustomization.yaml
└── README.md

Implementation Tasks:

5.2.1 Namespace

Create namespace manifest (k8s/base/namespace.yaml)

apiVersion: v1
kind: Namespace
metadata:
  name: doit
  labels:
    app: doit
    environment: dev

5.2.2 ConfigMap

Create ConfigMap for non-sensitive config

apiVersion: v1
kind: ConfigMap
metadata:
  name: doit-config
  namespace: doit
data:
  APP_ENVIRONMENT: "production"
  APP_NAME: "doit-api"
  LOG_LEVEL: "info"
  DB_HOST: "postgres-service"
  DB_PORT: "5432"
  DB_NAME: "doit"
  REDIS_ADDR: "redis-service:6379"

5.2.3 Secret

Create Secret for sensitive data

apiVersion: v1
kind: Secret
metadata:
  name: doit-secrets
  namespace: doit
type: Opaque
stringData:
  DB_USER: "doit"
  DB_PASSWORD: "changeme"
  JWT_SECRET: "your-super-secret-key"
  REDIS_PASSWORD: ""

Learn about sealed-secrets for GitOps (store secrets safely in Git)

5.2.4 Deployment (Your API)

Create Deployment manifest

Define resource requests and limits:

resources:
  requests:
    memory: "128Mi"
    cpu: "100m"
  limits:
    memory: "512Mi"
    cpu: "500m"

Add liveness probe (is app alive?)

livenessProbe:
  httpGet:
    path: /health/liveness
    port: 8080
  initialDelaySeconds: 10
  periodSeconds: 30

Add readiness probe (is app ready for traffic?)

readinessProbe:
  httpGet:
    path: /health/readiness
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 10

Configure environment variables from ConfigMap and Secret
Set replica count: 3 (for high availability)
Add pod anti-affinity (spread across nodes)

5.2.5 Service

Create Service to expose API
Type: ClusterIP (internal) or LoadBalancer (external)
Configure selectors to match Deployment labels
Expose port 80 → targetPort 8080

5.2.6 PostgreSQL Deployment

Create PersistentVolumeClaim for database storage
Create PostgreSQL Deployment
Create PostgreSQL Service (ClusterIP - internal only)
Add init container for database initialization
Configure resource limits

5.2.7 Redis Deployment

Create Redis Deployment
Create Redis Service
Configure persistence (if needed)
Set resource limits

Testing:

Apply all manifests: kubectl apply -f k8s/base/
Check resources: kubectl get all -n doit
View logs: kubectl logs -n doit deployment/doit-api
Port-forward to test: kubectl port-forward -n doit svc/doit-api 8080:80
Test API: curl http://localhost:8080/health

5.3 Advanced Kubernetes Patterns

What you'll learn:

Horizontal Pod Autoscaler (HPA)
Ingress controllers for routing
Network Policies for security
Pod Disruption Budgets (PDB)
Resource Quotas and Limits
StatefulSets vs Deployments

5.3.1 Horizontal Pod Autoscaler

What you'll learn:

Auto-scale based on CPU/memory
Custom metrics (requests per second)

Implementation:

Install metrics-server (if not present)

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Create HPA manifest:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: doit-api-hpa
  namespace: doit
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: doit-api
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80

Test auto-scaling with load (use hey or ab)
Watch scaling: kubectl get hpa -n doit --watch

5.3.2 Ingress Controller

What you'll learn:

L7 load balancing
Path-based routing
TLS/SSL termination
Multiple services behind one IP

Implementation:

Install ingress-nginx controller

kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.8.2/deploy/static/provider/cloud/deploy.yaml

Create Ingress manifest:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: doit-ingress
  namespace: doit
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
  ingressClassName: nginx
  tls:
    - hosts:
        - api.doit.example.com
      secretName: doit-tls
  rules:
    - host: api.doit.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: doit-api
                port:
                  number: 80

Test ingress routing
(Optional) Install cert-manager for automatic TLS certificates

5.3.3 Network Policies

What you'll learn:

Pod-to-pod network security
Zero-trust networking
Ingress and egress rules

Implementation:

Create NetworkPolicy to restrict PostgreSQL access:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: postgres-network-policy
  namespace: doit
spec:
  podSelector:
    matchLabels:
      app: postgres
  policyTypes:
    - Ingress
  ingress:
    - from:
        - podSelector:
            matchLabels:
              app: doit-api # Only API can access Postgres
      ports:
        - protocol: TCP
          port: 5432

Test that external access is blocked
Create similar policy for Redis

5.3.4 Pod Disruption Budget

What you'll learn:

Ensure availability during voluntary disruptions
Rolling updates without downtime

Implementation:

Create PDB manifest:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: doit-api-pdb
  namespace: doit
spec:
  minAvailable: 2 # Always keep 2 pods running
  selector:
    matchLabels:
      app: doit-api

5.4 Helm Charts - Package Management for Kubernetes

What you'll learn:

Helm architecture (Charts, Releases, Repositories)
Chart structure and templating
Values files for different environments
Helm hooks (pre-install, post-install)
Chart dependencies
Helm best practices

Why Helm:

Reusable templates (deploy to dev/staging/prod with different values)
Version control for releases
Easy rollbacks
Share charts with team
Industry standard for K8s package management

5.4.1 Helm Basics

Setup:

Install Helm: brew install helm (macOS) or download from helm.sh
Verify: helm version
Add popular repos:

helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

Learning Exercises:

Install PostgreSQL with Helm:

helm install my-postgres bitnami/postgresql -n doit

List releases: helm list -n doit
Get values: helm get values my-postgres -n doit
Upgrade: helm upgrade my-postgres bitnami/postgresql --set auth.password=newpass -n doit
Rollback: helm rollback my-postgres -n doit
Uninstall: helm uninstall my-postgres -n doit

5.4.2 Create Your Own Helm Chart

Chart Structure:

helm/
└── doit-api/
    ├── Chart.yaml           # Chart metadata
    ├── values.yaml          # Default values
    ├── values-dev.yaml      # Dev environment overrides
    ├── values-staging.yaml  # Staging overrides
    ├── values-prod.yaml     # Production overrides
    ├── templates/
    │   ├── NOTES.txt       # Post-install notes
    │   ├── _helpers.tpl    # Template helpers
    │   ├── deployment.yaml
    │   ├── service.yaml
    │   ├── configmap.yaml
    │   ├── secret.yaml
    │   ├── ingress.yaml
    │   ├── hpa.yaml
    │   ├── serviceaccount.yaml
    │   └── tests/
    │       └── test-connection.yaml
    └── .helmignore

Implementation Tasks:

Create chart skeleton:

helm create helm/doit-api

Customize Chart.yaml:

apiVersion: v2
name: doit-api
description: A Helm chart for DoIt REST API
type: application
version: 0.1.0
appVersion: "1.0.0"
keywords:
  - doit
  - api
  - golang
  - rest
maintainers:
  - name: Your Name
    email: your.email@example.com

Define values.yaml with sensible defaults:

replicaCount: 3

image:
  repository: doit-api
  pullPolicy: IfNotPresent
  tag: "latest"

service:
  type: ClusterIP
  port: 80
  targetPort: 8080

ingress:
  enabled: true
  className: nginx
  hosts:
    - host: api.doit.local
      paths:
        - path: /
          pathType: Prefix
  tls: []

resources:
  limits:
    cpu: 500m
    memory: 512Mi
  requests:
    cpu: 100m
    memory: 128Mi

autoscaling:
  enabled: true
  minReplicas: 2
  maxReplicas: 10
  targetCPUUtilizationPercentage: 70

config:
  appEnvironment: "production"
  logLevel: "info"
  dbHost: "postgres-service"
  dbPort: "5432"
  dbName: "doit"
  redisAddr: "redis-service:6379"

secrets:
  dbUser: "doit"
  dbPassword: "changeme"
  jwtSecret: "your-secret-key"

postgresql:
  enabled: true
  auth:
    username: doit
    password: changeme
    database: doit
  primary:
    persistence:
      enabled: true
      size: 8Gi

redis:
  enabled: true
  auth:
    enabled: false

Create environment-specific values files:
- values-dev.yaml: Lower resources, debug logging
- values-staging.yaml: Medium resources, realistic data
- values-prod.yaml: Full resources, monitoring enabled

5.4.3 Helm Templating

What you'll learn:

Go templating syntax
Built-in objects (.Values, .Chart, .Release)
Template functions (default, required, quote, toYaml)
Control structures (if, range, with)
Named templates and helpers

Example: Templated Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ include "doit-api.fullname" . }}
  namespace: {{ .Release.Namespace }}
  labels:
    {{- include "doit-api.labels" . | nindent 4 }}
spec:
  {{- if not .Values.autoscaling.enabled }}
  replicas: {{ .Values.replicaCount }}
  {{- end }}
  selector:
    matchLabels:
      {{- include "doit-api.selectorLabels" . | nindent 6 }}
  template:
    metadata:
      labels:
        {{- include "doit-api.selectorLabels" . | nindent 8 }}
    spec:
      containers:
      - name: {{ .Chart.Name }}
        image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
        imagePullPolicy: {{ .Values.image.pullPolicy }}
        ports:
        - name: http
          containerPort: {{ .Values.service.targetPort }}
          protocol: TCP
        env:
        - name: APP_ENVIRONMENT
          value: {{ .Values.config.appEnvironment | quote }}
        - name: DB_PASSWORD
          valueFrom:
            secretKeyRef:
              name: {{ include "doit-api.fullname" . }}-secret
              key: dbPassword
        resources:
          {{- toYaml .Values.resources | nindent 10 }}

Tasks:

Template all manifests
Create _helpers.tpl with reusable functions
Use required for mandatory values
Add conditional blocks (e.g., ingress enabled/disabled)
Test rendering: helm template doit-api helm/doit-api
Lint chart: helm lint helm/doit-api

5.4.4 Helm Dependencies

What you'll learn:

Including other charts as dependencies
Subchart values override
Managing external dependencies

Implementation:

Add dependencies to Chart.yaml:

dependencies:
  - name: postgresql
    version: "12.x.x"
    repository: https://charts.bitnami.com/bitnami
    condition: postgresql.enabled
  - name: redis
    version: "17.x.x"
    repository: https://charts.bitnami.com/bitnami
    condition: redis.enabled
  - name: prometheus
    version: "25.x.x"
    repository: https://prometheus-community.github.io/helm-charts
    condition: prometheus.enabled

Update dependencies:

helm dependency update helm/doit-api

This downloads subcharts to charts/ directory
Override subchart values in your values.yaml

5.4.5 Helm Hooks

What you'll learn:

Run jobs before/after install, upgrade, delete
Database migrations as pre-upgrade hooks
Cleanup jobs as post-delete hooks

Use Cases:

Run database migrations before deploying new version
Seed initial data on first install
Clean up resources on uninstall

Implementation:

Create migration job with hook:

apiVersion: batch/v1
kind: Job
metadata:
  name: {{ include "doit-api.fullname" . }}-migration
  annotations:
    "helm.sh/hook": pre-upgrade,pre-install
    "helm.sh/hook-weight": "-5"
    "helm.sh/hook-delete-policy": before-hook-creation
spec:
  template:
    spec:
      restartPolicy: Never
      containers:
      - name: migration
        image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
        command: ["migrate"]
        args: ["-path", "/migrations", "-database", "$(DB_URL)", "up"]
        env:
        - name: DB_URL
          value: "postgres://$(DB_USER):$(DB_PASSWORD)@$(DB_HOST):$(DB_PORT)/$(DB_NAME)?sslmode=disable"

Test hook execution during install/upgrade

5.4.6 Chart Testing

Create test in templates/tests/test-connection.yaml:

apiVersion: v1
kind: Pod
metadata:
  name: "{{ include "doit-api.fullname" . }}-test"
  annotations:
    "helm.sh/hook": test
spec:
  restartPolicy: Never
  containers:
  - name: wget
    image: busybox
    command: ['wget']
    args: ['{{ include "doit-api.fullname" . }}:{{ .Values.service.port }}/health']

Run tests:

helm test doit-api -n doit

5.5 Deploying with Helm

Installation:

# Install to dev environment
helm install doit-api helm/doit-api \
  -f helm/doit-api/values-dev.yaml \
  -n doit-dev \
  --create-namespace

# Install to production
helm install doit-api helm/doit-api \
  -f helm/doit-api/values-prod.yaml \
  -n doit-prod \
  --create-namespace

Upgrade:

# Upgrade with new values
helm upgrade doit-api helm/doit-api \
  -f helm/doit-api/values-prod.yaml \
  -n doit-prod

# Upgrade with specific image tag
helm upgrade doit-api helm/doit-api \
  --set image.tag=v1.2.3 \
  -n doit-prod

Rollback:

# View history
helm history doit-api -n doit-prod

# Rollback to previous version
helm rollback doit-api -n doit-prod

# Rollback to specific revision
helm rollback doit-api 3 -n doit-prod

Uninstall:

helm uninstall doit-api -n doit-prod

Tasks:

Document installation procedure
Create Makefile targets:
- make helm-install-dev
- make helm-install-prod
- make helm-upgrade-dev
- make helm-test
Version your chart (update Chart.yaml on changes)
Package chart: helm package helm/doit-api
(Optional) Publish to chart repository

5.6 Observability in Kubernetes

What you'll learn:

Prometheus Operator
Grafana in K8s
Service Monitors
Custom dashboards for K8s metrics

Implementation:

Install kube-prometheus-stack via Helm:

helm install kube-prometheus prometheus-community/kube-prometheus-stack \
  -n monitoring \
  --create-namespace

This installs:

Prometheus Operator
Grafana
Alertmanager
Node Exporter
kube-state-metrics
Create ServiceMonitor for your API:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: doit-api
  namespace: doit
spec:
  selector:
    matchLabels:
      app: doit-api
  endpoints:
    - port: http
      path: /metrics
      interval: 30s

Access Grafana:

kubectl port-forward -n monitoring svc/kube-prometheus-grafana 3000:80

Import Kubernetes dashboards
Create custom dashboard for your API

5.7 Production Kubernetes Best Practices

What you'll learn:

Resource quotas per namespace
Limit ranges
Pod security policies/standards
RBAC (Role-Based Access Control)
Service accounts
Security contexts

5.7.1 Resource Quotas

apiVersion: v1
kind: ResourceQuota
metadata:
  name: doit-quota
  namespace: doit-prod
spec:
  hard:
    requests.cpu: "10"
    requests.memory: 20Gi
    limits.cpu: "20"
    limits.memory: 40Gi
    persistentvolumeclaims: "5"
    services.loadbalancers: "2"

5.7.2 Pod Security Standards

apiVersion: v1
kind: Namespace
metadata:
  name: doit-prod
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted

5.7.3 RBAC

Create ServiceAccount for your app
Create Role with minimal permissions
Bind Role to ServiceAccount
Use ServiceAccount in Deployment

Tasks:

Implement all security best practices
Document security model
Run security scans (kubesec, kube-bench)

5.8 Local Kubernetes Testing Tools

Tools to master:

k9s - Terminal UI for K8s
```
brew install k9s
k9s
```

stern - Multi-pod log tailing

brew install stern
stern doit-api -n doit

kubectx/kubens - Context/namespace switching

brew install kubectx
kubectx docker-desktop
kubens doit

kustomize - Template-free customization
```
kubectl apply -k k8s/overlays/dev/
```

helm diff - Preview changes

helm plugin install https://github.com/databus23/helm-diff
helm diff upgrade doit-api helm/doit-api -n doit

5.9 CI/CD with Kubernetes & Helm

What you'll learn:

Automated Helm deployments
Image tagging strategies
ArgoCD for GitOps (optional)

GitHub Actions Workflow:

name: Deploy to Kubernetes

on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Build Docker image
        run: |
          docker build -t doit-api:${{ github.sha }} .

      - name: Push to registry
        run: |
          echo "${{ secrets.REGISTRY_PASSWORD }}" | docker login -u ${{ secrets.REGISTRY_USERNAME }} --password-stdin
          docker push doit-api:${{ github.sha }}

      - name: Setup kubectl
        uses: azure/setup-kubectl@v3

      - name: Setup Helm
        uses: azure/setup-helm@v3

      - name: Deploy with Helm
        run: |
          helm upgrade --install doit-api helm/doit-api \
            --set image.tag=${{ github.sha }} \
            -f helm/doit-api/values-dev.yaml \
            -n doit-dev \
            --create-namespace

Tasks:

Set up CI/CD pipeline for K8s
Implement proper image tagging (git SHA, semver)
Add smoke tests after deployment
Configure rollback on failure

Phase 6: AWS Deployment Foundation

Duration: Weeks 7-9
Theme: Deploy to real cloud infrastructure

Note: Now that you understand Kubernetes, you can choose between ECS (simpler, managed) or EKS (Kubernetes on AWS). Both paths are covered below.

6.1 AWS Account Setup & Fundamentals

What you'll learn:

AWS account best practices
IAM users, roles, and policies (least privilege)
VPC, subnets, security groups
AWS CLI configuration
Cost management and billing alerts

Setup Tasks:

Create AWS account (use free tier)
Enable MFA on root account
Create IAM admin user (don't use root!)
Configure AWS CLI with profiles
Set up billing alerts
Understand AWS Free Tier limits
Create budget alerts ($10, $20, $50)

Security Tasks:

Set up CloudTrail (audit logging)
Enable AWS Config (compliance)
Review IAM Access Analyzer

6.2 Infrastructure as Code (Terraform)

What you'll learn:

Declarative infrastructure
State management (local, S3 backend)
Modules and reusability
Workspaces (dev/staging/prod)
Terraform best practices

Project Structure:

infrastructure/
  terraform/
    modules/
      networking/    # VPC, subnets, security groups
      compute/       # ECS, EC2, or EKS
      database/      # RDS, ElastiCache
      monitoring/    # CloudWatch, alarms
      storage/       # S3 buckets
    environments/
      dev/
      staging/
      prod/

Implementation Tasks:

6.3 Deployment Path A: ECS Fargate (Simpler, Managed)

What you'll learn:

Container orchestration on AWS
ECS task definitions
ECS services and clusters
Application Load Balancer (ALB)
Service discovery
Auto-scaling policies
CloudWatch integration

Architecture:

Internet
  ↓
Application Load Balancer (ALB)
  ↓
ECS Fargate Tasks (your Go app - auto-scaled)
  ↓
├─→ RDS PostgreSQL (private subnet)
└─→ ElastiCache Redis (private subnet)

Implementation Tasks:

Terraform Modules:

ALB module
ECS cluster module
ECS task definition module
ECS service module

6.4 Deployment Path B: EKS (Kubernetes on AWS) - Production Grade

What you'll learn:

EKS cluster provisioning with Terraform
AWS-specific Kubernetes integrations
AWS Load Balancer Controller
EKS IAM roles for service accounts (IRSA)
Amazon EBS CSI driver for storage
AWS Secrets Manager integration
EKS managed node groups
Cluster autoscaler
Cost optimization strategies

Why Choose EKS:

✅ You already know Kubernetes (Phase 5)
✅ Portable skills (works on any K8s cluster)
✅ More control and flexibility
✅ Strong ecosystem (Helm, operators, etc.)
✅ Multi-cloud strategy possible
⚠️ More complex than ECS
⚠️ More expensive (control plane + nodes)

Architecture:

Internet
  ↓
AWS Load Balancer (ALB - created by Ingress)
  ↓
EKS Cluster
  ├─ Control Plane (AWS managed)
  └─ Worker Nodes (EC2 instances - auto-scaled)
      ├─ doit-api Pods (3+ replicas)
      ├─ Ingress Controller Pods
      └─ Monitoring Pods (Prometheus, Grafana)

Connected to:
├─→ RDS PostgreSQL (private subnet)
├─→ ElastiCache Redis (private subnet)
└─→ AWS Secrets Manager

Implementation Tasks:

6.4.1 EKS Cluster Creation (Terraform)

Create VPC module for EKS

module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "~> 5.0"

  name = "doit-eks-vpc"
  cidr = "10.0.0.0/16"

  azs             = ["us-east-1a", "us-east-1b", "us-east-1c"]
  private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
  public_subnets  = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]

  enable_nat_gateway   = true
  single_nat_gateway   = false  # High availability
  enable_dns_hostnames = true

  # Tags for EKS
  public_subnet_tags = {
    "kubernetes.io/role/elb" = "1"
  }

  private_subnet_tags = {
    "kubernetes.io/role/internal-elb" = "1"
  }
}

Create EKS cluster module

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 19.0"

  cluster_name    = "doit-eks"
  cluster_version = "1.28"

  vpc_id     = module.vpc.vpc_id
  subnet_ids = module.vpc.private_subnets

  # OIDC provider for IRSA
  enable_irsa = true

  # Managed node groups
  eks_managed_node_groups = {
    general = {
      desired_size = 2
      min_size     = 2
      max_size     = 10

      instance_types = ["t3.medium"]
      capacity_type  = "ON_DEMAND"

      labels = {
        role = "general"
      }

      tags = {
        Environment = "production"
      }
    }
  }

  # Cluster add-ons
  cluster_addons = {
    coredns = {
      most_recent = true
    }
    kube-proxy = {
      most_recent = true
    }
    vpc-cni = {
      most_recent = true
    }
    aws-ebs-csi-driver = {
      most_recent = true
    }
  }
}

Apply Terraform:

cd infrastructure/terraform/environments/prod
terraform init
terraform plan
terraform apply

Configure kubectl:

aws eks update-kubeconfig --name doit-eks --region us-east-1
kubectl get nodes

6.4.2 AWS Load Balancer Controller

What it does: Creates AWS ALB/NLB from Kubernetes Ingress

Create IAM role for controller (IRSA):

module "aws_load_balancer_controller_irsa_role" {
  source  = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks"
  version = "~> 5.0"

  role_name = "aws-load-balancer-controller"

  attach_load_balancer_controller_policy = true

  oidc_providers = {
    ex = {
      provider_arn               = module.eks.oidc_provider_arn
      namespace_service_accounts = ["kube-system:aws-load-balancer-controller"]
    }
  }
}

Install controller via Helm:

helm repo add eks https://aws.github.io/eks-charts
helm install aws-load-balancer-controller eks/aws-load-balancer-controller \
  -n kube-system \
  --set clusterName=doit-eks \
  --set serviceAccount.create=true \
  --set serviceAccount.name=aws-load-balancer-controller \
  --set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"="arn:aws:iam::ACCOUNT:role/aws-load-balancer-controller"

Verify installation:

kubectl get deployment -n kube-system aws-load-balancer-controller

6.4.3 Deploy Your Helm Chart to EKS

Create production values for EKS (values-eks-prod.yaml):

image:
  repository: ACCOUNT.dkr.ecr.us-east-1.amazonaws.com/doit-api
  tag: "v1.0.0"

replicaCount: 3

ingress:
  enabled: true
  className: alb # Use AWS ALB
  annotations:
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
    alb.ingress.kubernetes.io/healthcheck-path: /health
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}, {"HTTPS": 443}]'
    alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:REGION:ACCOUNT:certificate/CERT_ID
  hosts:
    - host: api.doit.example.com
      paths:
        - path: /
          pathType: Prefix

config:
  dbHost: doit-prod.xxxxx.us-east-1.rds.amazonaws.com # RDS endpoint
  dbPort: "5432"
  dbName: doit
  redisAddr: doit-redis.xxxxx.cache.amazonaws.com:6379 # ElastiCache endpoint

# Don't deploy PostgreSQL/Redis in K8s - use AWS managed services
postgresql:
  enabled: false

redis:
  enabled: false

# Use AWS Secrets Manager via External Secrets Operator (see below)
externalSecrets:
  enabled: true

Deploy:

helm upgrade --install doit-api helm/doit-api \
  -f helm/doit-api/values-eks-prod.yaml \
  -n doit-prod \
  --create-namespace

Verify:

kubectl get all -n doit-prod
kubectl get ingress -n doit-prod

6.4.4 AWS Secrets Manager Integration

Why: Store secrets in AWS Secrets Manager, not in K8s Secrets

Install External Secrets Operator:

helm repo add external-secrets https://charts.external-secrets.io
helm install external-secrets external-secrets/external-secrets \
  -n external-secrets-system \
  --create-namespace

Create IAM role for External Secrets (IRSA)
Create SecretStore:

apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
  name: aws-secrets-manager
  namespace: doit-prod
spec:
  provider:
    aws:
      service: SecretsManager
      region: us-east-1
      auth:
        jwt:
          serviceAccountRef:
            name: external-secrets-sa

Create ExternalSecret:

apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: doit-secrets
  namespace: doit-prod
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secrets-manager
    kind: SecretStore
  target:
    name: doit-secrets
    creationPolicy: Owner
  data:
    - secretKey: dbPassword
      remoteRef:
        key: doit/prod/database
        property: password
    - secretKey: jwtSecret
      remoteRef:
        key: doit/prod/jwt
        property: secret

6.4.5 Cluster Autoscaler

What it does: Automatically adds/removes nodes based on demand

Create IAM role for Cluster Autoscaler (IRSA)
Install via Helm:

helm repo add autoscaler https://kubernetes.github.io/autoscaler
helm install cluster-autoscaler autoscaler/cluster-autoscaler \
  -n kube-system \
  --set autoDiscovery.clusterName=doit-eks \
  --set awsRegion=us-east-1 \
  --set rbac.serviceAccount.annotations."eks\.amazonaws\.com/role-arn"="arn:aws:iam::ACCOUNT:role/cluster-autoscaler"

Test autoscaling:

# Scale up workload
kubectl scale deployment doit-api -n doit-prod --replicas=20

# Watch nodes being added
kubectl get nodes --watch

6.4.6 Monitoring on EKS

Install kube-prometheus-stack (if not already installed from Phase 5):

helm install kube-prometheus prometheus-community/kube-prometheus-stack \
  -n monitoring \
  --create-namespace \
  -f values-prometheus-eks.yaml

Expose Grafana via Ingress:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: grafana
  namespace: monitoring
  annotations:
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
spec:
  ingressClassName: alb
  rules:
    - host: grafana.doit.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: kube-prometheus-grafana
                port:
                  number: 80

Access Grafana at https://grafana.doit.example.com

6.4.7 Cost Optimization for EKS

Use Spot Instances for non-critical workloads:

eks_managed_node_groups = {
  spot = {
    desired_size = 2
    min_size     = 1
    max_size     = 10

    instance_types = ["t3.medium", "t3a.medium"]
    capacity_type  = "SPOT"

    labels = {
      role = "spot"
    }

    taints = [{
      key    = "spot"
      value  = "true"
      effect = "NoSchedule"
    }]
  }
}

Configure pod tolerations for spot:

tolerations:
  - key: "spot"
    operator: "Equal"
    value: "true"
    effect: "NoSchedule"

Use Karpenter for more advanced autoscaling (optional)
Set resource requests accurately
Use Horizontal Pod Autoscaler (already configured in Phase 5)
Use Vertical Pod Autoscaler for right-sizing

6.4.8 EKS Production Checklist

Deliverable: Production-grade Kubernetes cluster on AWS with your application running! 🚀

Comparison: ECS vs EKS

Aspect	ECS Fargate (6.3)	EKS (6.4)
Complexity	Lower	Higher
Cost	$$	$$$
Learning Curve	Easier	Steeper
Portability	AWS only	Any cloud
Control	Less	More
Ecosystem	Limited	Rich (Helm, Operators)
Best For	AWS-first teams	K8s-first teams

Recommendation:

Choose ECS if you want simpler ops and AWS lock-in is OK
Choose EKS if you want Kubernetes skills and multi-cloud portability

6.5 AWS Services Integration

Database (RDS)

Create PostgreSQL RDS instance
Configure automated backups
Set up Multi-AZ for high availability
Configure parameter groups
Set up read replica (optional, costs extra)
Connect app to RDS

Caching (ElastiCache)

Create Redis cluster
Configure cluster mode (disabled for free tier)
Set up parameter groups
Connect app to ElastiCache

Secrets Management

Monitoring (CloudWatch)

Security

Configure security groups (least privilege)
Set up AWS WAF (Web Application Firewall)
Enable VPC Flow Logs
Configure AWS Shield (DDoS protection)
Set up AWS Config rules

Deliverable: Fully deployed, production-ready app on AWS! 🎉

Phase 7: Advanced DevOps & CI/CD

Duration: Weeks 9-11
Theme: Automate everything

7.1 CI/CD Pipeline Enhancement

Current State: CI only (testing, security scanning)
Goal: Full CI/CD with automated deployments

Pipeline Flow:

Code Push to GitHub
  ↓
GitHub Actions CI:
  1. Run tests ✅
  2. Security scan ✅
  3. Code generation verification ✅
  4. Build Docker image
  5. Push to ECR
  6. Update ECS task definition (or K8s manifests)
  7. Deploy to dev environment
  8. Run smoke tests
  9. (Manual approval for prod)
  10. Deploy to production
  11. Run smoke tests
  12. Rollback if failed

Implementation Tasks:

Advanced:

Blue-green deployments
Canary deployments (10% → 50% → 100%)
Feature flags for gradual rollouts

7.2 Database Migration Strategy

What you'll learn:

Running migrations in production safely
Zero-downtime migration patterns
Rollback strategies
Migration automation

Decision: Use golang-migrate CLI in CD pipeline

Implementation Tasks:

Add migration step to CD pipeline
Run migrations before deploying new app version
Implement safe migration patterns:
- Backward compatible migrations
- Separate data from schema changes
Test rollback scenarios
Add migration health checks
Document migration process

Migration Patterns:

Additive changes (add column with default)
Expanding then contracting (multi-step changes)
Data migrations in separate steps

7.3 Environment Management

What you'll learn:

Multi-environment strategy (dev, staging, prod)
Configuration management
Secrets per environment
Environment parity

Environments to Set Up:

Dev (development, auto-deploy from main)
Staging (pre-production, auto-deploy from releases)
Production (manual approval required)

Configuration:

Use Terraform workspaces or separate state files
Environment-specific variables (AWS SSM Parameter Store)
Separate databases per environment
Separate AWS accounts (best practice) or VPCs
Document promotion process (dev → staging → prod)

7.4 Disaster Recovery & Backups

What you'll learn:

RTO (Recovery Time Objective) and RPO (Recovery Point Objective)
Backup strategies
Point-in-time recovery
Multi-AZ and Multi-Region

Implementation Tasks:

Enable automated RDS backups (daily)
Test RDS restore from backup
Set up RDS snapshots before major changes
Configure Redis persistence (AOF or RDB)
Document disaster recovery procedures
Test failover scenarios
Set up Multi-AZ for RDS (high availability)
(Optional) Set up cross-region replication

Recovery Testing:

Test database restore
Test application recovery
Measure actual RTO and RPO
Document lessons learned

Phase 8: Advanced Architecture Patterns

Duration: Weeks 11-13
Theme: Scale and resilience patterns

8.1 API Gateway Pattern

What you'll learn:

Gateway as single entry point
Request routing and transformation
Rate limiting at edge
Authentication at gateway

Options:

AWS API Gateway (managed service)
Build your own simple gateway (learning exercise)

Implementation Tasks:

8.2 Background Jobs & Queues

What you'll learn:

Asynchronous processing
Message queues (SQS)
Worker patterns
Dead letter queues
Retry strategies with exponential backoff

Use Cases:

Send email notifications (don't block HTTP requests)
Update analytics (eventual consistency is fine)
Trigger webhooks
Image processing (if you add file uploads)

Implementation Tasks:

Architecture:

HTTP Request (Create Todo)
  ↓
API: Save to DB, return 201
  ↓
Publish to SQS queue
  ↓
Worker: Process async (send email, update analytics)

Advanced (Optional):

Use AWS SNS for pub/sub (fan-out pattern)
Implement priority queues
Add scheduled jobs (cron-like)

7.3 Rate Limiting & Circuit Breakers

What you'll learn:

Protecting services from overload
Circuit breaker pattern (prevent cascading failures)
Bulkhead pattern (isolate failures)
Retry with exponential backoff (you have this in pkg/retry)

Rate Limiting:

Implement rate limiting per user (100 req/min)
Implement rate limiting per IP (1000 req/min)
Use Redis for distributed rate limiting
Add rate limit headers (X-RateLimit-Remaining)
Return 429 Too Many Requests with Retry-After

Circuit Breaker:

Add circuit breaker for database calls
Add circuit breaker for Redis calls
Add circuit breaker for external APIs (if you add any)
Configure thresholds (fail 5 times → open circuit for 30s)
Add health checks that report circuit state
Test behavior during failures

Library: Use sony/gobreaker or build your own

7.4 Feature Flags

What you'll learn:

Deploy code without releasing features
A/B testing
Gradual rollouts
Kill switches for problematic features

Implementation Tasks:

Use Cases:

Beta features for specific users
Gradual rollout (5% → 25% → 50% → 100%)
Kill switch for buggy features

7.5 Multi-Region Architecture (Theory + Planning)

What you'll learn:

Active-active vs active-passive
Data replication strategies
Latency-based routing (Route 53)
Conflict resolution (last-write-wins, CRDTs)
Global load balancing

Planning Tasks:

Document multi-region strategy
Identify stateless vs stateful components
Plan database replication (RDS cross-region read replica)
Plan cache replication (Redis Global Datastore)
Document trade-offs (consistency vs availability)
Design conflict resolution strategy
Calculate costs for multi-region

Optional Implementation:

Deploy to second AWS region (e.g., us-west-2)
Set up Route 53 latency-based routing
Configure cross-region RDS replica
Test failover scenarios

Phase 8: Production Operations & Scale

Duration: Weeks 11-12+
Theme: Operating at scale

8.1 Performance Optimization

What you'll learn:

Profiling Go applications
Database query optimization
Connection pooling tuning
Memory optimization

Tasks:

Set up Go profiling (pprof)
Profile CPU usage under load
Profile memory allocations
Identify slow database queries (pg_stat_statements)
Add database indexes where needed
Optimize connection pool settings
Reduce allocations in hot paths
Benchmark improvements

8.2 Load Testing & Capacity Planning

What you'll learn:

Load testing tools (k6, Gatling)
Identifying bottlenecks
Capacity planning
Auto-scaling tuning

Tasks:

8.3 Cost Optimization

What you'll learn:

Right-sizing instances
Spot instances for non-critical workloads
Reserved capacity planning
Cost monitoring and alerts

Tasks:

Analyze AWS Cost Explorer
Identify biggest cost drivers
Right-size RDS instances (don't over-provision)
Use Spot instances for workers
Consider Reserved Instances (if usage is stable)
Set up cost anomaly detection
Implement cost allocation tags
Document cost optimization strategies

8.4 Security Hardening

What you'll learn:

Penetration testing basics
OWASP Top 10 mitigation
Security scanning automation
Compliance frameworks

Tasks:

8.5 Compliance & Audit

What you'll learn:

Audit logging
Compliance frameworks (SOC2, GDPR concepts)
Data retention policies
Access controls

Tasks:

Implement comprehensive audit logging
Log all data mutations (who, what, when)
Set up log retention policies
Implement GDPR-style data export
Implement data deletion (right to be forgotten)
Document data handling procedures
Review access controls
Create compliance documentation

📚 Documentation Index

Phase 1: Security & Production Readiness

API Documentation Guide - Swagger/OpenAPI implementation
Security Implementation Guide - OWASP Top 10 & best practices
Security Summary - Quick security reference
OWASP Top 10 Guide - Comprehensive OWASP coverage
Model Pattern - Domain model architecture
Health Endpoints Implementation
Health Endpoints Documentation
Health Quick Reference

Phase 2: Local Infrastructure & Containerization

Docker Multi-Stage Build:

Complete Implementation Guide - Detailed walkthrough
Quick Reference - Essential commands
Visual Guide - Architecture diagrams
Docker README - Docker documentation index

Docker Compose:

Mental Model Guide - Conceptual understanding
Implementation Guide - Complete walkthrough
Quick Reference - Command cheat sheet

Phase Summaries:

Phase 2 Completion Summary - Overall results
Kubernetes Roadmap - Future Kubernetes plans

Phase 3: Observability & Monitoring

Mental Model Guides:

Prometheus Mental Model - Metrics concepts & PromQL
Grafana Mental Model - Dashboards & visualization
Distributed Tracing Mental Model - OpenTelemetry & Jaeger
Observability Overview - Three pillars summary

Implementation Files:

internal/tracing/tracing.go - OpenTelemetry provider setup
internal/tracing/helpers.go - Tracing helper functions
internal/middlewares/tracing_middleware.go - HTTP tracing middleware
infra/docker/grafana/provisioning-dev/datasources/prometheus.yaml - Grafana datasources (Prometheus + Jaeger)
infra/docker/grafana/dashboards/api-overview.json - API metrics dashboard
infra/docker/grafana/dashboards/database-performance.json - Database metrics dashboard

Development Workflow:

docker-compose.infra.yml - Infrastructure-only compose for hybrid development
env.local.example - Local environment variables template

Database & Migrations

Migration Guide - Database migrations
Seeding Guide - Database seeding
Seeding Setup Summary
SQLC/PGX Cheatsheet - Database queries

Testing & Development

Quick References

Quick Reference - Main project reference
Real World Examples - Practical examples
Integration Example - Integration testing

Learning Methodology

Learning Methodology Prompt - How to learn with AI assistance

📚 Learning Resources

Books

"Designing Data-Intensive Applications" by Martin Kleppmann (architecture patterns - MUST READ)
"The Phoenix Project" by Gene Kim (DevOps culture and practices)
"Site Reliability Engineering" by Google (SRE practices)
"Release It!" by Michael Nygard (production-ready software)
"Building Microservices" by Sam Newman (distributed systems)
"Domain-Driven Design" by Eric Evans (software architecture)

AWS

AWS Skill Builder (free courses)
AWS Well-Architected Framework (read this!)
AWS Solutions Library (reference architectures)
AWS Whitepapers (security, performance, cost optimization)
AWS re:Invent videos on YouTube

Go & Architecture

Go official blog (concurrency patterns)
Effective Go (official guide)
Practical Go (Dave Cheney's blog)
The Twelve-Factor App (methodology)

DevOps & Infrastructure

Terraform documentation and tutorials
Docker documentation
Kubernetes documentation (kubernetes.io)
CNCF landscape (cloud native tools)

Monitoring & Observability

Prometheus documentation
Grafana tutorials
OpenTelemetry documentation
Google's SRE books (free online)

🎯 Recommended Starting Point

Since you want to learn DevOps, Backend Architecture, and AWS, here's the optimal path:

Weeks 1-2: Quick Wins

Authentication (Phase 1.1) - 4 days
Health checks (Phase 1.3) - 1 day
Docker + Docker Compose (Phase 2) - 3 days
Observability basics (Phase 3.1-3.2) - 4 days

Result: Secure API with monitoring, running in containers

Weeks 3-4: Architecture

Redis caching (Phase 4.1) - 3 days
Repository pattern (Phase 4.2) - 2 days
Events (Phase 4.4) - 3 days
API docs (Phase 1.2) - 2 days

Result: Well-architected, documented API with caching

Weeks 5-7: AWS Deployment

Terraform basics (Phase 5.2) - 5 days
Deploy to ECS (Phase 5.3) - 7 days
Monitoring on AWS (Phase 5.5) - 2 days

Result: Production app running on AWS!

Weeks 8+: Advanced Topics

Pick what interests you most from Phases 6-8

💰 Budget Considerations

AWS Free Tier (12 months)

ECS: 50 GB/month free
RDS: 750 hours/month t2.micro or t3.micro
ElastiCache: 750 hours/month t2.micro or t3.micro
ALB: 15 LCUs per month
CloudWatch: 10 custom metrics

Estimated Monthly Cost (After Free Tier)

Minimal: $20-30/month (single small instance)
Dev environment: $50-75/month
Production-like: $150-200/month (multi-AZ, monitoring, etc.)

Cost Saving Tips

Use Terraform to destroy environments when not in use
Use AWS Budgets and alerts
Start with smallest instance sizes
Use Spot instances for workers

✅ Success Criteria

By the end of this roadmap, you will have:

✅ Production-ready Go REST API with authentication
✅ Comprehensive observability (logs, metrics, traces)
✅ Full Docker and Docker Compose setup
✅ Applied multiple architecture patterns (CQRS, events, caching, repository)
✅ Deployed to AWS with IaC (Terraform)
✅ CI/CD pipeline with automated testing and deployment
✅ Understanding of 10+ AWS services
✅ Real-world DevOps experience
✅ Portfolio project to show employers
✅ Deep understanding of production systems

📝 Tracking Your Progress

Update this file as you go!

Mark checkboxes as you complete tasks
Add notes on what you learned
Document challenges and solutions
Track time spent on each phase
Celebrate wins! 🎉

Additional Tracking:

Keep a learning journal (daily or weekly)
Take notes on problems you solved
Document architecture decisions (ADRs)
Build a portfolio README showcasing what you built

🚀 Ready to Start?

Pick a phase and dive in! I recommend starting with Phase 1.1 (Authentication) - it's immediately useful and touches all layers of your application.

Good luck on your learning journey! 🎓

Last Updated: December 26, 2025
Project: doit (Go REST API with PostgreSQL)
Focus: Backend Architecture • DevOps • AWS • Containerization • Observability

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.github/workflows		.github/workflows
.vscode		.vscode
api		api
cmd		cmd
docs		docs
infra		infra
internal		internal
pkg		pkg
scripts		scripts
.dockerignore		.dockerignore
.env		.env
.env.example		.env.example
.gitignore		.gitignore
API_DOCUMENTATION_GUIDE.md		API_DOCUMENTATION_GUIDE.md
INTEGRATION_EXAMPLE.md		INTEGRATION_EXAMPLE.md
KUBERNETES_ROADMAP_SUMMARY.md		KUBERNETES_ROADMAP_SUMMARY.md
LEARNING_METHODOLOGY_PROMPT.md		LEARNING_METHODOLOGY_PROMPT.md
MODEL_PATTERN.md		MODEL_PATTERN.md
Makefile		Makefile
OWASP_TOP_10_GUIDE.md		OWASP_TOP_10_GUIDE.md
PHASE_2_COMPLETION_SUMMARY.md		PHASE_2_COMPLETION_SUMMARY.md
QUICK_REFERENCE.md		QUICK_REFERENCE.md
README.md		README.md
REAL_WORLD_EXAMPLES.md		REAL_WORLD_EXAMPLES.md
SECURITY_IMPLEMENTATION_GUIDE.md		SECURITY_IMPLEMENTATION_GUIDE.md
SECURITY_SUMMARY.md		SECURITY_SUMMARY.md
docker-compose.infra.yml		docker-compose.infra.yml
docker-compose.yml		docker-compose.yml
entrypoint.sh		entrypoint.sh
env.local.example		env.local.example
go.mod		go.mod
go.sum		go.sum
sqlc.yaml		sqlc.yaml

Folders and files

Latest commit

History

Repository files navigation

Learning Roadmap: Backend Architecture, DevOps & AWS

📊 Progress Tracker

Phase 1: Security & Production Readiness

1.1 Authentication & Authorization ⭐ CRITICAL

1.2 API Documentation (OpenAPI/Swagger)

1.3 Graceful Shutdown & Health Checks

Phase 2: Local Infrastructure & Containerization

2.1 Docker Multi-Stage Build ✅

2.2 Docker Compose - Full Local Stack ✅ COMPLETED

Phase 3: Observability & Monitoring ✅ COMPLETED

3.1 Structured Logging with Context

3.2 Metrics with Prometheus

3.3 Distributed Tracing (OpenTelemetry) ✅ COMPLETED

Phase 4: Architecture Patterns & Caching

4.1 Caching Layer with Redis

4.2 Repository Pattern (Abstraction Layer)

4.3 CQRS Pattern (Light Version)

4.4 Event-Driven Architecture (Basic)

4.5 Integration Tests with Real Dependencies

Phase 5: Kubernetes & Helm Charts

5.1 Kubernetes Fundamentals

5.2 Kubernetes Manifests for DoIt API

5.2.1 Namespace

5.2.2 ConfigMap

5.2.3 Secret

5.2.4 Deployment (Your API)

5.2.5 Service

5.2.6 PostgreSQL Deployment

5.2.7 Redis Deployment

5.3 Advanced Kubernetes Patterns

5.3.1 Horizontal Pod Autoscaler

5.3.2 Ingress Controller

5.3.3 Network Policies

5.3.4 Pod Disruption Budget

5.4 Helm Charts - Package Management for Kubernetes

5.4.1 Helm Basics

5.4.2 Create Your Own Helm Chart

5.4.3 Helm Templating

5.4.4 Helm Dependencies

5.4.5 Helm Hooks

5.4.6 Chart Testing

5.5 Deploying with Helm

5.6 Observability in Kubernetes

5.7 Production Kubernetes Best Practices

5.7.1 Resource Quotas

5.7.2 Pod Security Standards

5.7.3 RBAC

5.8 Local Kubernetes Testing Tools

5.9 CI/CD with Kubernetes & Helm

Phase 6: AWS Deployment Foundation

6.1 AWS Account Setup & Fundamentals

6.2 Infrastructure as Code (Terraform)

6.3 Deployment Path A: ECS Fargate (Simpler, Managed)

6.4 Deployment Path B: EKS (Kubernetes on AWS) - Production Grade

6.4.1 EKS Cluster Creation (Terraform)

6.4.2 AWS Load Balancer Controller

6.4.3 Deploy Your Helm Chart to EKS

6.4.4 AWS Secrets Manager Integration

6.4.5 Cluster Autoscaler

6.4.6 Monitoring on EKS

6.4.7 Cost Optimization for EKS

6.4.8 EKS Production Checklist

6.5 AWS Services Integration

Database (RDS)

Caching (ElastiCache)

Secrets Management

Monitoring (CloudWatch)

Security

Phase 7: Advanced DevOps & CI/CD

7.1 CI/CD Pipeline Enhancement

7.2 Database Migration Strategy

7.3 Environment Management

7.4 Disaster Recovery & Backups

Phase 8: Advanced Architecture Patterns

8.1 API Gateway Pattern

8.2 Background Jobs & Queues

Packages