Skip to content

bhaktofmahakal/microservices-YC

Repository files navigation

Microservices Observability Stack

Kubernetes Prometheus Grafana Loki Go License

A production-grade Kubernetes observability stack built on top of Google's microservices-demo. Provides full-stack visibility — metrics, logs, and dashboards — with custom application instrumentation for the productcatalogservice.


Table of Contents


Architecture

┌─────────────────────────────────────────────────────────┐
│                    Kubernetes Cluster                    │
│                                                         │
│  ┌──────────────── hipster-shop namespace ───────────┐  │
│  │  productcatalogservice (:8080 gRPC, :8888 metrics) │  │
│  │  + 10 other microservices (frontend, cart, …)     │  │
│  └────────────────────────────────────────────────────┘  │
│                          │ scrape /metrics               │
│  ┌──────────────── monitoring namespace ─────────────┐  │
│  │  Prometheus  ◄── ServiceMonitor (30s interval)    │  │
│  │  Grafana     ←── dashboards (drdroid-dashboard)   │  │
│  │  Loki        ←── Promtail (pod log shipping)      │  │
│  └────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────┘
Layer Component Role
Orchestration Kubernetes (Kind) Container scheduling & service discovery
Application microservices-demo (Hipster Shop) 11-service e-commerce workload
Metrics Prometheus + kube-prometheus-stack Collection, storage, alerting
Visualization Grafana Dashboards & alerting UI
Logs Loki + Promtail Log aggregation & querying
Instrumentation Go promhttp in productcatalogservice Custom process & runtime metrics

Key Features

  • Custom Application Metrics — Process memory, goroutines, and GC pause times from productcatalogservice via a dedicated /metrics endpoint.
  • Kubernetes Infrastructure Metrics — Pod CPU/memory and node-level metrics via kube-prometheus-stack.
  • Centralized Logging — Structured log aggregation from all pods via Loki + Promtail.
  • Pre-built Grafana Dashboard — 7-panel dashboard covering application, infrastructure, and log observability (importable from drdroid-dashboard.json).
  • Automatic Service Discovery — Prometheus discovers the metrics endpoint through a ServiceMonitor CRD — no static scrape config required.

Prerequisites

Ensure the following tools are installed and available in your PATH before proceeding:

Tool Minimum Version Install
Docker 20.10+ docs.docker.com
kubectl 1.27+ kubernetes.io
Kind 0.20+ kind.sigs.k8s.io
Helm 3.12+ helm.sh

Note: Any CNCF-conformant Kubernetes cluster (EKS, GKE, AKS, k3s, minikube) can be used in place of Kind.


Quick Start

1. Create a local cluster

kind create cluster --name observability
kubectl config use-context kind-observability

2. Create namespaces

kubectl create namespace hipster-shop
kubectl create namespace monitoring

3. Deploy the microservices

# Clone Google's microservices-demo if not already present
git clone https://github.com/GoogleCloudPlatform/microservices-demo.git

kubectl apply -f microservices-demo/release/kubernetes-manifests.yaml -n hipster-shop

4. Add Helm repositories

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

5. Install the monitoring stack

# kube-prometheus-stack (Prometheus + Grafana + Alertmanager)
helm install kps prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --set grafana.adminPassword="${GRAFANA_ADMIN_PASSWORD:-changeme}" \
  --set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false

# Loki log aggregation (reuses existing Grafana instance)
helm install loki grafana/loki-stack \
  --namespace monitoring \
  --set grafana.enabled=false \
  --set promtail.enabled=true

6. Apply the ServiceMonitor

kubectl apply -f productcatalogservice-servicemonitor.yaml

7. Import the Grafana dashboard

# Port-forward Grafana
kubectl port-forward svc/kps-grafana 3000:80 -n monitoring &

# Import dashboard via API
curl -s -u "admin:${GRAFANA_ADMIN_PASSWORD:-changeme}" \
  -X POST http://localhost:3000/api/dashboards/import \
  -H "Content-Type: application/json" \
  -d "{\"dashboard\": $(cat drdroid-dashboard.json), \"overwrite\": true, \"folderId\": 0}"

8. Access Grafana

Open http://localhost:3000 and log in with admin / <GRAFANA_ADMIN_PASSWORD>.


Production Deployment

Resource Limits

Always set resource requests and limits for monitoring components to protect cluster stability:

helm upgrade kps prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --set prometheus.prometheusSpec.resources.requests.cpu=200m \
  --set prometheus.prometheusSpec.resources.requests.memory=512Mi \
  --set prometheus.prometheusSpec.resources.limits.cpu=1000m \
  --set prometheus.prometheusSpec.resources.limits.memory=2Gi \
  --set grafana.resources.requests.cpu=100m \
  --set grafana.resources.requests.memory=128Mi

Persistent Storage

Enable persistent volumes so metrics and dashboards survive pod restarts:

helm upgrade kps prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=20Gi \
  --set grafana.persistence.enabled=true \
  --set grafana.persistence.size=5Gi

High Availability

For production clusters, run multiple replicas of Grafana and use Thanos or Cortex for Prometheus HA:

helm upgrade kps prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --set grafana.replicas=2

TLS / Ingress

Expose Grafana behind an Ingress controller with TLS rather than using kubectl port-forward:

# Example Ingress (cert-manager + nginx-ingress)
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: grafana
  namespace: monitoring
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
  ingressClassName: nginx
  tls:
    - hosts: [grafana.example.com]
      secretName: grafana-tls
  rules:
    - host: grafana.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: kps-grafana
                port:
                  number: 80

Alerting

Configure Alertmanager to route critical alerts to your notification channel (Slack, PagerDuty, email):

helm upgrade kps prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --set alertmanager.config.global.slack_api_url="https://hooks.slack.com/services/YOUR/WEBHOOK"

Custom Instrumentation

productcatalogservice metrics endpoint

productcatalogservice exposes Prometheus metrics on port 8888 via a lightweight HTTP server added to server.go:

go func() {
    http.Handle("/metrics", promhttp.Handler())
    log.Infof("Starting metrics server on :8888")
    if err := http.ListenAndServe(":8888", nil); err != nil {
        log.Errorf("Metrics server error: %v", err)
    }
}()

Exposed metrics:

Metric Type Description
process_resident_memory_bytes Gauge Resident set size in bytes
process_virtual_memory_bytes Gauge Virtual memory size in bytes
go_goroutines Gauge Number of active goroutines
go_gc_duration_seconds Summary GC stop-the-world pause durations
go_threads Gauge Number of OS threads created

ServiceMonitor

The productcatalogservice-servicemonitor.yaml file registers the metrics endpoint with Prometheus Operator so that Prometheus automatically discovers and scrapes it every 30 seconds — no manual scrape config required.

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: productcatalogservice-monitor
  namespace: hipster-shop
  labels:
    release: kps          # must match kube-prometheus-stack release name
spec:
  selector:
    matchLabels:
      app: productcatalogservice
  endpoints:
    - port: metrics
      interval: 30s

If your Helm release name differs from kps, update the release label accordingly.


Dashboard

The drdroid-dashboard.json file contains a ready-to-import Grafana dashboard with seven panels:

# Panel Data Source
1 Product Catalog — Resident Memory Prometheus
2 Product Catalog — Virtual Memory Prometheus
3 Product Catalog — Active Goroutines Prometheus
4 Product Catalog — GC 99th Percentile Pause Prometheus
5 All Pods — CPU Usage Rate Prometheus
6 All Pods — Memory Usage Prometheus
7 Application Logs — All Microservices Loki

Import via Grafana UI: Dashboards → Import → Upload JSON file → select drdroid-dashboard.json.


Verification

Check Prometheus targets

kubectl port-forward svc/kps-kube-prometheus-stack-prometheus 9090:9090 -n monitoring

Open http://localhost:9090/targets and verify productcatalogservice-metrics shows UP.

Query custom metrics

# Resident memory
process_resident_memory_bytes{job="productcatalogservice-metrics"}

# GC rate
rate(go_gc_duration_seconds_count{job="productcatalogservice-metrics"}[5m])

# Active goroutines
go_goroutines{job="productcatalogservice-metrics"}

Query logs

# All hipster-shop logs
{namespace="hipster-shop"}

# Product catalog logs only
{namespace="hipster-shop", pod=~"productcatalogservice-.*"}

Validate metrics endpoint directly

POD=$(kubectl get pod -n hipster-shop -l app=productcatalogservice -o jsonpath='{.items[0].metadata.name}')
kubectl exec -n hipster-shop "$POD" -- wget -qO- http://localhost:8888/metrics | head -20

Configuration Reference

Parameter Default Description
GRAFANA_ADMIN_PASSWORD changeme Grafana admin password (set via env var or Helm value)
Prometheus scrape interval 30s Configurable per ServiceMonitor .spec.endpoints[].interval
Prometheus retention 10d Set via --set prometheus.prometheusSpec.retention=30d
Loki retention 744h (31 days) Set via --set loki.config.table_manager.retention_period=720h
Metrics port 8888 Port exposed by productcatalogservice for /metrics

Troubleshooting

Metrics not appearing in Prometheus

# 1. Confirm the ServiceMonitor exists
kubectl get servicemonitor -n hipster-shop

# 2. Confirm productcatalogservice pods are running
kubectl get pods -n hipster-shop -l app=productcatalogservice

# 3. Verify the /metrics endpoint is reachable inside the cluster
POD=$(kubectl get pod -n hipster-shop -l app=productcatalogservice -o jsonpath='{.items[0].metadata.name}')
kubectl exec -n hipster-shop "$POD" -- wget -qO- http://localhost:8888/metrics | grep process_

# 4. Check Prometheus is discovering the target
kubectl port-forward svc/kps-kube-prometheus-stack-prometheus 9090:9090 -n monitoring
# Open http://localhost:9090/targets

Logs not appearing in Loki

# 1. Check Loki and Promtail pods are healthy
kubectl get pods -n monitoring -l app=loki
kubectl get pods -n monitoring -l app=promtail

# 2. Check Promtail logs for shipping errors
kubectl logs -n monitoring -l app=promtail --tail=50

# 3. Verify Loki data source in Grafana
# Grafana → Configuration → Data Sources → Loki → Test

Grafana shows "No data"

  • Confirm the Loki data source URL matches the service: http://loki:3100
  • Verify the Prometheus data source URL: http://kps-kube-prometheus-stack-prometheus:9090
  • Check that the dashboard time range covers a period when metrics were being generated.

Kind cluster out of resources

# Check node resource usage
kubectl top nodes

# Describe a pod that is Pending
kubectl describe pod <pod-name> -n hipster-shop

Security Considerations

  • Change default credentials — replace the default changeme Grafana password with a strong, unique secret. Store it in a Kubernetes Secret and reference it in Helm:

    # Create the secret first
    kubectl create secret generic grafana-admin-secret \
      --namespace monitoring \
      --from-literal=admin-password="$(openssl rand -base64 24)"
    
    # Reference the secret in Helm (instead of --set grafana.adminPassword)
    helm install kps prometheus-community/kube-prometheus-stack \
      --namespace monitoring \
      --set grafana.admin.existingSecret=grafana-admin-secret \
      --set grafana.admin.passwordKey=admin-password
  • Restrict network access — do not expose Prometheus, Grafana, or Loki directly on public IPs. Use an authenticated Ingress or a VPN.

  • RBACkube-prometheus-stack creates the necessary RBAC resources; avoid granting cluster-admin to monitoring service accounts.

  • TLS — terminate TLS at the Ingress layer (see Production Deployment → TLS / Ingress).

  • Image scanning — regularly scan all container images for CVEs using tools such as Trivy or Grype.

  • Secrets rotation — rotate Grafana admin passwords and any webhook tokens on a regular schedule.


Contributing

Contributions are welcome. Please follow these steps:

  1. Fork the repository and create a feature branch: git checkout -b feature/your-change
  2. Make your changes with clear, descriptive commits.
  3. Ensure any new Kubernetes manifests pass kubectl apply --dry-run=client -f <file>.
  4. Open a Pull Request with a clear description of the problem and solution.

Project Structure

.
├── drdroid-dashboard.json                  # Grafana dashboard (7 panels)
├── productcatalogservice-servicemonitor.yaml  # Prometheus ServiceMonitor CRD
└── README.md                               # This file

Tech Stack

Component Purpose Version
Kubernetes Container orchestration 1.27+
Kind Local cluster provisioning 0.20+
Prometheus Metrics collection & storage 2.45+
Grafana Visualization & dashboards 10.0+
Loki Log aggregation 2.9+
Promtail Log shipping agent 2.9+
Go productcatalogservice runtime 1.21
Helm Package management 3.12+

License

This project builds on Google Cloud Platform's microservices-demo, which is licensed under the Apache License 2.0.

Additions and modifications in this repository are also released under the Apache License 2.0.

About

A complete Kubernetes-based observability solution for Google's microservices-demo with custom application instrumentation, centralized logging, and real-time dashboards.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors