A production-grade Kubernetes observability stack built on top of Google's microservices-demo. Provides full-stack visibility — metrics, logs, and dashboards — with custom application instrumentation for the productcatalogservice.
- Architecture
- Key Features
- Prerequisites
- Quick Start
- Production Deployment
- Custom Instrumentation
- Dashboard
- Verification
- Configuration Reference
- Troubleshooting
- Security Considerations
- Contributing
- License
┌─────────────────────────────────────────────────────────┐
│ Kubernetes Cluster │
│ │
│ ┌──────────────── hipster-shop namespace ───────────┐ │
│ │ productcatalogservice (:8080 gRPC, :8888 metrics) │ │
│ │ + 10 other microservices (frontend, cart, …) │ │
│ └────────────────────────────────────────────────────┘ │
│ │ scrape /metrics │
│ ┌──────────────── monitoring namespace ─────────────┐ │
│ │ Prometheus ◄── ServiceMonitor (30s interval) │ │
│ │ Grafana ←── dashboards (drdroid-dashboard) │ │
│ │ Loki ←── Promtail (pod log shipping) │ │
│ └────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
| Layer | Component | Role |
|---|---|---|
| Orchestration | Kubernetes (Kind) | Container scheduling & service discovery |
| Application | microservices-demo (Hipster Shop) | 11-service e-commerce workload |
| Metrics | Prometheus + kube-prometheus-stack | Collection, storage, alerting |
| Visualization | Grafana | Dashboards & alerting UI |
| Logs | Loki + Promtail | Log aggregation & querying |
| Instrumentation | Go promhttp in productcatalogservice |
Custom process & runtime metrics |
- Custom Application Metrics — Process memory, goroutines, and GC pause times from
productcatalogservicevia a dedicated/metricsendpoint. - Kubernetes Infrastructure Metrics — Pod CPU/memory and node-level metrics via
kube-prometheus-stack. - Centralized Logging — Structured log aggregation from all pods via Loki + Promtail.
- Pre-built Grafana Dashboard — 7-panel dashboard covering application, infrastructure, and log observability (importable from
drdroid-dashboard.json). - Automatic Service Discovery — Prometheus discovers the metrics endpoint through a
ServiceMonitorCRD — no static scrape config required.
Ensure the following tools are installed and available in your PATH before proceeding:
| Tool | Minimum Version | Install |
|---|---|---|
| Docker | 20.10+ | docs.docker.com |
| kubectl | 1.27+ | kubernetes.io |
| Kind | 0.20+ | kind.sigs.k8s.io |
| Helm | 3.12+ | helm.sh |
Note: Any CNCF-conformant Kubernetes cluster (EKS, GKE, AKS, k3s, minikube) can be used in place of Kind.
kind create cluster --name observability
kubectl config use-context kind-observabilitykubectl create namespace hipster-shop
kubectl create namespace monitoring# Clone Google's microservices-demo if not already present
git clone https://github.com/GoogleCloudPlatform/microservices-demo.git
kubectl apply -f microservices-demo/release/kubernetes-manifests.yaml -n hipster-shophelm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update# kube-prometheus-stack (Prometheus + Grafana + Alertmanager)
helm install kps prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--set grafana.adminPassword="${GRAFANA_ADMIN_PASSWORD:-changeme}" \
--set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false
# Loki log aggregation (reuses existing Grafana instance)
helm install loki grafana/loki-stack \
--namespace monitoring \
--set grafana.enabled=false \
--set promtail.enabled=truekubectl apply -f productcatalogservice-servicemonitor.yaml# Port-forward Grafana
kubectl port-forward svc/kps-grafana 3000:80 -n monitoring &
# Import dashboard via API
curl -s -u "admin:${GRAFANA_ADMIN_PASSWORD:-changeme}" \
-X POST http://localhost:3000/api/dashboards/import \
-H "Content-Type: application/json" \
-d "{\"dashboard\": $(cat drdroid-dashboard.json), \"overwrite\": true, \"folderId\": 0}"Open http://localhost:3000 and log in with admin / <GRAFANA_ADMIN_PASSWORD>.
Always set resource requests and limits for monitoring components to protect cluster stability:
helm upgrade kps prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--set prometheus.prometheusSpec.resources.requests.cpu=200m \
--set prometheus.prometheusSpec.resources.requests.memory=512Mi \
--set prometheus.prometheusSpec.resources.limits.cpu=1000m \
--set prometheus.prometheusSpec.resources.limits.memory=2Gi \
--set grafana.resources.requests.cpu=100m \
--set grafana.resources.requests.memory=128MiEnable persistent volumes so metrics and dashboards survive pod restarts:
helm upgrade kps prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=20Gi \
--set grafana.persistence.enabled=true \
--set grafana.persistence.size=5GiFor production clusters, run multiple replicas of Grafana and use Thanos or Cortex for Prometheus HA:
helm upgrade kps prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--set grafana.replicas=2Expose Grafana behind an Ingress controller with TLS rather than using kubectl port-forward:
# Example Ingress (cert-manager + nginx-ingress)
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: grafana
namespace: monitoring
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
ingressClassName: nginx
tls:
- hosts: [grafana.example.com]
secretName: grafana-tls
rules:
- host: grafana.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: kps-grafana
port:
number: 80Configure Alertmanager to route critical alerts to your notification channel (Slack, PagerDuty, email):
helm upgrade kps prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--set alertmanager.config.global.slack_api_url="https://hooks.slack.com/services/YOUR/WEBHOOK"productcatalogservice exposes Prometheus metrics on port 8888 via a lightweight HTTP server added to server.go:
go func() {
http.Handle("/metrics", promhttp.Handler())
log.Infof("Starting metrics server on :8888")
if err := http.ListenAndServe(":8888", nil); err != nil {
log.Errorf("Metrics server error: %v", err)
}
}()Exposed metrics:
| Metric | Type | Description |
|---|---|---|
process_resident_memory_bytes |
Gauge | Resident set size in bytes |
process_virtual_memory_bytes |
Gauge | Virtual memory size in bytes |
go_goroutines |
Gauge | Number of active goroutines |
go_gc_duration_seconds |
Summary | GC stop-the-world pause durations |
go_threads |
Gauge | Number of OS threads created |
The productcatalogservice-servicemonitor.yaml file registers the metrics endpoint with Prometheus Operator so that Prometheus automatically discovers and scrapes it every 30 seconds — no manual scrape config required.
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: productcatalogservice-monitor
namespace: hipster-shop
labels:
release: kps # must match kube-prometheus-stack release name
spec:
selector:
matchLabels:
app: productcatalogservice
endpoints:
- port: metrics
interval: 30sIf your Helm release name differs from
kps, update thereleaselabel accordingly.
The drdroid-dashboard.json file contains a ready-to-import Grafana dashboard with seven panels:
| # | Panel | Data Source |
|---|---|---|
| 1 | Product Catalog — Resident Memory | Prometheus |
| 2 | Product Catalog — Virtual Memory | Prometheus |
| 3 | Product Catalog — Active Goroutines | Prometheus |
| 4 | Product Catalog — GC 99th Percentile Pause | Prometheus |
| 5 | All Pods — CPU Usage Rate | Prometheus |
| 6 | All Pods — Memory Usage | Prometheus |
| 7 | Application Logs — All Microservices | Loki |
Import via Grafana UI: Dashboards → Import → Upload JSON file → select drdroid-dashboard.json.
kubectl port-forward svc/kps-kube-prometheus-stack-prometheus 9090:9090 -n monitoringOpen http://localhost:9090/targets and verify productcatalogservice-metrics shows UP.
# Resident memory
process_resident_memory_bytes{job="productcatalogservice-metrics"}
# GC rate
rate(go_gc_duration_seconds_count{job="productcatalogservice-metrics"}[5m])
# Active goroutines
go_goroutines{job="productcatalogservice-metrics"}
# All hipster-shop logs
{namespace="hipster-shop"}
# Product catalog logs only
{namespace="hipster-shop", pod=~"productcatalogservice-.*"}
POD=$(kubectl get pod -n hipster-shop -l app=productcatalogservice -o jsonpath='{.items[0].metadata.name}')
kubectl exec -n hipster-shop "$POD" -- wget -qO- http://localhost:8888/metrics | head -20| Parameter | Default | Description |
|---|---|---|
GRAFANA_ADMIN_PASSWORD |
changeme |
Grafana admin password (set via env var or Helm value) |
| Prometheus scrape interval | 30s |
Configurable per ServiceMonitor .spec.endpoints[].interval |
| Prometheus retention | 10d |
Set via --set prometheus.prometheusSpec.retention=30d |
| Loki retention | 744h (31 days) |
Set via --set loki.config.table_manager.retention_period=720h |
| Metrics port | 8888 |
Port exposed by productcatalogservice for /metrics |
# 1. Confirm the ServiceMonitor exists
kubectl get servicemonitor -n hipster-shop
# 2. Confirm productcatalogservice pods are running
kubectl get pods -n hipster-shop -l app=productcatalogservice
# 3. Verify the /metrics endpoint is reachable inside the cluster
POD=$(kubectl get pod -n hipster-shop -l app=productcatalogservice -o jsonpath='{.items[0].metadata.name}')
kubectl exec -n hipster-shop "$POD" -- wget -qO- http://localhost:8888/metrics | grep process_
# 4. Check Prometheus is discovering the target
kubectl port-forward svc/kps-kube-prometheus-stack-prometheus 9090:9090 -n monitoring
# Open http://localhost:9090/targets# 1. Check Loki and Promtail pods are healthy
kubectl get pods -n monitoring -l app=loki
kubectl get pods -n monitoring -l app=promtail
# 2. Check Promtail logs for shipping errors
kubectl logs -n monitoring -l app=promtail --tail=50
# 3. Verify Loki data source in Grafana
# Grafana → Configuration → Data Sources → Loki → Test- Confirm the Loki data source URL matches the service:
http://loki:3100 - Verify the Prometheus data source URL:
http://kps-kube-prometheus-stack-prometheus:9090 - Check that the dashboard time range covers a period when metrics were being generated.
# Check node resource usage
kubectl top nodes
# Describe a pod that is Pending
kubectl describe pod <pod-name> -n hipster-shop-
Change default credentials — replace the default
changemeGrafana password with a strong, unique secret. Store it in a KubernetesSecretand reference it in Helm:# Create the secret first kubectl create secret generic grafana-admin-secret \ --namespace monitoring \ --from-literal=admin-password="$(openssl rand -base64 24)" # Reference the secret in Helm (instead of --set grafana.adminPassword) helm install kps prometheus-community/kube-prometheus-stack \ --namespace monitoring \ --set grafana.admin.existingSecret=grafana-admin-secret \ --set grafana.admin.passwordKey=admin-password
-
Restrict network access — do not expose Prometheus, Grafana, or Loki directly on public IPs. Use an authenticated Ingress or a VPN.
-
RBAC —
kube-prometheus-stackcreates the necessary RBAC resources; avoid grantingcluster-adminto monitoring service accounts. -
TLS — terminate TLS at the Ingress layer (see Production Deployment → TLS / Ingress).
-
Image scanning — regularly scan all container images for CVEs using tools such as Trivy or Grype.
-
Secrets rotation — rotate Grafana admin passwords and any webhook tokens on a regular schedule.
Contributions are welcome. Please follow these steps:
- Fork the repository and create a feature branch:
git checkout -b feature/your-change - Make your changes with clear, descriptive commits.
- Ensure any new Kubernetes manifests pass
kubectl apply --dry-run=client -f <file>. - Open a Pull Request with a clear description of the problem and solution.
.
├── drdroid-dashboard.json # Grafana dashboard (7 panels)
├── productcatalogservice-servicemonitor.yaml # Prometheus ServiceMonitor CRD
└── README.md # This file
| Component | Purpose | Version |
|---|---|---|
| Kubernetes | Container orchestration | 1.27+ |
| Kind | Local cluster provisioning | 0.20+ |
| Prometheus | Metrics collection & storage | 2.45+ |
| Grafana | Visualization & dashboards | 10.0+ |
| Loki | Log aggregation | 2.9+ |
| Promtail | Log shipping agent | 2.9+ |
| Go | productcatalogservice runtime |
1.21 |
| Helm | Package management | 3.12+ |
This project builds on Google Cloud Platform's microservices-demo, which is licensed under the Apache License 2.0.
Additions and modifications in this repository are also released under the Apache License 2.0.