Monitoring redpanda inside Kubernetes #2759

rauanmayemir · 2021-10-23T12:10:59Z

rauanmayemir
Oct 23, 2021

Redpanda has great tooling when it comes to monitoring your cluster with Grafana and Prometheus.

However, it requires knowing your observability stack and being able to configure prometheus which I don't. I wanted something simple that will just show me the metrics, graphs, and whatnot.

Let's assume that we already have Redpanda up and running under redpanda-ns namespace, and jump in to setting up our monitoring.

Prometheus Operator

We're going to use Prometheus Operator. One would think that using a k8s operator complicates the stack, not simplifies it. But to me it turned out to be a very simple and robust solution once you grasp the concept of service monitors.

Installing the operator is just a one-yamler bundle with all the stuff, but for some reason it inconveniently installs everything in the default namespace, so unless it doesn't bother you we're going to patch it to use a monitoring-ns namespace instead (we need to create it ourselves).

curl https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.51.2/bundle.yaml > prom_operator-v0.51.2.yaml
sed -i "s/namespace: default/namespace: monitoring-ns/g" prom_operator-v0.51.2.yaml
kubectl apply -f ./prom_operator-v0.51.2.yaml

Let's also create a service account and a cluster role:

apiVersion: v1  
kind: ServiceAccount  
metadata:  
  name: prometheus-agent  
  namespace: monitoring-ns
---  
apiVersion: rbac.authorization.k8s.io/v1  
kind: ClusterRole  
metadata:  
  name: prometheus-agent  
rules:  
- apiGroups: [""]  
  resources:  
  - namespaces  
  - endpoints  
  - services  
  - nodes  
  - nodes/metrics  
  - nodes/proxy  
  - pods  
  verbs: ["get", "list", "watch"]  
- apiGroups: [""]  
  resources:  
  - configmaps  
  verbs: ["get"]  
- apiGroups:  
  - networking.k8s.io  
  resources:  
  - ingresses  
  verbs: ["get", "list", "watch"]  
- nonResourceURLs: ["/metrics", "/api/*"]  
  verbs: ["get"]  
---  
apiVersion: rbac.authorization.k8s.io/v1  
kind: ClusterRoleBinding  
metadata:  
  name: prometheus-agent  
roleRef:  
  apiGroup: rbac.authorization.k8s.io  
  kind: ClusterRole  
  name: prometheus-agent  
subjects:  
- kind: ServiceAccount  
  name: prometheus-agent  
  namespace: monitoring-ns

I just digged all the parameters from somewhere, our agent might need less permissions.

Prometheus instance

Prometheus operator operates prometheuses, and we need an actual Prometheus instance that's going to collect the metrics.

apiVersion: monitoring.coreos.com/v1  
kind: Prometheus  
metadata:  
  name: redpanda  
  namespace: monitoring-ns
  labels:  
    app: prometheus-redpanda  
spec:  
  image: quay.io/prometheus/prometheus:v2.30.2  
  replicas: 1  
  securityContext:
    fsGroup: 2000
    runAsNonRoot: true
    runAsUser: 1000  
  serviceAccountName: prometheus-agent  
  version: v2.28.1  
  serviceMonitorSelector:  
    matchLabels:  
      monitor: prometheus-redpanda  
  enableAdminAPI: true  
---  
apiVersion: v1  
kind: Service  
metadata:  
  name: prometheus-redpanda  
  namespace: monitoring-ns
  labels:  
    app: prometheus-redpanda  
spec:  
  ports:  
  - name: web  
    port: 9090  
    targetPort: web  
  selector:  
    prometheus: redpanda

This will create a prometheus deployment and a dedicated service in case we want to use separate instances for our workloads.

Service Monitor

Now we need to tell our prometheus to start monitoring Redpanda.

apiVersion: monitoring.coreos.com/v1  
kind: ServiceMonitor  
metadata:  
  name: redpanda-metrics-monitor  
  namespace: monitoring-ns
  labels:  
    monitor: prometheus-redpanda  
spec:  
  endpoints:  
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token  
    path: /metrics  
    port: admin
    scheme: http
  namespaceSelector:  
    matchNames:
    - redpanda-ns
  selector:  
    matchLabels:  
      app.kubernetes.io/component: redpanda  
      app.kubernetes.io/instance: redpanda  
      app.kubernetes.io/name: redpanda

The important part here is setting the right namespace and labels selectors, we assume that the cluster is called redpanda.

Grafana

Data is being collected in prometheus, now we need to show it in Grafana.

First, we create a prometheus datasource and set its http url to http://prometheus-redpanda.monitoring-ns:9090.

Then we generate a grafana dashboard:

rpk generate grafana-dashboard --datasource='datasource-uid' --metrics-endpoint='redpanda.redpanda-ns.svc.cluster.local:9644'

The dashboard will show the state of the Redpanda cluster, but frankly I don't understand most of the graphs because they're all about low-level maintenance.

We want something higher-level and also monitor our actual data: topics, partitions, etc.

Kminion

Kminion is a prometheus exporter that we need to install and operate separately, but the value it gives is so worth it.

helm repo add kminion https://raw.githubusercontent.com/cloudhut/kminion/master/charts/archives
helm repo update
helm install -f ./helm-kminion-values.yaml -n redpanda-ns kminion kminion/kminion

Let's customize helm values and set our brokers and sasl config:

image:
  tag: "master-dff9857"
deployment:  
  env:  
    secretKeyRefs:  
     - name: KAFKA_SASL_PASSWORD  
       secretName: redpanda-credentials-secret
       secretKey: kafka-sasl-password
kminion:  
  config:  
    kafka:  
      brokers:  
      - redpanda-0.redpanda.redpanda-ns.svc.cluster.local:9092
      - redpanda-n.redpanda.redpanda-ns.svc.cluster.local:9092
      clientId: "kminion"
      sasl:
        enabled: true
        username: "redpanda-user"
        mechanism: "SCRAM-SHA-256"
      minion:  
        consumerGroups:
          enabled: true
          scrapeMode: adminApi
          granularity: partition
        topics:  
          granularity: partition  
          allowedTopics: ["/.*/"]  
          infoMetric:  
            configKeys: ["cleanup.policy"]  
        logDirs:  
          enabled: true

Here's the part where we finally enjoy the help of the prometheus operator.

apiVersion: monitoring.coreos.com/v1  
kind: ServiceMonitor  
metadata:  
  name: kminion  
  namespace: monitoring-ns
  labels:  
    monitor: prometheus-redpanda  
spec:  
  endpoints:  
  - port: metrics  
    path: /metrics  
    honorLabels: true  
  namespaceSelector:  
    matchNames:  
    - redpanda-ns
  selector:  
    matchLabels:  
      app.kubernetes.io/instance: kminion  
      app.kubernetes.io/name: kminion

We created one more service monitor that will use the same prometheus instance. And since we already have it as datasource in Grafana, we just need to install dashboards to visualize the data collected by kminion. Fortunately, all the dashboards are published to the official grafana registry.

Kminion gives us full observability into our cluster and data. We could get realtime telemetry by enabling end-to-end testing where kminion would periodically produce and consume messages measuring the latency and uptime, but for some reason it didn't work for me. As I don't have much need for it yet, I tabled the idea to tinker it in future.

emaxerrno · 2021-10-23T17:39:54Z

emaxerrno
Oct 23, 2021
Maintainer

This is a great piece @rauanmayemir !!! Very cool

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Monitoring redpanda inside Kubernetes #2759

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Monitoring redpanda inside Kubernetes #2759

rauanmayemir Oct 23, 2021

Prometheus Operator

Prometheus instance

Service Monitor

Grafana

Kminion

Replies: 1 comment

emaxerrno Oct 23, 2021 Maintainer

rauanmayemir
Oct 23, 2021

emaxerrno
Oct 23, 2021
Maintainer