Etcd monitoring #9

daniellee · 2016-11-04T10:36:13Z

Every k8s cluster has an etcd cluster. The important thing to measure is latency, especially for the leader.
There are a couple of resources for this:

snap collector plugin for etcd - https://github.com/intelsdi-x/snap-plugin-collector-etcd
etcd metrics endpoint - https://coreos.com/etcd/docs/latest/metrics.html
datadog etcd integration as an example of what's possible - https://www.datadoghq.com/blog/monitor-etcd-performance/
etcd dashboard for prometheus on gnet - https://grafana.net/dashboards/178

daniellee · 2016-11-04T10:53:59Z

Leader stats: /v2/stats/leader
Follower stats with /v2/stats/self

Example leader stats:

{
  "leader": "54be60531c7f6892",
  "followers": {
    "3e0d82ced9501c94": {
      "latency": {
        "current": 0.00397,
        "average": 0.22196975104523,
        "standardDeviation": 8.8368979766015,
        "minimum": 0.00124,
        "maximum": 446.408304
      },
      "counts": {
        "fail": 93,
        "success": 2631
      }
    },
    "8697515a5e606ffe": {
      "latency": {
        "current": 0.00322,
        "average": 0.0069664165131983,
        "standardDeviation": 0.012247814765413,
        "minimum": 0.00139,
        "maximum": 0.2215
      },
      "counts": {
        "fail": 0,
        "success": 3258
      }
    }
  }
}

https://coreos.com/etcd/docs/latest/api.html#leader-statistics

woodsaj · 2016-11-04T17:27:23Z

This will need some thought, i wouldnt worry about it for the initial release of the kubernetes-app.

Right now, snap is deployed to every node, and every snap instance runs the same task(s). For monitoring specific services (etcd, kube-api, elasticsearch, etc....) we would not need every snap instance to perform the checks. I am not sure the best way to tackle this, but it is more aligned with our long term strategy to have g.net be the repository for snap collector plugins and task manifests and associated dashboards.

daniellee · 2016-11-17T12:16:15Z

Monitoring etcd with Prometheus blog post: https://coreos.com/blog/developing-prometheus-alerts-for-etcd.html

Vince-Cercury · 2017-09-01T02:06:03Z

has anyone built a etcd grafana dashboard recently? The 178 is mostly out of date. I've tried updated the metrics names but not getting much input from this dashboard.

Any thoughts on how an etcd dashboard should look like, the key metrics to display?

Vince-Cercury · 2017-09-01T06:00:09Z

Went ahead and built one based on the CoreOs etcd doc (https://coreos.com/etcd/docs/latest/metrics.html)

Available here: https://grafana.com/dashboards/3070

Also available on github https://github.com/VinceMD/Grafana-Dashboards/blob/master/etcd-prometheus-dashboard.json

daniellee added the enhancement label Feb 22, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Etcd monitoring #9

Etcd monitoring #9

daniellee commented Nov 4, 2016

daniellee commented Nov 4, 2016

woodsaj commented Nov 4, 2016

daniellee commented Nov 17, 2016

Vince-Cercury commented Sep 1, 2017

Vince-Cercury commented Sep 1, 2017

Etcd monitoring #9

Etcd monitoring #9

Comments

daniellee commented Nov 4, 2016

daniellee commented Nov 4, 2016

woodsaj commented Nov 4, 2016

daniellee commented Nov 17, 2016

Vince-Cercury commented Sep 1, 2017

Vince-Cercury commented Sep 1, 2017