Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Etcd monitoring #9

Open
daniellee opened this issue Nov 4, 2016 · 5 comments
Open

Etcd monitoring #9

daniellee opened this issue Nov 4, 2016 · 5 comments

Comments

@daniellee
Copy link
Contributor

Every k8s cluster has an etcd cluster. The important thing to measure is latency, especially for the leader.
There are a couple of resources for this:

image

image

@daniellee
Copy link
Contributor Author

Leader stats: /v2/stats/leader
Follower stats with /v2/stats/self

Example leader stats:

{
  "leader": "54be60531c7f6892",
  "followers": {
    "3e0d82ced9501c94": {
      "latency": {
        "current": 0.00397,
        "average": 0.22196975104523,
        "standardDeviation": 8.8368979766015,
        "minimum": 0.00124,
        "maximum": 446.408304
      },
      "counts": {
        "fail": 93,
        "success": 2631
      }
    },
    "8697515a5e606ffe": {
      "latency": {
        "current": 0.00322,
        "average": 0.0069664165131983,
        "standardDeviation": 0.012247814765413,
        "minimum": 0.00139,
        "maximum": 0.2215
      },
      "counts": {
        "fail": 0,
        "success": 3258
      }
    }
  }
}

https://coreos.com/etcd/docs/latest/api.html#leader-statistics

@woodsaj
Copy link
Contributor

woodsaj commented Nov 4, 2016

This will need some thought, i wouldnt worry about it for the initial release of the kubernetes-app.

Right now, snap is deployed to every node, and every snap instance runs the same task(s). For monitoring specific services (etcd, kube-api, elasticsearch, etc....) we would not need every snap instance to perform the checks. I am not sure the best way to tackle this, but it is more aligned with our long term strategy to have g.net be the repository for snap collector plugins and task manifests and associated dashboards.

@daniellee
Copy link
Contributor Author

Monitoring etcd with Prometheus blog post: https://coreos.com/blog/developing-prometheus-alerts-for-etcd.html

@Vince-Cercury
Copy link

has anyone built a etcd grafana dashboard recently? The 178 is mostly out of date. I've tried updated the metrics names but not getting much input from this dashboard.

Any thoughts on how an etcd dashboard should look like, the key metrics to display?

@Vince-Cercury
Copy link

Went ahead and built one based on the CoreOs etcd doc (https://coreos.com/etcd/docs/latest/metrics.html)

Available here: https://grafana.com/dashboards/3070

Also available on github https://github.com/VinceMD/Grafana-Dashboards/blob/master/etcd-prometheus-dashboard.json

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants