-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Etcd monitoring #9
Comments
Leader stats: Example leader stats: {
"leader": "54be60531c7f6892",
"followers": {
"3e0d82ced9501c94": {
"latency": {
"current": 0.00397,
"average": 0.22196975104523,
"standardDeviation": 8.8368979766015,
"minimum": 0.00124,
"maximum": 446.408304
},
"counts": {
"fail": 93,
"success": 2631
}
},
"8697515a5e606ffe": {
"latency": {
"current": 0.00322,
"average": 0.0069664165131983,
"standardDeviation": 0.012247814765413,
"minimum": 0.00139,
"maximum": 0.2215
},
"counts": {
"fail": 0,
"success": 3258
}
}
}
} https://coreos.com/etcd/docs/latest/api.html#leader-statistics |
This will need some thought, i wouldnt worry about it for the initial release of the kubernetes-app. Right now, snap is deployed to every node, and every snap instance runs the same task(s). For monitoring specific services (etcd, kube-api, elasticsearch, etc....) we would not need every snap instance to perform the checks. I am not sure the best way to tackle this, but it is more aligned with our long term strategy to have g.net be the repository for snap collector plugins and task manifests and associated dashboards. |
Monitoring etcd with Prometheus blog post: https://coreos.com/blog/developing-prometheus-alerts-for-etcd.html |
has anyone built a etcd grafana dashboard recently? The 178 is mostly out of date. I've tried updated the metrics names but not getting much input from this dashboard. Any thoughts on how an etcd dashboard should look like, the key metrics to display? |
Went ahead and built one based on the CoreOs etcd doc (https://coreos.com/etcd/docs/latest/metrics.html) Available here: https://grafana.com/dashboards/3070 Also available on github https://github.com/VinceMD/Grafana-Dashboards/blob/master/etcd-prometheus-dashboard.json |
Every k8s cluster has an etcd cluster. The important thing to measure is latency, especially for the leader.
There are a couple of resources for this:
The text was updated successfully, but these errors were encountered: