-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zookeeper duration #541
base: master
Are you sure you want to change the base?
Zookeeper duration #541
Conversation
@@ -0,0 +1,16 @@ | |||
module: zookeeper | |||
name: zookeeper-health |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name: zookeeper-health | |
name: health |
@@ -0,0 +1,22 @@ | |||
module: zookeeper | |||
name: zookeeper-latency |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name: zookeeper-latency | |
name: latency |
lasting_duration: "5m" | ||
latency_disabled: "false" | ||
major: | ||
threshold: 250000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical & major thresholds seems ta have too close values.
name: zookeeper-health | ||
transformation: false | ||
aggregation: true | ||
exclude_not_running_vm: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary ?
name: zookeeper-latency | ||
transformation: false | ||
aggregation: true | ||
exclude_not_running_vm: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary ?
@@ -26,7 +26,7 @@ EOF | |||
max_delay = var.heartbeat_max_delay | |||
} | |||
|
|||
resource "signalfx_detector" "zookeeper_health" { | |||
/*resource "signalfx_detector" "zookeeper_health" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to remove
@@ -44,7 +44,7 @@ variable "heartbeat_aggregation_function" { | |||
default = "" | |||
} | |||
|
|||
# zookeeper_health detector | |||
/*# zookeeper_health detector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to remove
@@ -0,0 +1,16 @@ | |||
module: zookeeper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This detector should aggregate on all servers in the cluster and trigger a major on loss of part of the servers (half ? third ?) and critical on loss of more than that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example :
signal = data('gauge.zk_service_health', filter=filter('env', 'preprod') and filter('sfx_monitored', 'true')).mean(by=['plugin_instance']).publish('signal')
detect(when(signal < 0.66, lasting='5m', at_least=1)).publish('CRIT')
detect(when(signal < 1, lasting='5m', at_least=1)).publish('MAJ')```
signal: | ||
metric: "gauge.zk_avg_latency" | ||
rules: | ||
critical: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that a high latency on one server should trigger a critical alter.
Maybe 2 detectors:
- one that trigger major and critical if all servers in a cluster have high latency
- one that trigger major for a single server high latency
Work in progress, still needs some cleanup, but the detector was split as recommended, and behavior is has expected |
Cleanup's details done. Please check our last changes and tell us if all is right now |
We also split server-health detector : one critical for cluster and one major for single server |
Hello, |
@@ -0,0 +1,14 @@ | |||
module: zookeeper | |||
name: server-health | |||
disabled: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed, this is the default value
@@ -0,0 +1,14 @@ | |||
module: zookeeper | |||
name: cluster-health | |||
disabled: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed, this is the default value
module: zookeeper | ||
name: server-latency | ||
aggregation: false | ||
disabled: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed, this is the default value
module: zookeeper | ||
name: cluster-latency | ||
aggregation: ".mean(by=['kubernetes_cluster'])" | ||
disabled: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed, this is the default value
comparator: ">" | ||
description: "Zookeeper cluster latency is too high" | ||
lasting_duration: "5m" | ||
latency_disabled: "false" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
latency_disabled variable does not exist, I think this line should be deleted
comparator: ">" | ||
description: "Zookeeper server latency is too high" | ||
lasting_duration: "5m" | ||
latency_disabled: "false" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
latency_disabled variable does not exist, I think this line should be deleted
comparator: "==" | ||
description: "Zookeeper cluster is not running" | ||
lasting_duration: "5m" | ||
health_disabled: "false" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
health_disabled variable does not exist, I think this line should be deleted
comparator: "!=" | ||
description: "Zookeeper server is not running" | ||
lasting_duration: "5m" | ||
health_disabled: "false" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
health_disabled variable does not exist, I think this line should be deleted
Hello, |
Hello, |
Any update please ? |
Add zokeeper-health and zookeeper-latency parameters