diff --git a/docs/severity.md b/docs/severity.md index 63a84a315..98d368b84 100644 --- a/docs/severity.md +++ b/docs/severity.md @@ -75,6 +75,7 @@ - [kubernetes-velero](#kubernetes-velero) - [kubernetes-volumes](#kubernetes-volumes) - [kubernetes-workloads-count](#kubernetes-workloads-count) +- [logstash](#logstash) - [mdadm](#mdadm) - [memcached](#memcached) - [mongodb](#mongodb) @@ -818,6 +819,20 @@ |Kubernetes workloads count|-|-|X|X|-| +## logstash + +|Detector|Critical|Major|Minor|Warning|Info| +|---|---|---|---|---|---| +|Logstash heartbeat|X|-|-|-|-| +|Logstash events in high|-|-|X|X|-| +|Logstash events in low|-|-|X|X|-| +|Logstash events out high|-|-|X|X|-| +|Logstash events out low|-|-|X|X|-| +|Logstash cpu percent|-|-|X|X|-| +|Logstash queued events|-|-|X|X|-| +|Logstash queued disk|-|-|X|X|-| + + ## mdadm |Detector|Critical|Major|Minor|Warning|Info| diff --git a/modules/smart-agent_logstash/README.md b/modules/smart-agent_logstash/README.md new file mode 100644 index 000000000..1f4a3eb09 --- /dev/null +++ b/modules/smart-agent_logstash/README.md @@ -0,0 +1,142 @@ +# LOGSTASH SignalFx detectors + + + +:link: **Contents** + +- [How to use this module?](#how-to-use-this-module) +- [What are the available detectors in this module?](#what-are-the-available-detectors-in-this-module) +- [How to collect required metrics?](#how-to-collect-required-metrics) + - [Monitors](#monitors) + - [Examples](#examples) + - [Metrics](#metrics) +- [Related documentation](#related-documentation) + + + +## How to use this module? + +This directory defines a [Terraform](https://www.terraform.io/) +[module](https://www.terraform.io/docs/modules/usage.html) you can use in your +existing [stack](https://github.com/claranet/terraform-signalfx-detectors/wiki/Getting-started#stack) by adding a +`module` configuration and setting its `source` parameter to URL of this folder: + +```hcl +module "signalfx-detectors-smart-agent-logstash" { + source = "github.com/chungktran/terraform-signalfx-detectors.git//modules/smart-agent_logstash?ref={revision}" + + environment = var.environment + notifications = local.notifications +} +``` + +Note the following parameters: + +* `source`: Use this parameter to specify the URL of the module. The double slash (`//`) is intentional and required. + Terraform uses it to specify subfolders within a Git repo (see [module + sources](https://www.terraform.io/docs/modules/sources.html)). The `ref` parameter specifies a specific Git tag in + this repository. It is recommended to use the latest "pinned" version in place of `{revision}`. Avoid using a branch + like `master` except for testing purpose. Note that every modules in this repository are available on the Terraform + [registry](https://registry.terraform.io/modules/claranet/detectors/signalfx) and we recommend using it as source + instead of `git` which is more flexible but less future-proof. + +* `environment`: Use this parameter to specify the + [environment](https://github.com/claranet/terraform-signalfx-detectors/wiki/Getting-started#environment) used by this + instance of the module. + Its value will be added to the `prefixes` list at the start of the [detector + name](https://github.com/claranet/terraform-signalfx-detectors/wiki/Templating#example). + In general, it will also be used in the `filtering` internal sub-module to [apply + filters](https://github.com/claranet/terraform-signalfx-detectors/wiki/Guidance#filtering) based on our default + [tagging convention](https://github.com/claranet/terraform-signalfx-detectors/wiki/Tagging-convention) by default. + +* `notifications`: Use this parameter to define where alerts should be sent depending on their severity. It consists + of a Terraform [object](https://www.terraform.io/docs/configuration/types.html#object-) where each key represents an + available [detector rule severity](https://docs.signalfx.com/en/latest/detect-alert/set-up-detectors.html#severity) + and its value is a list of recipients. Every recipients must respect the [detector notification + format](https://registry.terraform.io/providers/splunk-terraform/signalfx/latest/docs/resources/detector#notification-format). + Check the [notification binding](https://github.com/claranet/terraform-signalfx-detectors/wiki/Notifications-binding) + documentation to understand the recommended role of each severity. + +These 3 parameters alongs with all variables defined in [common-variables.tf](common-variables.tf) are common to all +[modules](../) in this repository. Other variables, specific to this module, are available in +[variables-gen.tf](variables-gen.tf). +In general, the default configuration "works" but all of these Terraform +[variables](https://www.terraform.io/docs/configuration/variables.html) make it possible to +customize the detectors behavior to better fit your needs. + +Most of them represent usual tips and rules detailled in the +[guidance](https://github.com/claranet/terraform-signalfx-detectors/wiki/Guidance) documentation and listed in the +common [variables](https://github.com/claranet/terraform-signalfx-detectors/wiki/Variables) dedicated documentation. + +Feel free to explore the [wiki](https://github.com/claranet/terraform-signalfx-detectors/wiki) for more information about +general usage of this repository. + +## What are the available detectors in this module? + +This module creates the following SignalFx detectors which could contain one or multiple alerting rules: + +|Detector|Critical|Major|Minor|Warning|Info| +|---|---|---|---|---|---| +|Logstash heartbeat|X|-|-|-|-| +|Logstash events in high|-|-|X|X|-| +|Logstash events in low|-|-|X|X|-| +|Logstash events out high|-|-|X|X|-| +|Logstash events out low|-|-|X|X|-| +|Logstash cpu percent|-|-|X|X|-| +|Logstash queued events|-|-|X|X|-| +|Logstash queued disk|-|-|X|X|-| + +## How to collect required metrics? + +This module uses metrics available from +[monitors](https://docs.signalfx.com/en/latest/integrations/agent/monitors/_monitor-config.html) +available in the [SignalFx Smart +Agent](https://github.com/signalfx/signalfx-agent). Check the [Related documentation](#related-documentation) section for more +information including the official documentation of this monitor. + + +Check the [integration +documentation](https://docs.signalfx.com/en/latest/integrations/agent/monitors/logstash.html) +in addition to the monitor one which it uses. + +### Monitors + +You have to enable the following `extraMetrics` in your monitor configuration: + +* `node.stats.pipelines.queue.queue_size_in_bytes` + +### Examples + +```yaml + - type: logstash + extraMetrics: + - node.stats.pipelines.queue.queue_size_in_bytes +``` + + +### Metrics + + +To filter only required metrics for the detectors of this module, add the +[datapointsToExclude](https://docs.signalfx.com/en/latest/integrations/agent/filtering.html) parameter to +the corresponding monitor configuration: + +```yaml + datapointsToExclude: + - metricNames: + - '*' + - '!node.stats.events.events.in' + - '!node.stats.events.events.out' + - '!node.stats.pipelines.queue.events_count' + - '!node.stats.pipelines.queue.queue_size_in_bytes' + - '!node.stats.process.process.cpu.percent' + +``` + + + +## Related documentation + +* [Terraform SignalFx provider](https://registry.terraform.io/providers/splunk-terraform/signalfx/latest/docs) +* [Terraform SignalFx detector](https://registry.terraform.io/providers/splunk-terraform/signalfx/latest/docs/resources/detector) +* [Smart Agent monitor](https://docs.signalfx.com/en/latest/integrations/agent/monitors/logstash.html) diff --git a/modules/smart-agent_logstash/common-filters.tf b/modules/smart-agent_logstash/common-filters.tf new file mode 120000 index 000000000..4df54e41e --- /dev/null +++ b/modules/smart-agent_logstash/common-filters.tf @@ -0,0 +1 @@ +../../common/module/filters-smart-agent.tf \ No newline at end of file diff --git a/modules/smart-agent_logstash/common-locals.tf b/modules/smart-agent_logstash/common-locals.tf new file mode 120000 index 000000000..5672d21ab --- /dev/null +++ b/modules/smart-agent_logstash/common-locals.tf @@ -0,0 +1 @@ +../../common/module/locals.tf \ No newline at end of file diff --git a/modules/smart-agent_logstash/common-modules.tf b/modules/smart-agent_logstash/common-modules.tf new file mode 120000 index 000000000..8c81ef377 --- /dev/null +++ b/modules/smart-agent_logstash/common-modules.tf @@ -0,0 +1 @@ +../../common/module/modules.tf \ No newline at end of file diff --git a/modules/smart-agent_logstash/common-variables.tf b/modules/smart-agent_logstash/common-variables.tf new file mode 120000 index 000000000..f3037a584 --- /dev/null +++ b/modules/smart-agent_logstash/common-variables.tf @@ -0,0 +1 @@ +../../common/module/variables.tf \ No newline at end of file diff --git a/modules/smart-agent_logstash/common-versions.tf b/modules/smart-agent_logstash/common-versions.tf new file mode 120000 index 000000000..fa7f5509f --- /dev/null +++ b/modules/smart-agent_logstash/common-versions.tf @@ -0,0 +1 @@ +../../common/module/versions.tf \ No newline at end of file diff --git a/modules/smart-agent_logstash/conf/00-heartbeat.yaml b/modules/smart-agent_logstash/conf/00-heartbeat.yaml new file mode 100644 index 000000000..4eaa4e883 --- /dev/null +++ b/modules/smart-agent_logstash/conf/00-heartbeat.yaml @@ -0,0 +1,12 @@ +module: logstash +name: heartbeat + +transformation: false +aggregation: true + +signals: + signal: + metric: node.stats.events.events.in +rules: + critical: + diff --git a/modules/smart-agent_logstash/conf/01-events_in_high.yaml b/modules/smart-agent_logstash/conf/01-events_in_high.yaml new file mode 100644 index 000000000..ef705b4ee --- /dev/null +++ b/modules/smart-agent_logstash/conf/01-events_in_high.yaml @@ -0,0 +1,21 @@ +module: logstash +name: events in high + +transformation: ".min(over='10m')" +aggregation: true + +signals: + signal: + metric: node.stats.events.events.in + rollup: delta +rules: + warning: + description: is high + threshold: 25000 + comparator: '>=' + dependency: minor + minor: + description: is too high + threshold: 30000 + comparator: '>=' + diff --git a/modules/smart-agent_logstash/conf/02-events_in_low.yaml b/modules/smart-agent_logstash/conf/02-events_in_low.yaml new file mode 100644 index 000000000..4f27ac6ed --- /dev/null +++ b/modules/smart-agent_logstash/conf/02-events_in_low.yaml @@ -0,0 +1,21 @@ +module: logstash +name: events in low + +transformation: ".min(over='10m')" +aggregation: true + +signals: + signal: + metric: node.stats.events.events.in + rollup: delta +rules: + warning: + description: is low + threshold: 100 + comparator: '<=' + dependency: minor + minor: + description: is too low + threshold: 0 + comparator: '<=' + diff --git a/modules/smart-agent_logstash/conf/03-events_out_high.yaml b/modules/smart-agent_logstash/conf/03-events_out_high.yaml new file mode 100644 index 000000000..7d8fe1505 --- /dev/null +++ b/modules/smart-agent_logstash/conf/03-events_out_high.yaml @@ -0,0 +1,21 @@ +module: logstash +name: events out high + +transformation: ".min(over='10m')" +aggregation: true + +signals: + signal: + metric: node.stats.events.events.out + rollup: delta +rules: + warning: + description: is high + threshold: 25000 + comparator: '>=' + dependency: minor + minor: + description: is too high + threshold: 30000 + comparator: '>=' + diff --git a/modules/smart-agent_logstash/conf/04-events_out_low.yaml b/modules/smart-agent_logstash/conf/04-events_out_low.yaml new file mode 100644 index 000000000..47bd4464a --- /dev/null +++ b/modules/smart-agent_logstash/conf/04-events_out_low.yaml @@ -0,0 +1,21 @@ +module: logstash +name: events out low + +transformation: ".min(over='10m')" +aggregation: true + +signals: + signal: + metric: node.stats.events.events.out + rollup: delta +rules: + warning: + description: is low + threshold: 100 + comparator: '<=' + dependency: minor + minor: + description: is too low + threshold: 0 + comparator: '<=' + diff --git a/modules/smart-agent_logstash/conf/05-cpu_percent.yaml b/modules/smart-agent_logstash/conf/05-cpu_percent.yaml new file mode 100644 index 000000000..8e5569803 --- /dev/null +++ b/modules/smart-agent_logstash/conf/05-cpu_percent.yaml @@ -0,0 +1,19 @@ +module: logstash +name: cpu percent + +transformation: ".min(over='10m')" +aggregation: true + +signals: + signal: + metric: node.stats.process.process.cpu.percent +rules: + warning: + description: is high + threshold: 90 + comparator: '>=' + minor: + description: is too high + threshold: 100 + comparator: '>=' + diff --git a/modules/smart-agent_logstash/conf/06-queued_events.yaml b/modules/smart-agent_logstash/conf/06-queued_events.yaml new file mode 100644 index 000000000..091a45783 --- /dev/null +++ b/modules/smart-agent_logstash/conf/06-queued_events.yaml @@ -0,0 +1,21 @@ +module: logstash +name: queued events + +transformation: ".min(over='10m')" +aggregation: true + +signals: + signal: + metric: node.stats.pipelines.queue.events_count + rollup: latest +rules: + warning: + description: is high + threshold: 1000000 + comparator: '>=' + dependency: minor + minor: + description: is too high + threshold: 2000000 + comparator: '>=' + diff --git a/modules/smart-agent_logstash/conf/07-queued_disk.yaml b/modules/smart-agent_logstash/conf/07-queued_disk.yaml new file mode 100644 index 000000000..80ea3fe20 --- /dev/null +++ b/modules/smart-agent_logstash/conf/07-queued_disk.yaml @@ -0,0 +1,23 @@ +module: logstash +name: queued disk + +transformation: ".min(over='10m')" +aggregation: true + +signals: + disk: + metric: node.stats.pipelines.queue.queue_size_in_bytes + rollup: latest + signal: + formula: (disk / 1000000) +rules: + warning: + description: is high + threshold: 8000 + comparator: '>=' + dependency: minor + minor: + description: is too high + threshold: 10000 + comparator: '>=' + diff --git a/modules/smart-agent_logstash/conf/readme.yaml b/modules/smart-agent_logstash/conf/readme.yaml new file mode 100644 index 000000000..0a5f48894 --- /dev/null +++ b/modules/smart-agent_logstash/conf/readme.yaml @@ -0,0 +1,23 @@ +documentations: + - name: Smart Agent monitor + url: 'https://docs.signalfx.com/en/latest/integrations/agent/monitors/logstash.html' + +source_doc: | + Check the [integration + documentation](https://docs.signalfx.com/en/latest/integrations/agent/monitors/logstash.html) + in addition to the monitor one which it uses. + + ### Monitors + + You have to enable the following `extraMetrics` in your monitor configuration: + + * `node.stats.pipelines.queue.queue_size_in_bytes` + + ### Examples + + ```yaml + - type: logstash + extraMetrics: + - node.stats.pipelines.queue.queue_size_in_bytes + ``` + diff --git a/modules/smart-agent_logstash/detectors-gen.tf b/modules/smart-agent_logstash/detectors-gen.tf new file mode 100644 index 000000000..406e89bbe --- /dev/null +++ b/modules/smart-agent_logstash/detectors-gen.tf @@ -0,0 +1,295 @@ +resource "signalfx_detector" "heartbeat" { + name = format("%s %s", local.detector_name_prefix, "Logstash heartbeat") + + authorized_writer_teams = var.authorized_writer_teams + teams = try(coalescelist(var.teams, var.authorized_writer_teams), null) + tags = compact(concat(local.common_tags, local.tags, var.extra_tags)) + + max_delay = 900 + + program_text = <<-EOF + from signalfx.detectors.not_reporting import not_reporting + signal = data('node.stats.events.events.in', filter=${module.filtering.signalflow})${var.heartbeat_aggregation_function}.publish('signal') + not_reporting.detector(stream=signal, resource_identifier=None, duration='${var.heartbeat_timeframe}', auto_resolve_after='${local.heartbeat_auto_resolve_after}').publish('CRIT') +EOF + + rule { + description = "has not reported in ${var.heartbeat_timeframe}" + severity = "Critical" + detect_label = "CRIT" + disabled = coalesce(var.heartbeat_disabled, var.detectors_disabled) + notifications = coalescelist(lookup(var.heartbeat_notifications, "critical", []), var.notifications.critical) + runbook_url = try(coalesce(var.heartbeat_runbook_url, var.runbook_url), "") + tip = var.heartbeat_tip + parameterized_subject = var.message_subject == "" ? local.rule_subject_novalue : var.message_subject + parameterized_body = var.message_body == "" ? local.rule_body : var.message_body + } +} + +resource "signalfx_detector" "events_in_high" { + name = format("%s %s", local.detector_name_prefix, "Logstash events in high") + + authorized_writer_teams = var.authorized_writer_teams + teams = try(coalescelist(var.teams, var.authorized_writer_teams), null) + tags = compact(concat(local.common_tags, local.tags, var.extra_tags)) + + program_text = <<-EOF + signal = data('node.stats.events.events.in', filter=${module.filtering.signalflow}, rollup='delta')${var.events_in_high_aggregation_function}${var.events_in_high_transformation_function}.publish('signal') + detect(when(signal >= ${var.events_in_high_threshold_warning}, lasting=%{if var.events_in_high_lasting_duration_warning == null}None%{else}'${var.events_in_high_lasting_duration_warning}'%{endif}, at_least=${var.events_in_high_at_least_percentage_warning}) and (not when(signal >= ${var.events_in_high_threshold_minor}, lasting=%{if var.events_in_high_lasting_duration_minor == null}None%{else}'${var.events_in_high_lasting_duration_minor}'%{endif}, at_least=${var.events_in_high_at_least_percentage_minor}))).publish('WARN') + detect(when(signal >= ${var.events_in_high_threshold_minor}, lasting=%{if var.events_in_high_lasting_duration_minor == null}None%{else}'${var.events_in_high_lasting_duration_minor}'%{endif}, at_least=${var.events_in_high_at_least_percentage_minor})).publish('MINOR') +EOF + + rule { + description = "is high >= ${var.events_in_high_threshold_warning}" + severity = "Warning" + detect_label = "WARN" + disabled = coalesce(var.events_in_high_disabled_warning, var.events_in_high_disabled, var.detectors_disabled) + notifications = coalescelist(lookup(var.events_in_high_notifications, "warning", []), var.notifications.warning) + runbook_url = try(coalesce(var.events_in_high_runbook_url, var.runbook_url), "") + tip = var.events_in_high_tip + parameterized_subject = var.message_subject == "" ? local.rule_subject : var.message_subject + parameterized_body = var.message_body == "" ? local.rule_body : var.message_body + } + + rule { + description = "is too high >= ${var.events_in_high_threshold_minor}" + severity = "Minor" + detect_label = "MINOR" + disabled = coalesce(var.events_in_high_disabled_minor, var.events_in_high_disabled, var.detectors_disabled) + notifications = coalescelist(lookup(var.events_in_high_notifications, "minor", []), var.notifications.minor) + runbook_url = try(coalesce(var.events_in_high_runbook_url, var.runbook_url), "") + tip = var.events_in_high_tip + parameterized_subject = var.message_subject == "" ? local.rule_subject : var.message_subject + parameterized_body = var.message_body == "" ? local.rule_body : var.message_body + } +} + +resource "signalfx_detector" "events_in_low" { + name = format("%s %s", local.detector_name_prefix, "Logstash events in low") + + authorized_writer_teams = var.authorized_writer_teams + teams = try(coalescelist(var.teams, var.authorized_writer_teams), null) + tags = compact(concat(local.common_tags, local.tags, var.extra_tags)) + + program_text = <<-EOF + signal = data('node.stats.events.events.in', filter=${module.filtering.signalflow}, rollup='delta')${var.events_in_low_aggregation_function}${var.events_in_low_transformation_function}.publish('signal') + detect(when(signal <= ${var.events_in_low_threshold_warning}, lasting=%{if var.events_in_low_lasting_duration_warning == null}None%{else}'${var.events_in_low_lasting_duration_warning}'%{endif}, at_least=${var.events_in_low_at_least_percentage_warning}) and (not when(signal <= ${var.events_in_low_threshold_minor}, lasting=%{if var.events_in_low_lasting_duration_minor == null}None%{else}'${var.events_in_low_lasting_duration_minor}'%{endif}, at_least=${var.events_in_low_at_least_percentage_minor}))).publish('WARN') + detect(when(signal <= ${var.events_in_low_threshold_minor}, lasting=%{if var.events_in_low_lasting_duration_minor == null}None%{else}'${var.events_in_low_lasting_duration_minor}'%{endif}, at_least=${var.events_in_low_at_least_percentage_minor})).publish('MINOR') +EOF + + rule { + description = "is low <= ${var.events_in_low_threshold_warning}" + severity = "Warning" + detect_label = "WARN" + disabled = coalesce(var.events_in_low_disabled_warning, var.events_in_low_disabled, var.detectors_disabled) + notifications = coalescelist(lookup(var.events_in_low_notifications, "warning", []), var.notifications.warning) + runbook_url = try(coalesce(var.events_in_low_runbook_url, var.runbook_url), "") + tip = var.events_in_low_tip + parameterized_subject = var.message_subject == "" ? local.rule_subject : var.message_subject + parameterized_body = var.message_body == "" ? local.rule_body : var.message_body + } + + rule { + description = "is too low <= ${var.events_in_low_threshold_minor}" + severity = "Minor" + detect_label = "MINOR" + disabled = coalesce(var.events_in_low_disabled_minor, var.events_in_low_disabled, var.detectors_disabled) + notifications = coalescelist(lookup(var.events_in_low_notifications, "minor", []), var.notifications.minor) + runbook_url = try(coalesce(var.events_in_low_runbook_url, var.runbook_url), "") + tip = var.events_in_low_tip + parameterized_subject = var.message_subject == "" ? local.rule_subject : var.message_subject + parameterized_body = var.message_body == "" ? local.rule_body : var.message_body + } +} + +resource "signalfx_detector" "events_out_high" { + name = format("%s %s", local.detector_name_prefix, "Logstash events out high") + + authorized_writer_teams = var.authorized_writer_teams + teams = try(coalescelist(var.teams, var.authorized_writer_teams), null) + tags = compact(concat(local.common_tags, local.tags, var.extra_tags)) + + program_text = <<-EOF + signal = data('node.stats.events.events.out', filter=${module.filtering.signalflow}, rollup='delta')${var.events_out_high_aggregation_function}${var.events_out_high_transformation_function}.publish('signal') + detect(when(signal >= ${var.events_out_high_threshold_warning}, lasting=%{if var.events_out_high_lasting_duration_warning == null}None%{else}'${var.events_out_high_lasting_duration_warning}'%{endif}, at_least=${var.events_out_high_at_least_percentage_warning}) and (not when(signal >= ${var.events_out_high_threshold_minor}, lasting=%{if var.events_out_high_lasting_duration_minor == null}None%{else}'${var.events_out_high_lasting_duration_minor}'%{endif}, at_least=${var.events_out_high_at_least_percentage_minor}))).publish('WARN') + detect(when(signal >= ${var.events_out_high_threshold_minor}, lasting=%{if var.events_out_high_lasting_duration_minor == null}None%{else}'${var.events_out_high_lasting_duration_minor}'%{endif}, at_least=${var.events_out_high_at_least_percentage_minor})).publish('MINOR') +EOF + + rule { + description = "is high >= ${var.events_out_high_threshold_warning}" + severity = "Warning" + detect_label = "WARN" + disabled = coalesce(var.events_out_high_disabled_warning, var.events_out_high_disabled, var.detectors_disabled) + notifications = coalescelist(lookup(var.events_out_high_notifications, "warning", []), var.notifications.warning) + runbook_url = try(coalesce(var.events_out_high_runbook_url, var.runbook_url), "") + tip = var.events_out_high_tip + parameterized_subject = var.message_subject == "" ? local.rule_subject : var.message_subject + parameterized_body = var.message_body == "" ? local.rule_body : var.message_body + } + + rule { + description = "is too high >= ${var.events_out_high_threshold_minor}" + severity = "Minor" + detect_label = "MINOR" + disabled = coalesce(var.events_out_high_disabled_minor, var.events_out_high_disabled, var.detectors_disabled) + notifications = coalescelist(lookup(var.events_out_high_notifications, "minor", []), var.notifications.minor) + runbook_url = try(coalesce(var.events_out_high_runbook_url, var.runbook_url), "") + tip = var.events_out_high_tip + parameterized_subject = var.message_subject == "" ? local.rule_subject : var.message_subject + parameterized_body = var.message_body == "" ? local.rule_body : var.message_body + } +} + +resource "signalfx_detector" "events_out_low" { + name = format("%s %s", local.detector_name_prefix, "Logstash events out low") + + authorized_writer_teams = var.authorized_writer_teams + teams = try(coalescelist(var.teams, var.authorized_writer_teams), null) + tags = compact(concat(local.common_tags, local.tags, var.extra_tags)) + + program_text = <<-EOF + signal = data('node.stats.events.events.out', filter=${module.filtering.signalflow}, rollup='delta')${var.events_out_low_aggregation_function}${var.events_out_low_transformation_function}.publish('signal') + detect(when(signal <= ${var.events_out_low_threshold_warning}, lasting=%{if var.events_out_low_lasting_duration_warning == null}None%{else}'${var.events_out_low_lasting_duration_warning}'%{endif}, at_least=${var.events_out_low_at_least_percentage_warning}) and (not when(signal <= ${var.events_out_low_threshold_minor}, lasting=%{if var.events_out_low_lasting_duration_minor == null}None%{else}'${var.events_out_low_lasting_duration_minor}'%{endif}, at_least=${var.events_out_low_at_least_percentage_minor}))).publish('WARN') + detect(when(signal <= ${var.events_out_low_threshold_minor}, lasting=%{if var.events_out_low_lasting_duration_minor == null}None%{else}'${var.events_out_low_lasting_duration_minor}'%{endif}, at_least=${var.events_out_low_at_least_percentage_minor})).publish('MINOR') +EOF + + rule { + description = "is low <= ${var.events_out_low_threshold_warning}" + severity = "Warning" + detect_label = "WARN" + disabled = coalesce(var.events_out_low_disabled_warning, var.events_out_low_disabled, var.detectors_disabled) + notifications = coalescelist(lookup(var.events_out_low_notifications, "warning", []), var.notifications.warning) + runbook_url = try(coalesce(var.events_out_low_runbook_url, var.runbook_url), "") + tip = var.events_out_low_tip + parameterized_subject = var.message_subject == "" ? local.rule_subject : var.message_subject + parameterized_body = var.message_body == "" ? local.rule_body : var.message_body + } + + rule { + description = "is too low <= ${var.events_out_low_threshold_minor}" + severity = "Minor" + detect_label = "MINOR" + disabled = coalesce(var.events_out_low_disabled_minor, var.events_out_low_disabled, var.detectors_disabled) + notifications = coalescelist(lookup(var.events_out_low_notifications, "minor", []), var.notifications.minor) + runbook_url = try(coalesce(var.events_out_low_runbook_url, var.runbook_url), "") + tip = var.events_out_low_tip + parameterized_subject = var.message_subject == "" ? local.rule_subject : var.message_subject + parameterized_body = var.message_body == "" ? local.rule_body : var.message_body + } +} + +resource "signalfx_detector" "cpu_percent" { + name = format("%s %s", local.detector_name_prefix, "Logstash cpu percent") + + authorized_writer_teams = var.authorized_writer_teams + teams = try(coalescelist(var.teams, var.authorized_writer_teams), null) + tags = compact(concat(local.common_tags, local.tags, var.extra_tags)) + + program_text = <<-EOF + signal = data('node.stats.process.process.cpu.percent', filter=${module.filtering.signalflow})${var.cpu_percent_aggregation_function}${var.cpu_percent_transformation_function}.publish('signal') + detect(when(signal >= ${var.cpu_percent_threshold_warning}, lasting=%{if var.cpu_percent_lasting_duration_warning == null}None%{else}'${var.cpu_percent_lasting_duration_warning}'%{endif}, at_least=${var.cpu_percent_at_least_percentage_warning})).publish('WARN') + detect(when(signal >= ${var.cpu_percent_threshold_minor}, lasting=%{if var.cpu_percent_lasting_duration_minor == null}None%{else}'${var.cpu_percent_lasting_duration_minor}'%{endif}, at_least=${var.cpu_percent_at_least_percentage_minor})).publish('MINOR') +EOF + + rule { + description = "is high >= ${var.cpu_percent_threshold_warning}" + severity = "Warning" + detect_label = "WARN" + disabled = coalesce(var.cpu_percent_disabled_warning, var.cpu_percent_disabled, var.detectors_disabled) + notifications = coalescelist(lookup(var.cpu_percent_notifications, "warning", []), var.notifications.warning) + runbook_url = try(coalesce(var.cpu_percent_runbook_url, var.runbook_url), "") + tip = var.cpu_percent_tip + parameterized_subject = var.message_subject == "" ? local.rule_subject : var.message_subject + parameterized_body = var.message_body == "" ? local.rule_body : var.message_body + } + + rule { + description = "is too high >= ${var.cpu_percent_threshold_minor}" + severity = "Minor" + detect_label = "MINOR" + disabled = coalesce(var.cpu_percent_disabled_minor, var.cpu_percent_disabled, var.detectors_disabled) + notifications = coalescelist(lookup(var.cpu_percent_notifications, "minor", []), var.notifications.minor) + runbook_url = try(coalesce(var.cpu_percent_runbook_url, var.runbook_url), "") + tip = var.cpu_percent_tip + parameterized_subject = var.message_subject == "" ? local.rule_subject : var.message_subject + parameterized_body = var.message_body == "" ? local.rule_body : var.message_body + } +} + +resource "signalfx_detector" "queued_events" { + name = format("%s %s", local.detector_name_prefix, "Logstash queued events") + + authorized_writer_teams = var.authorized_writer_teams + teams = try(coalescelist(var.teams, var.authorized_writer_teams), null) + tags = compact(concat(local.common_tags, local.tags, var.extra_tags)) + + program_text = <<-EOF + signal = data('node.stats.pipelines.queue.events_count', filter=${module.filtering.signalflow}, rollup='latest')${var.queued_events_aggregation_function}${var.queued_events_transformation_function}.publish('signal') + detect(when(signal >= ${var.queued_events_threshold_warning}, lasting=%{if var.queued_events_lasting_duration_warning == null}None%{else}'${var.queued_events_lasting_duration_warning}'%{endif}, at_least=${var.queued_events_at_least_percentage_warning}) and (not when(signal >= ${var.queued_events_threshold_minor}, lasting=%{if var.queued_events_lasting_duration_minor == null}None%{else}'${var.queued_events_lasting_duration_minor}'%{endif}, at_least=${var.queued_events_at_least_percentage_minor}))).publish('WARN') + detect(when(signal >= ${var.queued_events_threshold_minor}, lasting=%{if var.queued_events_lasting_duration_minor == null}None%{else}'${var.queued_events_lasting_duration_minor}'%{endif}, at_least=${var.queued_events_at_least_percentage_minor})).publish('MINOR') +EOF + + rule { + description = "is high >= ${var.queued_events_threshold_warning}" + severity = "Warning" + detect_label = "WARN" + disabled = coalesce(var.queued_events_disabled_warning, var.queued_events_disabled, var.detectors_disabled) + notifications = coalescelist(lookup(var.queued_events_notifications, "warning", []), var.notifications.warning) + runbook_url = try(coalesce(var.queued_events_runbook_url, var.runbook_url), "") + tip = var.queued_events_tip + parameterized_subject = var.message_subject == "" ? local.rule_subject : var.message_subject + parameterized_body = var.message_body == "" ? local.rule_body : var.message_body + } + + rule { + description = "is too high >= ${var.queued_events_threshold_minor}" + severity = "Minor" + detect_label = "MINOR" + disabled = coalesce(var.queued_events_disabled_minor, var.queued_events_disabled, var.detectors_disabled) + notifications = coalescelist(lookup(var.queued_events_notifications, "minor", []), var.notifications.minor) + runbook_url = try(coalesce(var.queued_events_runbook_url, var.runbook_url), "") + tip = var.queued_events_tip + parameterized_subject = var.message_subject == "" ? local.rule_subject : var.message_subject + parameterized_body = var.message_body == "" ? local.rule_body : var.message_body + } +} + +resource "signalfx_detector" "queued_disk" { + name = format("%s %s", local.detector_name_prefix, "Logstash queued disk") + + authorized_writer_teams = var.authorized_writer_teams + teams = try(coalescelist(var.teams, var.authorized_writer_teams), null) + tags = compact(concat(local.common_tags, local.tags, var.extra_tags)) + + program_text = <<-EOF + disk = data('node.stats.pipelines.queue.queue_size_in_bytes', filter=${module.filtering.signalflow}, rollup='latest')${var.queued_disk_aggregation_function}${var.queued_disk_transformation_function} + signal = (disk / 1000000).publish('signal') + detect(when(signal >= ${var.queued_disk_threshold_warning}, lasting=%{if var.queued_disk_lasting_duration_warning == null}None%{else}'${var.queued_disk_lasting_duration_warning}'%{endif}, at_least=${var.queued_disk_at_least_percentage_warning}) and (not when(signal >= ${var.queued_disk_threshold_minor}, lasting=%{if var.queued_disk_lasting_duration_minor == null}None%{else}'${var.queued_disk_lasting_duration_minor}'%{endif}, at_least=${var.queued_disk_at_least_percentage_minor}))).publish('WARN') + detect(when(signal >= ${var.queued_disk_threshold_minor}, lasting=%{if var.queued_disk_lasting_duration_minor == null}None%{else}'${var.queued_disk_lasting_duration_minor}'%{endif}, at_least=${var.queued_disk_at_least_percentage_minor})).publish('MINOR') +EOF + + rule { + description = "is high >= ${var.queued_disk_threshold_warning}" + severity = "Warning" + detect_label = "WARN" + disabled = coalesce(var.queued_disk_disabled_warning, var.queued_disk_disabled, var.detectors_disabled) + notifications = coalescelist(lookup(var.queued_disk_notifications, "warning", []), var.notifications.warning) + runbook_url = try(coalesce(var.queued_disk_runbook_url, var.runbook_url), "") + tip = var.queued_disk_tip + parameterized_subject = var.message_subject == "" ? local.rule_subject : var.message_subject + parameterized_body = var.message_body == "" ? local.rule_body : var.message_body + } + + rule { + description = "is too high >= ${var.queued_disk_threshold_minor}" + severity = "Minor" + detect_label = "MINOR" + disabled = coalesce(var.queued_disk_disabled_minor, var.queued_disk_disabled, var.detectors_disabled) + notifications = coalescelist(lookup(var.queued_disk_notifications, "minor", []), var.notifications.minor) + runbook_url = try(coalesce(var.queued_disk_runbook_url, var.runbook_url), "") + tip = var.queued_disk_tip + parameterized_subject = var.message_subject == "" ? local.rule_subject : var.message_subject + parameterized_body = var.message_body == "" ? local.rule_body : var.message_body + } +} + diff --git a/modules/smart-agent_logstash/outputs.tf b/modules/smart-agent_logstash/outputs.tf new file mode 100644 index 000000000..a5f192ba4 --- /dev/null +++ b/modules/smart-agent_logstash/outputs.tf @@ -0,0 +1,40 @@ +output "cpu_percent" { + description = "Detector resource for cpu_percent" + value = signalfx_detector.cpu_percent +} + +output "events_in_high" { + description = "Detector resource for events_in_high" + value = signalfx_detector.events_in_high +} + +output "events_in_low" { + description = "Detector resource for events_in_low" + value = signalfx_detector.events_in_low +} + +output "events_out_high" { + description = "Detector resource for events_out_high" + value = signalfx_detector.events_out_high +} + +output "events_out_low" { + description = "Detector resource for events_out_low" + value = signalfx_detector.events_out_low +} + +output "heartbeat" { + description = "Detector resource for heartbeat" + value = signalfx_detector.heartbeat +} + +output "queued_disk" { + description = "Detector resource for queued_disk" + value = signalfx_detector.queued_disk +} + +output "queued_events" { + description = "Detector resource for queued_events" + value = signalfx_detector.queued_events +} + diff --git a/modules/smart-agent_logstash/tags.tf b/modules/smart-agent_logstash/tags.tf new file mode 100644 index 000000000..3397396a6 --- /dev/null +++ b/modules/smart-agent_logstash/tags.tf @@ -0,0 +1,4 @@ +locals { + tags = ["smart-agent", "logstash"] +} + diff --git a/modules/smart-agent_logstash/variables-gen.tf b/modules/smart-agent_logstash/variables-gen.tf new file mode 100644 index 000000000..5b682b59a --- /dev/null +++ b/modules/smart-agent_logstash/variables-gen.tf @@ -0,0 +1,626 @@ +# heartbeat detector + +variable "heartbeat_notifications" { + description = "Notification recipients list per severity overridden for heartbeat detector" + type = map(list(string)) + default = {} +} + +variable "heartbeat_aggregation_function" { + description = "Aggregation function and group by for heartbeat detector (i.e. \".mean(by=['host'])\")" + type = string + default = "" +} + +variable "heartbeat_tip" { + description = "Suggested first course of action or any note useful for incident handling" + type = string + default = "" +} + +variable "heartbeat_runbook_url" { + description = "URL like SignalFx dashboard or wiki page which can help to troubleshoot the incident cause" + type = string + default = "" +} + +variable "heartbeat_disabled" { + description = "Disable all alerting rules for heartbeat detector" + type = bool + default = null +} + +variable "heartbeat_timeframe" { + description = "Timeframe for heartbeat detector (i.e. \"10m\")" + type = string + default = "20m" +} + +# events_in_high detector + +variable "events_in_high_notifications" { + description = "Notification recipients list per severity overridden for events_in_high detector" + type = map(list(string)) + default = {} +} + +variable "events_in_high_aggregation_function" { + description = "Aggregation function and group by for events_in_high detector (i.e. \".mean(by=['host'])\")" + type = string + default = "" +} + +variable "events_in_high_transformation_function" { + description = "Transformation function for events_in_high detector (i.e. \".mean(over='5m')\")" + type = string + default = ".min(over='10m')" +} + +variable "events_in_high_tip" { + description = "Suggested first course of action or any note useful for incident handling" + type = string + default = "" +} + +variable "events_in_high_runbook_url" { + description = "URL like SignalFx dashboard or wiki page which can help to troubleshoot the incident cause" + type = string + default = "" +} + +variable "events_in_high_disabled" { + description = "Disable all alerting rules for events_in_high detector" + type = bool + default = null +} + +variable "events_in_high_disabled_warning" { + description = "Disable warning alerting rule for events_in_high detector" + type = bool + default = null +} + +variable "events_in_high_disabled_minor" { + description = "Disable minor alerting rule for events_in_high detector" + type = bool + default = null +} + +variable "events_in_high_threshold_warning" { + description = "Warning threshold for events_in_high detector" + type = number + default = 25000 +} + +variable "events_in_high_lasting_duration_warning" { + description = "Minimum duration that conditions must be true before raising alert" + type = string + default = null +} + +variable "events_in_high_at_least_percentage_warning" { + description = "Percentage of lasting that conditions must be true before raising alert (>= 0.0 and <= 1.0)" + type = number + default = 1 +} +variable "events_in_high_threshold_minor" { + description = "Minor threshold for events_in_high detector" + type = number + default = 30000 +} + +variable "events_in_high_lasting_duration_minor" { + description = "Minimum duration that conditions must be true before raising alert" + type = string + default = null +} + +variable "events_in_high_at_least_percentage_minor" { + description = "Percentage of lasting that conditions must be true before raising alert (>= 0.0 and <= 1.0)" + type = number + default = 1 +} +# events_in_low detector + +variable "events_in_low_notifications" { + description = "Notification recipients list per severity overridden for events_in_low detector" + type = map(list(string)) + default = {} +} + +variable "events_in_low_aggregation_function" { + description = "Aggregation function and group by for events_in_low detector (i.e. \".mean(by=['host'])\")" + type = string + default = "" +} + +variable "events_in_low_transformation_function" { + description = "Transformation function for events_in_low detector (i.e. \".mean(over='5m')\")" + type = string + default = ".min(over='10m')" +} + +variable "events_in_low_tip" { + description = "Suggested first course of action or any note useful for incident handling" + type = string + default = "" +} + +variable "events_in_low_runbook_url" { + description = "URL like SignalFx dashboard or wiki page which can help to troubleshoot the incident cause" + type = string + default = "" +} + +variable "events_in_low_disabled" { + description = "Disable all alerting rules for events_in_low detector" + type = bool + default = null +} + +variable "events_in_low_disabled_warning" { + description = "Disable warning alerting rule for events_in_low detector" + type = bool + default = null +} + +variable "events_in_low_disabled_minor" { + description = "Disable minor alerting rule for events_in_low detector" + type = bool + default = null +} + +variable "events_in_low_threshold_warning" { + description = "Warning threshold for events_in_low detector" + type = number + default = 100 +} + +variable "events_in_low_lasting_duration_warning" { + description = "Minimum duration that conditions must be true before raising alert" + type = string + default = null +} + +variable "events_in_low_at_least_percentage_warning" { + description = "Percentage of lasting that conditions must be true before raising alert (>= 0.0 and <= 1.0)" + type = number + default = 1 +} +variable "events_in_low_threshold_minor" { + description = "Minor threshold for events_in_low detector" + type = number + default = 0 +} + +variable "events_in_low_lasting_duration_minor" { + description = "Minimum duration that conditions must be true before raising alert" + type = string + default = null +} + +variable "events_in_low_at_least_percentage_minor" { + description = "Percentage of lasting that conditions must be true before raising alert (>= 0.0 and <= 1.0)" + type = number + default = 1 +} +# events_out_high detector + +variable "events_out_high_notifications" { + description = "Notification recipients list per severity overridden for events_out_high detector" + type = map(list(string)) + default = {} +} + +variable "events_out_high_aggregation_function" { + description = "Aggregation function and group by for events_out_high detector (i.e. \".mean(by=['host'])\")" + type = string + default = "" +} + +variable "events_out_high_transformation_function" { + description = "Transformation function for events_out_high detector (i.e. \".mean(over='5m')\")" + type = string + default = ".min(over='10m')" +} + +variable "events_out_high_tip" { + description = "Suggested first course of action or any note useful for incident handling" + type = string + default = "" +} + +variable "events_out_high_runbook_url" { + description = "URL like SignalFx dashboard or wiki page which can help to troubleshoot the incident cause" + type = string + default = "" +} + +variable "events_out_high_disabled" { + description = "Disable all alerting rules for events_out_high detector" + type = bool + default = null +} + +variable "events_out_high_disabled_warning" { + description = "Disable warning alerting rule for events_out_high detector" + type = bool + default = null +} + +variable "events_out_high_disabled_minor" { + description = "Disable minor alerting rule for events_out_high detector" + type = bool + default = null +} + +variable "events_out_high_threshold_warning" { + description = "Warning threshold for events_out_high detector" + type = number + default = 25000 +} + +variable "events_out_high_lasting_duration_warning" { + description = "Minimum duration that conditions must be true before raising alert" + type = string + default = null +} + +variable "events_out_high_at_least_percentage_warning" { + description = "Percentage of lasting that conditions must be true before raising alert (>= 0.0 and <= 1.0)" + type = number + default = 1 +} +variable "events_out_high_threshold_minor" { + description = "Minor threshold for events_out_high detector" + type = number + default = 30000 +} + +variable "events_out_high_lasting_duration_minor" { + description = "Minimum duration that conditions must be true before raising alert" + type = string + default = null +} + +variable "events_out_high_at_least_percentage_minor" { + description = "Percentage of lasting that conditions must be true before raising alert (>= 0.0 and <= 1.0)" + type = number + default = 1 +} +# events_out_low detector + +variable "events_out_low_notifications" { + description = "Notification recipients list per severity overridden for events_out_low detector" + type = map(list(string)) + default = {} +} + +variable "events_out_low_aggregation_function" { + description = "Aggregation function and group by for events_out_low detector (i.e. \".mean(by=['host'])\")" + type = string + default = "" +} + +variable "events_out_low_transformation_function" { + description = "Transformation function for events_out_low detector (i.e. \".mean(over='5m')\")" + type = string + default = ".min(over='10m')" +} + +variable "events_out_low_tip" { + description = "Suggested first course of action or any note useful for incident handling" + type = string + default = "" +} + +variable "events_out_low_runbook_url" { + description = "URL like SignalFx dashboard or wiki page which can help to troubleshoot the incident cause" + type = string + default = "" +} + +variable "events_out_low_disabled" { + description = "Disable all alerting rules for events_out_low detector" + type = bool + default = null +} + +variable "events_out_low_disabled_warning" { + description = "Disable warning alerting rule for events_out_low detector" + type = bool + default = null +} + +variable "events_out_low_disabled_minor" { + description = "Disable minor alerting rule for events_out_low detector" + type = bool + default = null +} + +variable "events_out_low_threshold_warning" { + description = "Warning threshold for events_out_low detector" + type = number + default = 100 +} + +variable "events_out_low_lasting_duration_warning" { + description = "Minimum duration that conditions must be true before raising alert" + type = string + default = null +} + +variable "events_out_low_at_least_percentage_warning" { + description = "Percentage of lasting that conditions must be true before raising alert (>= 0.0 and <= 1.0)" + type = number + default = 1 +} +variable "events_out_low_threshold_minor" { + description = "Minor threshold for events_out_low detector" + type = number + default = 0 +} + +variable "events_out_low_lasting_duration_minor" { + description = "Minimum duration that conditions must be true before raising alert" + type = string + default = null +} + +variable "events_out_low_at_least_percentage_minor" { + description = "Percentage of lasting that conditions must be true before raising alert (>= 0.0 and <= 1.0)" + type = number + default = 1 +} +# cpu_percent detector + +variable "cpu_percent_notifications" { + description = "Notification recipients list per severity overridden for cpu_percent detector" + type = map(list(string)) + default = {} +} + +variable "cpu_percent_aggregation_function" { + description = "Aggregation function and group by for cpu_percent detector (i.e. \".mean(by=['host'])\")" + type = string + default = "" +} + +variable "cpu_percent_transformation_function" { + description = "Transformation function for cpu_percent detector (i.e. \".mean(over='5m')\")" + type = string + default = ".min(over='10m')" +} + +variable "cpu_percent_tip" { + description = "Suggested first course of action or any note useful for incident handling" + type = string + default = "" +} + +variable "cpu_percent_runbook_url" { + description = "URL like SignalFx dashboard or wiki page which can help to troubleshoot the incident cause" + type = string + default = "" +} + +variable "cpu_percent_disabled" { + description = "Disable all alerting rules for cpu_percent detector" + type = bool + default = null +} + +variable "cpu_percent_disabled_warning" { + description = "Disable warning alerting rule for cpu_percent detector" + type = bool + default = null +} + +variable "cpu_percent_disabled_minor" { + description = "Disable minor alerting rule for cpu_percent detector" + type = bool + default = null +} + +variable "cpu_percent_threshold_warning" { + description = "Warning threshold for cpu_percent detector" + type = number + default = 90 +} + +variable "cpu_percent_lasting_duration_warning" { + description = "Minimum duration that conditions must be true before raising alert" + type = string + default = null +} + +variable "cpu_percent_at_least_percentage_warning" { + description = "Percentage of lasting that conditions must be true before raising alert (>= 0.0 and <= 1.0)" + type = number + default = 1 +} +variable "cpu_percent_threshold_minor" { + description = "Minor threshold for cpu_percent detector" + type = number + default = 100 +} + +variable "cpu_percent_lasting_duration_minor" { + description = "Minimum duration that conditions must be true before raising alert" + type = string + default = null +} + +variable "cpu_percent_at_least_percentage_minor" { + description = "Percentage of lasting that conditions must be true before raising alert (>= 0.0 and <= 1.0)" + type = number + default = 1 +} +# queued_events detector + +variable "queued_events_notifications" { + description = "Notification recipients list per severity overridden for queued_events detector" + type = map(list(string)) + default = {} +} + +variable "queued_events_aggregation_function" { + description = "Aggregation function and group by for queued_events detector (i.e. \".mean(by=['host'])\")" + type = string + default = "" +} + +variable "queued_events_transformation_function" { + description = "Transformation function for queued_events detector (i.e. \".mean(over='5m')\")" + type = string + default = ".min(over='10m')" +} + +variable "queued_events_tip" { + description = "Suggested first course of action or any note useful for incident handling" + type = string + default = "" +} + +variable "queued_events_runbook_url" { + description = "URL like SignalFx dashboard or wiki page which can help to troubleshoot the incident cause" + type = string + default = "" +} + +variable "queued_events_disabled" { + description = "Disable all alerting rules for queued_events detector" + type = bool + default = null +} + +variable "queued_events_disabled_warning" { + description = "Disable warning alerting rule for queued_events detector" + type = bool + default = null +} + +variable "queued_events_disabled_minor" { + description = "Disable minor alerting rule for queued_events detector" + type = bool + default = null +} + +variable "queued_events_threshold_warning" { + description = "Warning threshold for queued_events detector" + type = number + default = 1000000 +} + +variable "queued_events_lasting_duration_warning" { + description = "Minimum duration that conditions must be true before raising alert" + type = string + default = null +} + +variable "queued_events_at_least_percentage_warning" { + description = "Percentage of lasting that conditions must be true before raising alert (>= 0.0 and <= 1.0)" + type = number + default = 1 +} +variable "queued_events_threshold_minor" { + description = "Minor threshold for queued_events detector" + type = number + default = 2000000 +} + +variable "queued_events_lasting_duration_minor" { + description = "Minimum duration that conditions must be true before raising alert" + type = string + default = null +} + +variable "queued_events_at_least_percentage_minor" { + description = "Percentage of lasting that conditions must be true before raising alert (>= 0.0 and <= 1.0)" + type = number + default = 1 +} +# queued_disk detector + +variable "queued_disk_notifications" { + description = "Notification recipients list per severity overridden for queued_disk detector" + type = map(list(string)) + default = {} +} + +variable "queued_disk_aggregation_function" { + description = "Aggregation function and group by for queued_disk detector (i.e. \".mean(by=['host'])\")" + type = string + default = "" +} + +variable "queued_disk_transformation_function" { + description = "Transformation function for queued_disk detector (i.e. \".mean(over='5m')\")" + type = string + default = ".min(over='10m')" +} + +variable "queued_disk_tip" { + description = "Suggested first course of action or any note useful for incident handling" + type = string + default = "" +} + +variable "queued_disk_runbook_url" { + description = "URL like SignalFx dashboard or wiki page which can help to troubleshoot the incident cause" + type = string + default = "" +} + +variable "queued_disk_disabled" { + description = "Disable all alerting rules for queued_disk detector" + type = bool + default = null +} + +variable "queued_disk_disabled_warning" { + description = "Disable warning alerting rule for queued_disk detector" + type = bool + default = null +} + +variable "queued_disk_disabled_minor" { + description = "Disable minor alerting rule for queued_disk detector" + type = bool + default = null +} + +variable "queued_disk_threshold_warning" { + description = "Warning threshold for queued_disk detector" + type = number + default = 8000 +} + +variable "queued_disk_lasting_duration_warning" { + description = "Minimum duration that conditions must be true before raising alert" + type = string + default = null +} + +variable "queued_disk_at_least_percentage_warning" { + description = "Percentage of lasting that conditions must be true before raising alert (>= 0.0 and <= 1.0)" + type = number + default = 1 +} +variable "queued_disk_threshold_minor" { + description = "Minor threshold for queued_disk detector" + type = number + default = 10000 +} + +variable "queued_disk_lasting_duration_minor" { + description = "Minimum duration that conditions must be true before raising alert" + type = string + default = null +} + +variable "queued_disk_at_least_percentage_minor" { + description = "Percentage of lasting that conditions must be true before raising alert (>= 0.0 and <= 1.0)" + type = number + default = 1 +}