Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logstash detectors module #327

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions docs/severity.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,7 @@
- [kubernetes-velero](#kubernetes-velero)
- [kubernetes-volumes](#kubernetes-volumes)
- [kubernetes-workloads-count](#kubernetes-workloads-count)
- [logstash](#logstash)
- [mdadm](#mdadm)
- [memcached](#memcached)
- [mongodb](#mongodb)
Expand Down Expand Up @@ -818,6 +819,20 @@
|Kubernetes workloads count|-|-|X|X|-|


## logstash

|Detector|Critical|Major|Minor|Warning|Info|
|---|---|---|---|---|---|
|Logstash heartbeat|X|-|-|-|-|
|Logstash events in high|-|-|X|X|-|
|Logstash events in low|-|-|X|X|-|
|Logstash events out high|-|-|X|X|-|
|Logstash events out low|-|-|X|X|-|
|Logstash cpu percent|-|-|X|X|-|
|Logstash queued events|-|-|X|X|-|
|Logstash queued disk|-|-|X|X|-|


## mdadm

|Detector|Critical|Major|Minor|Warning|Info|
Expand Down
142 changes: 142 additions & 0 deletions modules/smart-agent_logstash/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
# LOGSTASH SignalFx detectors

<!-- START doctoc generated TOC please keep comment here to allow auto update -->
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
:link: **Contents**

- [How to use this module?](#how-to-use-this-module)
- [What are the available detectors in this module?](#what-are-the-available-detectors-in-this-module)
- [How to collect required metrics?](#how-to-collect-required-metrics)
- [Monitors](#monitors)
- [Examples](#examples)
- [Metrics](#metrics)
- [Related documentation](#related-documentation)

<!-- END doctoc generated TOC please keep comment here to allow auto update -->

## How to use this module?

This directory defines a [Terraform](https://www.terraform.io/)
[module](https://www.terraform.io/docs/modules/usage.html) you can use in your
existing [stack](https://github.com/claranet/terraform-signalfx-detectors/wiki/Getting-started#stack) by adding a
`module` configuration and setting its `source` parameter to URL of this folder:

```hcl
module "signalfx-detectors-smart-agent-logstash" {
source = "github.com/chungktran/terraform-signalfx-detectors.git//modules/smart-agent_logstash?ref={revision}"

environment = var.environment
notifications = local.notifications
}
```

Note the following parameters:

* `source`: Use this parameter to specify the URL of the module. The double slash (`//`) is intentional and required.
Terraform uses it to specify subfolders within a Git repo (see [module
sources](https://www.terraform.io/docs/modules/sources.html)). The `ref` parameter specifies a specific Git tag in
this repository. It is recommended to use the latest "pinned" version in place of `{revision}`. Avoid using a branch
like `master` except for testing purpose. Note that every modules in this repository are available on the Terraform
[registry](https://registry.terraform.io/modules/claranet/detectors/signalfx) and we recommend using it as source
instead of `git` which is more flexible but less future-proof.

* `environment`: Use this parameter to specify the
[environment](https://github.com/claranet/terraform-signalfx-detectors/wiki/Getting-started#environment) used by this
instance of the module.
Its value will be added to the `prefixes` list at the start of the [detector
name](https://github.com/claranet/terraform-signalfx-detectors/wiki/Templating#example).
In general, it will also be used in the `filtering` internal sub-module to [apply
filters](https://github.com/claranet/terraform-signalfx-detectors/wiki/Guidance#filtering) based on our default
[tagging convention](https://github.com/claranet/terraform-signalfx-detectors/wiki/Tagging-convention) by default.

* `notifications`: Use this parameter to define where alerts should be sent depending on their severity. It consists
of a Terraform [object](https://www.terraform.io/docs/configuration/types.html#object-) where each key represents an
available [detector rule severity](https://docs.signalfx.com/en/latest/detect-alert/set-up-detectors.html#severity)
and its value is a list of recipients. Every recipients must respect the [detector notification
format](https://registry.terraform.io/providers/splunk-terraform/signalfx/latest/docs/resources/detector#notification-format).
Check the [notification binding](https://github.com/claranet/terraform-signalfx-detectors/wiki/Notifications-binding)
documentation to understand the recommended role of each severity.

These 3 parameters alongs with all variables defined in [common-variables.tf](common-variables.tf) are common to all
[modules](../) in this repository. Other variables, specific to this module, are available in
[variables-gen.tf](variables-gen.tf).
In general, the default configuration "works" but all of these Terraform
[variables](https://www.terraform.io/docs/configuration/variables.html) make it possible to
customize the detectors behavior to better fit your needs.

Most of them represent usual tips and rules detailled in the
[guidance](https://github.com/claranet/terraform-signalfx-detectors/wiki/Guidance) documentation and listed in the
common [variables](https://github.com/claranet/terraform-signalfx-detectors/wiki/Variables) dedicated documentation.

Feel free to explore the [wiki](https://github.com/claranet/terraform-signalfx-detectors/wiki) for more information about
general usage of this repository.

## What are the available detectors in this module?

This module creates the following SignalFx detectors which could contain one or multiple alerting rules:

|Detector|Critical|Major|Minor|Warning|Info|
|---|---|---|---|---|---|
|Logstash heartbeat|X|-|-|-|-|
|Logstash events in high|-|-|X|X|-|
|Logstash events in low|-|-|X|X|-|
|Logstash events out high|-|-|X|X|-|
|Logstash events out low|-|-|X|X|-|
|Logstash cpu percent|-|-|X|X|-|
|Logstash queued events|-|-|X|X|-|
|Logstash queued disk|-|-|X|X|-|

## How to collect required metrics?

This module uses metrics available from
[monitors](https://docs.signalfx.com/en/latest/integrations/agent/monitors/_monitor-config.html)
available in the [SignalFx Smart
Agent](https://github.com/signalfx/signalfx-agent). Check the [Related documentation](#related-documentation) section for more
information including the official documentation of this monitor.


Check the [integration
documentation](https://docs.signalfx.com/en/latest/integrations/agent/monitors/logstash.html)
in addition to the monitor one which it uses.

### Monitors

You have to enable the following `extraMetrics` in your monitor configuration:

* `node.stats.pipelines.queue.queue_size_in_bytes`

### Examples

```yaml
- type: logstash
extraMetrics:
- node.stats.pipelines.queue.queue_size_in_bytes
```


### Metrics


To filter only required metrics for the detectors of this module, add the
[datapointsToExclude](https://docs.signalfx.com/en/latest/integrations/agent/filtering.html) parameter to
the corresponding monitor configuration:

```yaml
datapointsToExclude:
- metricNames:
- '*'
- '!node.stats.events.events.in'
- '!node.stats.events.events.out'
- '!node.stats.pipelines.queue.events_count'
- '!node.stats.pipelines.queue.queue_size_in_bytes'
- '!node.stats.process.process.cpu.percent'

```



## Related documentation

* [Terraform SignalFx provider](https://registry.terraform.io/providers/splunk-terraform/signalfx/latest/docs)
* [Terraform SignalFx detector](https://registry.terraform.io/providers/splunk-terraform/signalfx/latest/docs/resources/detector)
* [Smart Agent monitor](https://docs.signalfx.com/en/latest/integrations/agent/monitors/logstash.html)
1 change: 1 addition & 0 deletions modules/smart-agent_logstash/common-filters.tf
1 change: 1 addition & 0 deletions modules/smart-agent_logstash/common-locals.tf
1 change: 1 addition & 0 deletions modules/smart-agent_logstash/common-modules.tf
1 change: 1 addition & 0 deletions modules/smart-agent_logstash/common-variables.tf
1 change: 1 addition & 0 deletions modules/smart-agent_logstash/common-versions.tf
12 changes: 12 additions & 0 deletions modules/smart-agent_logstash/conf/00-heartbeat.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
module: logstash
name: heartbeat

transformation: false
aggregation: true

signals:
signal:
metric: node.stats.events.events.in
rules:
critical:

21 changes: 21 additions & 0 deletions modules/smart-agent_logstash/conf/01-events_in_high.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
module: logstash
name: events in high

transformation: ".min(over='10m')"
aggregation: true

signals:
signal:
metric: node.stats.events.events.in
rollup: delta
rules:
warning:
description: is high
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would let the description undefined while it does not bring more valuable information than the default, this can apply to all other similar descriptions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to leave the description b/c the default says too high for both critical and major which to me is not right. When it's major I want it say high and when it's critical I want it to say too high which make perfect sense to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but no matter their severity both rules are too high compared to their respective threshold and that is true to everybody, not a personal meaning preference.

I would prefer to keep modules homogeneous to give a consistent behavior to the users.
That said we could open a request feature to create variables to give the user the ability to customize the rule description as he want.

threshold: 25000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as this detector thresholds highly depends on each environment, I would let threshold undefined to force the user to customize this according to his needs.

this can applied to all other detectors except may be the critical too low ones (equals to 0).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to the module to be as useful out of the box as much as possible without too much configuration. With your suggestion the end user will get error when doing terraform plan/apply and that's very un-friendly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree about the goal to make all modules useful out the box as much as possible and this is why I suggest this in addition to my other suggestion to prefer relative based detectors (like percentage).

There are 2 possibilities:

  • either the thresholds you choose have an universal or common logic so you can keep it but you just need to explain them (as Notes in readme, or as tip on detector, or mix them). e.g. on aws rds, the max connection is calculated depending on the instance type so we could specify a default value in this case which should match and work for any user who did not set a custom limit on their RDS.
  • or these thresholds have been choose from your environment, with your requirements, your needs and your criteria so they have no sens or legitimacy for others and so they should not have been set as default in a generic module

the modules should be as plug and play as possible, meaning a user can deploy detectors without to customize anything and it works, users are used to this.

if you let default values for thresholds which must always be customized this is misleading for the user and a trap because they will not notice the problem until the moment they do not get alert when they want to.

if you do not set default values, so these variables will be automatically added to the module example usage in the readme to inform the user these variables are mandatory and yes if they forget them they will get an explicit and understandable error to force them to set, so no bad surprise in future because they deployed detectors which will never work in their environment (in my opinion this is more un-friendly)

comparator: '>='
dependency: minor
minor:
description: is too high
threshold: 30000
comparator: '>='

21 changes: 21 additions & 0 deletions modules/smart-agent_logstash/conf/02-events_in_low.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
module: logstash
name: events in low

transformation: ".min(over='10m')"
aggregation: true

signals:
signal:
metric: node.stats.events.events.in
rollup: delta
rules:
warning:
description: is low
threshold: 100
comparator: '<='
dependency: minor
minor:
description: is too low
threshold: 0
comparator: '<='

21 changes: 21 additions & 0 deletions modules/smart-agent_logstash/conf/03-events_out_high.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
module: logstash
name: events out high

transformation: ".min(over='10m')"
aggregation: true

signals:
signal:
metric: node.stats.events.events.out
rollup: delta
rules:
warning:
description: is high
threshold: 25000
comparator: '>='
dependency: minor
minor:
description: is too high
threshold: 30000
comparator: '>='

21 changes: 21 additions & 0 deletions modules/smart-agent_logstash/conf/04-events_out_low.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
module: logstash
name: events out low

transformation: ".min(over='10m')"
aggregation: true

signals:
signal:
metric: node.stats.events.events.out
rollup: delta
rules:
warning:
description: is low
threshold: 100
comparator: '<='
dependency: minor
minor:
description: is too low
threshold: 0
comparator: '<='

19 changes: 19 additions & 0 deletions modules/smart-agent_logstash/conf/05-cpu_percent.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
module: logstash
name: cpu percent

transformation: ".min(over='10m')"
aggregation: true

signals:
signal:
metric: node.stats.process.process.cpu.percent
rules:
warning:
description: is high
threshold: 90
comparator: '>='
minor:
description: is too high
threshold: 100
comparator: '>='

21 changes: 21 additions & 0 deletions modules/smart-agent_logstash/conf/06-queued_events.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
module: logstash
name: queued events

transformation: ".min(over='10m')"
aggregation: true

signals:
signal:
metric: node.stats.pipelines.queue.events_count
rollup: latest
rules:
warning:
description: is high
threshold: 1000000
comparator: '>='
dependency: minor
minor:
description: is too high
threshold: 2000000
comparator: '>='

23 changes: 23 additions & 0 deletions modules/smart-agent_logstash/conf/07-queued_disk.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
module: logstash
name: queued disk

transformation: ".min(over='10m')"
aggregation: true

signals:
disk:
metric: node.stats.pipelines.queue.queue_size_in_bytes
rollup: latest
signal:
formula: (disk / 1000000)
rules:
warning:
description: is high
threshold: 8000
comparator: '>='
dependency: minor
minor:
description: is too high
threshold: 10000
comparator: '>='

23 changes: 23 additions & 0 deletions modules/smart-agent_logstash/conf/readme.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
documentations:
- name: Smart Agent monitor
url: 'https://docs.signalfx.com/en/latest/integrations/agent/monitors/logstash.html'

source_doc: |
Check the [integration
documentation](https://docs.signalfx.com/en/latest/integrations/agent/monitors/logstash.html)
in addition to the monitor one which it uses.

### Monitors

You have to enable the following `extraMetrics` in your monitor configuration:

* `node.stats.pipelines.queue.queue_size_in_bytes`

### Examples

```yaml
- type: logstash
extraMetrics:
- node.stats.pipelines.queue.queue_size_in_bytes
```

Loading