Skip to content

Commit

Permalink
Merge pull request #349 from stackhpc/monitoring-docs
Browse files Browse the repository at this point in the history
alertmanager example and docs
  • Loading branch information
markgoddard authored Jan 31, 2023
2 parents 751045a + 452f6af commit dc8d331
Show file tree
Hide file tree
Showing 2 changed files with 99 additions and 3 deletions.
55 changes: 52 additions & 3 deletions doc/source/configuration/monitoring.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,20 @@ Monitoring Configuration

StackHPC kayobe config includes a reference monitoring and alerting stack based
on Prometheus, Alertmanager, Grafana, Fluentd, Elasticsearch & Kibana. These
services by default come enabled and configured. Central Elasticsearch cluster
collects OpenStack logs, with an option to receive operating system logs too.
In order to enable this, execute custom playbook after deployment:
services by default come enabled and configured.

Monitoring hosts, usually the controllers, should be added to the monitoring
group. The group definition can be applied in various different places. For
example, this configuration could be added to etc/kayobe/inventory/groups:

.. code-block:: yaml
[monitoring:children]
controllers
Central Elasticsearch cluster collects OpenStack logs, with an option to receive
operating system logs too. In order to enable this, execute custom playbook
after deployment:

.. code-block:: console
Expand Down Expand Up @@ -78,3 +89,41 @@ on the overcloud hosts:
SMART reporting should now be enabled along with a Prometheus alert for
unhealthy disks and a Grafana dashboard called ``Hardware Overview``.

Alertmanager and Slack
======================

StackHPC Kayobe configuration comes bundled with an array of alerts but does not
enable any receivers for notifications by default. Various receivers can be
configured for Alertmanager. Slack is currently the most common.

To set up a receiver, create a ``prometheus-alertmanager.yml`` file under
``etc/kayobe/kolla/config/prometheus/``. An example config is stored in this
directory. The example configuration uses two Slack channels. One channel
receives all alerts while the other only receives alerts tagged as critical. It
also adds a silence button to temporarily mute alerts. To use the example in a
deployment, you will need to generate two webhook URLs, one for each channel.

To generate a slack webhook, `create a new app
<https://api.slack.com/apps/new>`__ in the workspace you want to add alerts to.
From the Features page, toggle Activate incoming webhooks on. Click Add new
webhook to workspace. Pick a channel that the app will post to, then click
Authorise. You only need one app to generate both webhooks.

Both URLs should be encrypted using ansible vault, as they give anyone access to
your slack channels. The standard practice is to store them in
``kayobe/secrets.yml`` as:

.. code-block:: yaml
secrets_slack_notification_channel_url: <some_webhook_url>
secrets_slack_critical_notification_channel_url: <some_other_webhook_url>
These should then be set as the ``slack_api_url`` and ``api_url`` for the
regular and critical alerts channels respectively. Both slack channel names will
need to be set, and the proxy URL sould be set or removed.

If you want to add an alerting rule, there are many good examples of alerts are
available `here <https://awesome-prometheus-alerts.grep.to/>`__. They simply
need to be added to one of the ``*.rules`` files in the prometheus configuration
directory.
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
---
global:
resolve_timeout: 5m
smtp_require_tls: true
slack_api_url: 'https://hooks.slack.com/services/example/alerts/webhook'

route:
receiver: 'slack-notifications'
group_by: [alertname]
group_wait: 30s
group_interval: 5m
repeat_interval: 4h

routes:
- matchers:
- severity=~"critical|alert"
receiver: 'slack-critical-notifications'


receivers:
- name: 'slack-notifications'
slack_configs:
- channel: '#notifications'
actions:
- type: button
text: 'Silence 🔕'
url: {{ '{% raw %}' }}{% raw %} '{{ template "__alert_silence_link" . }}'
{% endraw %}{{ '{% endraw %}' }}
send_resolved: true
http_config:
proxy_url: http://1.2.3.4:3128
- name: 'slack-critical-notifications'
slack_configs:
- channel: '#notifications-critical'
actions:
- type: button
text: 'Silence 🔕'
url: {{ '{% raw %}' }}{% raw %} '{{ template "__alert_silence_link" . }}'
{% endraw %}{{ '{% endraw %}' }}
send_resolved: true
http_config:
proxy_url: http://1.2.3.4:3128
api_url: 'https://hooks.slack.com/services/example/alerts/webhook-critical'


templates:
- '/etc/prometheus/*.tmpl'

0 comments on commit dc8d331

Please sign in to comment.