Convert some RST pages to markdown

satterly · satterly · commit 31fa4299782c · 2023-03-21T00:36:29.000+01:00
diff --git a/README.md b/README.md
@@ -32,6 +32,7 @@ References
   * Alabaster https://github.com/bitprophet/alabaster
   * Pygments syntax highlighter https://pygments.org/languages/
   * MyST Parser https://myst-parser.readthedocs.io/en/latest/intro.html
+  * https://raw.githubusercontent.com/executablebooks/MyST-Parser/602470ebdaf81fbea999fcc0f0cf1b8e7784ec15/tests/test_renderers/fixtures/sphinx_roles.md
 
 License
 -------
diff --git a/source/about.md b/source/about.md
@@ -0,0 +1,37 @@
+## About
+
+Alerta started at [The Guardian](https://www.theguardian.com/media/2011/jan/28/wikileaks-julian-assange-alan-rusbridger)
+out of necessity as a replacement for a [legacy monitoring tool](https://www.quest.com/foglight/)
+but only after exhaustively evaluating [all](https://www.solarwinds.com/)
+[credible](https://www.nagios.org/) [alternatives](https://www.zabbix.com/)
+first.
+
+Initially all we wanted was to be able to create alert thresholds against the
+hundreds of thousands of [Ganglia metrics](https://github.com/ganglia/monitor-core/wiki)
+collected for the website and view the alerts in a web console ie. a Ganglia
+"alerter". Not having a proper name for this
+[metrics and monitoring system](https://www.theguardian.com/info/developer-blog/2012/oct/04/winning-the-metrics-battle)
+the working name of "an alerter" stuck and a simple homophone was chosen
+to aid future Google searches.
+
+In the end, the thresholding of metrics proved very difficult to scale so we
+eventually split the project in two and metric thresholding was given to Riemann
+(see [riemann-config](https://github.com/guardian/riemann-config)) and the
+alert correlation, de-duplication and visualisation became the "Alerta" project.
+
+Over the years the project has evolved to meet the constantly changing needs of
+the [Guardian developer teams](https://www.theguardian.com/info/2021/oct/29/running-a-post-decade-innovation-retrospective)
+as they moved to a more agile, dynamic, "[swimlaned](http://akfpartners.com/growth-blog/fault-isolative-architectures-or-swimlaning)"
+architecture which has meant, for the operations team, a shift from static,
+self-hosted infrastructure to an internal OpenStack cloud to then finally an external
+cloud service.
+
+In that time certain key features of Alerta have been deprecated as requirements
+changed (eg. the message bus, Ganglia, Riemann) and others added (eg. OAuth2 login,
+CloudWatch, Pingdom, PagerDuty integration). In the process it has been slimmed
+down to fewer core components making it easier to understand, deploy and manage.
+
+As a result, Alerta is now quite different to the somewhat "over engineered" initial
+solution but the core concepts of being a flexible, easy-to-use tool remain and
+it is now a "cloud-ready" solution adapted to the challenges of a fast changing
+environment.
diff --git a/source/about.rst b/source/about.rst
diff --git a/source/conventions.md b/source/conventions.md
@@ -0,0 +1,99 @@
+# Conventions
+
+Always favour convention over configuration. And any configuration
+should have sensible defaults.
+
+## Naming Conventions
+
+### Resources
+
+The key alert attribute name of `resource` was specifically chosen
+so as not to be host centric. A resource *can* be a hostname, but it
+might also be an EC2 instance ID, a Docker container ID or some other
+type of non-host unique identifier.
+
+### Environments & Services
+
+The environment attribute is used to [namespace](https://en.wikipedia.org/wiki/Namespace)
+the alert resource. This allows you to have two resources with the same
+name (eg. `web01`) but that are differentiated by their environments
+(eg. `Production` and `Development`).
+
+Choose a set of environments and enforce them. ie. `PROD`, `DEV`
+or `Production`, `Development` but not both. The same for services
+eg. `MobileAPI`, `Mobile-API` and `mobile api` are all valid
+but needlessly different and impossible to query for consistently
+or generate aggregate metrics for.
+
+Note that the **_service attribute is a list_** because it is common
+for infrastructure (ie. a resource) to be used by more than one service.
+That is, if a component failure occurs that problem could cause an
+outage in multiple services.
+
+### Event Names
+
+It can be useful to define a convention when it comes to naming
+events. Possible options are:
+
+* Camel case - `DiskUtilHigh`
+* Hierarchy - `NW:INTERFACE:DOWN`
+* SNMP - `cpuAlarmHigh`
+
+Querying for all Disk utilisation alerts using the `alerta` CLI
+is then relatively straight-forward::
+
+    $ alerta query --filter event=~DiskUtil
+
+### Event Groups
+
+Another consideration is to ensure you make use of the event group
+which gives you the ability to group related alerts.
+
+Some suggested event groups with possible events are listed below.
+
+| Event Groups       | Events (examples)                          |
+|--------------------|--------------------------------------------|
+| `Service`          | failures with entire services              |
+| `Application`      | errors from application logs               |
+| `OS`             | disk space, time sync failing                |
+| `Performance`    | system load, swap utilisation high           |
+| `Configuration`  | config mgmt tool alerts eg. Puppet or Chef   |
+| `Web`            | web server errors                            |
+| `Syslog`         | unix system log messages                     |
+| `Hardware`       | hardware errors                              |
+| `Storage`        | NFS, SAN, NAS storage infrastructure         |
+| `Database`       | database errors, table space utilisation     |
+| `Security`       | security/authorization messages              |
+| `Network`        | network devices and infrastructure           |
+| `Cloud`          | cloud-based services or infrastructure       |
+
+Querying for all performance-related alerts using the `alerta` CLI
+could then become::
+
+    $ alerta query --filter group=Performance
+
+### Severity Levels
+
+Agree on a subset of [severity levels](api/alert.rst#alert-severities) and
+be consistent with what they mean. For example, if severity levels are used
+consistently then integrating with a paging or email system becomes easier.
+
+| Severity     | Service Level                    | Notification                   |
+|--------------|----------------------------------|--------------------------------|
+| `critical`   | service unavailable              | immediate page out             |
+| `major`      | service impaired still available | page during business hours     |
+| `minor`      | component failure                | email only                     |
+| `warning`    | everything else                  | consolidate into daily email   |
+
+## Enforcing Conventions
+
+Once a set of naming conventions are agreed, they can be enforced by
+using a simple "pre-receive" plugin, similar to a [`git` hook](https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks).
+
+A full working example called [reject][reject] can be found in the plugins
+directory of the project code repository and is installed by default.
+The server configuration settings {envvar}`ORIGIN_BLACKLIST` and
+{envvar}`ALLOWED_ENVIRONMENTS` can be used to tailor it for your
+circumstances or it can be disabled completely.
+
+[reject]: https://github.com/alerta/alerta/blob/master/alerta/plugins/reject.py
diff --git a/source/conventions.rst b/source/conventions.rst
diff --git a/source/index.rst b/source/index.rst
@@ -45,15 +45,14 @@ The ``alerta`` command-line tool can also be used to generate alerts.
 The required API key is ``demo-key``.
 
 .. toctree::
-   :caption: Introduction
+   :caption: Basics
    :maxdepth: 2
    :hidden:
 
    quick-start
-   design
    server
-   webui
    cli
+   webui
    configuration
    authentication
    authorization
@@ -76,16 +75,32 @@ The required API key is ``demo-key``.
    :maxdepth: 2
    :hidden:
 
+   design
    conventions
    development
    Tutorials <tutorials>
    resources
+   faq
+   release-notes
+
+.. toctree::
+   :caption: API
+   :glob:
+   :maxdepth: 2
+   :hidden:
 
    api/reference
    api/query-syntax
    api/alert
    api/heartbeat
 
+.. toctree::
+   :caption: More
+   :maxdepth: 2
+   :hidden:
+
+   about
+
 Contribute
 ----------
 
@@ -102,25 +117,11 @@ Support
 * :ref:`Frequently Asked Questions <faq>`
 * Issue Tracker: https://github.com/alerta/alerta/issues
 
-.. toctree::
-   :caption: More
-   :maxdepth: 1
-   :hidden:
-
-   faq
-
 License
 -------
 
 This project is licensed under the Apache license, Version 2.0 .
 
-.. toctree::
-   :maxdepth: 2
-   :hidden:
-
-   release-notes
-   about
-
 Indices and tables
 ==================