-
Notifications
You must be signed in to change notification settings - Fork 32
Tagging convention
🔗 Contents
The tagging is a practice that allow to bring context and detailed information to further describe the resource on which it is attached. The name (tags, dimensions, metadata, labels ..) could change depending on the tool but the goal remains always the same.
These tags are very common in the cloudnative (very dynamic) world and they could be useful for many things:
- identify, list, explore and discover resources
- compliance checks
- bulk automation
- define billing target
- set the scope of responsibility (which team, which oncall level ..)
- ...
Some of these tags are often used so well that they become common to all of our projects. That is when it represents a convention on which we can base to apply rule sets for different purpose.
In SignalFx this is called Metadata and they are often automatically synced from the source of metrics (like AWS tags, GCP labels ...).
On monitoring area, these tags are especially useful to determine the scope of each alert. This is even more true for next gen monitoring tools like SignalFx which monitor dynamic and high cardinality resources (like containers).
SignalFx provides rich filtering capabilities based on these tags (metadata) to make it possible to apply a detector to a specific set of resources and customize its behavior depending on their constraints.
This project brings pre-configured detectors which are filtered by default following a tagging convention suited to monitored metrics and resources. This default filtering policy is highly opinionated and you can:
- either add required tags to your resources (aws, gcp, azure, smart agent..) to match it
- or decide to override this default filtering by a specific one more suited to your existing tagging convention
It will use the environment variable as value to filter in time series in a detector signals for one specific environment.
The key used could depends on the source and its metadata but should end by env
.
It will limit the resources on which detectors apply or not so you see this tag as an "alerting" filter.
Indeed, even for a production environment we sometimes want to collect resources metrics but do not alert on them.
We use this tag as convention in the filtering policy of detectors modules to perform sort of "whitelisting" resources policy for alerting by default.
This is not mandatory while you can set a custom filtering when you import detectors modules and sometimes we cannot change or update tags on infrastructure side (like AWS EC2 instances).
Out of this main purpose this tag could be used to achieve other rules depending on your needs like:
- to automatically add resources to monitoring without to change monitoring configuration. Only have, to add this tag to your infrastructure resource when it is ready. This logic could be applied to other optional tags for a richer tagging convention like the on call level (BH/NBH), the reponsible team ... Each case will have its detector with the corresponding filters. Put the right tags on your resource to see the right detector to apply on it.
- to filter cloud VMs metrics collection and prevent over billing (see this module as example).
- to list all monitored resources from the SignalFx (or from original sources like the AWS account).
- to easily add/remove a resource to the alerting scope (similar to a muting rule). For example,
the CD could automatically change the value to
false
at the start of the deployment and restore it totrue
at the end (and as fallback in case of error), this will avoid alerts due to deployment known as causing short downtime.
-
identify and separate the environment because if we should monitor same things for every environments we also often want different configuration like notifications recipients for example.
-
define and delimit the scope of alerting on specific resources (i.e. apply detectors only on set of resources in the same account for multi parties or teams usage.
-
both
env
andsfx_monitored
represent the general policy but the default filtering could change per module for multiple factors. For example, GCP managed services labels are not synced as dimensions into SignalFx which make impossible to usegcp_label_env
as we did foraws_tag_env
and so we use theproject_id
dimension instead.
Dimensions are free in the agent except some very specific ones like host
.
So the keys are simply env
for environment name and sfx_monitored
for
monitored flag.
It is recommended to configure these dimensions on the agent at global level:
globalDimensions:
sfx_monitored: true
env: myEnv
Tags from cloud providers are also synced as dimensions in SignalFx but a prefix will be added changing its key. Also some cloud provider could have known limitations and use different default filtering than this targeted tagging convention.
Signalfx add the prefix aws_tag
to any tags synced from AWS
integration.
So keys are aws_tag_env
for environment name and aws_tag_sfx_monitored
for monitored flag.
Some rare services are known to not sync aws tags as SignalFx dimensions. In this case, their
corresponding module will use aws_account_id
instead of usual convention.
Signalfx add the prefix azure_tag
to any tags synced from Azure
integration.
So keys are azure_tag_env
for environment name and azure_tag_sfx_monitored
for monitored flag.
Signalfx add the prefix gcp_label
to any labels synced from GCP
integration.
So keys should be gcp_label_env
for environment name and gcp_label_sfx_monitored
for monitored flag. Sadly, SignalFx supports only 2 services for labels syncing:
Spanner and Storage bucket.
For GCE instances we have a workaround using GCP metadata instead of label so we can use gcp_metadata_sfx_env
and gcp_metadata_sfx_monitored
: https://github.com/claranet/terraform-signalfx-detectors/blob/master/modules/integration_gcp-compute-engine/modules.tf#L4.
But it is not possible to follow the tagging convention and its usual filtering policy for other services while SignalFx does not sync lables. This is why, we only use project_id
dimension as default filtering for now.