-
Notifications
You must be signed in to change notification settings - Fork 0
Horizon Process Monitoring
Dave Wong edited this page Dec 16, 2020
·
2 revisions
- SSH into the worker you want to set up monitoring for
- Create this configuration
/etc/datadog-agent/conf.d/process.d/conf.yaml
- You need
sudo
to create that configuration
- You need
## All options defined here are available to all instances.
#
init_config:
## @param pid_cache_duration - integer - optional - default: 120
## Changes the check refresh rate of the matching pid list every X seconds except if it
## detects a change before. You might want to set it low if you want to
## alert on process service checks.
#
# pid_cache_duration: 120
## @param access_denied_cache_duration - integer - optional - default: 120
## The check maintains a list of PIDs for which it got access denied. It won't try to look at them again for the
## duration in seconds specified by access_denied_cache_duration.
#
# access_denied_cache_duration: 120
## @param shared_process_list_cache_duration - integer - optional - default: 120
## The check maintains a list of running processes shared among all instances, that is used to generate the
## matching pid list on each instance. It won't try to look at them again for the duration in seconds
## specified by shared_process_list_cache_duration.
#
# shared_process_list_cache_duration: 120
## @param procfs_path - string - optional
## Used to override the default procfs path, e.g. for docker containers with the outside fs mounted at /host/proc
## DEPRECATED: please specify `procfs_path` globally in `datadog.conf` instead
#
# procfs_path: /proc
## @param service - string - optional
## Attach the tag `service:<SERVICE>` to every metric, event, and service check emitted by this integration.
##
## Additionally, this sets the default `service` for every log source.
#
# service: <SERVICE>
## Every instance is scheduled independent of the others.
#
instances:
## @param name - string - required
## Used to uniquely identify your metrics as they are tagged with this name in Datadog.
#
- name: Horizon
## @param search_string - list of strings - optional
## If one of the elements in the list matches, it return the count of
## all the processes that match the string exactly by default. Change this behavior with the
## parameter `exact_match: false`.
##
## Note: One and only one of search_string, pid or pid_file must be specified per instance.
#
search_string:
- horizon
## @param exact_match - boolean - optional - default: true
## Matches your search_string on proc.name().
## If you want to match on a substring within proc.cmdline(), set this to false
## Regex is also supported when this flag is set to `false`.
##
## Note: agent v6.11+ on windows runs as an unprivileged `ddagentuser` that does not have acces to the full
## command line of processes running under a different user. This option cannot be used in such cases.
## https://docs.datadoghq.com/integrations/process/#configuration
#
exact_match: false
## @param thresholds - mapping - optional
## The threshold parameter is composed of two ranges: critical and warning
## * warning: (optional) List of two values: If the number of processes found is below the first value or
## above the second one, the process check returns WARNING. To make an semi-unbounded interval,
## use `.inf` for the upper bound.
## * critical: (optional) List of two values: If the number of processes found is below the first value or
## above the second one, the process check returns CRITICAL. To make an semi-unbounded interval,
## use `.inf` for the upper bound.
#
# thresholds:
# warning:
# - <BELOW_VALUE>
# - <TOP_VALUE>
# critical:
# - <BELOW_VALUE>
# - <TOP_VALUE>
## @param collect_children - boolean - optional - default: false
## If true, the check also collects metrics from all child processes of a matched process.
## Please be aware that the collection is recursive, and might take some time depending on the use case.
#
# collect_children: false
## @param tags - list of strings - optional
## A list of tags to attach to every metric and service check emitted by this instance.
##
## Learn more about tagging at https://docs.datadoghq.com/tagging
#
# tags:
# - <KEY_1>:<VALUE_1>
# - <KEY_2>:<VALUE_2>
## @param service - string - optional
## Attach the tag `service:<SERVICE>` to every metric, event, and service check emitted by this integration.
##
## Overrides any `service` defined in the `init_config` section.
#
# service: <SERVICE>
## @param min_collection_interval - number - optional - default: 15
## This changes the collection interval of the check. For more information, see:
## https://docs.datadoghq.com/developers/write_agent_check/#collection-interval
#
min_collection_interval: 60
- Restart the Datadog agent
sudo service datadog-agent restart
- On datadog go to the
Soapbox API - DO
dashboard - Copy and paste one of the
soapbox-worker-1XX Horizon
widgets - Edit the new widged (click pencil)
- In the reported by dropdown look for the new worker
- Sometimes it takes a few minutes for the new monitor to register on Datadog
- If after a few minutes you still don't see it check the logs on the server to see if there are any errors
tail -f /var/log/datadog/agent.log
- You can restart the agent then tail the logs to force the agent to redo all the checks