Full stack of tools for monitoring.
This stack is composed by:
- Netdata:
Real Time Monitoring
- Visualise real time system status - Prometheus
Data Base
- Store collected metrics - Docker
- Base container solution - cAdvisor
Container metrics exporter
- Expose metrics of your running containers - Grafana
Analytics plataform
- Allow you query and understand collected metrics - Node_Exporter
OS & Hardware metrics exporter
- Expose OS & Hardware metrics - AlertManager
- Handle alerts sent by Prometheus - Slack
- Easy integration between your teammates - BlackBox Exporter
Endpoint Probes
- Probing of multiple endpoints over HTTP, HTTPS, DNS, TCP and ICMP
To execute the steps bellow the following are necessary:
- Docker in Swarm mode
You can simply run the following to start your standalone swarm cluster:
$ docker swarm init --advertise-addr YOUR_HOST_IP_HERE
Swarm initialized: current node (dxn1zf6l61qsb1josjja83ngz) is now a manager.
To add a worker to this swarm, run the following command:
docker swarm join \
--token SWMTKN-1-49nj1cmql0jkz5s954yi3oex3nedyz0fb0xx14ie39trti4wxv-8vxv8rssmk743ojnwacrr2e7c \
To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.
Do not forget to replace "YOUR_HOST_IP_HERE" ;)
# git clone [email protected]:jeskz0rd/monitoring.git
cd monitoring
Since version 1.1 You can use Netdata as a Container running with the stack as global mode, exposing metrics from all your cluster straight to prometheus.
As described by titpetric in additional-notes
You will not get detailed application metrics (mysql, ups, etc.) from other containers or from the
host if running netdata in a container.
It may be possible to get some of those metrics, but it might not be easy, and most likely not worth it.
For most detailed metrics, netdata needs to share the same environment as the application server
it monitors. This means it would need to run either in the same container (not even remotely practical),
or in the same virtual machine (no containers).
If you do not intend to collect this kind of metrics through Netdata the containerized application fulfil the requirements and automates the deployment over your cluster, otherwise follow the steps bellow.
# bash <(curl -Ss https://my-netdata.io/kickstart.sh) all
# echo 1 >/sys/kernel/mm/ksm/run
# echo 1000 >/sys/kernel/mm/ksm/sleep_millisecs
# vim /conf/prometheus/prometheus.yml
- job_name: 'netdata'
metrics_path: '/api/v1/allmetrics'
format: [prometheus]
honor_labels: true
scrape_interval: 20s
- targets: ['YOUR_NETDATA_IP:19999']
Add the "prom" label to the Prometheus Swarm node.
docker node update YOUR_PROMETHEUS_SWARM_NODE --label-add "prom=true"
In Grafana 5.1> the default user id is 472 and as described in Grafana Documentation the steps bellow must be done to run it properly.
# docker container run --rm --user root --name grafana_temp -it -v ~/monitoring/volumes/grafana/data:/var/lib/grafana --entrypoint bash grafana/grafana:5.1.3
yet in the container you just started run the following:
$ chown -R root:root /etc/grafana && \
chmod -R a+r /etc/grafana && \
chown -R grafana:grafana /var/lib/grafana && \
chown -R grafana:grafana /usr/share/grafana && \
it takes a while changing the permissions...
Configure Prometheus to scrape http.
Add your http targets in the prometheus.yml
####### BLACKBOX MONITORING ########
- job_name: 'blackbox'
- http_2xx
scrape_interval: 30s
scrape_timeout: 10s
metrics_path: /probe
scheme: http
- targets:
- http://example.com
- http://www.example.com
- http://your.web.app
- http://your.web.app/check
# docker stack deploy -c docker-compose.yml monitoring
Check deployed services:
# docker service ls
ypjvzrdzs760 monitoring_alertmanager replicated 1/1 jesk/alertmanager_alpine:1.0 *:9093->9093/tcp
x54ardi5blgn monitoring_blackbox-exporter global 1/1 prom/blackbox-exporter:v0.12.0 *:9115->9115/tcp
nnpqv7k297g4 monitoring_cadvisor global 1/1 google/cadvisor:v0.30.0 *:8080->8080/tcp
gpn2qklfmra6 monitoring_grafana replicated 1/1 grafana/grafana:5.1.3 *:3000->3000/tcp
7xgth29zggfb monitoring_netdata global 1/1 firehol/netdata:alpine *:19999->19999/tcp
31q4t856ciua monitoring_node-exporter global 1/1 jesk/node-exporter_alpine:1.0 *:9100->9100/tcp
z2jd4eprumd8 monitoring_prometheus replicated 1/1 jesk/prometheus_alpine:1.0 *:9090->9090/tcp
Accessing Prometheus interface on browser:
Accessing AlertManager interface on browser:
Accessing Grafana interface on browser:
user: admin
passwd: admin
Accessing Netdata interface on browser:
Accessing Node_exporter metrics on browser:
Accessing Blackbox_exporter on browser:
Create a channel and add the API information about your Slack account
# vim /conf/alertmanager/config.yml
receiver: 'slack'
- name: 'slack'
- send_resolved: true
username: 'YOUR USERNAME'
channel: '#YOURCHANNEL'
All notable changes to this project will be documented in this file.