-
Notifications
You must be signed in to change notification settings - Fork 1
TIG Stack
Dan Davies edited this page Oct 1, 2025
·
5 revisions
A TIG stack has been set up to monitor disk usage over time. There is a public Grafana dashboard (internal to Imperial) showing disk usage, RAM and CPU over time.

Disk usage of over 85% should trigger a Slack message in the #alerts channel of the aichemy Slack workspace. This is configured within Grafana directly (see below).

The docker-compose.yaml and other config files are on the server at /tig-stack/.
Example docker-compose.yaml:
version: '3.8'
services:
influxdb:
image: influxdb:2.7
container_name: influxdb
restart: unless-stopped
ports:
- "8086:8086"
volumes:
- influxdb_data:/var/lib/influxdb2
environment:
- DOCKER_INFLUXDB_INIT_MODE=setup
- DOCKER_INFLUXDB_INIT_USERNAME=admin
- DOCKER_INFLUXDB_INIT_PASSWORD=<redacted>
- DOCKER_INFLUXDB_INIT_ORG=default-org
- DOCKER_INFLUXDB_INIT_BUCKET=telegraf
- DOCKER_INFLUXDB_INIT_ADMIN_TOKEN=<redacted>
telegraf:
image: telegraf:latest
container_name: telegraf
restart: unless-stopped
depends_on:
- influxdb
volumes:
- ./telegraf.conf:/etc/telegraf/telegraf.conf:ro
- /:/hostfs:ro
grafana:
image: grafana/grafana-oss
container_name: grafana
restart: unless-stopped
ports:
- "3000:3000"
volumes:
- grafana_data:/var/lib/grafana
volumes:
influxdb_data:
grafana_data:
Example telegraf.conf:
[agent]
interval = "10s"
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = "0s"
flush_interval = "10s"
flush_jitter = "0s"
precision = ""
hostname = ""
omit_hostname = false
[[outputs.influxdb_v2]]
urls = ["http://influxdb:8086"]
token = <redacted>
organization = "default-org"
bucket = "telegraf"
[[inputs.disk]]
mount_points = ["/"]
ignore_fs = ["tmpfs", "devtmpfs", "devfs"]
fielddrop = ["inodes*"]
[[inputs.diskio]]
[[inputs.mem]]
[[inputs.system]]
[[inputs.cpu]]
## Whether to report per-cpu stats or not
percpu = true
## Whether to report total system cpu stats or not
totalcpu = true
## If true, collect raw CPU time metrics
collect_cpu_time = false
## If true, compute and report the sum of all non-idle CPU states
report_active = false