TIG Stack

TIG (Telegraf/InfluxDB/Grafana) Stack

A TIG stack has been set up to monitor disk usage over time. There is a public Grafana dashboard (internal to Imperial) showing disk usage, RAM and CPU over time.

Disk usage of over 85% should trigger a Slack message in the #alerts channel of the aichemy Slack workspace. This is configured within Grafana directly (see below).

The docker-compose.yaml and other config files are on the server at /tig-stack/.

Example docker-compose.yaml:

version: '3.8'

services:
  influxdb:
    image: influxdb:2.7
    container_name: influxdb
    restart: unless-stopped
    ports:
      - "8086:8086"
    volumes:
      - influxdb_data:/var/lib/influxdb2
    environment:
      - DOCKER_INFLUXDB_INIT_MODE=setup
      - DOCKER_INFLUXDB_INIT_USERNAME=admin
      - DOCKER_INFLUXDB_INIT_PASSWORD=<redacted>
      - DOCKER_INFLUXDB_INIT_ORG=default-org
      - DOCKER_INFLUXDB_INIT_BUCKET=telegraf
      - DOCKER_INFLUXDB_INIT_ADMIN_TOKEN=<redacted>

  telegraf:
    image: telegraf:latest
    container_name: telegraf
    restart: unless-stopped
    depends_on:
      - influxdb
    volumes:
      - ./telegraf.conf:/etc/telegraf/telegraf.conf:ro
      - /:/hostfs:ro

  grafana:
    image: grafana/grafana-oss
    container_name: grafana
    restart: unless-stopped
    ports:
      - "3000:3000"
    volumes:
      - grafana_data:/var/lib/grafana

volumes:
  influxdb_data:
  grafana_data:

Example telegraf.conf:

[agent]
  interval = "10s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "10s"
  flush_jitter = "0s"
  precision = ""
  hostname = ""
  omit_hostname = false

[[outputs.influxdb_v2]]
  urls = ["http://influxdb:8086"]
  token = <redacted>
  organization = "default-org"
  bucket = "telegraf"

[[inputs.disk]]
  mount_points = ["/"]
  ignore_fs = ["tmpfs", "devtmpfs", "devfs"]
  fielddrop = ["inodes*"]

[[inputs.diskio]]
[[inputs.mem]]
[[inputs.system]]

[[inputs.cpu]]
  ## Whether to report per-cpu stats or not
  percpu = true
  ## Whether to report total system cpu stats or not
  totalcpu = true
  ## If true, collect raw CPU time metrics
  collect_cpu_time = false
  ## If true, compute and report the sum of all non-idle CPU states
  report_active = false

https://github.com/aichemy-hub

🏠 https://aichemy.ac.uk

🏠 Home

📈 NMR Facility

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TIG Stack

TIG (Telegraf/InfluxDB/Grafana) Stack

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally