Skip to content

Latest commit

 

History

History
108 lines (80 loc) · 4.57 KB

README.md

File metadata and controls

108 lines (80 loc) · 4.57 KB

Grafana Agent logo

Grafana Agent is a telemetry collector for sending metrics, logs, and trace data to the opinionated Grafana observability stack. It works best with:

Users of Prometheus operating at a massive scale (i.e., millions of active series) can struggle to run an unsharded singleton Prometheus instance: it becomes a single point of failure and requires a giant machine with a lot of resources allocated to it. Even with proper sharding across multiple Prometheus instances, using Prometheus to send data to a cloud vendor can seem redundant: why pay for cloud storage if data is already stored locally?

The Grafana Agent uses the same code as Prometheus, but tackles these issues by only using the most relevant parts of Prometheus for interaction with hosted metrics:

  1. Service Discovery
  2. Scraping
  3. Write Ahead Log (WAL)
  4. Remote Write

On top of these, the Grafana Agent enables easier sharding mechanisms that enable users to shard Agents across their cluster and lower the memory requirements per machine.

A typical deployment of the Grafana Agent for Prometheus metrics can see up to a 40% reduction in memory usage with equal scrape loads.

The Grafana Agent it can be used to send Prometheus metrics to any system that supports the Prometheus remote_write API.

Trade-offs

By heavily optimizing Prometheus for remote write and resource reduction, some trade-offs have been made:

  • You can't query the Agent; you can only query metrics from the remote write storage.
  • Recording rules aren't supported.
  • Alerts aren't supported.
  • When sharding the Agent, if your node has problems that interrupt metric availability, metrics tracking that node won't be sent for alerting on.

While the Agent can't use recording rules and alerts, remote_write systems such as Cortex currently support server-side rules and alerts. Note that this trade-off means that reliability of alerts are tied to the reliability of the remote system and alerts will be delayed at least by the time it takes for samples to reach the remote system.

Roadmap

  • Prometheus metrics
  • A second clustering mode to solve sharding monitoring availability problems.
  • Support for integrations (embedded exporters/automatic scrape configs)
  • Promtail for Loki logs
  • Tempo traces
  • carbon-relay-ng for Graphite metrics.
  • All-in-one installation script (metrics, logs, and traces)

Getting Started

When using Kubernetes this link offers the best guide.

Other installation methods can be found in our Production documentation.

More detailed documentation is provided as part of the repository.

Example

The example/ folder contains docker-compose configs and a local k3d/Tanka environment. Both examples deploy the Agent, Cortex and Grafana for testing the agent. See the docker-compose README and the k3d example README for more information.

Prometheus Vendoring

The Grafana Agent vendors a downstream Prometheus repository maintained by Grafana Labs. This is done so experimental features Grafana Labs wants to contribute upstream can first be tested and iterated on quickly within the Agent. We aim to always base our vendor off of a recent official Prometheus release and to keep the experimental changes not available in the upstream repository to a minimum.

Please refer to the pinned Prometheus Vendor Update Tracking issue for our current vendored Prometheus release.

For more context on our vendoring strategy, read our repo maintenance guide.

Getting Help

If you have any questions or feedback regarding the Grafana Agent: