Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Meta monitoring revamp #155

Draft
wants to merge 40 commits into
base: main
Choose a base branch
from

Conversation

Imshelledin21
Copy link

@Imshelledin21 Imshelledin21 commented Sep 25, 2024

Major overhaul of this chart to expand it's capabilities for meta-monitoring Grafana Products.
Alloy deployment split into 4 separate deployments.

  1. Alloy Metrics
  2. Alloy Logs
  3. Alloy Events
  4. Alloy Traces (configs pending)

The overall configs are relatively simple at this point in time, mostly so I could prove out the concept and have something functioning that could be built upon and further refined.

Alloy Metrics
Can be enabled or disabled
Configurable to enable scraping and remote-writing metrics from the following:

  • Grafana
  • Mimir
  • Loki
  • Tempo
  • Alloy
  • Self
  • Node Exporter
  • Kube State Metrics
  • cAdvisor

Currently each component is able to scrape metrics from a specified namespace or list of namespaces. Each components discovery/scrape configs are modular, creating a configmap for the component specific configs, and adding a config section to the alloy-metrics configmap, only if that component is enabled.

Alloy Logs
Can be enabled or disabled
Configurable to collect pod logs from a list of specified namespaces.

Alloy Events
Can be enabled or disabled
Configurable to collect kubernetes event logs from a list of specified namespaces

Alloy Traces
Can be enabled or disabled
Will be configured to collect OTEL/Jaeger traces

Next steps for Alloy configs

  1. Complete Alloy Traces implementation
  2. Allow further refining of target discovery per component based on kubernetes labels/annotations
  3. Allow further refinement of metrics/logs. i.e. keep metrics, regex for logs to drop.

Next steps for the chart overall

  1. Add ability to create PrometheusRule kubernetes resources for recording/alert rules for Mimir/Loki with the ability to have alloy load them into the Grafana Cloud or local Mimir and Loki instances.
  2. Put some work into the dashboards for Grafana/Mimir/Loki/Tempo/Alloy to refine them for this chart

- Included Grafana, Mimir and Tempo dashboards.
- Updated Loki Dashboards
- Reorganized dashboards by product
Added Mimir and Tempo recording rules
Updated values.yaml to enable metric collection of mimir and loki
Update metrics-config.yaml to utilize module configmaps
`Alloy` section dispersed to each Alloy Deployment.
Major cleanup of the helm chart and configurations.
minor clean up of various items
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant