Skip to content

Conversation

@0xinterface
Copy link
Contributor

This pull request introduces a major refactor to both the Alma and Debian modules to support remote telemetry and metrics/logs publishing via Grafana Alloy, replacing previous node-exporter-based metrics collection. The changes add new configuration options, update documentation, and implement the necessary file generation and templates to enable Alloy-based telemetry, including remote log and metrics endpoints. Additionally, the handling of remote files and system configuration files is improved for both modules.

Telemetry and Alloy Integration

  • Added a new telemetry variable (object with enabled, loki_addr, and prometheus_addr) to both Alma and Debian modules, allowing remote logs and metrics publishing via Grafana Alloy. This replaces the previous expose_metrics boolean and updates documentation accordingly (variables.tf, README.md). [1] [2] [3]
  • Added logic to generate Alloy configuration files (/etc/alloy/config.alloy) and service defaults (/etc/default/alloy), using new templates and conditional rendering based on telemetry settings. [1] [2] [3]
  • Added template for Alloy configuration (config.alloy.tftpl) with detailed blocks for metrics scraping, log collection, and remote write endpoints for Loki and Prometheus.

Remote File Handling

  • Improved handling of remote files: added logic to collect remote files from substrate definitions, fetch Alloy system extension from a remote artifact, and generate a script (fetch-remote-files.sh) to download and configure these files at boot. [1] [2] [3]

System Configuration Updates

  • Updated network configuration templates and file generation logic, including removal of the old static network template and introduction of new conditional systemd network and resolver configuration files for Debian. [1] [2]

File and Directory Generation Refactor

  • Refactored file and directory generation logic to correctly handle remote files (using URIs instead of base64 content) and ensure proper ordering and tagging for cloud-init.

Miscellaneous

  • Updated local variables and package installation logic to support the new telemetry approach and removed legacy node-exporter handling. [1] [2]

These changes collectively enable advanced telemetry and monitoring capabilities for Alma and Debian modules using Grafana Alloy, with flexible remote configuration and improved system setup.

@0xinterface 0xinterface force-pushed the modules/alma-debian-telemetry branch 2 times, most recently from cdd583e to bf88c7a Compare September 27, 2025 11:08
@0xinterface 0xinterface force-pushed the modules/alma-debian-telemetry branch from f9daabc to 3e1ab3f Compare November 3, 2025 14:46
Copilot AI review requested due to automatic review settings November 3, 2025 14:46
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request migrates from prometheus-node-exporter to Grafana Alloy for telemetry collection across Debian, AlmaLinux, and Flatcar modules. The key changes replace a simple boolean expose_metrics variable with a structured telemetry object that enables centralized logging to Loki and metrics to Prometheus.

  • Replaced expose_metrics boolean with structured telemetry object containing Loki and Prometheus endpoints
  • Introduced Grafana Alloy system extension deployment across all OS modules
  • Updated DNS configuration to use systemd-resolved with DNS-over-TLS support

Reviewed Changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
modules/debian/variables.tf Replaced expose_metrics with telemetry object variable, updated nameservers to support DNS-over-TLS, changed default tag from "ignition" to "cloud-init"
modules/debian/config.tf Added Alloy configuration files, systemd-resolved config, unified network configuration, and remote file fetching for Alloy extension
modules/debian/cloudinit.tf Added Alloy remote files and enabled systemd-sysext service
modules/debian/templates/*.tftpl Removed static.network template, added resolved.conf, default.network, and config.alloy templates
modules/alma/variables.tf Mirrored telemetry variable changes from Debian module
modules/alma/config.tf Added Alloy configuration and remote file fetching mechanism via systemd service
modules/alma/cloudinit.tf Enabled fetch-remote-files systemd service for Alloy extension download
modules/alma/templates/*.tftpl Added fetch-remote-files.sh script and config.alloy template, removed static.network
modules/*/README.md Updated documentation to reflect telemetry variable changes
.github/workflows/*.yml Restricted workflows to main branch only

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

owner = optional(string, "root")
group = optional(string, "root")
tags = optional(string, "ignition") # only ignition is specified for backward compatibility
tags = optional(string, "cloud-init") # only ignition is specified for backward compatibility
Copy link

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment states 'only ignition is specified for backward compatibility' but the default value was changed to 'cloud-init'. The comment is now misleading and should be updated to reflect the actual default or removed.

Suggested change
tags = optional(string, "cloud-init") # only ignition is specified for backward compatibility
tags = optional(string, "cloud-init")

Copilot uses AI. Check for mistakes.
loki_addr = string
prometheus_addr = string
})
description = "Whether to enable alloy logging to Loki endpoint, e.g. { enabled = true, loki_endpoint = 'https://loki.example.com/loki/api/v1/push' }"
Copy link

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example uses 'loki_endpoint' but the actual variable name is 'loki_addr'. The example should use the correct field name 'loki_addr'.

Suggested change
description = "Whether to enable alloy logging to Loki endpoint, e.g. { enabled = true, loki_endpoint = 'https://loki.example.com/loki/api/v1/push' }"
description = "Whether to enable alloy logging to Loki endpoint, e.g. { enabled = true, loki_addr = 'https://loki.example.com/loki/api/v1/push' }"

Copilot uses AI. Check for mistakes.
loki_addr = string
prometheus_addr = string
})
description = "Whether to enable alloy logging to Loki endpoint, e.g. { enabled = true, loki_endpoint = 'https://loki.example.com/loki/api/v1/push' }"
Copy link

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example uses 'loki_endpoint' but the actual variable name is 'loki_addr'. The example should use the correct field name 'loki_addr'.

Suggested change
description = "Whether to enable alloy logging to Loki endpoint, e.g. { enabled = true, loki_endpoint = 'https://loki.example.com/loki/api/v1/push' }"
description = "Whether to enable alloy logging to Loki endpoint, e.g. { enabled = true, loki_addr = 'https://loki.example.com/loki/api/v1/push' }"

Copilot uses AI. Check for mistakes.

// Define how to scrape metrics from the node_exporter
prometheus.scrape "integrations_node_exporter" {
scrape_interval = "15s"
Copy link

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The scrape_interval is hardcoded to 15s. Consider making this configurable via a template variable to allow users to adjust the collection frequency based on their needs.

Suggested change
scrape_interval = "15s"
scrape_interval = "${scrape_interval:-15s}"

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant