Skip to content

Conversation

@adrian-salas
Copy link

@adrian-salas adrian-salas commented Nov 28, 2025

Added caution about Kubernetes API saturation risk when running Alloy as a DaemonSet, along with recommended solutions

PR Description

Which issue(s) this PR fixes

Proposition for #4787 #4793

PR Checklist

  • CHANGELOG.md updated
  • Documentation added
  • Tests updated
  • Config converters updated

Added caution about Kubernetes API saturation risk when running Alloy as a DaemonSet, along with recommended solutions
Comment on lines +18 to 37
{{< admonition type="caution" >}}
**Kubernetes API Saturation Risk**

When running Alloy as a DaemonSet with the default configuration, **each Alloy pod will watch logs for all pods in the cluster**.
This means if you have 20 nodes, each Alloy pod will watch every pod's logs, resulting in substantial load on the Kubernetes API and cluster resources.
On large or resource-constrained clusters, this can cause excessive API requests, memory usage, and may even prevent new objects from being created.

#### Recommended Solutions

- **Restrict Alloy pods to only collect logs for pods on their local node.**
See the example in the [Limit to only Pods on the same node](#limit-to-only-pods-on-the-same-node) section below for a configuration snippet that uses label selectors and environment variables to achieve this.
- **Clustering mode:** For larger deployments, consider setting up Alloy in clustering mode.
- **Monitor resource consumption:** Regularly check API server throttling, memory usage, and inflight requests, especially on cloud-managed clusters (e.g., Azure AKS).

Failure to properly configure Alloy can result in degraded cluster performance, increased cloud costs, and operational risk.
Please review your configuration carefully and consult the examples below.
{{< /admonition >}}

If you supply no connection information, this component defaults to an in-cluster configuration.
A kubeconfig file or manual connection settings can be used to override the defaults.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{{< admonition type="caution" >}}
**Kubernetes API Saturation Risk**
When running Alloy as a DaemonSet with the default configuration, **each Alloy pod will watch logs for all pods in the cluster**.
This means if you have 20 nodes, each Alloy pod will watch every pod's logs, resulting in substantial load on the Kubernetes API and cluster resources.
On large or resource-constrained clusters, this can cause excessive API requests, memory usage, and may even prevent new objects from being created.
#### Recommended Solutions
- **Restrict Alloy pods to only collect logs for pods on their local node.**
See the example in the [Limit to only Pods on the same node](#limit-to-only-pods-on-the-same-node) section below for a configuration snippet that uses label selectors and environment variables to achieve this.
- **Clustering mode:** For larger deployments, consider setting up Alloy in clustering mode.
- **Monitor resource consumption:** Regularly check API server throttling, memory usage, and inflight requests, especially on cloud-managed clusters (e.g., Azure AKS).
Failure to properly configure Alloy can result in degraded cluster performance, increased cloud costs, and operational risk.
Please review your configuration carefully and consult the examples below.
{{< /admonition >}}
If you supply no connection information, this component defaults to an in-cluster configuration.
A kubeconfig file or manual connection settings can be used to override the defaults.
If you supply no connection information, this component defaults to an in-cluster configuration.
A kubeconfig file or manual connection settings can be used to override the defaults.
## Performance considerations
By default, `discovery.kubernetes` discovers resources across all namespaces in your cluster.
In DaemonSet deployments, this means every {{< param "PRODUCT_NAME" >}} Pod watches all resources, which can increase API server load.
For better performance and reduced API load:
- Use the [`namespaces`](#namespaces) block to limit discovery to specific namespaces.
- Use [`selectors`](#selectors) to filter resources by labels or fields.
- Consider the node-local example in [Limit to only Pods on the same node](#limit-to-only-pods-on-the-same-node).
- Use clustering mode for larger deployments to distribute the discovery load.
- Monitor API server metrics like request rate, throttling, and memory usage, especially on managed clusters.

How about this? We simplify the information and give the reader some really clear things they can do to handle the performance issues. This is active (gives specific and clear steps) and I think says the same thing as the Caution did.

@clayton-cornell clayton-cornell added the type/docs Docs Squad label across all Grafana Labs repos label Nov 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type/docs Docs Squad label across all Grafana Labs repos

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants