Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kepler does not report metrics on resources outside of some system namespaces #1771

Open
BoyanBanev opened this issue Sep 9, 2024 · 0 comments
Labels
kind/bug report bug issue

Comments

@BoyanBanev
Copy link

What happened?

We have deployed kepler in 2 clusters. The only difference between them is that Cluster A is dual stack and Cluster B is IPv6 only

Kepler reports metrics correctly from cluster A. From Cluster B we can only see metrics reported for some system namespaces (e.g. kube-system) and kepler itself

ClusterA.log
ClusterA_metrics.txt
ClusterB.log
ClusterB_metrics.txt

What did you expect to happen?

I expect that kepler reports metrics for all resources on Cluster B

How can we reproduce it (as minimally and precisely as possible)?

Run kepler in an ipv6 only kubernetes cluster

Anything else we need to know?

No response

Kepler image tag

0.7.10

Kubernetes version

$ kubectl version
Client Version: v1.30.3
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.4

Cloud provider or bare metal

bare metal

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Kepler deployment config

Name: kepler
Selector: app.kubernetes.io/component=exporter,app.kubernetes.io/name=kepler
Node-Selector: kubernetes.io/os=linux
Labels: app.kubernetes.io/component=exporter
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=kepler
app.kubernetes.io/version=release-0.7.10
helm.sh/chart=kepler-0.5.7
helm.toolkit.fluxcd.io/name=kepler
helm.toolkit.fluxcd.io/namespace=kepler
Annotations: deprecated.daemonset.template.generation: 1
meta.helm.sh/release-name: kepler
meta.helm.sh/release-namespace: kepler
telegraf.influxdata.com/class: app
telegraf.influxdata.com/env-fieldref-HOSTIP: status.hostIP
telegraf.influxdata.com/env-fieldref-NAMESPACE: metadata.namespace
telegraf.influxdata.com/env-fieldref-PODIP: status.podIP
telegraf.influxdata.com/env-fieldref-PODNAME: metadata.name
telegraf.influxdata.com/volume-mounts: {"cdi-user":"/var/local"}
Desired Number of Nodes Scheduled: 6
Current Number of Nodes Scheduled: 6
Number of Nodes Scheduled with Up-to-date Pods: 6
Number of Nodes Scheduled with Available Pods: 6
Number of Nodes Misscheduled: 0
Pods Status: 6 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
Labels: app.kubernetes.io/component=exporter
app.kubernetes.io/name=kepler
monitoring=aid
Annotations: telegraf.influxdata.com/class: app
telegraf.influxdata.com/env-fieldref-HOSTIP: status.hostIP
telegraf.influxdata.com/env-fieldref-NAMESPACE: metadata.namespace
telegraf.influxdata.com/env-fieldref-PODIP: status.podIP
telegraf.influxdata.com/env-fieldref-PODNAME: metadata.name
telegraf.influxdata.com/volume-mounts: {"cdi-user":"/var/local"}
Service Account: kepler
Containers:
kepler-exporter:
Image: artifactory.devops.telekom.de/dtt-cbdev-boyanslab-dev-docker/kepler:0.7.10
Port: 9102/TCP
Host Port: 9102/TCP
Args:
-v=$(KEPLER_LOG_LEVEL)
Liveness: http-get http://:9102/healthz delay=10s timeout=10s period=60s #success=1 #failure=5
Environment:
NODE_IP: (v1:status.hostIP)
NODE_NAME: (v1:spec.nodeName)
METRIC_PATH: /metrics
BIND_ADDRESS: 0.0.0.0:9102
BIND_ADDRESS: 0.0.0.0:9102
CGROUP_METRICS: *
CPU_ARCH_OVERRIDE:
ENABLE_EBPF_CGROUPID: true
ENABLE_GPU: false
ENABLE_PROCESS_METRICS: false
ENABLE_QAT: true
EXPOSE_CGROUP_METRICS: true
EXPOSE_HW_COUNTER_METRICS: true
EXPOSE_IRQ_COUNTER_METRICS: true
KEPLER_LOG_LEVEL: 6
METRIC_PATH: /metrics
Mounts:
/lib/modules from lib-modules (rw)
/proc from proc (rw)
/sys from tracing (rw)
/usr/src from usr-src (rw)
Volumes:
lib-modules:
Type: HostPath (bare host directory volume)
Path: /lib/modules
HostPathType: DirectoryOrCreate
tracing:
Type: HostPath (bare host directory volume)
Path: /sys
HostPathType: Directory
proc:
Type: HostPath (bare host directory volume)
Path: /proc
HostPathType: Directory
usr-src:
Type: HostPath (bare host directory volume)
Path: /usr/src
HostPathType: Directory
cdi-user:
Type: Secret (a volume populated by a Secret)
SecretName: cdi-user-appmetrics
Optional: false
Node-Selectors: kubernetes.io/os=linux
Tolerations: node-role.kubernetes.io/control-plane:NoSchedule

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

@BoyanBanev BoyanBanev added the kind/bug report bug issue label Sep 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug report bug issue
Projects
None yet
Development

No branches or pull requests

1 participant