Kubernetes Scaphandre Deployment reporting 0 W #353

eduardogomescampos1 · 2024-01-23T14:13:24Z

Bug description

First of all, I would like to thank all the Scaphandre team for a tool like this. It has been extremely helpful so far! So, the bug consists on some nodes from my local k8s cluster reporting 0W of consumption. As way to illustrate the issue, there is a print screen from the official Scaphandre Grafana dashboard on the screenshot section.
Each color represents a node and, as you can see, 3 of them report 0W. The thing that is most intriguing is that if I run Scaphandre locally, I'm able to get actual values. There is also a print screen of the logs of a local execution of Scaphandre in one of those nodes reporting 0W in the k8s version.
As you can see, Scaphandre is able to obtain those metrics locally, however, the pods from the k8s cluster cannot. Doing "kubectl logs 'scaphandre pod ' " has been of no help since it just returns:
" Scaphandre prometheus exporter
Sending ⚡ metrics
Press CTRL-C to stop scaphandre "
And describing the pods does not return anything worth mentioning either.
It is relevant to note that the firewall is disabled on all cluster machines.
Could you give any insights on solving this, please?

To Reproduce

Create a k8s cluster using Calico CNI following its documentation
Create a deployment for Grafana and Prometheus (following these tutorials: https://devopscube.com/setup-grafana-kubernetes/ and https://devopscube.com/setup-prometheus-monitoring-on-kubernetes/)
Deploy Scaphandre from its Helm Chart
Open Scaphandre Grafana dashboard and verify that some nodes report 0W

Expected behavior

The Grafana dashboard should report the same values obtained from the local execution rather than 0W

Screenshots

Scaphandre Grafana Dashboard
Local execution from Scaphandre is able to get values different than 0W.

Environment

Linux distribution version on all machines Ubuntu 22.04.3
Kernel version on all machines 5.15.0-91-generic

Additional context

One interesting aspect is that all of malfunctioning machines have been formatted quite recently so I'm guessing there might be a misconfiguration somewhere.

mmadoo · 2024-01-23T16:36:46Z

Which docker tag are you using and what is the value of the metrics scaph_self_version ?

I am using dev tag and got version 0.5. My metrics for scaph_process_power_consumption_microwatts are fine.

eduardogomescampos1 · 2024-01-23T17:28:27Z

Which docker tag are you using and what is the value of the metrics scaph_self_version ?

I am using dev tag and got version 0.5. My metrics for scaph_process_power_consumption_microwatts are fine.

All nodes return 0.5 for this metric. I have installed the helm chart from the dev branch using the dev tag as well. Besides, something I also noted is that whenever I run the quick docker version (as in https://hubblo-org.github.io/scaphandre-documentation/tutorials/installation-linux) I also get reported 0W on one of the malfunctioning nodes. I feel like this has something to do with the container not being allowed to access the proper files, even though I have disabled all firewalls and used the command chmod 777 on both /sys/class/powercap and /proc (for testing purposes). I'm wondering why only one node is able to get the measurements correctly.

Docker quick version output

eduardogomescampos1 · 2024-01-23T17:41:58Z

Now I've tried to run the dev image locally and there is a warning
"scaphandre::sensors: Could'nt read record from /sys/class/powercap/intel-rapl:0/energy_uj, error was: Os { code: 2, kind: NotFound, message: "No such file or directory" }"
However, as I have stated before, I have used the chmod -R 777 command on this folder and disabled the firewall. What could be causing this?

eduardogomescampos1 · 2024-01-24T12:42:20Z

It is indeed a permission issue. As I came back to office and typed "kubectl logs 'scaphandre pod'", this time a got a warning message stating:
"scaphandre::sensors: Could'nt read record from /sys/class/powercap/intel-rapl:0/energy_uj, error was: Os { code: 13, kind: PermissionDenied, message: "Permission denied" }"

eduardogomescampos1 · 2024-02-20T19:18:04Z

I think it had something to do with the containerd container runtime. In the project I'm taking part on we decided to change from containerd to CRI-O and the problem was solved afterwards. All nodes report sensible values now

bpetit · 2024-10-17T09:40:05Z

Hi, it seems related to #391 that has been merged in dev a few days ago.

If anyone wants to give it a try with a containerd runtime that would be interesting.

Now I've tried to run the dev image locally and there is a warning "scaphandre::sensors: Could'nt read record from /sys/class/powercap/intel-rapl:0/energy_uj, error was: Os { code: 2, kind: NotFound, message: "No such file or directory" }" However, as I have stated before, I have used the chmod -R 777 command on this folder and disabled the firewall. What could be causing this?

This would be related to a intel-rapl module issue, not scaphandre itself.

It is indeed a permission issue. As I came back to office and typed "kubectl logs 'scaphandre pod'", this time a got a warning message stating:
"scaphandre::sensors: Could'nt read record from /sys/class/powercap/intel-rapl:0/energy_uj, error was: Os { code: 13, kind: PermissionDenied, message: "Permission denied" }"

This would be related (probably) to #391

eduardogomescampos1 added the bug Something isn't working label Jan 23, 2024

bpetit added this to General Jun 19, 2024

bpetit moved this to Triage in General Jun 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kubernetes Scaphandre Deployment reporting 0 W #353

Kubernetes Scaphandre Deployment reporting 0 W #353

eduardogomescampos1 commented Jan 23, 2024 •

edited

Loading

mmadoo commented Jan 23, 2024

eduardogomescampos1 commented Jan 23, 2024

eduardogomescampos1 commented Jan 23, 2024

eduardogomescampos1 commented Jan 24, 2024

eduardogomescampos1 commented Feb 20, 2024

bpetit commented Oct 17, 2024

Kubernetes Scaphandre Deployment reporting 0 W #353

Kubernetes Scaphandre Deployment reporting 0 W #353

Comments

eduardogomescampos1 commented Jan 23, 2024 • edited Loading

Bug description

To Reproduce

Expected behavior

Screenshots

Environment

Additional context

mmadoo commented Jan 23, 2024

eduardogomescampos1 commented Jan 23, 2024

eduardogomescampos1 commented Jan 23, 2024

eduardogomescampos1 commented Jan 24, 2024

eduardogomescampos1 commented Feb 20, 2024

bpetit commented Oct 17, 2024

eduardogomescampos1 commented Jan 23, 2024 •

edited

Loading