You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The sum of the node approximations is pretty close to the socket meter measured values. However, the approximated values for a whole namespace are off by a lot. When the GPU worload pauses for a few minutes, the socket meter measures about 260 W, meaning the actual consumption of the GPU worload should not be much higher than 250 W (~ 490 (gpu total node socket meter ) - 260 (socket meter idle)). Should I be using another PromQL query to obtain the power consumption of a whole namespace?
I think the idle power consumption of the gpu workload are a lot lower because the idle power is currently divided by the number of processes rather than by the amount of resources which is acknowledged and planned as future work item (line 179) and the cpu workload is a microservice benchmark that uses a lot of pods.
What did you expect to happen?
Kepler reports the power consumption per namesapce as accurately as it does for the entire node (which is does very well, props to the devs)
How can we reproduce it (as minimally and precisely as possible)?
Run any kind of workload (preferably GPU as it's more dramatic) and execute above prometheus queries.
What happened?
sum(rate(kepler_container_joules_total{{container_namespace='test', mode='dynamic'}}[60s]))
sum(rate(kepler_container_joules_total{{container_namespace='test', mode='idle'}}[60s]))
sum(rate(kepler_container_platform_joules_total{mode='dynamic'}[60s]))
sum(rate(kepler_container_platform_joules_total{mode='idle'}[60s]))
The sum of the node approximations is pretty close to the socket meter measured values. However, the approximated values for a whole namespace are off by a lot. When the GPU worload pauses for a few minutes, the socket meter measures about 260 W, meaning the actual consumption of the GPU worload should not be much higher than 250 W (~ 490 (gpu total node socket meter ) - 260 (socket meter idle)). Should I be using another PromQL query to obtain the power consumption of a whole namespace?
I think the idle power consumption of the gpu workload are a lot lower because the idle power is currently divided by the number of processes rather than by the amount of resources which is acknowledged and planned as future work item (line 179) and the cpu workload is a microservice benchmark that uses a lot of pods.
What did you expect to happen?
Kepler reports the power consumption per namesapce as accurately as it does for the entire node (which is does very well, props to the devs)
How can we reproduce it (as minimally and precisely as possible)?
Run any kind of workload (preferably GPU as it's more dramatic) and execute above prometheus queries.
Anything else we need to know?
No response
Kepler image tag
Kubernetes version
Cloud provider or bare metal
OS version
Install tools
Kepler deployment config
No response
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)
No response
The text was updated successfully, but these errors were encountered: