You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have successfully installed Scaphandre on my Kubernetes cluster using the provided documentation here. The installation command includes enabling ServiceMonitor and setting the interval to 2 seconds:
Additionally, I have set up Prometheus and adjusted its configuration to a global scraping interval of 2 seconds with a timeout of 1 second.
My objective is to monitor the energy usage metric of each node, for which I created a Python script executed in a Jupyter Notebook. The script queries Prometheus for the 'scaph_host_energy_microjoules' metric in a loop:
import requests
import time
prometheus = 'http://localhost:9090/'
while True:
energy_query = 'scaph_host_energy_microjoules'
response_energy = requests.get(prometheus + '/api/v1/query', params={'query': energy_query})
result_energy = response_energy.json().get('data', {}).get('result', [])
# The error occurs here after some runtime
energy_usage = float(result_energy[action]['value'][1])
time.sleep(5)
After running the script for approximately 40 minutes, an 'IndexError: list index out of range' occurs. This issue seems to indicate that Prometheus is unable to scrape metrics from all three nodes consistently. It appears that the Scaphandre pod responsible for gathering node metrics periodically goes down and then restarts, causing intermittent interruptions (it is like sometimes, for 1s, two pods out of 3 works).
Additional details:
The cluster is created using Kind with a basic configuration (2 worker nodes and 1 master).
Action can be 0, 1, or 2, corresponding to the three nodes in the cluster.
I suspect that the problem might be related to the scrape interval. Your insights and suggestions on resolving this issue would be greatly appreciated. Thank you in advance for your assistance.
uname -a
Linux my_pc 6.1.0-1029-oem #29-Ubuntu SMP PREEMPT_DYNAMIC Tue Jan 9 21:07:34 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
cat /proc/cpuinfo
model : 186
model name : 13th Gen Intel(R) Core(TM) i5-1335U
The text was updated successfully, but these errors were encountered:
I am using a scrape interval of 5 minutes and a scrape timeout of 2 minutes.
Why is the reason to set a such short interval ?
The risk with that timeout is that scaphandre uses all CPU if you a lot of pods.
I am using a scrape interval of 5 minutes and a scrape timeout of 2 minutes. Why is the reason to set a such short interval ? The risk with that timeout is that scaphandre uses all CPU if you a lot of pods.
I'm using Machine Learning, so I have to have updated data
I have successfully installed Scaphandre on my Kubernetes cluster using the provided documentation here. The installation command includes enabling ServiceMonitor and setting the interval to 2 seconds:
helm install scaphandre helm/scaphandre --set serviceMonitor.enabled=true --set serviceMonitor.interval=2s
Additionally, I have set up Prometheus and adjusted its configuration to a global scraping interval of 2 seconds with a timeout of 1 second.
My objective is to monitor the energy usage metric of each node, for which I created a Python script executed in a Jupyter Notebook. The script queries Prometheus for the 'scaph_host_energy_microjoules' metric in a loop:
After running the script for approximately 40 minutes, an 'IndexError: list index out of range' occurs. This issue seems to indicate that Prometheus is unable to scrape metrics from all three nodes consistently. It appears that the Scaphandre pod responsible for gathering node metrics periodically goes down and then restarts, causing intermittent interruptions (it is like sometimes, for 1s, two pods out of 3 works).
Additional details:
I suspect that the problem might be related to the scrape interval. Your insights and suggestions on resolving this issue would be greatly appreciated. Thank you in advance for your assistance.
The text was updated successfully, but these errors were encountered: