You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the application-controller logs - I do not see any specific warning/error about this.
Metrics values look like below..
# HELP rest_client_requests_total Number of HTTP requests, partitioned by status code, method, and host.
# TYPE rest_client_requests_total counter
rest_client_requests_total{code="200",host="172.25.60.1:443",method="GET"} 18783
rest_client_requests_total{code="200",host="172.25.60.1:443",method="PATCH"} 6237
rest_client_requests_total{code="200",host="172.25.60.1:443",method="PUT"} 4
rest_client_requests_total{code="201",host="172.25.60.1:443",method="POST"} 12
rest_client_requests_total{code="404",host="172.25.60.1:443",method="GET"} 36
rest_client_requests_total{code="422",host="172.25.60.1:443",method="POST"} 18
So my question is - how do I debug which rest requests are failing and then I can focus on why they are failing. Right now the alert based on this metric does not give enough information to fix it.
I tried restarting the application-controller pods but error comes back.
I suspect some kind of application configuration issue because I see this error only in one cluster.
Any insight on how to figure out the cause would be great!
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Currently, we are getting alerted that ArgoCD application controller pod is experiencing some %age of errors when connection to Kubernetes API Server.
The alert expression is based on this alert rule
https://samber.github.io/awesome-prometheus-alerts/rules.html#rule-kubernetes-1-31
In the application-controller logs - I do not see any specific warning/error about this.
Metrics values look like below..
So my question is - how do I debug which rest requests are failing and then I can focus on why they are failing. Right now the alert based on this metric does not give enough information to fix it.
I tried restarting the application-controller pods but error comes back.
I suspect some kind of application configuration issue because I see this error only in one cluster.
Any insight on how to figure out the cause would be great!
Beta Was this translation helpful? Give feedback.
All reactions