Application-controller - debug rest_client_requests_total related alert #19929

dharapvj · 2024-09-13T10:04:21Z

dharapvj
Sep 13, 2024

Currently, we are getting alerted that ArgoCD application controller pod is experiencing some %age of errors when connection to Kubernetes API Server.

The alert expression is based on this alert rule
https://samber.github.io/awesome-prometheus-alerts/rules.html#rule-kubernetes-1-31

In the application-controller logs - I do not see any specific warning/error about this.

Metrics values look like below..

# HELP rest_client_requests_total Number of HTTP requests, partitioned by status code, method, and host.
# TYPE rest_client_requests_total counter
rest_client_requests_total{code="200",host="172.25.60.1:443",method="GET"} 18783
rest_client_requests_total{code="200",host="172.25.60.1:443",method="PATCH"} 6237
rest_client_requests_total{code="200",host="172.25.60.1:443",method="PUT"} 4
rest_client_requests_total{code="201",host="172.25.60.1:443",method="POST"} 12
rest_client_requests_total{code="404",host="172.25.60.1:443",method="GET"} 36
rest_client_requests_total{code="422",host="172.25.60.1:443",method="POST"} 18

So my question is - how do I debug which rest requests are failing and then I can focus on why they are failing. Right now the alert based on this metric does not give enough information to fix it.

I tried restarting the application-controller pods but error comes back.

I suspect some kind of application configuration issue because I see this error only in one cluster.

Any insight on how to figure out the cause would be great!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Application-controller - debug rest_client_requests_total related alert #19929

{{title}}

Replies: 0 comments

Select a reply

Application-controller - debug rest_client_requests_total related alert #19929

dharapvj Sep 13, 2024

Replies: 0 comments

dharapvj
Sep 13, 2024