Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keda RabbitMQ Autoscaler Not Working and K8S API Server Timeout #6268

Closed
oguzhansrky opened this issue Oct 24, 2024 · 3 comments
Closed

Keda RabbitMQ Autoscaler Not Working and K8S API Server Timeout #6268

oguzhansrky opened this issue Oct 24, 2024 · 3 comments
Labels
bug Something isn't working stale All issues that are marked as stale due to inactivity

Comments

@oguzhansrky
Copy link

oguzhansrky commented Oct 24, 2024

Report

E1024 14:34:07.470495       1 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"http: Handler timeout"}: http: Handler timeout
E1024 14:34:07.471060       1 writers.go:131] apiserver was unable to write a fallback JSON response: http: Handler timeout
E1024 14:34:07.471715       1 writers.go:131] apiserver was unable to write a fallback JSON response: http: Handler timeout
E1024 14:34:07.472363       1 timeout.go:141] post-timeout activity - time-elapsed: 8.018879ms, GET "/apis/external.metrics.k8s.io/v1beta1" result: <nil>
E1024 14:34:07.472951       1 timeout.go:141] post-timeout activity - time-elapsed: 4.440412ms, GET "/apis/external.metrics.k8s.io/v1beta1" result: <nil>
E1024 14:34:07.473569       1 timeout.go:141] post-timeout activity - time-elapsed: 9.043293ms, GET "/apis/external.metrics.k8s.io/v1beta1" result: <nil>
W1024 14:34:12.113708       1 logging.go:59] [core] [Channel #1 SubChannel #2] grpc: addrConn.createTransport failed to connect to {
  "Addr": "keda-operator.keda.svc.cluster.local:9666",
  "ServerName": "keda-operator.keda.svc.cluster.local:9666",
  "Attributes": null,
  "BalancerAttributes": null,
  "Type": 0,
  "Metadata": null
}. Err: connection error: desc = "transport: Error while dialing dial tcp 10.43.71.178:9666: connect: connection refused"
E1024 14:34:18.480827       1 wrap.go:53] timeout or abort while handling: method=GET URI="/apis/external.metrics.k8s.io/v1beta1" audit-ID="9a09bcaf-3a0e-41fc-8642-aadb2298f2c3"
E1024 14:34:18.480968       1 writers.go:118] apiserver was unable to write a JSON response: http: Handler timeout
E1024 14:34:18.482389       1 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"http: Handler timeout"}: http: Handler timeout

Expected Behavior

.

Actual Behavior

.

Steps to Reproduce the Problem

Logs from KEDA operator

2024-10-24T14:39:51Z	ERROR	scalers_cache	error getting scale decision	{"scaledobject.Name": "bug", "scaledObject.Namespace": "bug", "scaleTarget.Name": "bug", "error": "error inspecting rabbitMQ: Exception (404) Reason: \"NOT_FOUND - no queue 'bug' in vhost '/'\""}
github.com/kedacore/keda/v2/pkg/scaling/cache.(*ScalersCache).GetScaledObjectState
	/workspace/pkg/scaling/cache/scalers_cache.go:155
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers
	/workspace/pkg/scaling/scale_handler.go:360
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop
	/workspace/pkg/scaling/scale_handler.go:162
2024-10-24T14:39:51Z	ERROR	scalers_cache	error getting scale decision	{"scaledobject.Name": "bug", "scaledObject.Namespace": "bug", "scaleTarget.Name": "bug", "error": "error inspecting rabbitMQ: Exception (404) Reason: \"NOT_FOUND - no queue 'bug' in vhost '/'\""}
github.com/kedacore/keda/v2/pkg/scaling/cache.(*ScalersCache).GetScaledObjectState
	/workspace/pkg/scaling/cache/scalers_cache.go:155
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers
	/workspace/pkg/scaling/scale_handler.go:360
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop
	/workspace/pkg/scaling/scale_handler.go:162

KEDA Version

2.11.2

Kubernetes Version

1.28

Platform

Other

Scaler Details

RabbitMQ

Anything else?

I have been using Keda for a long time. Recently, the system has started to fail to respond. There are nearly 3000 scaledobjects in the system, maybe this is the cause of the problem, but I cannot find the place where it takes the timeout and I cannot intervene in Keda.

@oguzhansrky oguzhansrky added the bug Something isn't working label Oct 24, 2024
@JorTurFer
Copy link
Member

Hello
There have been a lot of performance improvements since KEDA v2.11.2, I'd suggest upgrading it to a recent version and maybe the issues in the metrics servers disappears.

About the RabbitMQ issue, does the queue exist in the vhost? the error reports 404 searching the queue.

About the load (~3k SO), it's quite normal and KEDA should be able to handle it just configuring the kube-client parameters to support more request in its own local rate limiter -> https://keda.sh/docs/2.15/operate/cluster/#kubernetes-client-parameters
The improvements I meant above also includes changesin how KEDA handles the status of the SO, reducing the requests to the k8s api server

Copy link

stale bot commented Jan 3, 2025

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale All issues that are marked as stale due to inactivity label Jan 3, 2025
Copy link

stale bot commented Jan 10, 2025

This issue has been automatically closed due to inactivity.

@stale stale bot closed this as completed Jan 10, 2025
@github-project-automation github-project-automation bot moved this from To Triage to Ready To Ship in Roadmap - KEDA Core Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale All issues that are marked as stale due to inactivity
Projects
Status: Ready To Ship
Development

No branches or pull requests

2 participants