Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significant memory spike at startup when cluster has many resources #3919

Open
jgoldschrafe opened this issue Oct 29, 2024 · 0 comments
Open

Comments

@jgoldschrafe
Copy link

jgoldschrafe commented Oct 29, 2024

Describe the bug
At startup, aws-load-balancer-controller experiences a significant spike in memory usage, resulting in a blip where memory utilization substantially higher than the normal runtime memory usage before returning to normal so fast that cadvisor probably won't even catch the memory usage. (In my prod environment, the spike is 8x.) Memory requests and limits must be sized to accommodate this, resulting in significant waste.

Memory usage seemingly improved by some double-digit percentage between 2.8.0 and 2.9.2, presumably due to the Go SDK v2 upgrade, but is still an order of magnitude removed from expected behavior.

I suspect this is related to deserializing a large apiserver response; the behavior can be triggered by creating a large number of resources of certain types.

Steps to reproduce

  • Create 6,000 kubernetes.io/tls secrets populated with material (can be the same key) in the aws-load-balancer-controller namespace
  • Start a pod with a memory limit that definitely should not crash based on historical usage (512 MiB is reasonable)
  • Observe the pod crashing into Error state before cadvisor can even log the increase in memory usage
  • Remove the pod memory limit
  • Observe the pod starting successfully
  • Remove the secret resources
  • Set an much lower memory limit (~256 MiB)
  • Observe the pod starting successfully

Expected outcome
Following a leader election, aws-load-balancer-controller uses a reasonably consistent amount of memory throughout the pod's lifecycle.

Environment
AWS Load Balancer Controller 2.8.0-2.9.2
Kubernetes 1.28-1.31
EKS eks.6

Additional Context:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants