Significant memory spike at startup when cluster has many resources #3919

jgoldschrafe · 2024-10-29T18:45:18Z

Describe the bug
At startup, aws-load-balancer-controller experiences a significant spike in memory usage, resulting in a blip where memory utilization substantially higher than the normal runtime memory usage before returning to normal so fast that cadvisor probably won't even catch the memory usage. (In my prod environment, the spike is 8x.) Memory requests and limits must be sized to accommodate this, resulting in significant waste.

Memory usage seemingly improved by some double-digit percentage between 2.8.0 and 2.9.2, presumably due to the Go SDK v2 upgrade, but is still an order of magnitude removed from expected behavior.

I suspect this is related to deserializing a large apiserver response; the behavior can be triggered by creating a large number of resources of certain types.

Steps to reproduce

Create 6,000 kubernetes.io/tls secrets populated with material (can be the same key) in the aws-load-balancer-controller namespace
Start a pod with a memory limit that definitely should not crash based on historical usage (512 MiB is reasonable)
Observe the pod crashing into Error state before cadvisor can even log the increase in memory usage
Remove the pod memory limit
Observe the pod starting successfully
Remove the secret resources
Set an much lower memory limit (~256 MiB)
Observe the pod starting successfully

Expected outcome
Following a leader election, aws-load-balancer-controller uses a reasonably consistent amount of memory throughout the pod's lifecycle.

Environment
AWS Load Balancer Controller 2.8.0-2.9.2
Kubernetes 1.28-1.31
EKS eks.6

Additional Context:

The text was updated successfully, but these errors were encountered:

shraddhabang added the triage/needs-investigation label Oct 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Significant memory spike at startup when cluster has many resources #3919

Significant memory spike at startup when cluster has many resources #3919

jgoldschrafe commented Oct 29, 2024 •

edited

Loading

Significant memory spike at startup when cluster has many resources #3919

Significant memory spike at startup when cluster has many resources #3919

Comments

jgoldschrafe commented Oct 29, 2024 • edited Loading

jgoldschrafe commented Oct 29, 2024 •

edited

Loading