Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Work on API memory exhaustion / stability issues #1471

Open
3 tasks done
sjahl opened this issue Apr 3, 2024 · 0 comments
Open
3 tasks done

Work on API memory exhaustion / stability issues #1471

sjahl opened this issue Apr 3, 2024 · 0 comments

Comments

@sjahl
Copy link
Contributor

sjahl commented Apr 3, 2024

I've been working on this in the background for a while, I just wanted to have an issue to have on the board for it.

What we've been seeing, and trying to mitigate:

  • pods had a JS heap that was larger than the container memory limit, so pods were getting OOMKilled before they did any garbage collection
  • Setting a heap size below the container limit resulted in heap allocation failures, since that heap wasn't large enough (3GB), and caused CPU usage to go nuts, since we were constantly garbage collecting
  • Setting a 7GB heap resulted in fewer heap allocation failures, and improved CPU utilization, but we were still getting the odd container kill on very large short term heap allocations
  • Setting a 10GB heap seems to fix most of the heap allocation failures, except for some outliers.

Still tracking down some of our other pod restarts -- not all of these seem to be memory related, and we get pod restarts about ~twice per day with the latest resource limit increase.

Instability patterns that we're seeing regularly:

  • pods fail their health check, which removes them from the load balancing pool. All requests then go to the other API server, which overwhelm it, and cause it to also become unhealthy. We swap back and forth throughout the day
  • API pods crash with a RangeError stacktrace. The common cause of this appears to be attempting to serialize very large JSON objects into strings (https://the-tgg.slack.com/archives/C03P7FA3W3T/p1710772444516509)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant