-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
k8s: check and possibly optimise the launch of pending pods #681
Comments
More information about situations of this kind. Here is a node which appears memory-busy: $ kubectl top nodes | grep 'node-3 '
mycluster-node-3 4521m 56% 10261Mi 73% and it is running four user jobs: $ kgp | grep 'node-3 '
reana-run-job-232b1162-c04e-46d6-9018-843ee84a072c-g5mzz 2/2 Running 0 14h 10.0.0.0 mycluster-node-3 <none> <none>
reana-run-job-6a86b3ae-c0a1-4b14-ae53-e8b8fb15d97a-5thvb 2/2 Running 0 4h6m 10.0.0.0 mycluster-node-3 <none> <none>
reana-run-job-a3137e4c-48c7-4d3a-a073-335b18761231-bjpns 2/2 Running 0 4m5s 10.0.0.0 mycluster-node-3 <none> <none>
reana-run-job-ee000c25-32f1-4375-900c-e4ae8a9fb30c-wrmxt 2/2 Running 0 11m 10.0.0.0 mycluster-node-3 <none> <none> Each job requires 3Gi memory, i.e. about 12Gi total, but the jobs actually use much less, about 5Gi only:
This is because the jobs are by default requiring 3 Gi if the user does not specify any other value. If, instead of silently adding 3 Gi memory requirement to each job, we let the jobs consume as much memory as they wish, and have a parallel "memory watcher" daemonset on the nodes that would monitor and kill any user job pods if these start to consume a lot of memory, we would be able to pack twice as many jobs on nodes as we do now, in this very example. (And, if a user does require some 4 Gi, we would simply respect that. This would only change the default behaviour. when a user does not ask for any specific memory limit) |
Sorry to jump in! Average memory and (max)PeakRSS metric monitoring might help to define min
|
There was a situation in a cluster running many concurrent workflows, which generated many jobs, that many jobs were pending, because the cluster did not have enough memory resources to run them all.
For example, here is one snapshot in time:
This means that only 60% of jobs could be running, the remaining 40% were pending. (Some for many hours.)
Some nodes were really busy, for example:
However, other nodes were less so, for example:
It seems that our pending pods aren't consumed as rapidly as they in theory could (e.g. the node-33 and node-35 above had free capacity).
Here is one such Pending pod described:
Let's verify our Kubernetes cluster settings related to the behaviour of Pending pods and let's see whether we could make the memory-checks and the scheduling of these pending pods faster.
The text was updated successfully, but these errors were encountered: