-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update the capacity to zero on shutdown/reset #502
base: master
Are you sure you want to change the base?
update the capacity to zero on shutdown/reset #502
Conversation
b542bb8
to
ee1e0e9
Compare
@@ -309,6 +322,14 @@ func (rs *resourceServer) restart() error { | |||
// Send terminate signal to ListAndWatch() | |||
rs.termSignal <- true | |||
|
|||
// wait for the terminated signal or 5 second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In resoruceServer.Stop()
we do the termSignal-terminatedSignal
handshake before stopping the grpcServer. n restart, we do it after rs.grpcServer.Stop()
. Is it intentional?
It's not strictly related to this PR, but with these new changes I think it can raise an error on L189
stream.Send(resp)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right! sorry
ee1e0e9
to
0e7b699
Compare
@zeeke please give it another look :) |
hmm im not sure that thats what kubelet expects (being suddenly reported there are no devices when device plugin shuts will it remove entries from checkpoint file ? |
pkg/resources/server.go
Outdated
@@ -41,6 +41,7 @@ type resourceServer struct { | |||
resourceNamePrefix string | |||
grpcServer *grpc.Server | |||
termSignal chan bool | |||
terminatedSignal chan bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: termSignal
and terminatedSignal
only relate to ListAndWatch
function. They don't relate to the resourceServer
struct to terminate or see if it's terminated.
Renaming those two channels to listAndWatchStopSignal
and listAndWatchFinishedSignal
(or something similar) might improve a little the readability of this file.
From my tests it doesn't remove the checkpoint file and running pods continue to run (I will try to leave it down for a longer period to be sure there is no reconcile or something that will kill the running pods) without this change, we have an issue when we take down the pod and the pod is allocated to that node the pod will not be able to start. |
When the device plugin is restarted, kubelet marks the resource as unhealthy, but still reports the resource as existing for a grace period (5 mins). If a pod is scheduled before the device plugin comes up, the pod create fails without a retryloop with an error message Pod was rejected: Allocate failed due to no healthy devices present; cannot allocate unhealthy devices <DEVICE_NAME>, which is unexpected. This commit allow the device plugin to send an empty list of devices before the reset or shutdown Signed-off-by: Sebastian Sch <[email protected]>
0e7b699
to
c1a6852
Compare
related to openshift/sriov-network-operator#812 |
@SchSeba been digging a bit into kubelet code kubelet will report updated status to kubelet every 10 seconds[1] (every NodeStatusUpdateFrequency[2]) once device plugin plugin exits (its endpoint no longer valid), all devices are deemed unhealty[3] [1] https://github.com/kubernetes/kubernetes/blob/d61cbac69aae97db1839bd2e0e86d68f26b353a7/pkg/kubelet/kubelet.go#L1637 so after at most 10 seconds node will report zero allocatable resources if plugin has exited. what i suggest, is in sriov-network-operator after we remove device plugin to spin up a goroutine to keep deleting pods in admission error if they consume resources from device plugin until new device plugin is up or alternatively (even better option imo) once device plugin is up again clean up any pods which consume dp resources in admission error once. LMK what you think. |
Interesting I will try to implement a POC in the operator but before that a question do you see any issue changing the number to 0 when we reboot? can that effect something? |
|
IIUC, if the device plugin exits and leaves the resources unhealthy (for a while I guess), any deployed Pod will get an admission error. If the Pod comes from a Deployment or a Replicaset, the kube controller spawns a lot of subsequent Pods, creating a little junk:
Setting the device count to 0 will prevent the scheduler from selecting the node, making it retry the same Pod in an exponential backoff fashion. |
Pull Request Test Coverage Report for Build 6377485447Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
Pull Request Test Coverage Report for Build 6377734797Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
Pull Request Test Coverage Report for Build 6377419645Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
1 similar comment
Pull Request Test Coverage Report for Build 6377419645Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
When the device plugin is restarted, kubelet marks the resource as unhealthy, but still reports the resource as existing for a grace period (5 mins). If a pod is scheduled before the device plugin comes up, the pod create fails without a retryloop with an error message Pod was rejected: Allocate failed due to no healthy devices present; cannot allocate unhealthy devices <DEVICE_NAME>, which is unexpected.
This commit allow the device plugin to send an empty list of devices before the reset or shutdown