-
Notifications
You must be signed in to change notification settings - Fork 216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: drain and volume detachment status conditions #1876
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: jmdeal The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
4bb4d97
to
fb3ac47
Compare
Pull Request Test Coverage Report for Build 12284394559Details
💛 - Coveralls |
/assign @engedaam |
if cloudprovider.IsNodeClaimNotFoundError(err) { | ||
return reconcile.Result{}, c.removeFinalizer(ctx, node) | ||
stored := nodeClaim.DeepCopy() | ||
if modified := nodeClaim.StatusConditions().SetFalse(v1.ConditionTypeDrained, "Draining", "Draining"); modified { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want both Reason and Message to be Draining? Any extra details we can add here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scoping this to the drain error handling block also means that we're not going to be adding this status condition if the node was empty in the first place. From a functionality perspective this is fine, but also makes it a bit confusing to trace the steps in history later. Thoughts on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want both Reason and Message to be Draining? Any extra details we can add here?
It would be nice, but would result in a lot of additional writes to the resource. That's why I opted to leave additional information on the event where it can be appropriately deduped.
Scoping this to the drain error handling block also means that we're not going to be adding this status condition if the node was empty in the first place.
Yeah, this is intentional. If there were no drainable pods on the Node in the first place, it wouldn't make sense to transition the status condition to false. We should transition from unknown -> true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice, but would result in a lot of additional writes to the resource. That's why I opted to leave additional information on the event where it can be appropriately deduped.
Maybe we can do as a followup, but it'd be interesting to have our reason here call back to which group of pods we're currently draining (e.g. non-critical daemon, critical daemon, non-critical non-daemon, critical non-daemon)
I agree with your second point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed some additional follow-ups, we could set the reason based on the group of pods currently being evicted (e.g. critical
, system-critical
, etc.) This will result in up to 4 additional writes per Node. We're going to decouple for now, but I'll open an issue to track this as an additional feature once this has merged.
if err := c.kubeClient.Delete(ctx, nodeClaim); err != nil { | ||
return reconcile.Result{}, client.IgnoreNotFound(fmt.Errorf("deleting nodeclaim, %w", err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't we want to client.IgnoreNotFound the error handling block so that we continue to the rest of the controller
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the nodeClaim isn't found, we shouldn't be able to proceed with the rest of the loop anyway. This is just a short-circuit. Same answer for anywhere else we short circuit on NotFound.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline, we'll return and requeue if the NodeClaim isn't found. An appropriate error message will then be printed by the initial get call, and we'll return without requeue.
if err := c.kubeClient.Status().Patch(ctx, nodeClaim, client.MergeFromWithOptions(stored, client.MergeFromWithOptimisticLock{})); err != nil { | ||
if errors.IsConflict(err) { | ||
return reconcile.Result{Requeue: true}, nil | ||
} | ||
return reconcile.Result{}, fmt.Errorf("getting nodeclaim, %w", err) | ||
return reconcile.Result{}, client.IgnoreNotFound(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we be doing ignore not found in this block and just continuing if it doesn't exist?
NodesDrainedTotal.Inc(map[string]string{ | ||
metrics.NodePoolLabel: node.Labels[v1.NodePoolLabelKey], | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an observation that we still emit this metric even if we didn't do any draining.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is what we want. I'm considering "drained" the end state, not the process. It would also be confusing / concerning to me as an operator if total nodes drained was less than the total nodes terminated, since that would indicate to me Karpenter is terminating nodes.
We do need to check though that it drained successfully, and we haven't passed over the drain block due to TGP expiration. If that's what you were calling out, you're right and I'll address that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline, this isn't actually an issue with drain, but it is an issue with VolumeDrained. We shouldn't set that condition to true if we proceeded due to TPG expiration.
// getting the NodeClaim again. This prevents conflict errors on subsequent writes. | ||
// USE CAUTION when determining whether to increase this timeout or remove this line | ||
time.Sleep(time.Second) | ||
nodeClaim, err = nodeutils.NodeClaimForNode(ctx, c.kubeClient, node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we can just return here and requeue the controller?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main reason I didn't do that was the additional testability burden. This would increase the number of reconciliations required for the termination controller. Requiring multiple reconciliations for instance termination can already be hard enough to reason about, I'd really rather not increase this further.
Long-term I'm still tracking #1837 which will split these stages into individual controller or subreconcilers and address this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline, we'll return and requeue after 1 second rather than doing the sleep.
c.recorder.Publish(terminatorevents.NodeAwaitingVolumeDetachmentEvent(node)) | ||
stored := nodeClaim.DeepCopy() | ||
if modified := nodeClaim.StatusConditions().SetFalse(v1.ConditionTypeVolumesDetached, "AwaitingVolumeDetachment", "AwaitingVolumeDetachment"); modified { | ||
if err := c.kubeClient.Status().Patch(ctx, nodeClaim, client.MergeFromWithOptions(stored, client.MergeFromWithOptimisticLock{})); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here on ignoring not found here rather than L194
// getting the NodeClaim again. This prevents conflict errors on subsequent writes. | ||
// USE CAUTION when determining whether to increase this timeout or remove this line | ||
time.Sleep(time.Second) | ||
nodeClaim, err = nodeutils.NodeClaimForNode(ctx, c.kubeClient, node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here on just returning
Fixes #N/A
Description
Adds status conditions for node drain and volume detachment to improve observability for the individual termination stages. This is a scoped down version of #1837, which takes these changes along with splitting each termination stage into a separate controller. I will continue to work on that refactor, but I'm decoupling to work on higher priority work.
How was this change tested?
make presubmit
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.