[improvement] Remove severity levels from controller logic #589

komer3 · 2024-12-09T19:59:55Z

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

TODOs:

squashed commits
includes documentation
adds unit tests
adds or updates e2e tests

… linodecluster_controller.go

internal/controller/linodemachine_controller_test.go

internal/controller/linodevpc_controller.go

rahulait · 2024-12-10T16:27:40Z

internal/controller/linodecluster_controller.go

-			if !reconciler.HasConditionSeverity(clusterScope.LinodeCluster, clusterv1.ReadyCondition, clusterv1.ConditionSeverityError) {
+			if !reconciler.HasStaleCondition(clusterScope.LinodeCluster,
+				clusterv1.ReadyCondition,
+				reconciler.DefaultTimeout(r.ReconcileTimeout, reconciler.DefaultClusterControllerReconcileTimeout)) {
 				logger.Info("re-queuing cluster/nb deletion")
 				return ctrl.Result{RequeueAfter: reconciler.DefaultClusterControllerReconcileDelay}, nil


I wonder what our approach should be here. Currently we are doing:

if err != nil { hasStaleCondition() { # try delete every 5 seconds return {5 sec}, nil } # try delete with exponential backoff return {...}, err }

I wonder for delete, we should just have delete every 5 or 10 seconds and not have exponential backoff. With exponential backoffs, deletes could get stuck for 1000 secs (16+ mins) max if something goes wrong. Looking for thoughts from others.

I was thinking about this yesterday. Do we know how many times does it have to fail to get stuck for 1000+ sec?

This is what I have seen in past for exponential backoff:

0.004881143569946289 0.022043943405151367 0.04145383834838867 0.08641910552978516 0.1618349552154541 0.3228580951690674 0.6422019004821777 1.2825469970703125 2.561889171600342 5.122460842132568 10.242242097854614 20.482059001922607 40.962461948394775 81.92205500602722 0.0189058780670166 163.84187817573547 655.36194896698 1000.0018429756165 1000.0022790431976 1000.0020320415497 1000.0018348693848

So mostly failures 16-17 times. I see in most of the places we are doing exponential first and then serial when its a stale condition, I wonder if its a fixed pattern we want to use here as well or we want to have fixed for delete.

I think it would be okay to do 10 or more seconds but I'm worried that a bad delete request could slow down the controllers from processing other requests Or could the controller handle that well?

Since controllers pick from queue, a bad request would mean that request will keep on getting requeued every 10 secs if its using fixed requeue logic or exponentially if its using exponential logic. Since default concurrency is 10, I don't think it will slow down things much as its just for delete and we don't have too many deletes all the time. With exponential delete, I just see that someone might see their resources not getting cleaned up on UI fast if its stuck in exponential backoff. But that also is a rare case. I am ok with current approach, just wanted some extra eyes and thoughts here.

I'm fine with doing 10 sec delayed requeue. Its a rare case but probably will happen :)

rahulait · 2024-12-10T16:50:24Z

internal/controller/linodemachine_controller.go

 			reconciler.DefaultTimeout(r.ReconcileTimeout, reconciler.DefaultClusterControllerReconcileTimeout)) {
+			conditions.MarkFalse(machineScope.LinodeMachine, ConditionPreflightLinodeVPCReady, string(cerrs.CreateClusterError), "", "%s", err.Error())


Will this lead to lastTransitionTime getting updated for the condition? If yes, then in next iteration HasStaleCondition() will return false as it only looks at lastTransitionTime now. Previously, if it was already stale (by checking error severity), we were marking it as stale even if lastTransitionTime changed. Not sure here how it will behave.

Ah good catch! Currently MarkFalse() will update the last transition time but I won't be using this func anymore in the follow up. I'll use Set() func which give more granular control over how to set conditions and I can just not set it then.

But is this something to look out for in a lot of different places?

Yeah, I see it being used in couple of other places in the PR.

Looking at the current main branch's RecordDecayingCondition() method, I do see it running conditions.MarkFalse() everytime. So if previously it was finding the condition as stale, its possible conditions.MarkFalse() wasn't changing the lastTransitionTime if it matched with the existing condition.

If thats the case, then with the latest code in this PR, there will be no change in condition even if we do conditions.MarkFalse() inside or outside of HasStaleCondition() block. In that case, both below examples look same to me:

if reconciler.HasStaleCondition(...) { conditions.MarkFalse(X, Y, Z, "", "%s", err.Error()) return {...}, nil } conditions.MarkFalse(X, Y, Z, "", "%s", err.Error()) return {}, err

Is same as:

conditions.MarkFalse(X, Y, Z, "", "%s", err.Error()) if reconciler.HasStaleCondition(...) { return {...}, nil } return {}, err

Yup I confirmed that lastTransitionTime is not changed or update if the condition stays the same. MarkFalse() uses Set() func which has this comment over it:

// NOTE: If a condition already exists, the LastTransitionTime is updated only if a change is detected // in any of the following fields: Status, Reason, Severity and Message.

internal/controller/linodevpc_controller.go

rahulait · 2024-12-10T17:22:28Z

Other than couple of comments I have to test and confirm, the PR looks good to me.

rahulait

LGTM

komer3 added 8 commits December 9, 2024 12:46

Remove the use of RecordDecayingCondition func and severity levels in…

9675a3a

… linodecluster_controller.go

Remove the use of severity from linode firewall controller

0e440a4

Remove the severity level use from linodemachine controller

9065afc

Remove severity level usage from linodeobjbucket controller

604fed9

Remove Severity usage from linode placement grp controller

c252ead

Remove severity level usage from linodevpc controller

7d33e3d

Remove unused func and code

fa59d6a

lint fix

79a81d4

komer3 changed the title ~~Remove severity levels from controller logic~~ [improvement] Remove severity levels from controller logic Dec 9, 2024

AshleyDumaine reviewed Dec 9, 2024

View reviewed changes

internal/controller/linodemachine_controller_test.go Outdated Show resolved Hide resolved

clean up commented code

79f5d8a

github-actions bot added the improvement label Dec 9, 2024

rahulait reviewed Dec 9, 2024

View reviewed changes

internal/controller/linodevpc_controller.go Show resolved Hide resolved

rahulait reviewed Dec 10, 2024

View reviewed changes

internal/controller/linodevpc_controller.go Outdated Show resolved Hide resolved

update MarkFalse() syntax to be consistent

6ee81b6

rahulait approved these changes Dec 11, 2024

View reviewed changes

AshleyDumaine approved these changes Dec 12, 2024

View reviewed changes

komer3 merged commit a431a05 into main Dec 13, 2024
10 checks passed

komer3 deleted the remove-severity-levels branch December 13, 2024 18:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[improvement] Remove severity levels from controller logic #589

[improvement] Remove severity levels from controller logic #589

komer3 commented Dec 9, 2024

rahulait Dec 10, 2024 •

edited

Loading

komer3 Dec 10, 2024

rahulait Dec 10, 2024 •

edited

Loading

komer3 Dec 10, 2024

rahulait Dec 10, 2024

komer3 Dec 11, 2024

rahulait Dec 10, 2024

komer3 Dec 10, 2024

rahulait Dec 10, 2024 •

edited

Loading

komer3 Dec 11, 2024

rahulait commented Dec 10, 2024

rahulait left a comment

		reconciler.DefaultTimeout(r.ReconcileTimeout, reconciler.DefaultClusterControllerReconcileTimeout)) {
		conditions.MarkFalse(machineScope.LinodeMachine, ConditionPreflightLinodeVPCReady, string(cerrs.CreateClusterError), "", "%s", err.Error())

[improvement] Remove severity levels from controller logic #589

[improvement] Remove severity levels from controller logic #589

Conversation

komer3 commented Dec 9, 2024

rahulait Dec 10, 2024 • edited Loading

Choose a reason for hiding this comment

komer3 Dec 10, 2024

Choose a reason for hiding this comment

rahulait Dec 10, 2024 • edited Loading

Choose a reason for hiding this comment

komer3 Dec 10, 2024

Choose a reason for hiding this comment

rahulait Dec 10, 2024

Choose a reason for hiding this comment

komer3 Dec 11, 2024

Choose a reason for hiding this comment

rahulait Dec 10, 2024

Choose a reason for hiding this comment

komer3 Dec 10, 2024

Choose a reason for hiding this comment

rahulait Dec 10, 2024 • edited Loading

Choose a reason for hiding this comment

komer3 Dec 11, 2024

Choose a reason for hiding this comment

rahulait commented Dec 10, 2024

rahulait left a comment

Choose a reason for hiding this comment

rahulait Dec 10, 2024 •

edited

Loading

rahulait Dec 10, 2024 •

edited

Loading

rahulait Dec 10, 2024 •

edited

Loading