Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconcile if ExcludeFromRemediation label is removed #329

Merged

Conversation

clobrano
Copy link
Contributor

Why we need this PR

In case ExcludeFromRemediation label is present, remediation is skipped,
however if the Node is still unhealthy when the label is removed it
should be reconciled.

Changes made

Watch for changes in the Node's labels. If ExcludeFromRemediation is removed reconcile the Node.

Which issue(s) this PR fixes

https://issues.redhat.com/browse/ECOPROJECT-1988

Test plan

Copy link
Contributor

openshift-ci bot commented May 20, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@clobrano clobrano changed the title Reconcile if ExcludeFromRemediation label is removed [WIP] Reconcile if ExcludeFromRemediation label is removed May 20, 2024
}

func labelsUpdateNeedsReconcile(oldLabels, newLabels map[string]string) bool {
// Check if the ExcludeFromRemediation label was removed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if it would make sense to reconcile a node with added label 🤔 It would at least give better visibility to what NHC is doing or not doing in the logs... WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to me (I just need to bring a logger to this function)

we could add a log when the label change is detected like:

  • label added: "Label XYZ was added on Node ABC. Future remediations on this Node will be skipped
    until the label is present"
  • label removed: "Label XYZ was removed on Node ABC."

I'm just not sure about the "Future remediation" part, but I don't want to give the idea that the
Node needs a remediation right now, and we are talking about "potential" future remediations.

I'd like to modify also the log

"Skipped remediation because node ABC is marked to exclude remediations"

into something like

"Remediation skipped on node ABC: node has remediation.medik8s.io/exclude-from-remediation label"

(or a more generic Exclude from remediation label, so that we don't need to maintain this log if we
ever need to change the label's name in the future)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean returning true here in case the label was set. Reconcile we do the logging.
And +1 for improving the log message over there!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean returning true here in case the label was set.

Just to make sure I get it right.

  • you only mean commonLabels.ExcludeFromRemediation in that context and not other labels
  • We will trigger the reconcile for "Excluded" nodes in order to have clearer logs/events and later on the reconcile method itself will identify the situation and stop.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct

@clobrano clobrano force-pushed the e1988-remediation-after-nmo-maintenance-0 branch from ec3f73e to 89303fb Compare May 22, 2024 13:38
@clobrano clobrano changed the title [WIP] Reconcile if ExcludeFromRemediation label is removed Reconcile if ExcludeFromRemediation label is removed May 22, 2024
Copy link
Member

@slintes slintes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, but add a unit test please

In case ExcludeFromRemediation label is present, remediation is skipped.
However if the Node is still unhealthy when the label is removed, it
should be reconciled.

Signed-off-by: Carlo Lobrano <[email protected]>
@clobrano clobrano force-pushed the e1988-remediation-after-nmo-maintenance-0 branch from 89303fb to 00edf09 Compare May 23, 2024 09:25
@openshift-ci openshift-ci bot added the lgtm label May 23, 2024
Copy link
Contributor

openshift-ci bot commented May 23, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: clobrano, slintes

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@clobrano clobrano marked this pull request as ready for review May 23, 2024 11:27
@openshift-ci openshift-ci bot requested review from razo7 and slintes May 23, 2024 11:27
@slintes
Copy link
Member

slintes commented May 23, 2024

ping tide

@slintes
Copy link
Member

slintes commented May 23, 2024

/lgtm cancel

@openshift-ci openshift-ci bot removed the lgtm label May 23, 2024
@slintes
Copy link
Member

slintes commented May 23, 2024

/lgtm

@openshift-ci openshift-ci bot added the lgtm label May 23, 2024
@slintes
Copy link
Member

slintes commented May 23, 2024

/lgtm cancel

@openshift-ci openshift-ci bot removed the lgtm label May 23, 2024
@slintes
Copy link
Member

slintes commented May 23, 2024

/lgtm

@openshift-ci openshift-ci bot added the lgtm label May 23, 2024
@slintes slintes merged commit b710d34 into medik8s:main May 23, 2024
18 checks passed
@clobrano
Copy link
Contributor Author

/cherry-pick release-0.8

@openshift-cherrypick-robot

@clobrano: new pull request created: #331

In response to this:

/cherry-pick release-0.8

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants