-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MHC created CR isn't deleted by the remediator #347
MHC created CR isn't deleted by the remediator #347
Conversation
Skipping CI for Draft Pull Request. |
/test ? |
@mshitrit: The following commands are available to trigger required jobs:
Use In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/test 4.16-openshift-e2e |
1. CR is created with regular (non generated) machine name when using a multiple support template 2. Multiple template Annotation placed on the CR contains the correct node/template name Signed-off-by: Michael Shitrit <[email protected]>
This severs two purpuses: - It will be set on NodeName annotation instead of previouslly set machine Name for MHC use case. - Indication that for MHC remediation, in which case the normal (non generated) Machine name is set as the CR name Signed-off-by: Michael Shitrit <[email protected]>
4d047c2
to
e96b911
Compare
/test 4.16-openshift-e2e |
…t machine name when checking NodeName annotation. Signed-off-by: Michael Shitrit <[email protected]>
/test 4.16-openshift-e2e |
Signed-off-by: Michael Shitrit <[email protected]>
/test 4.16-openshift-e2e |
/test 4.16-openshift-e2e |
/test 4.17-openshift-e2e |
1 similar comment
/test 4.17-openshift-e2e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as discussed and even noted in the description, this is blocked by https://issues.redhat.com//browse/ECOPROJECT-2187, running tests does not make any sense IMHO
You are probably right, but for some reason I had a vague memory that before the SNR issue we had green e2e for this PR. |
I'd like to understand why the e2e test is green actually. I would expect that it fails, because the MHC controller does not watch remediation CRs, so when the finalizer is removed by the remediator, no reconcile is triggered for lease deletion. The test logs are suspicious IMHO, there is a 1,5 minute delay between CR deletion and lease deletion. The latter should happen much quicker: MHC test, see time diff between lease duration step start, and exiting the test
NHC test for comparison, see time diff between lease duration step start, and CR deletion step start
I think it's coincidence that the last e2e test run succeeded, something else must have triggered reconcile, and once again increasing test timeouts without a reason shadows an issue... Unfortunately pod logs gathering failed, so we can't check NHC logs. Let's try again. /test 4.18-openshift-e2e |
unfortunately MHC logs aren't helpful, they don't reveal what's triggering the reconcile. However, there still is too long delay between CR deletion, and lease deletion test step succeeding. I assume a Node or Machine update is triggering the MHC reconcile.... |
This is a fair assumption, however I think this shouldn't be blocked for merge by ECOPROJECT-2187
|
It does, it hides the issue by increasing the test timeout. Please revert that, then we can merge. |
e51cac5
to
b45a5c5
Compare
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mshitrit, slintes The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@mshitrit: /override requires failed status contexts, check run or a prowjob name to operate on.
Only the following failed contexts/checkruns were expected:
If you are trying to override a checkrun that has a space in it, you must put a double quote on the context. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Tests are failing due to ECOPROJECT-2187 /override ci/prow/4.14-openshift-e2e |
@mshitrit: Overrode contexts on behalf of mshitrit: ci/prow/4.14-openshift-e2e, ci/prow/4.15-openshift-e2e, ci/prow/4.16-openshift-e2e, ci/prow/4.17-openshift-e2e, ci/prow/4.18-openshift-e2e In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Why we need this PR
When a CR is created by MHC it's not deleted by the remediator (SNR) in case the remediator supports multiple templates.
The reason is that when MHC creates the CR it creates a generated name for the CR which does not match the machine name.
When the same apply for node base remediation the additional (node name and template name) info is stored in annotations on the CR, but for machine based remediation the remediator doesn't access those annotations.
Changes made
Which issue(s) this PR fixes
ECOPROJECT-2077
Test plan
Added a regression test
This is blocked by ECOPROJECT-2187
Context: