-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 Recreate bootstrap token if it was cleaned up #11520
base: main
Are you sure you want to change the base?
🐛 Recreate bootstrap token if it was cleaned up #11520
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, just some nits
@@ -371,6 +371,11 @@ func (r *KubeadmConfigReconciler) refreshBootstrapTokenIfNeeded(ctx context.Cont | |||
|
|||
secret, err := getToken(ctx, remoteClient, token) | |||
if err != nil { | |||
if apierrors.IsNotFound(err) { | |||
log.Error(err, "token secret is gone, triggering creation of new token") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log.Error(err, "token secret is gone, triggering creation of new token") | |
log.Error(err, "Bootstrap token secret is gone, triggering creation of new token") |
Also, big nit, but should this really be an error log? Even if this not part of the normal flow, if we have a way to recover from it, should we surface it as an error? My concern is if this would be confusing from someone looking over the logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, and done. I had initially made it an error since it's rare, but that's admittedly not a good reason.
return ctrl.Result{}, errors.Wrapf(err, "failed to refresh bootstrap token") | ||
} | ||
return ctrl.Result{ | ||
RequeueAfter: r.tokenCheckRefreshOrRotationInterval(), | ||
}, nil | ||
} | ||
|
||
func (r *KubeadmConfigReconciler) createNewBootstrapToken(ctx context.Context, config *bootstrapv1.KubeadmConfig, scope *Scope, remoteClient client.Client) (ctrl.Result, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe? Given the message you log, looks like this method is not for the first token creation, but for when the token needs to be recreated.
func (r *KubeadmConfigReconciler) createNewBootstrapToken(ctx context.Context, config *bootstrapv1.KubeadmConfig, scope *Scope, remoteClient client.Client) (ctrl.Result, error) { | |
func (r *KubeadmConfigReconciler) recreateNewBootstrapToken(ctx context.Context, config *bootstrapv1.KubeadmConfig, scope *Scope, remoteClient client.Client) (ctrl.Result, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point. But, talking about language nits 😆, I'm removing "new" since that else duplicates the "re" prefix.
Yes, it seems a dup of #11037, but it has the big advantage that this PR comes with unit tests. It would be great if @AndiDog and @archerwu9425 can join forcces and come out with one PR preserving test coverage and addressing this comment (re-create behaviour should be limited to MP owned bootstrap tokens). |
5350450
to
8716e12
Compare
Sorry, I wasn't aware there's already a PR. Please feel free to choose where to continue with reviews. I covered all places where the secret could be seen as not found – the |
8716e12
to
9d22a92
Compare
I limited the new behavior to machine pools |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last two nits otherwise lgtm pending fix for linter errors
@@ -371,6 +371,11 @@ func (r *KubeadmConfigReconciler) refreshBootstrapTokenIfNeeded(ctx context.Cont | |||
|
|||
secret, err := getToken(ctx, remoteClient, token) | |||
if err != nil { | |||
if apierrors.IsNotFound(err) && scope.ConfigOwner.IsMachinePool() { | |||
log.Info("Bootstrap token secret is gone, triggering creation of new token") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
log.Info("Bootstrap token secret is gone, triggering creation of new token") | |
log.Info("Bootstrap token secret is not found, triggering creation of new token") |
same in L410
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be better to move this log line and the following line blanking out the Token into recreateBootstrapToken
q: can we drop blanking out the boolstrap token entirely, given that we are setting it to a new value inside recreateBootstrapToken
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reworded.
Semantically, the function name recreateBootstrapToken
doesn't denote that the secret wasn't found, so I'd rather leave the log line here where we determined that the secret wasn't found and therefore need to call the recreation functionality.
We could drop setting the field to ""
, but then if token creation fails, CAPI tries to read the non-existing secret again. If we set an empty string, the next reconciliation run will try to create the token. Your call here – I have no strong opinion which way to go.
@@ -1441,6 +1441,154 @@ func TestBootstrapTokenRotationMachinePool(t *testing.T) { | |||
g.Expect(foundNew).To(BeTrue()) | |||
} | |||
|
|||
func TestBootstrapTokenRefreshIfTokenSecretCleanedMachine(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have a single test with two test run? (se we can have a better test description)
func TestBootstrapTokenRefreshIfTokenSecretCleanedMachine(t *testing.T) { | |
func TestBootstrapTokenRefresh(t *testing.T) { | |
t.Run("should not recreate the token for Machines", func(t *testing.T) { | |
... | |
}) | |
t.Run("should recreate the token for MachinePools", func(t *testing.T) | |
... | |
}) | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
9d22a92
to
51a4011
Compare
What this PR does / why we need it:
If the join token was cleaned up by Kubernetes before CAPI was able to refresh its expiry, CAPI currently fails. This happened in a few cases for us, using machine pools, and required manual intervention to unset the token so that CAPI would recreate it. This change takes care to autoheal the situation by recreating the token.
Which issue(s) this PR fixes:
Seems to be the same issue as #11034.
/area bootstrap