-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement etcd member management in pre-terminate hook #435
Implement etcd member management in pre-terminate hook #435
Conversation
Currently testing. Getting errors in etcd after scaling down:
ETCD cluster is not accessible, and API server is not recovering because of that |
ef72b5a
to
452143d
Compare
It seems the previous problem is fixed, but another problem appears on removal of the last machine.
There is a discussion upstream about a potentially related issue https://kubernetes.slack.com/archives/C8TSNPY4T/p1725952675583209 |
Looks like a potential fix upstream will be available to use with the new v1.8.3 CAPI patch release scheduled for today, however we have not yet bumped CAPI to v1.8.x series in CAPRKE2 |
47e23e5
to
2ed8d90
Compare
6788009
to
5257503
Compare
Watch logs and metric collection causes failures in CI |
a547119
to
9f985d0
Compare
This PR depends on #440 |
|
||
var errs []error | ||
|
||
for i := range machinesToDelete { | ||
m := machinesToDelete[i] | ||
logger := logger.WithValues("machine", m) | ||
|
||
// During RKE2CP deletion we don't care about forwarding etcd leadership or removing etcd members. | ||
// So we are removing the pre-terminate hook. | ||
// This is important because when deleting KCP we will delete all members of etcd and it's not possible |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// This is important because when deleting KCP we will delete all members of etcd and it's not possible | |
// This is important because when deleting RKE2CP we will delete all members of etcd and it's not possible |
} | ||
|
||
// Note: Removing the etcd member will lead to the etcd and the kube-apiserver Pod on the Machine shutting down. | ||
// If ControlPlaneKubeletLocalMode is used, the kubelet is communicating with the local apiserver and thus now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this comment kubeadm-specific ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this comment can be simplified.
Signed-off-by: Danil-Grigorev <[email protected]>
Signed-off-by: Danil-Grigorev <[email protected]>
- Lint fixes Signed-off-by: Danil-Grigorev <[email protected]>
When the last machine is in a deleting state, this means that cluster is removed also. In such scenario, waiting for draining is not feasible, because it is performes only when node deletion is allowed. Which is not, due to cluster removal. Cluster API prevents draining with the "cluster is being deleted" error. Signed-off-by: Danil-Grigorev <[email protected]>
Signed-off-by: Danil-Grigorev <[email protected]>
9f985d0
to
fc6f21d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks a lot for taking care of this issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix @Danil-Grigorev!
What this PR does / why we need it:
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #431
Special notes for your reviewer:
Checklist: