Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to remove node after cordon/drain #719

Open
stevehipwell opened this issue Nov 3, 2022 · 10 comments
Open

Add option to remove node after cordon/drain #719

stevehipwell opened this issue Nov 3, 2022 · 10 comments
Labels
stalebot-ignore To NOT let the stalebot update or close the Issue / PR Type: Enhancement New feature or request V2 Issues related to NTH V2

Comments

@stevehipwell
Copy link
Contributor

Describe the feature
I'd like the option for NTH v2 to actually remove the node from the cluster (e.g. kubectl delete node) when cordon/drain has completed; the lifecycle would still terminate the instance.

Is the feature request related to a problem?
The idiomatic way that controllers work is with caches and only responding to events so it's important that the node removal be an actual Kubernetes event so that other controllers know that it's happened.

Describe alternatives you've considered
n/a

@snay2 snay2 added Type: Enhancement New feature or request stalebot-ignore To NOT let the stalebot update or close the Issue / PR labels Nov 16, 2022
@vkruoso
Copy link

vkruoso commented Mar 28, 2023

That would be great. I have a k3s cluster where machines are managed via spot fleet and the only thing missing is the node removal. Was kind of expecting that it would automatically remove it by default. Open to do a PR if you can point a good approach on the implementation.

@dcarrion87
Copy link

dcarrion87 commented Jun 27, 2023

That would be great. I have a k3s cluster where machines are managed via spot fleet and the only thing missing is the node removal. Was kind of expecting that it would automatically remove it by default. Open to do a PR if you can point a good approach on the implementation.

@vkruoso I started to set this up and ran into this too. Are you getting around this in a specific way at the moment?

@dcarrion87
Copy link

dcarrion87 commented Jun 27, 2023

@stevehipwell also curious if you solve this via a custom reaper?

@stevehipwell
Copy link
Contributor Author

@dcarrion87 this is still an outstanding request with no solution.

@dcarrion87
Copy link

@stevehipwell do you manually clean up nodes every now and then? We're thinking of putting in an additional reaper.

@stevehipwell
Copy link
Contributor Author

@dcarrion87 we don't. If I had the time this would be something I'd like to contribute to NTH.

AFAIK Karpenter removes nodes it manages. So if Karpenter was part of the EKS control plane or could run on nodes it was managing that would be the best solution.

@vkruoso
Copy link

vkruoso commented Jun 28, 2023

That would be great. I have a k3s cluster where machines are managed via spot fleet and the only thing missing is the node removal. Was kind of expecting that it would automatically remove it by default. Open to do a PR if you can point a good approach on the implementation.

@vkruoso I started to set this up and ran into this too. Are you getting around this in a specific way at the moment?

At this moment we remove those nodes manually once in a while.

@dcarrion87
Copy link

dcarrion87 commented Jun 28, 2023

Yeh fair enough. Karpenter won't work for this use case. I'm going to be implementing a separate reaper alongside the NTH using a combination AWS and Kubernetes API calls. I.e. If node matches rules and is terminated then delete it.

@vkruoso
Copy link

vkruoso commented Jun 28, 2023

Yeh fair enough. Karpenter won't work for this use case. I'm going to be implementing a separate reaper alongside the NTH using a combination AWS and Kubernetes API calls. I.e. If node matches rules and is terminated then delete it.

Awesome. Please let me know if I can help in any way.

@LikithaVemulapalli LikithaVemulapalli added the V2 Issues related to NTH V2 label May 23, 2024
@migueleliasweb
Copy link

Just an idea:

Create a daemonset (or place a script in the host) that acts as the healthcheck target for the EC2 machine. This script would check if the node has been cordoned. If it has, it answers false thus making the EC2 healthchecks fail.

This will trigger NTH to drain the instance whilst AWS itself will kill the EC2 in the end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stalebot-ignore To NOT let the stalebot update or close the Issue / PR Type: Enhancement New feature or request V2 Issues related to NTH V2
Projects
None yet
Development

No branches or pull requests

6 participants