Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metrics to show the number of nodes being tracked #785

Open
stevehipwell opened this issue Feb 28, 2023 · 5 comments
Open

Add metrics to show the number of nodes being tracked #785

stevehipwell opened this issue Feb 28, 2023 · 5 comments
Labels
good first issue Good for newcomers stalebot-ignore To NOT let the stalebot update or close the Issue / PR Type: Enhancement New feature or request

Comments

@stevehipwell
Copy link
Contributor

Describe the feature
I'd like to know the number of nodes that a NTH instance is currently managing and have it exported as a metric.

Is the feature request related to a problem?
After the tagging changes we had a defect where the tags weren't being set on the instances causing our nodes to no longer be managed by NTH; this metric would have meant it was easy to see that it wasn't configured correctly.

Describe alternatives you've considered
n/a

@cjerad cjerad added Type: Enhancement New feature or request stalebot-ignore To NOT let the stalebot update or close the Issue / PR labels Mar 8, 2023
@LikithaVemulapalli LikithaVemulapalli added the good first issue Good for newcomers label Aug 19, 2024
@Lu-David Lu-David assigned Lu-David and unassigned Lu-David Aug 26, 2024
@phuhung273
Copy link

Hi @stevehipwell, do you mean that we only need this feature for SQS mode ?

@stevehipwell
Copy link
Contributor Author

@phuhung273 yes this only makes sense for SQS mode.

@phuhung273
Copy link

My 1st idea is counting instances based on tag. But there is only 1 mandatory tag key=aws-node-termination-handler/managed which cannot cover:

  • Case 1: account with multiple clusters
  • Case 2: cluster with both standalone instance and ASG

Came up with 2nd idea:

  • Step 1: kubectl get node to get all nodes belongs to that cluster (can cover 2 cases above)
  • Step 2: Check how many of them have key=aws-node-termination-handler/managed

May I know your thought @stevehipwell

@stevehipwell
Copy link
Contributor Author

@phuhung273 the managed tag passed into NTH should be exclusive to the K8s cluster as NTH will attempt to manage all EC2 instances with that tag. How about the following metrics which would allow us to detect misconfigurations.

  • nth_managed_nodes - number of K8s nodes in the cluster with the NTH managed tag set
  • nth_managed_instances - number of running EC2 instances with the NTH managed tag

@phuhung273
Copy link

@phuhung273 the managed tag passed into NTH should be exclusive to the K8s cluster as NTH will attempt to manage all EC2 instances with that tag. How about the following metrics which would allow us to detect misconfigurations.

  • nth_managed_nodes - number of K8s nodes in the cluster with the NTH managed tag set
  • nth_managed_instances - number of running EC2 instances with the NTH managed tag

Amazing @stevehipwell. Is this pseudo code good enough ?

for {
  nth_managed_nodes = (kubectl get node --filter tag=NTH)
  nth_managed_instances = (aws ec2 describe-instances --filter tag=NTH)
  sleep(5s)
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers stalebot-ignore To NOT let the stalebot update or close the Issue / PR Type: Enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants