Add metrics to show the number of nodes being tracked #785

stevehipwell · 2023-02-28T14:22:55Z

Describe the feature
I'd like to know the number of nodes that a NTH instance is currently managing and have it exported as a metric.

Is the feature request related to a problem?
After the tagging changes we had a defect where the tags weren't being set on the instances causing our nodes to no longer be managed by NTH; this metric would have meant it was easy to see that it wasn't configured correctly.

Describe alternatives you've considered
n/a

phuhung273 · 2024-11-01T13:57:49Z

Hi @stevehipwell, do you mean that we only need this feature for SQS mode ?

stevehipwell · 2024-11-04T11:24:09Z

@phuhung273 yes this only makes sense for SQS mode.

phuhung273 · 2024-11-05T15:17:37Z

My 1st idea is counting instances based on tag. But there is only 1 mandatory tag key=aws-node-termination-handler/managed which cannot cover:

Case 1: account with multiple clusters
Case 2: cluster with both standalone instance and ASG

Came up with 2nd idea:

Step 1: kubectl get node to get all nodes belongs to that cluster (can cover 2 cases above)
Step 2: Check how many of them have key=aws-node-termination-handler/managed

May I know your thought @stevehipwell

stevehipwell · 2024-11-05T16:07:42Z

@phuhung273 the managed tag passed into NTH should be exclusive to the K8s cluster as NTH will attempt to manage all EC2 instances with that tag. How about the following metrics which would allow us to detect misconfigurations.

nth_managed_nodes - number of K8s nodes in the cluster with the NTH managed tag set
nth_managed_instances - number of running EC2 instances with the NTH managed tag

phuhung273 · 2024-11-05T17:18:41Z

@phuhung273 the managed tag passed into NTH should be exclusive to the K8s cluster as NTH will attempt to manage all EC2 instances with that tag. How about the following metrics which would allow us to detect misconfigurations.

nth_managed_nodes - number of K8s nodes in the cluster with the NTH managed tag set

nth_managed_instances - number of running EC2 instances with the NTH managed tag

Amazing @stevehipwell. Is this pseudo code good enough ?

for {
  nth_managed_nodes = (kubectl get node --filter tag=NTH)
  nth_managed_instances = (aws ec2 describe-instances --filter tag=NTH)
  sleep(5s)
}

cjerad added Type: Enhancement New feature or request stalebot-ignore To NOT let the stalebot update or close the Issue / PR labels Mar 8, 2023

LikithaVemulapalli added the good first issue Good for newcomers label Aug 19, 2024

Lu-David assigned Lu-David and unassigned Lu-David Aug 26, 2024

phuhung273 mentioned this issue Oct 29, 2024

[WIP] SQS mode: prometheus add nodes gauge #1083

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add metrics to show the number of nodes being tracked #785

Add metrics to show the number of nodes being tracked #785

stevehipwell commented Feb 28, 2023

phuhung273 commented Nov 1, 2024

stevehipwell commented Nov 4, 2024

phuhung273 commented Nov 5, 2024

stevehipwell commented Nov 5, 2024

phuhung273 commented Nov 5, 2024

Add metrics to show the number of nodes being tracked #785

Add metrics to show the number of nodes being tracked #785

Comments

stevehipwell commented Feb 28, 2023

phuhung273 commented Nov 1, 2024

stevehipwell commented Nov 4, 2024

phuhung273 commented Nov 5, 2024

stevehipwell commented Nov 5, 2024

phuhung273 commented Nov 5, 2024