-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(Fix) Updating helm charts to support version 3 windows integration #754
base: main
Are you sure you want to change the base?
Conversation
enableWindows: false | ||
# -- List of Windows versions present in the cluster. Our Kubernetes integration supported Windows versions LTSC 2019 (1809), 20H2, and LTSC 2022 | ||
windowsOsList: [] | ||
# - version: 2019 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it intended for those to be commented out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nrepai Yes, WindowsOsList is required if enableWindows is set to True.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nrepai I have updated comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. My only concern is that in the description it seems you only tested using Kubernetes v1.23:
kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
ip-10-0-11-231.ec2.internal Ready <none> 3d21h v1.23.17-eks-a59e1f0 10.0.11.231 18.208.247.135 Windows Server 2022 Datacenter 10.0.20348.1668 docker://20.10.21
ip-10-0-110-39.ec2.internal Ready <none> 22d v1.23.17-eks-a59e1f0 10.0.110.39 <none> Windows Server 2019 Datacenter 10.0.17763.4252 docker://20.10.21
ip-10-0-27-190.ec2.internal Ready <none> 22d v1.23.17-eks-a59e1f0 10.0.27.190 3.95.250.93 Amazon Linux 2 5.4.238-148.347.amzn2.x86_64 docker://20.10.17
This version is no longer supported. If possible, can we test using a v1.26 cluster?
…inux nodes. And collecting metrics from Windows nodes will continue the same approach in V2
The added selectors look good to me and match others that we have included already. Anyone else has a different opinion? |
@@ -161,13 +161,15 @@ integrations that you have configured. | |||
| kubelet.config.retries | int | `3` | Number of retries after timeout expired | | |||
| kubelet.config.scraperMaxReruns | int | `4` | Max number of scraper rerun when scraper runtime error happens | | |||
| kubelet.config.timeout | string | `"10s"` | Timeout for the kubelet APIs contacted by the integration | | |||
| kubelet.enableWindows | bool | `false` | Enable Windows monitoring and update `windowsOsList` with the list of Windows versions present in the cluster. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sure this readme is auto-generated by helm-docs
, otherwise, it will be overwritten
@@ -18,6 +18,7 @@ rules: | |||
- "services" | |||
- "nodes" | |||
- "namespaces" | |||
- "pods" # Needed for our Windows integration to be able to access KSM on a Linux node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RagalahariP From v2 to v3, the pods resource is removed for linux support for some reason as you mentioned during the meeting. The reason may be reducing the security surface.
What I mean during the meeting is if we can adding it back for windows only, that will be better .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xqi-nr after you left the meeting we agreed that this comment is a good compromise. @RagalahariP will contact @paologallinaharbur to request his review of this line change since he was involved in the project to upgrade V2 to V3 and may have historical context to validate this change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello! Yes, I'll take a look!
In general, we have never added support for Windows due to the lack of a canary environment, tooling, and proper pipelines to build the images.
Consider to specify clearly that this is experimental and might change in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, answering this particular point, it was a permission not needed anymore and therefore it was dropped.
You can reintroduce it without any issues, however if you want to ask for as few permission as possible you can add this specific line behind a "if windows.enabled=true" instead of a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was expecting a different approach.
If merged you will end-up in a mixed scenario with both V2 a V3 images running (V2 on Windows and V3 on Linux).
This could lead to multiple issues if not well configured, for example, KSM could be scraped form two different places (from a linux and a Windows node).
I would have expected to see V3 Windows images
nodeSelector: | ||
kubernetes.io/os: linux |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we sure that kubernetes.io/os: linux
is present on all user nodes?
serviceAccountName: {{ include "newrelic.common.serviceAccount.name" $ }} | ||
containers: | ||
- name: agent | ||
image: newrelic/infrastructure-k8s:{{ .imageTag }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a V2 image, why not installing the nri-bundle-v2 next to the V3 then? In this way we are mixing them up in the same chart!
@@ -161,13 +161,15 @@ integrations that you have configured. | |||
| kubelet.config.retries | int | `3` | Number of retries after timeout expired | | |||
| kubelet.config.scraperMaxReruns | int | `4` | Max number of scraper rerun when scraper runtime error happens | | |||
| kubelet.config.timeout | string | `"10s"` | Timeout for the kubelet APIs contacted by the integration | | |||
| kubelet.enableWindows | bool | `false` | Enable Windows monitoring and update `windowsOsList` with the list of Windows versions present in the cluster. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are adding the V2 image under the Kubelet component. However, the V2 is a bundle of kubelet+KSM+controlPlane component
| kubelet.enabled | bool | `true` | Enable kubelet monitoring. Advanced users only. Setting this to `false` is not supported and will break the New Relic experience. | | ||
| kubelet.extraEnv | list | `[]` | Add user environment variables to the agent | | ||
| kubelet.extraEnvFrom | list | `[]` | Add user environment from configMaps or secrets as variables to the agent | | ||
| kubelet.extraVolumeMounts | list | `[]` | Defines where to mount volumes specified with `extraVolumes` | | ||
| kubelet.extraVolumes | list | `[]` | Volumes to mount in the containers | | ||
| kubelet.hostNetwork | bool | Not set | Sets pod's hostNetwork. When set bypasses global/common variable | | ||
| kubelet.tolerations | list | Schedules in all tainted nodes | Tolerations for the control plane DaemonSet. | | ||
| kubelet.windowsOsList | list | `[]` | `windowsOsList` is only required if `enableWindows` is true. It contains list of Windows versions present in the cluster. Our Kubernetes integration supported Windows versions LTSC 2019 (1809), 20H2, and LTSC 2022 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Our Kubernetes integration supported Windows versions"
I would always specify that it is alpha
template: | ||
metadata: | ||
annotations: | ||
checksum/nri-kubernetes: {{ include (print $.Template.BasePath "/kubelet/scraper-configmap.yaml") $ | sha256sum }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nri-kubernetes in V2 branch does not read any configuration, that was introduced in V3.
I wonder if you actually compiled the V3 branch and pushed it to the old repo newrelic/infrastructure-k8s
, is that the case?
There are some discrepancies:
- kubelet/scraper-configmap.yaml is V3
- NRIA_PASSTHROUGH_ENVIRONMENT is V2
- app.kubernetes.io/component: kubelet is V3
- having just one container is definitely V2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how a user could change the config mounted? or set a proxy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may not understand the approach. However, this seems a V2 component being merged into the V3 chart.
The image used is newrelic/infrastructure-k8s
, the is just one container, and the nri-kubernetes process seems controlled by the agent. Maybe this is the fastest way to get Windows support, but there is a big risk of tech dept
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@paologallinaharbur has raised valid and serious issues that need to be addressed before we can merge the PR. I'm just marking the PR such that we do not merge it by mistake before his comments are fully addressed
Updating helm charts to support version 3 windows integration
Testing : Tested in local EKS cluster with the following node config
Dashboard: https://onenr.io/0BQ18La75jx
Original PR (with comments): #744