Patch containerd to enable NRI before running NRI plugin #97

fmuyassarov · 2023-07-19T22:38:35Z

This commit adds an init container to enable NRI in containerd configuration file. Users installing plugin via Helm have an option to opt in for using that init container to enable the NRI before running plugin. By default, this option is disabled.

fmuyassarov · 2023-07-19T22:38:54Z

/cc @kad

fmuyassarov · 2023-07-20T08:20:06Z

/cc @marquiz

deployment/helm/resource-management-policies/balloons/templates/configmap_enable_nri.yaml

deployment/helm/resource-management-policies/topology-aware/templates/configmap_emable_nri.yaml

fmuyassarov · 2023-08-08T08:07:27Z

@kad image is now pinned to ubuntu:22.04.

klihub · 2023-08-08T10:58:29Z

deployment/helm/resource-management-policies/balloons/templates/configmap_enable_nri.yaml

+       nsenter -t 1 -m bash
+       systemctl restart containerd


Does this really work as intended ?

I wonder about that nsenter usage. AFAICT, here it starts bash in the (mount) namespace of init/pid 1. Then once that bash exits, the original patch_nri.sh script continues in its original namespaces and runs systemctl restart containerd. So what does that nsenter accomplish ?

Does this really work as intended ?

Yes

I wonder about that nsenter usage. AFAICT, here it starts bash in the (mount) namespace of init/pid 1. Then once that bash exits, the original patch_nri.sh script continues in its original namespaces and runs systemctl restart containerd. So what does that nsenter accomplish ?

We don't exit the bash of the pid1 until we run systemctl restart containerd. We enter the pid1 in nsenter -t 1 -m bash and we don't exit but instead we just kill the init container as a whole. So, nsenter is entering the mount namespace of the init process (PID 1) and launching a bash within that namespace. For having the access to the host pid, I added hostPID: true.

So is that systemctl restart containerd supposed to be run in the namespace(s) nsenter entered ?

I think that nsenter-started bash exits immediately (when running without a controlling tty) and systemctl restart containerd is run in the same namespace as the main script.

Anyway, I wonder what nsenter -t 1 -m would anyway change from the namespaces point of view. Namespaces are inherited across forks. If a process is not actively changing its namespaces, it should be running in the same namespace as its parent process, so transitively as PID 1/its init process.

Yeah, the nsenter bash looks like nop to me, too

It seems that relying on nsenter -t 1 -m wasn't sufficient to execute the systemctl... command from within a script, even though it functions well when manually executed in the current terminal session. I have now used nsenter -t 1 -m systemctl restart containerd instead, thanks to the help of @klihub .

Here is the short POC:

deployment/helm/resource-management-policies/balloons/templates/configmap_enable_nri.yaml

marquiz · 2023-08-08T17:18:02Z

deployment/helm/resource-management-policies/balloons/templates/configmap_enable_nri.yaml

@@ -0,0 +1,36 @@
+{{- if .Values.nri.enable }}
+apiVersion: v1
+kind: ConfigMap


Do we really need this configmap business, the script is not that super complex. I think it would be simpler and safer (no way to affect what get's executed in privileged mode in host pid namespace by just mangling a configmap).

I'd suggest to write the script in the init container spec, smth like

command: - "/bin/bassh" - "-c" args: - | echo Patching containerd config sed '/\[plugins.\"io.containerd.nri.v1.nri/,/^$/s/disable = false/disable = true/' -i /etc/containerd/config.toml systemctl restart containerd

I'm personally not fan of adding multi line commands into pod spec especially when the command is long like with sed here. It makes it hard to read the DaemonSet file.

and if you see the script, it is not just 3 lines but we also have some conditionals there.

I fail to see why the configmap is any more readable. What about the security concerns?

I mean you should be able to put the script in the pod spec formatted exactly like in the configmap

I fail to see why the configmap is any more readable. What about the security concerns?

what kind of security concerns do you mean?

I mean you should be able to put the script in the pod spec formatted exactly like in the configmap

Yes I know. I just didn't like having multi line, long command in the spec of the pod.

I tend to isolate things into their own entity(when possible), instead of having one big YAML.
But anyways, I don't mean to say it is incorrect, it is just my personal preference and I'm totally fine to do it in the spec.

what kind of security concerns do you mean?

The visibility and assurance about what acutally gets run in the privileged host-pid-ns container. This whole business of mounting "some" configmap into the container and running a shell script from there

marquiz · 2023-08-08T17:20:01Z

deployment/helm/resource-management-policies/balloons/templates/configmap_enable_nri.yaml

+       nsenter -t 1 -m bash
+       systemctl restart containerd


Yeah, the nsenter bash looks like nop to me, too

This commit adds an init container to enable NRI in containerd configuration file. Users installing plugin via Helm have an option to opt in for using that init container to enable the NRI before running plugin. By default, this option is disabled. Signed-off-by: Feruzjon Muyassarov <[email protected]>

marquiz · 2023-08-09T10:23:21Z

deployment/helm/resource-management-policies/balloons/templates/configmap_enable_nri.yaml

@@ -0,0 +1,36 @@
+{{- if .Values.nri.enable }}
+apiVersion: v1
+kind: ConfigMap


I fail to see why the configmap is any more readable. What about the security concerns?

marquiz · 2023-08-09T10:24:37Z

deployment/helm/resource-management-policies/balloons/templates/configmap_enable_nri.yaml

+      echo "NRI has now been enabled..."
+   else
+      echo "NRI plugin entry doesn't exist in the continerd config file."
+      echo "Doing nothing..."


Shouldn't we add the nri plugin config in this case? The containerd config might be a stub, only containing settings that differ from the defaults.

I was not sure. If [plugins."io.containerd.nri.v1.nri"] is not there, we can of course add the following defaults.
WDYT @klihub ?

# Enable NRI support in containerd. disable = false # Allow connections from externally launched NRI plugins. disable_connections = false # plugin_config_path is the directory to search for plugin-specific configuration. plugin_config_path = "/etc/nri/conf.d" # plugin_path is the directory to search for plugins to launch on startup. plugin_path = "/opt/nri/plugins" # plugin_registration_timeout is the timeout for a plugin to register after connection. plugin_registration_timeout = "5s" # plugin_requst_timeout is the timeout for a plugin to handle an event/request. plugin_request_timeout = "2s" # socket_path is the path of the NRI socket to create for plugins to connect to. socket_path = "/var/run/nri/nri.sock"

OR just put disable = false as that's the only one needed, and not pretend that we necessarily know all the nri plugin config options

Of course this only makes sense if the containerd version that is running supports nri...

marquiz · 2023-08-09T10:29:11Z

deployment/helm/resource-management-policies/balloons/templates/configmap_enable_nri.yaml

+data:
+  patch_nri.sh: |
+   #!/bin/bash
+


I think we should bail out on any error

Suggested change

set -e

marquiz · 2023-08-09T10:41:15Z

deployment/helm/resource-management-policies/balloons/templates/configmap_enable_nri.yaml

+   cp /etc/containerd/config.toml "$config_file"
+
+   # Check if the disable field is set to false under [plugins."io.containerd.nri.v1.nri"]
+   status=$(awk -F ' = ' '/^\[plugins\."io\.containerd\.nri\.v1\.nri"\]/{flag=1} flag && /disable/{print $2; exit}' "$config_file")


Wouldn't it be simpler to just grep if [plugins."io.containerd.nri.v1.nri"] exists, do sed unconditioinally and use e.g. diff to see if anything was changed? WDYT?

Plus, in either of these cases there's still the corner case that the[plugins."io.containerd.nri.v1.nri"] section is present but the the disable setting hasn't been specified explicitly. Do we need to cover this(?) 🧐

Wouldn't it be simpler to just grep if [plugins."io.containerd.nri.v1.nri"] exists, do sed unconditioinally and use e.g. diff to see if anything was changed? WDYT?

It can be simpler, but it also means extra/unnecessary restart of the containerd.

EDIT: I missed your point, about checking the diff. So, diff would be used I guess to check if restart is needed or not right? Would not it be better instead of having diff just to check if we actually need to touch the file or not? Actually for me these two options (1. what I have, 2. what you suggest) don't really make a much difference. One is doing diff before touching the file the other is after touching the file.

Plus, in either of these cases there's still the corner case that the[plugins."io.containerd.nri.v1.nri"] section is present but the the disable setting hasn't been specified explicitly. Do we need to cover this(?) monocle_face

Is that a valid config? I mean, does containerd accept a config file without disable field set to either true/false?

So, diff would be used I guess to check if restart is needed or not right? Would not it be better instead of having diff just to check if we actually need to touch the file or not? Actually for me these two options (1. what I have, 2. what you suggest) don't really make a much difference. One is doing diff before touching the file the other is after touching the file.

Sure, if the diff is empty don't update the file. I was thinking about (in pseudo'ish bash cod)

sed ... /etc/containerd/config.toml > config.tmp if ! diff -q /etc/containerd/config.toml config.tmp; then mv config.tmp /etc/containerd/config.toml systemctl restart containerd fi

Plus, in either of these cases there's still the corner case that the[plugins."io.containerd.nri.v1.nri"] section is present but the the disable setting hasn't been specified explicitly. Do we need to cover this(?) monocle_face

Is that a valid config? I mean, does containerd accept a config file without disable field set to either true/false?

Yeah, it is. The config may be an empty file or just one line if at the extrema.

fmuyassarov · 2023-08-24T22:59:26Z

closing this in favor of #107.

fmuyassarov force-pushed the init-container branch from 5aff2b9 to 6916ac0 Compare July 20, 2023 08:18

kad reviewed Aug 1, 2023

View reviewed changes

deployment/helm/resource-management-policies/balloons/templates/configmap_enable_nri.yaml Outdated Show resolved Hide resolved

deployment/helm/resource-management-policies/topology-aware/templates/configmap_emable_nri.yaml Outdated Show resolved Hide resolved

fmuyassarov force-pushed the init-container branch from 6916ac0 to 6d4ba15 Compare August 2, 2023 09:36

fmuyassarov requested review from kad, jukkar and klihub August 2, 2023 09:37

fmuyassarov force-pushed the init-container branch from 6d4ba15 to e42f239 Compare August 8, 2023 08:06

klihub reviewed Aug 8, 2023

View reviewed changes

klihub requested a review from marquiz August 8, 2023 15:52

marquiz reviewed Aug 8, 2023

View reviewed changes

fmuyassarov force-pushed the init-container branch from e42f239 to a380696 Compare August 9, 2023 09:33

fmuyassarov requested review from klihub and marquiz August 9, 2023 10:08

marquiz reviewed Aug 9, 2023

View reviewed changes

fmuyassarov closed this Aug 24, 2023

fmuyassarov deleted the init-container branch October 23, 2023 14:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Patch containerd to enable NRI before running NRI plugin #97

Patch containerd to enable NRI before running NRI plugin #97

fmuyassarov commented Jul 19, 2023

fmuyassarov commented Jul 19, 2023

fmuyassarov commented Jul 20, 2023

fmuyassarov commented Aug 8, 2023

klihub Aug 8, 2023

fmuyassarov Aug 8, 2023

klihub Aug 8, 2023

klihub Aug 8, 2023

marquiz Aug 8, 2023

fmuyassarov Aug 9, 2023

marquiz Aug 8, 2023

fmuyassarov Aug 9, 2023

fmuyassarov Aug 9, 2023

marquiz Aug 9, 2023

marquiz Aug 9, 2023

fmuyassarov Aug 9, 2023

fmuyassarov Aug 9, 2023 •

edited

Loading

marquiz Aug 9, 2023

marquiz Aug 8, 2023

marquiz Aug 9, 2023

marquiz Aug 9, 2023

fmuyassarov Aug 9, 2023

marquiz Aug 9, 2023

marquiz Aug 9, 2023

marquiz Aug 9, 2023

fmuyassarov Aug 9, 2023 •

edited

Loading

marquiz Aug 9, 2023

fmuyassarov commented Aug 24, 2023

Patch containerd to enable NRI before running NRI plugin #97

Patch containerd to enable NRI before running NRI plugin #97

Conversation

fmuyassarov commented Jul 19, 2023

fmuyassarov commented Jul 19, 2023

fmuyassarov commented Jul 20, 2023

fmuyassarov commented Aug 8, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fmuyassarov Aug 9, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fmuyassarov Aug 9, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fmuyassarov commented Aug 24, 2023

fmuyassarov Aug 9, 2023 •

edited

Loading

fmuyassarov Aug 9, 2023 •

edited

Loading