Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace shell wrapper with a Go wrapper #700

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

jfroy
Copy link

@jfroy jfroy commented Sep 18, 2024

This PR replaces the shell wrapper used by the container toolkit installed with a Go program. This allows the GPU operator to install the toolkit on distributions that do not have a shell, such as Talos Linux.

The implementation of the Go wrapper uses side files (.argv, .envv) to provide injected arguments and environment variables, which are prepended to those supplied to the wrapper.

For environment variables, special prefixes can be used to prepend or appent to a colon-delimited list variable, such a PATH. '<' is used to prepend and '>' is used to append. For example, <PATH=foo will prepend foo to PATH.

Finally, the wrapper includes special logic to detect that it is wrapping one of the runtime programs and if so check if the nvidia kernel modules are loaded. If not, the wrapper will execute runc instead.

@jfroy
Copy link
Author

jfroy commented Sep 18, 2024

An alternative to the side files would be to embed the arguments and environments in custom ELF sections. It felt more opaque than side files, but let me know if you'd prefer that approach and I'll implement it.

@elezar elezar self-requested a review September 18, 2024 19:56
@elezar elezar self-assigned this Sep 18, 2024
Copy link
Member

@elezar elezar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jfroy one thought I would have here is whether it makes sense to rather pull this functionality into the main runtime and feature gate it or obfuscate it behind a mode flag. We already produce mode-specific binaries for cdi and legacy for use on the operator and this could be an extension of that.

Not sure what @klueska's thoughts are on this though.

@jfroy
Copy link
Author

jfroy commented Sep 18, 2024

The nvidia module detection logic seems like a natural fit to be moved in the runtime program. It's less clear that the argv and envv stuff would fit there -- the purpose of that indirection is to allow the gpu-operator to customize the installation of the toolkit to the cluster, and to its parameters (e.g. host-provided driver vs driver container, etc).

--

As a third alternative implementation, the wrapper could look for a dot-config file recursively up the directory tree, à la .gitconfig. This would be a good fit as the wrappers mostly all have the same argv and envv computed by the toolkit installer.

@elezar
Copy link
Member

elezar commented Oct 30, 2024

As a third alternative implementation, the wrapper could look for a dot-config file recursively up the directory tree, à la .gitconfig. This would be a good fit as the wrappers mostly all have the same argv and envv computed by the toolkit installer.

It may not have to be up the tree. We could create a single .env file in the "toolkit" root and read it in the current folder?

By the way, I have created #763 which should allow us to simplify this implementation further since we would not have to deal with command line arguments in addition to envvironment variables.

if err := scanner.Err(); err != nil {
log.Fatalf("failed to read argv file: %v", err)
}
return append(env, os.Environ()...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't later envvars override earlier ones? Should we not reverse the ordering of this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In golang's1 and glibc's2 implementations, the first entry wins. I assume this is true for other languages (they probably all followed glibc). POSIX3 is mute on the required behavior.

Footnotes

  1. https://cs.opensource.google/go/go/+/refs/tags/go1.23.2:src/syscall/env_unix.go;l=34

  2. https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/getenv.c;hb=HEAD

  3. https://pubs.opengroup.org/onlinepubs/9799919799/functions/getenv.html

@jfroy
Copy link
Author

jfroy commented Oct 30, 2024

As a third alternative implementation, the wrapper could look for a dot-config file recursively up the directory tree, à la .gitconfig. This would be a good fit as the wrappers mostly all have the same argv and envv computed by the toolkit installer.

It may not have to be up the tree. We could create a single .env file in the "toolkit" root and read it in the current folder?

Assuming all the tools can use a single set of environment variables, that would work,

By the way, I have created #763 which should allow us to simplify this implementation further since we would not have to deal with command line arguments in addition to envvironment variables.

If I remember correctly, this means the patch could drop the ability to set arguments and only handle environment variables.

@jfroy jfroy force-pushed the go-wrapper branch 2 times, most recently from 753b731 to 79a3a99 Compare October 30, 2024 15:58
Some platforms and Kubernetes distributions do not include a shell. This
patch replaces the shell wrapper scripts with a small Go program.

Signed-off-by: Jean-Francois Roy <[email protected]>
This patch modifies the the container toolkit installer, used by the
GPU operator, to use the new Go wrapper program.

Signed-off-by: Jean-Francois Roy <[email protected]>
Signed-off-by: Jean-Francois Roy <[email protected]>
This will be handled by the runtime itself.

Signed-off-by: Jean-Francois Roy <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants