Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

file access conflict for shim binaries #66

Open
phyrog opened this issue Mar 19, 2024 · 8 comments
Open

file access conflict for shim binaries #66

phyrog opened this issue Mar 19, 2024 · 8 comments
Labels
area/node-installer kind/bug Something isn't working

Comments

@phyrog
Copy link
Collaborator

phyrog commented Mar 19, 2024

In some cases, the node installer exits with this error:

ERROR failed to install error="failed to install shim 'spin-v2': open /mnt/node-root/opt/kwasm/bin/containerd-shim-spin-v2: text file busy"

This happens when containerd has already started a shim process with that binary and the installer wants to overwrite the binary file.

I believe we need to change the behavior of always copying the shim binaries and then detecting if something changed to first detect a change and then copy the binary:

binPath, changed, err := shimConfig.Install(fileName)

@phyrog phyrog added kind/bug Something isn't working area/node-installer labels Mar 19, 2024
@phyrog
Copy link
Collaborator Author

phyrog commented Mar 19, 2024

Generally this begs the question if we even can update shim binaries with the same name (i.e. same major version)

@tillknuesting
Copy link

tillknuesting commented Mar 24, 2024

Can we rename the old binary and place the new binary after it?

@phyrog
Copy link
Collaborator Author

phyrog commented Mar 25, 2024

Maybe, I'll try it later. Alternatively we could include a hash of the file in the name and only need to update the containerd config with the new binary path.

@toksikk
Copy link
Collaborator

toksikk commented Mar 25, 2024

@phyrog and I discussed the problem and decided to append a hash to the binary as an interim solution. Unfortunately, we couldn't come up with a good solution to avoid accumulating trash binaries after multiple updates.

According to https://iximiuz.com/en/posts/implementing-container-runtime-shim/ and some tests using the test manifests from #65 (comment), the shim process only lives if there is a workload, we can't just delete any unused binary by simply using lsof to check if there is a process still accessing it.

This issue still needs to be addressed.

@rajatjindal
Copy link
Member

when we update the shim-binary, what is the expected (and current) behavior of workloads which are currently running?

  • should they be restarted? (because we have new shim version) or
  • should they continue to use current shim version until restarted by an external event (e.g. k8s pod restart/reschdule etc).

in my opinion, we should force the workloads to restart somehow when we have new version. but also look forward to hearing what others think about this and what is the current behavior.

@phyrog
Copy link
Collaborator Author

phyrog commented May 1, 2024

For major version updates, existing workloads should be unaffected (runtime class name includes the major version), minor/patch version update handling is currently undefined, afaik.

As for what the handling should be: I don't know, but I suppose we could make the behavior configurable via the CRD.

@devigned
Copy link

devigned commented May 1, 2024

We had considered this question a while back in https://github.com/deislabs/containerd-wasm-shims/blob/main/proposals/2022-09-14-kubernetes-versioning.md. The bin names should probably have the version encoded in the file name. Then the only concerns are how to clean up old versions and when to update the containerd config mapping the runtime handler to the version specific file name.

Replacing the same file name shim-{version} with another of the same name, but containing different compiled code seems misleading. If that is a goal, then shim-{sha} seems more appropriate.

@voigt
Copy link
Contributor

voigt commented May 20, 2024

I think I "accidentally" addressed the issue in #68.

I'm renaming the shim binary to containerd-shim-<shim-resource-name>. As the shim resource in Kubernetes is unique, the binary name will be unique as well.

IMO this approach makes debugging easier, as the relation shim-resource-in-cluster <-> binary-on-node is clear.

This technically allows to install the same shim twice on the same node, but address it with a different runtimeclass (cool if you want to run a debug build of the same version).


Edit: Argh, it is too late ;) I'm rereading the conversation and realize, that it does not actually fix the "update" scenario:

Shim Resource installed:

name: spin-v2
source: https://github.com/spinkube/containerd-shim-spin/releases/download/v0.14.1/containerd-shim-spin-v2-linux-aarch64.tar.gz

Updating Shim Resource with new version of shim:

name: spin-v2
https://github.com/spinkube/containerd-shim-spin/releases/download/**v0.15.0**/containerd-shim-spin-v2-linux-aarch64.tar.gz

Will then lead to the mentioned error.

However, in a prod scenario, I'd rather run two versions of the same runtime and switch Apps one by one to the new version, as this leaves me with a roll-back option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/node-installer kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants