Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SpinKube x Rancher Desktop integration throws inconsistent results for more complex apps #80

Closed
divya-mohan0209 opened this issue Apr 29, 2024 · 20 comments

Comments

@divya-mohan0209
Copy link

Context:

I tried out the SpinKube x Rancher Desktop integration detailed on this page. It works seamlessly for the hello-world application detailed there & on the Fermyon blog.

However, when I tried installing some of the other complex templates and containerizing them, such as

or even templates of my own

there is inconsistent behaviour, i.e. they sometimes work and most of the time, they don't.

This is when the spin applications themselves work fine on my machine.

What is the error?

The pods enter the CrashLoopBackOff stage and are terminated with the following message: Last state: Terminated with 137: Error.

Some additional notes

  • Last state: Terminated with 137: Error - Since this error typically points to an error with memory. I tried increasing the memory assigned to pods and it didn't help.

  • I tried resetting the Kubernetes cluster and restoring Rancher Desktop to its factory settings (individually, of course). None of those approaches helped and in fact, had the opposite effect. If the templates were working before the factory reset or the cluster reset, they stopped working after. (Of course, I shouldn't have tried to fix what wasn't broken by resetting it 😆 But I did it anyway for the sake of reproducibility)

Lastly, I wasn't sure where the error was, so I filed it against this repo. I'll also open an issue against the Rancher Desktop issues GitHub repo.

Infrastructure details

  • macOS Sonoma 14.1, apple silicon chip
  • k8s versions tested on: 1.29.3, 1.28.9
  • Rancher Desktop version: 1.13.1
  • Spin version - 2.4.2
@radu-matei
Copy link
Member

Thanks for opening this, @divya-mohan0209!
This should help triage where the issues should be (my assumption is most issues would come from https://github.com/spinkube/containerd-shim-spin).

Keeping this open until we create issues for each of these.

Thanks!

@divya-mohan0209
Copy link
Author

Okie dokie, thank you @radu-matei! I shall keep this in mind next time I open issues :)

@radu-matei
Copy link
Member

Hey, @divya-mohan0209 -- just tried all all applications you referenced on a cluster with the latest release of SpinKube and the latest release of the shim, and couldn't reproduce ti with any of the applications.

The most likely cause here I think would be running an old version of the shim -- which might come pre-baked into Rancher Desktop.

Could you please run kubectl annotate node --all kwasm.sh/kwasm-node=true one more time to force KWasm to update?

@divya-mohan0209
Copy link
Author

It is listed as one of the steps, but how can I check if the version is updated? and what is the expected version it needs to be updated to?

@divya-mohan0209
Copy link
Author

Also, the thing is it runs when you first run all the applications. But when you reset Rancher Desktop and retry the steps all over again, it doesn't work.

@radu-matei
Copy link
Member

Yeah, I think it has something to do with the shim version used.
@rajatjindal has a one-liner to verify the version.

In the meantime, tagging @tpmccallum who wrote the instructions for Rancher Desktop, if we need to update them.

@divya-mohan0209
Copy link
Author

Also, retried it just now by rerunning the script. No luck.

image

@rajatjindal
Copy link
Member

I created the Rancher Desktop cluster, and noticed below (that it is indeed old version of shim).

@divya-mohan0209, could you please verify this on your cluster as well.

kubectl debug -it node/lima-rancher-desktop --image ubuntu:latest -n default -- /host/usr/local/containerd-shims/containerd-shim-spin-v2 -v

containerd-shim-spin-v2:
  Runtime: spin
  Version: 0.11.1
  Revision: 7058f601f3e92ee

@divya-mohan0209
Copy link
Author

divya-mohan0209 commented Apr 30, 2024

Just supplementing the error logs here, as well.

time="2024-04-30T12:59:26.854405727Z" level=info msg="CreateContainer within sandbox \"2939d8afcf2be62b2962cecfa7a0572f02f0da852f701bf1f9cf9260919e80a0\" for container &ContainerMetadata{Name:my-first-app,Attempt:0,}"
time="2024-04-30T12:59:26.856353602Z" level=info msg="CreateContainer within sandbox \"7b25fe8b3ff2a75d89bd1654396426f91f599aa00d7cc626089b28ffc8226dd3\" for &ContainerMetadata{Name:my-first-app,Attempt:0,} returns container id \"ca1f085e61081a93084dbe1c30e93a98ab7038e6c9ec5a1c119caabd363cd820\""
time="2024-04-30T12:59:26.856981436Z" level=info msg="StartContainer for \"ca1f085e61081a93084dbe1c30e93a98ab7038e6c9ec5a1c119caabd363cd820\""
time="2024-04-30T12:59:26.866778727Z" level=info msg="found manifest with WASM OCI image format."

time="2024-04-30T12:59:26.871746394Z" level=info msg="CreateContainer within sandbox \"2939d8afcf2be62b2962cecfa7a0572f02f0da852f701bf1f9cf9260919e80a0\" for &ContainerMetadata{Name:my-first-app,Attempt:0,} returns container id \"899a0e4ca27f947e6b02c5d431b7cd4b5fcb9bfaa7369c0978fe4c8279c33b45\""
time="2024-04-30T12:59:26.872443811Z" level=info msg="StartContainer for \"899a0e4ca27f947e6b02c5d431b7cd4b5fcb9bfaa7369c0978fe4c8279c33b45\""
time="2024-04-30T12:59:26.879040061Z" level=info msg="found manifest with WASM OCI image format."

time="2024-04-30T12:59:26.989514602Z" level=info msg="cgroup manager V2 will be used"

time="2024-04-30T12:59:26.997927102Z" level=info msg="cgroup manager V2 will be used"

time="2024-04-30T12:59:27.042846269Z" level=info msg="close_range; preserve_fds=0"

time="2024-04-30T12:59:27.043204978Z" level=warn msg="intermediate process already reaped"

time="2024-04-30T12:59:27.044068936Z" level=info msg="close_range; preserve_fds=0"

time="2024-04-30T12:59:27.044213894Z" level=warn msg="intermediate process already reaped"

time="2024-04-30T12:59:27.045217644Z" level=info msg="starting instance: ca1f085e61081a93084dbe1c30e93a98ab7038e6c9ec5a1c119caabd363cd820"

time="2024-04-30T12:59:27.045370853Z" level=info msg="calling start function"

time="2024-04-30T12:59:27.045397978Z" level=info msg="setting up wasi"

time="2024-04-30T12:59:27.046566228Z" level=info msg="starting instance: 899a0e4ca27f947e6b02c5d431b7cd4b5fcb9bfaa7369c0978fe4c8279c33b45"

time="2024-04-30T12:59:27.046550103Z" level=info msg=" >>> configuring spin oci application 111"

time="2024-04-30T12:59:27.046705936Z" level=info msg="calling start function"

time="2024-04-30T12:59:27.046745853Z" level=info msg="setting up wasi"
                              
time="2024-04-30T12:59:27.047655519Z" level=info msg="StartContainer for \"ca1f085e61081a93084dbe1c30e93a98ab7038e6c9ec5a1c119caabd363cd820\" returns successfully"
time="2024-04-30T12:59:27.046855769Z" level=info msg="writing artifact config to cache, near "/.cache/registry/manifests""
                              
time="2024-04-30T12:59:27.052521603Z" level=info msg="StartContainer for \"899a0e4ca27f947e6b02c5d431b7cd4b5fcb9bfaa7369c0978fe4c8279c33b45\" returns successfully"
time="2024-04-30T12:59:27.057878186Z" level=info msg=" >>> configuring spin oci application 111"
                              
time="2024-04-30T12:59:27.057913728Z" level=info msg="writing artifact config to cache, near "/.cache/registry/manifests""
                              
time="2024-04-30T12:59:27.060346811Z" level=info msg="writing spin oci config to "/spin.json""
                              
time="2024-04-30T12:59:27.064799728Z" level=info msg="writing spin oci config to "/spin.json""
                              
time="2024-04-30T12:59:27.111433603Z" level=info msg="error running start function: failed to resolve content for component "my-first-app""
                              
time="2024-04-30T12:59:27.112347144Z" level=info msg="error running start function: failed to resolve content for component "my-first-app""
                              
time="2024-04-30T12:59:27.114542228Z" level=info msg="no child process"
                              
time="2024-04-30T12:59:27.115303978Z" level=error msg="ttrpc: received message on inactive stream" stream=21
time="2024-04-30T12:59:27.115418061Z" level=info msg="deleting instance: ca1f085e61081a93084dbe1c30e93a98ab7038e6c9ec5a1c119caabd363cd820"
                              
time="2024-04-30T12:59:27.115589936Z" level=info msg="cgroup manager V2 will be used"
                              
time="2024-04-30T12:59:27.115984811Z" level=info msg="shim disconnected" id=ca1f085e61081a93084dbe1c30e93a98ab7038e6c9ec5a1c119caabd363cd820 namespace=k8s.io
time="2024-04-30T12:59:27.116002519Z" level=warning msg="cleaning up after shim disconnected" id=ca1f085e61081a93084dbe1c30e93a98ab7038e6c9ec5a1c119caabd363cd820 namespace=k8s.io                  
time="2024-04-30T12:59:27.116011894Z" level=info msg="cleaning up dead shim" namespace=k8s.io

@divya-mohan0209
Copy link
Author

It is!

kubectl debug -it node/lima-rancher-desktop --image ubuntu:latest -n default -- /host/usr/local/containerd-shims/containerd-shim-spin-v2 -v
Creating debugging pod node-debugger-lima-rancher-desktop-f6c9b with container debugger on node lima-rancher-desktop.
containerd-shim-spin-v2:
  Runtime: spin
  Version: 0.11.1
  Revision: 7058f601f3e92ee

But I have run the kubectl annotate node --all kwasm.sh/kwasm-node=true twice and it still doesn't update the shim.

@radu-matei
Copy link
Member

radu-matei commented Apr 30, 2024

@divya-mohan0209 could you try:

kubectl annotate node --all kwasm.sh/kwasm-node-
kubectl annotate node --all kwasm.sh/kwasm-node=true

And check the jobs in the kwasm namespace, then the shim version?

@divya-mohan0209
Copy link
Author

Yep, not looking good still.

The jobs:

pod/lima-rancher-desktop-provision-kwasm-htwv8   0/1     Unknown     0             42s
pod/lima-rancher-desktop-provision-kwasm-cs2wf   0/1     Completed   0             31s

The shim version:

 ~ kubectl debug -it node/lima-rancher-desktop --image ubuntu:latest -n default -- /host/usr/local/containerd-shims/containerd-shim-spin-v2 -v
Creating debugging pod node-debugger-lima-rancher-desktop-rvrl2 with container debugger on node lima-rancher-desktop.
containerd-shim-spin-v2:
  Runtime: spin
  Version: 0.11.1
  Revision: 7058f601f3e92ee

@divya-mohan0209
Copy link
Author

divya-mohan0209 commented Apr 30, 2024

Also, checked the kwasm logs for ya

2024-04-30T14:00:06.644825673Z stderr F {"level":"info","node":"lima-rancher-desktop","time":"2024-04-30T14:00:06Z","message":"Label removed. Removing Job."}
2024-04-30T14:00:13.891517177Z stderr F {"level":"info","node":"lima-rancher-desktop","time":"2024-04-30T14:00:13Z","message":"Trying to Deploy on lima-rancher-desktop"}
2024-04-30T14:00:13.897735427Z stderr F {"level":"info","time":"2024-04-30T14:00:13Z","message":"Job lima-rancher-desktop-provision-kwasm is still Ongoing"}
2024-04-30T14:00:13.95474801Z stderr F {"level":"info","time":"2024-04-30T14:00:13Z","message":"Job lima-rancher-desktop-provision-kwasm is still Ongoing"}
2024-04-30T14:00:17.702458053Z stderr F {"level":"info","time":"2024-04-30T14:00:17Z","message":"Job lima-rancher-desktop-provision-kwasm is still Ongoing"}
2024-04-30T14:00:17.707615387Z stderr F {"level":"info","time":"2024-04-30T14:00:17Z","message":"Job lima-rancher-desktop-provision-kwasm is still Ongoing"}
2024-04-30T14:00:24.010167056Z stderr F {"level":"info","time":"2024-04-30T14:00:24Z","message":"Job lima-rancher-desktop-provision-kwasm is still Ongoing"}
2024-04-30T14:00:26.643764558Z stderr F {"level":"info","time":"2024-04-30T14:00:26Z","message":"Job lima-rancher-desktop-provision-kwasm is still Ongoing"}
2024-04-30T14:00:26.648906849Z stderr F {"level":"info","time":"2024-04-30T14:00:26Z","message":"Job lima-rancher-desktop-provision-kwasm is Completed. Happy WASMing"

lima-rancher-desktop:/var/log/pods/kwasm_lima-rancher-desktop-provision-kwasm-cs2wf_80b08b4a-fdb9-4ee0-a5d0-e7da89625e38/kwasm-provision$ sudo tail -f 0.log
2024-04-30T14:00:24.459592432Z stdout F No change in containerd/config.toml

@rajatjindal
Copy link
Member

it seems latest version as per kwasm-node-installer is indeed v0.11.1. I will open PR to use latest version in kwasm-node-installer.

having said that, the instructions on https://www.spinkube.dev/docs/spin-operator/tutorials/integrating-with-rancher-desktop/, does refer to a different node-installer image which refers to latest spin-shim version.

@divya-mohan0209, could you please confirm what is the command you used to install the kwasm-operator? or this is the default version that comes with Rancher Desktop?

@divya-mohan0209
Copy link
Author

@divya-mohan0209, could you please confirm what is the command you used to install the kwasm-operator? or this is the default version that comes with Rancher Desktop?

I used the one in the SpinKube docs that you've listed above.

@rajatjindal
Copy link
Member

could you pls share the output of

kubectl get pods -n kwasm -o wide

@divya-mohan0209
Copy link
Author

I'll definitely do that once I login tomorrow and the app crashes. I had reset the entire thing for today's live code stream 🤣

@divya-mohan0209
Copy link
Author

Sorry for the delay in getting back! I had to re-do the steps :)

kubectl get pods -n kwasm -o wide
NAME                                         READY   STATUS      RESTARTS       AGE     IP           NODE                   NOMINATED NODE   READINESS GATES
lima-rancher-desktop-provision-kwasm-mxxfv   0/1     Completed   0              2d11h   <none>       lima-rancher-desktop   <none>           <none>
lima-rancher-desktop-provision-kwasm-5j7bt   0/1     Unknown     0              2d11h   <none>       lima-rancher-desktop   <none>           <none>
kwasm-operator-6c76c5f94b-hdb2h              1/1     Running     4 (5m1s ago)   2d11h   10.42.0.33   lima-rancher-desktop   <none>           <none>

@jandubois
Copy link

I've verified that this issue is caused by the old shim version and is fixed by using 0.14.1: rancher-sandbox/rancher-desktop#6785 (comment)

My comment there also shows how you can upgrade the shim version in Rancher Desktop, which manages shims itself and shouldn't need kwasm at all as long as you use the right RuntimeClass name in your SpinAppExecutor (spin instead of wasmtime-spin-v2). I've written about this on Slack at https://cloud-native.slack.com/archives/C06PC7JA1EE/p1714674606796679.

Note that the next release of Rancher Desktop (1.14) will have an option to install spinkube (and the spin cli), so none of the manual setup should be necessary anymore (once it is released).

@bacongobbler
Copy link
Collaborator

Thank you @jandubois for the confirmation. I just checked our documentation and it looks like we ask the user to install the latest version of Rancher Desktop. We will keep this ticket open until Rancher Desktop 1.14 has been released. We really appreciate you chiming in here and helping us confirm the issue. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants