Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SpinApps don't work with images from ghcr.io #22

Closed
macolso opened this issue Feb 2, 2024 · 6 comments
Closed

SpinApps don't work with images from ghcr.io #22

macolso opened this issue Feb 2, 2024 · 6 comments
Labels
documentation Improvements or additions to documentation

Comments

@macolso
Copy link
Contributor

macolso commented Feb 2, 2024

Apply the following SpinApp to the cluster

apiVersion: spinoperator.dev/v1
kind: SpinApp
metadata:
  name: simple-spinapp
spec:
  image: "ghcr.io/fermyon/spin-operator/hello-world:latest"
  replicas: 1

Note: You'll have to make the image public in our packages repo or publish your own image to your own account e.g.

spin build
spin registry push ghrc.io/username/app-name:latest

Observe that the SpinApp fails with ErrImgPull. The events in the pod:

│   Type     Reason     Age                    From               Message                                                           │
│   ----     ------     ----                   ----               -------                                                           │
│   Normal   Scheduled  4m13s                  default-scheduler  Successfully assigned default/simple-spinapp-7b7d69df56-c2xnn to  │
│ k3d-wasm-cluster-server-0                                                                                                         │
│   Normal   Pulling    2m41s (x4 over 4m14s)  kubelet            Pulling image "ghcr.io/fermyon/spin-operator/hello-world:latest"  │
│   Warning  Failed     2m41s (x4 over 4m13s)  kubelet            Failed to pull image "ghcr.io/fermyon/spin-operator/hello-world:l │
│ atest": rpc error: code = Unknown desc = failed to pull and unpack image "ghcr.io/fermyon/spin-operator/hello-world:latest": fail │
│ ed to unpack image on snapshotter overlayfs: mismatched image rootfs and manifest layers                                          │
│   Warning  Failed     2m41s (x4 over 4m13s)  kubelet            Error: ErrImagePull                                               │
│   Warning  Failed     2m28s (x6 over 4m12s)  kubelet            Error: ImagePullBackOff                                           │
│   Normal   BackOff    2m15s (x7 over 4m12s)  kubelet            Back-off pulling image "ghcr.io/fermyon/spin-operator/hello-world │
│ :latest"
@macolso
Copy link
Contributor Author

macolso commented Feb 2, 2024

Previous Discussion

@lann

Possibly related (same words!): deislabs/containerd-wasm-shims#191

@endocrimes

spin registry push being incompatible with older containerd releases does appear to be the main culprit - I guess for now we should adopt an internal policy of building release artifacts primarily with docker? - and switch to spin registry push when newer containerd is more prevalent.

@lann

We depend pretty heavily on some docker-incompatible (I assume?) features at this point. @vdice

@vdice

There seem to be multiple things going on...

  1. spin registry push packages a Spin app into its locked app config, 1 or more wasm layers* (per wasm module/component) and 0 or more data layers** for any static assets... so the agent pulling/loading the app would need to minimally handle wasm layers to run the simplest Spin apps. But I haven't tried directly pushing w/ Docker and running in containerd -- maybe it does just work in some capacity?
  1. I didn't think deploying straight from a spin registry push'd Spin app even worked with the usual k3d image we use (ghcr.io/deislabs/containerd-wasm-shims/examples/k3d:v0.10.0) as that uses a k3s base img w/ a containerd version less than what James mentions in getting "mismatched image rootfs and manifest layers" error deislabs/containerd-wasm-shims#191.*** But then @calebschoepp mentioned he had success -- albeit with the ttl.sh registry? (I'd be perplexed if chosen registry somehow had a part in this...)
  1. Last I checked, the current containerd shim engine doesn't yet handle the data or archive layers that we (spin's oci client) uses, or inlined data. I think we'd need to add support there if we want all types of Spin apps to run with the shim. Or maybe I am mistaken -- have we deployed more complex apps yet w/ the Spin Operator+shim combo? Eg a static site or some such?
    *currently application/vnd.wasm.content.layer.v1+wasm; waiting for upstream to define canonical value
    **or it may push archive layers if the total # of layers would exceed a max (500) and/or it may also write small content in-line into the config layer. These are both special cases that would need support in runtime engines.
    *** we've since bumped the k3s image in the shim, but that hasn't been included in a release yet... and I must admit I still haven't figured out how to build/produce the image locally 😂
    I'll do some testing so I'm more equipped to compare notes...

@vdice

Some findings from testing today:

ttl.sh works?

I did reproduce getting a simple hello world app running when using the ttl.sh registry. I am still confounded on why this works... and you'll note that it hits the same error but then says 'image already present' (which is weird as it wasn't or shouldn't have been). Though it would only work sometimes:
Working:

Normal   Scheduled  6s    default-scheduler  Successfully assigned default/simple-spinapp-86457f7d84-2vzbh to k3d-wasm-cluster-agent-0

Normal Pulling 6s kubelet Pulling image "ttl.sh/hello:10m"
Warning Failed 6s kubelet Failed to pull image "ttl.sh/hello:10m": rpc error: code = Unknown desc = failed to pull and unpack image "ttl.sh/hello:10m": failed to unpack image on snapshotter overlayfs: mismatched image rootfs and manifest layers
Warning Failed 6s kubelet Error: ErrImagePull
Normal Pulled 5s kubelet Container image "ttl.sh/hello:10m" already present on machine
Normal Created 5s kubelet Created container simple-spinapp
Normal Started 5s kubelet Started container simple-spinapp

Not working:

Normal Scheduled 18s default-scheduler Successfully assigned default/simple-spinapp-86457f7d84-7x6xg to k3d-wasm-cluster-agent-1
Normal Pulling 18s kubelet Pulling image "ttl.sh/hello:10m"
Warning Failed 15s kubelet Failed to pull image "ttl.sh/hello:10m": rpc error: code = Unknown desc = failed to pull and unpack image "ttl.sh/hello:10m": failed to unpack image on snapshotter overlayfs: mismatched image rootfs and manifest layers
Warning Failed 15s kubelet Error: ErrImagePull
Normal Pulled 13s (x2 over 14s) kubelet Container image "ttl.sh/hello:10m" already present on machine
Normal Created 13s (x2 over 14s) kubelet Created container simple-spinapp
Normal Started 13s (x2 over 14s) kubelet Started container simple-spinapp
Warning BackOff 12s kubelet Back-off restarting failed container simple-spinapp in pod simple-spinapp-86457f7d84-7x6xg_default(f5cc12ff-55a3-4e33-a824-4c201dd70454)

Attempting to use images from ghcr.io or docker.io lead to the same "failed to unpack image on snapshotter overlayfs: mismatched image rootfs and manifest layers" error and never worked, which as stated above, I believe is do to a containerd version < 1.7.7
### need containerd 1.7.7+
I saw behavior similar to the above with all of: k3d:v0.10.0, minikube and kind clusters (default/latest).  I tried the `hack/provision-minikube.sh` script but as far as I can tell that doesn't bump the containerd version.  The latest kind image uses 1.7.5: https://github.com/kubernetes-sigs/kind/blob/main/images/base/Dockerfile#L121 and thus I couldn't get `spin registry push`'d apps running there either.
Following the instructions to [build a custom kind image](https://kind.sigs.k8s.io/docs/contributing/development/#building-the-base-image), I built one w/ containerd rev'd to 1.7.12 and brought up a cluster with this image (you can too, the img is public): `kind create cluster --image vdice/kind:latest`.  Tada!  Running the hello world sample app from a ghcr.io ref works just fine:

Normal Scheduled 7s default-scheduler Successfully assigned default/simple-spinapp-5f96f88d74-rnz2s to kind-control-plane
Normal Pulling 7s kubelet Pulling image "ghcr.io/vdice/hello:latest"
Normal Pulled 6s kubelet Successfully pulled image "ghcr.io/vdice/hello:latest" in 844ms (844ms including waiting)
Normal Created 6s kubelet Created container simple-spinapp
Normal Started 6s kubelet Started container simple-spinapp

> ### still can't run apps w/ add'l non-wasm layers
> As mentioned, the shim doesn't support the other layer types that may be included in a Spin app, for instance Finicky Whiskers, with its many static assets.  The image is pulled/loaded fine but the app crash loops, presumably because of the missing/unloaded data layers
> ```
>   Normal   Scheduled  15m                default-scheduler  Successfully assigned default/simple-> spinapp-865755598c-g72mv to kind-control-plane
>   Normal   Pulled     15m                kubelet            Successfully pulled image "vdice/finicky-> whiskers:latest" in 9.307s (9.307s including waiting)
>   Normal   Pulled     15m                kubelet            Successfully pulled image "vdice/finicky-?whiskers:latest" in 1.263s (1.263s including waiting)
  Normal   Pulled     14m                kubelet            Successfully pulled image "vdice/finicky-whiskers:latest" in 1.526s (1.527s including waiting)
  Normal   Created    14m (x4 over 15m)  kubelet            Created container simple-spinapp
  Normal   Started    14m (x4 over 15m)  kubelet            Started container simple-spinapp
  Normal   Pulled     14m                kubelet            Successfully pulled image "vdice/finicky-whiskers:latest" in 1.424s (1.424s including waiting)
  Normal   Pulling    13m (x5 over 15m)  kubelet            Pulling image "vdice/finicky-whiskers:latest"
  Warning  BackOff    3s (x68 over 15m)  kubelet            Back-off restarting failed container simple-spinapp in pod simple-spinapp-865755598c-g72mv_default(f773b310-071e-4898-a32a-0518f48c53a6)
> ```
> Would definitely be curious to know if others have experiences different from mine.
> ### contingency plans?
> For Spin apps loaded directly from their `spin registry push`'d OCI references and using the shim:
> - A sufficient (1.7.7+ or 1.6.25+) containerd version is needed on the k8s cluster.  I haven't even > attempted to survey the cloud offerings but even the latest local distros (k3d, minikube, kind) don't > appear to ship with sufficient versions.  So how do we ensure success for users/customers with their > pre-existing k8s distros?
> - For full support of all Spin apps, it appears that we need to add logic to the shim to handle add'l types of layers that Spin may include in an app's OCI reference.  We'd then need a new shim release and ensure that is the version being installed on user/customer k8s clusters.

@radu-matei 

> AKS on Ubuntu runs containerd 1.7.5 — https://github.com/Azure/AKS/blob/master/vhd-notes/aks-ubuntu/AKSUbuntu-2204/202401.03.0.txt#L4

@vdice 

> For those who would like to test a k3d image with containerd bumped to the min. required version to handle wasm layers, try ghcr.io/vdice/containerd-wasm-shims/examples/k3d:v0.10.1. (Basically just a snapshot of main of the project as of writing, including the https://github.com/deislabs/containerd-wasm-shims/pull/195).

@radu

> After a quick search:
> AKS runs 1.7.5, and will support 1.7.7+ relatively soon (https://github.com/Azure/AKS/blob/master/vhd-notes/aks-ubuntu/AKSUbuntu-2204/202401.03.0.txt#L4)
EKS currently runs 1.7.2 (https://github.com/awslabs/amazon-eks-ami/blob/main/CHANGELOG.md#L1157), but there is no progress on this or response from the EKS team (https://github.com/awslabs/amazon-eks-ami/issues/1526)
for GKE I could not find the containerd version without creating a cluster (https://cloud.google.com/kubernetes-engine/docs/concepts/using-containerd)
We need to chat with the EKS and GKE people to understand the timelines for a supported containerd version. Also, containerd 2 is coming up soon, and the feature we need is 7 patch versions ago (what we need is in 1.7.7, 1.7.13 was recently released).
> The shim should work on GKE out of the box. -- https://cloud.google.com/container-optimized-os/docs/release-notes/m109#cos-109-17800-66-54_ (containerd 1.7.10)

@vdice
Copy link
Contributor

vdice commented Feb 13, 2024

This should be resolved by #48. Perhaps @calebschoepp (as original issue creator) can confirm and close?

@endocrimes
Copy link
Contributor

Less fixed and more "our samples probably work if you use k3d" - we need to document the containerd version reqs and include an example of what to do if you're using older containerd.

@bacongobbler bacongobbler added the documentation Improvements or additions to documentation label Feb 20, 2024
@bacongobbler
Copy link
Contributor

I'm going to throw this under "must have" as this seems like a fairly critical piece of documentation, and it should be pretty easy to add to our prerequisites page (if it isn't there already).

@calebschoepp
Copy link
Contributor

@vdice using the new K3d version I was able to run an app with image pointing to ghcr.io. Agreed that this is a documentation issue at this point.

I suggest that we mark this issue as closed and file a new issue to track the work of documenting the work arounds. If I don't here any push back over the next day or two I'll go ahead and do that.

@calebschoepp
Copy link
Contributor

Work is now tracked in #105

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

5 participants