Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement distributed image store #7120

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

smira
Copy link
Member

@smira smira commented Apr 21, 2023

This is PoC/experiment.

Each Talos node runs registryd component which acts both as a registry and a fan-out service. For local requests, registryd serves manifests/blobs from the containerd content storage. For incoming requests, registryd fans out requests to other nodes (cluster members), finding the first one which has the content.

I had to disable content store deduplication, as otherwise containerd drops original layers immediately.

One not fully solved question is how to inject registryd, what I did in my testing is to inject it as the endpoint in the registry mirror scheme, so if registryd has nothing, containerd falls back to "upstream" registry/mirror. There needs some work to be done to support it for * redirects.

There is unresolved issues with images protected by authorization. At the moment registryd never resolves tags (defers it to the upstream registry), but still it might deliver images without pull secrets given the proper digest.

How to secure registryd from access outside of the cluster?

This is PoC/experiment.

Each Talos node runs `registryd` component which acts both as a registry
and a fan-out service. For local requests, `registryd` serves
manifests/blobs from the containerd content storage. For incoming
requests, `registryd` fans out requests to other nodes (cluster
members), finding the first one which has the content.

I had to disable content store deduplication, as otherwise containerd
drops original layers immediately.

One not fully solved question is how to inject `registryd`, what I did
in my testing is to inject it as the endpoint in the registry mirror
scheme, so if `registryd` has nothing, `containerd` falls back to
"upstream" registry/mirror. There needs some work to be done to support
it for `*` redirects.

There is unresolved issues with images protected by authorization. At
the moment `registryd` never resolves tags (defers it to the upstream
registry), but still it might deliver images without pull secrets given
the proper digest.

How to secure `registryd` from access outside of the cluster?

Signed-off-by: Andrey Smirnov <[email protected]>
@smira smira added this to the v1.5 milestone Apr 21, 2023
@smira
Copy link
Member Author

smira commented Apr 21, 2023

This requires something like in the machine config (first endpoint is registryd, second is my registry mirrors, unrelated):

            ghcr.io:
                # List of endpoints (URLs) for registry mirrors to use.
                endpoints:
                    - http://127.0.0.1:3172
                    - http://172.20.0.1:5004

@smira
Copy link
Member Author

smira commented Apr 21, 2023

Of course final solution should be opt-in, configurable with a single flag:

  • reconfigure CRI not to drop image layers from content store
  • reconfigure mirror endpoints to inject "registryd" endpoint as the first one

Open questions:

  • images with pull credentials - need to dig more into that... how the auth is applied to blobs/manifests, what if the layer is shared?
  • securing access to registryd from outside of the cluster

break
}

if err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems not needed, which err case is this handling?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this handles the case when we get IsNotFound for all namespaces we checked for

@smira smira mentioned this pull request Jun 5, 2023
@smira smira modified the milestones: v1.5, v1.6 Aug 2, 2023
@smira smira mentioned this pull request Aug 2, 2023
@ruifung
Copy link

ruifung commented Aug 27, 2023

  • reconfigure mirror endpoints to inject "registryd" endpoint as the first one

I just found this and had a few thoughts, how would this work if the config has something like this?

docker.io:
    overridePath: true
    endpoints:
      - https://harbor.example.com/v2/dockerhub/

And don't mirror endpoints require listing every single source separately?

@smira
Copy link
Member Author

smira commented Aug 29, 2023

@ruifung I don't think I got your question, but this is early PoC, not real implementation yet, so some details are not known

@PrivatePuffin
Copy link

  • reconfigure mirror endpoints to inject "registryd" endpoint as the first one

I just found this and had a few thoughts, how would this work if the config has something like this?

docker.io:
    overridePath: true
    endpoints:
      - https://harbor.example.com/v2/dockerhub/

And don't mirror endpoints require listing every single source separately?

Afaik we can just use this as a forward for the image store?

@smira smira mentioned this pull request Dec 1, 2023
@smira smira modified the milestones: v1.6, v1.7 Dec 15, 2023
@onedr0p
Copy link

onedr0p commented Feb 13, 2024

@smira I'm not sure if you saw this project or not but it works great on Talos. It seems like what you want to do here, maybe you'll find some ideas looking thru the source.

https://github.com/XenitAB/spegel

@PrivatePuffin
Copy link

@smira I'm not sure if you saw this project or not but it works great on Talos. It seems like what you want to do here, maybe you'll find some ideas looking thru the source.

https://github.com/XenitAB/spegel

Thats great software actually, thanks for the tip!

@smira
Copy link
Member Author

smira commented Feb 13, 2024

@smira I'm not sure if you saw this project or not but it works great on Talos. It seems like what you want to do here, maybe you'll find some ideas looking thru the source.

https://github.com/XenitAB/spegel

yes, this was the inspiration, but probably more stuff we could do easier, but this is not done yet

@phillebaba
Copy link

@smira is there something specific in Spegel that you do not want, which is causing you to implement your own embedded registry?

@smira
Copy link
Member Author

smira commented Feb 14, 2024

@smira is there something specific in Spegel that you do not want, which is causing you to implement your own embedded registry?

it's not that Spegel has anything wrong, but rather it's a generic solution, while on Talos Linux we have more control and more information, e.g. we have the discovery data. So it should be easier to implement and run it on Talos.

Also it's our philosophy to keep things simple for the end users, just flip the switch and you get a distributed image cache.

@phillebaba
Copy link

I agree with you, my thought was that Talos could embed Spegel the same way k3s does. You don't even have to use the libp2p router if you have some other way of routing the traffic. Most components are interfaces so it should be pretty easy to just replace the router with a custom implementation.

@PrivatePuffin
Copy link

I agree with you, my thought was that Talos could embed Spegel the same way k3s does. You don't even have to use the libp2p router if you have some other way of routing the traffic. Most components are interfaces so it should be pretty easy to just replace the router with a custom implementation.

I think this is a great suggestion, which is also easier to maintain.

@smira smira modified the milestones: v1.7, v1.8 Apr 4, 2024
@PrivatePuffin
Copy link

@smira is there something specific in Spegel that you do not want, which is causing you to implement your own embedded registry?

it's not that Spegel has anything wrong, but rather it's a generic solution, while on Talos Linux we have more control and more information, e.g. we have the discovery data. So it should be easier to implement and run it on Talos.

Also it's our philosophy to keep things simple for the end users, just flip the switch and you get a distributed image cache.

I've done some implementation of Spegel now, and I have to say: It basically does precisely what you describe here... Its pretty much "apply and forget".

Copy link

This PR is stale because it has been open 45 days with no activity.

@github-actions github-actions bot added the Stale label Aug 13, 2024
@smira smira modified the milestones: v1.8, v1.9 Aug 30, 2024
Copy link

This PR is stale because it has been open 45 days with no activity.

@github-actions github-actions bot added the Stale label Oct 15, 2024
@PrivatePuffin
Copy link

I'm pretty sure this isn't stale and @smira is still working/thinking-about this.

@github-actions github-actions bot removed the Stale label Oct 16, 2024
@phillebaba
Copy link

Has there been any decision made on Spegel vs. implementing your own solution?

@PrivatePuffin
Copy link

PrivatePuffin commented Oct 17, 2024

Has there been any decision made on Spegel vs. implementing your own solution?

He literally said above they had already made a decision.
Though I think they should just include Spegel and call it a day.

Copy link

github-actions bot commented Dec 2, 2024

This PR is stale because it has been open 45 days with no activity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants