Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion thread for the next version of s6-overlay #358

Open
skarnet opened this issue Nov 28, 2021 · 50 comments
Open

Discussion thread for the next version of s6-overlay #358

skarnet opened this issue Nov 28, 2021 · 50 comments

Comments

@skarnet
Copy link
Contributor

skarnet commented Nov 28, 2021

I am in the process of writing a new version of s6-overlay. It is slowly shaping up, and although there is still a lot of work (and then a whole lot of testing!) ahead, I'm starting to feel happy about the result.

It's all the existing functionality of s6-overlay, modernized with current features of skaware, using s6-rc so people can migrate and take advantage of better dependency management if they choose to (but the current service format is obviously still supported), and made much smaller, faster, and readable.

The drawback is that it's using shell scripts in several places, so it adds a dependency to a shell. I don't think it's a big drawback; if there are container images that do not include a shell, I will provide a package that installs a minimalistic standalone /bin/sh.

Anyway, I am opening this issue because I will have a lot of questions for @jprjr, and maybe s6-overlay users can answer as well. As everyone knows by now, I can handle the programming side of things, but the ops / Dockerfile / packaging / installation side isn't my area of expertise and I will need advice on how to make things run smoothly.

My first question is for s6-overlay users; it's about the logutil-newfifo utility. Does anyone actually use this? AIUI, it does two completely different things:

  • It creates a fifo with given permissions. This can easily be replaced with mkfifo and (in some cases) chown, or s6-mkfifo and s6-chown to avoid depending on anything coreutils-like. If it's just about creating a fifo, I don't think such a wrapper is necessary.
  • It stores the reading end of the fifo into a fdholder. But it does so not with the identifier that the user provides, but with a random identifier based on the user argument, and does not give the complete identifier back to the user! So it's unusable, because there is no way to retrieve the stored fd.

The way I see it, the custom fdholder is useless, and logutil-newfifo is only a bloated wrapper around mkfifo. Do people actually use the fd-holding part?

I want to scrap logutil-newfifo entirely, as well as the custom fdholder. I believe this is an early experiment from @glerchundi to store fifos into a fdholder, based on conversations we had at the time; but as is, it clearly doesn't work. s6-rc comes with its own automatically managed fdholder, invisible to the user; I believe that people who need reliable logging pipes will benefit from migrating their services to it in the new version of s6-overlay.

My second question is really for @jprjr and is about overlay isolation. Currently we install a lot of stuff into /usr/bin; I find this pretty intrusive for an overlay. Aesthetically, I would much prefer having the overlay stuff out of the way, especially all the internal stuff that users aren't supposed to interact with.

I really want to install skaware under /package, with access to binaries via /command. Not for slashpackage advocacy here, but simply because no existing Docker image uses slashpackage so everything would definitely be completely out of the way and guaranteed not to conflict with anything else. I also want to move the s6-overlay scripts themselves to some other location (under /package or not). Of course, for compatibility, we would need to provide symlinks in /usr/bin (and a /bin/execlineb symlink as well), but that would be an optional layer; people could remove all reliance on the /usr/bin absolute paths to s6-overlay commands, and just trust their PATH, and then they could do away with the symlink layer. Moving stuff out of /usr/bin would also eliminate any and all usrmerge woes.

I have already started meddling with paths, doing stuff in /run instead of /var/run, and ensuring there is a /var/run -> /run symlink for compatibility. It simplifies a few things, and even allows for automatically mounting a tmpfs on /run in the read-only-root case when the user has forgotten to do it in the Dockerfile. Configuration stuff goes under /etc/s6-overlay, if there's a conflict then it's 100% on the image.

This is becoming a brain dump, I'll stop here; I just need an answer to the logutil-newfifo question and hope to open the discussion on moving stuff around with compatibility symlinks.

@GreyXor
Copy link

GreyXor commented Nov 29, 2021

Hello @skarnet,
Being myself a chronic and punctual user of s6-overlay for my docker images, It's a pleasure to see that this nice project is going to have a new version. I will follow this closely and bring my help for contributions, tests, ideas and feedbacks.

I did not need to use the logutil-newfifo wrapper. If needed I would use mkfifo.

@jprjr
Copy link
Member

jprjr commented Nov 29, 2021

Awesome! I'm excited to see what this new version looks like.

Re: packaging, the current layout of the repo could be re-worked to be more obvious (we have a root folder named "builder", and in that is "overlay-rootfs", which is where the actual code is). Having something that more closely resembles the layout of other skarnet packages (a "src" folder, "tools", etc) would be great, along with a configure script for local installation. I think one issue distros have/had is they'd prefer not to use our builds-with-binaries-included. We had the "nobin" version of the package, but that still creates a weird one-off type of package for them.

As far as CI/CD, we could probably simplify the binary-building. If I recall correctly, some of the earlier issues with Travis were:

  • timing/timeouts
  • Docker was unavailable

The timing/timeouts issue is basically gone now - we used to use qemu for the non-x86 builds (to generate the sysdeps folders), which was pretty slow and the main cause of timing, but that's been long-fixed. I don't 100% know if Travis ever got great Docker support (a lot of the CI/CD was written when Docker was still very new), but GitHub Actions definitely has way better support. We could likely switch to just building everything, from a single repo, in Docker, using a standard alpine image.

I agree on scrapping logutil-newfifo - I think I tried using it a handful of times, then resorted to just calling s6-mkfifo and s6-chown.

As far as slashpackage - the way I've always seen it, placing binaries/scripts into /usr/bin is preferable, because I just don't trust that PATH has been set correctly, or accidentally cleared, etc. My (anecdotal) experience is a lot of people building Docker images aren't that familiar with Unix concepts, and we'll likely get issues opened where somebody's PATH has been wiped out somehow. Whether the binaries are actually in /usr/bin, or they're somewhere else and then symlink'd in doesn't make a lot of difference to me.

As far as conflicts go - since users will likely expect all the s6-related programs to be in their PATH, if there is a conflict with an existing program/package, it will still be an issue. I don't think the physical location will address that problem (and it hasn't really been a problem since execline dropped the import program).

That whole usrmerge thing has been a huge source of frustration, so I would vote yes to slashpackage with compatbility symlinks in /usr/bin, and to try to stay out of /bin entirely.

I've got one question: How hard would it be for the s6-overlay execline scripts have a template for the #!/ line?

If we made it so they could be updated to either #!/bin/execlineb or #!/command/execlineb at install-time (assuming we create say, a configure script), we could have the distributed, ready-to-run tarball use the #!/command/execlineb variant, and distros could package s6-overlay to use their preferred option.

The whole reason for the binary installer is for dealing with /bin/execlineb and usrmerge. All our other scripts go into /usr/bin, and tar will wind up removing /bin in the case where /bin is a symlink and we're extracting that /bin/execlineb symlink. So, I'd really like to find a way to not require /bin/execlineb - so we can keep it out of the tarball - and therefore drop the binary installer. So if we could build and install the binaries as slashpackage, and install the s6-overlay scripts with the slashpackage-based paths for the execline interpreter, that would seem ideal.

As far as /bin/sh goes, while I think it's super cool to be able to build images without a shell, I think in practice, 99.999% of images are going to have a shell. So if having parts of s6-overlay require /bin/sh makes it simpler/easier, I'm fine with it.

@skarnet
Copy link
Contributor Author

skarnet commented Nov 29, 2021

Awesome! Trying to address everything, in no particular order.

  • I agree conflicts may exist no matter where binaries are installed, but I'd rather have PATH conflicts than overwrite /usr/bin/very-important-binary on a user's image. :-)
  • I have begun gathering your add-ons (justc-envdir, justc-installer and s6-overlay-preinit) in a new package called s6-overlay-helpers. The current implementation of s6-overlay-preinit is pretty dangerous (do you really want to have an invokable suid root binary hanging around?); so I have rewritten the early init so that:
    • s6-overlay-preinit is now basically a glorified execline if command that runs its child block as root and drops privileges afterwards. We can run a whole script as root now, even in USER mode, which makes it easier to maintain.
    • It will refuse to run unless it's pid 1, so it can only be invoked as pid 1 at the start of the container and it's harmless to have around.
    • /init now basically reads s6-overlay-preinit { preinit } { stage0 }, with preinit running as root, and stage0 running as whatever.
  • You seem to be saying that if execlineb is not in /bin, then we can do away with justc-installer entirely? That would be fantastic. With slashpackage, execlineb would be in /command, and nothing would ever touch /bin. I don't think it's worth it to make it configurable; we can hardcode policy on the stuff we provide in the overlay. Users who want to use execline can still do so; they can either use #!/command/execlineb, or install the compatibility symlinks and use #!/usr/bin/execlineb. Yes, that means that existing user scripts may need updating, but it's a single sed invocation.
  • About PATH: PATH is set by init and inherited by the whole supervision tree as well as the stage 2 script, so it will be correct in all the places where our binaries or scripts are called, and if it is not, it's a bug (and we can fix it). The only time where PATH definition might be an issue is if someone logs into the container and their shell doesn't have the correct PATH (e.g. doesn't contain /command). That sounds like something that can be documented in a FAQ, possibly with a Dockerfile excerpt to ensure all shells have the correct PATH. And in the worst case, users can always install the /usr/bin symlinks.
  • So the way I see it, there would be three tarballs for users to install depending on their needs:
    • An architecture-dependent set of compiled skaware binaries (plus s6-overlay-helpers), that expands in /package and /command. Most people will use this, except those who don't want prepackaged binaries.
    • An architecture-independent overlay, containing scripts and configuration files, that expands in /package and /command (for scripts) as well as /etc/s6-overlay (for configuration files) and possibly /init if we don't want to change the Dockerfile instructions. That is the overlay itself, it only depends on the presence of the binaries, whether they are provided by the previous tarball or by the image.
    • An architecture-independent forest of symlinks, that expands in /usr/bin. This one is completely optional: for paranoid users, for users who are calling s6 or s6-overlay programs with hardcoded /usr/bin paths and need it to transition, and for people who are using an image where s6 binaries are available in /usr/bin (they shouldn't, they should be in /bin, but if there's one benefit to usrmerge, it's that it makes it not matter).
  • Even if the overlay expands to different locations, existing user files in /etc/services.d, /etc/cont-init.d etc. are obviously still supported (they're called "legacy services" in the init messages, because we all love to hate that kind of nagging).
  • I will delay writing my standalone-shell-builder package if in practice all images have a shell - but if we get a report that some shell-less image attempted to use s6-overlay and didn't work, it's something that I will absolutely fix.
  • I plan to make s6-overlay buildable without Docker and with a very reasonable amount of assumptions on the build machine; I want to be able to build it from my own VMs that don't have Docker, and if you want to use Docker or rewrite CI/CD for it, I intend to make the Dockerfiles or Travis files super simple to write, basically call make with the right options.
  • Building s6-overlay is not about compiling C, but about integration:
    • Fetch a cross-toolchain for the target arch (I have new ones on skarnet.org, we can host them on GitHub if you want)
    • Fetch and cross-build skaware
    • Create the tarball with the skaware binaries
    • Create the tarball with the overlay
    • Create the tarball with the symlinks
  • Because of that, the layout can't be similar to regular skaware packages, with a configure script, etc. However, it should be similar to (but a lot simpler than) lh-bootstrap, which is also an integration package. I will try hard to make everything controllable in a single Makefile, so your integration scripts can just be about invoking make.

What do you think?

@jprjr
Copy link
Member

jprjr commented Nov 30, 2021

This all sounds awesome!

  • The "glorified if block that runs as root" sounds like a great replacement for the preinit.
  • Now that I've thought about it more, agree on using #!/command/execlineb as the interpreter line. If a distro wants to package the scripts with a different line, they can just use sed at install-time to modify them.
  • Being able to get the work done with just make sounds awesome. Don't care too much about where/how the cross-toolchain is hosted, maybe have both on skarnet.org and github as a backup?
  • The 3-tarball route sounds like a good solution, too - that way we don't have the whole "the one without binaries is the exception" situation we have right now, we instead have a definitive "this is the source" tarball that distro maintainers can pull down.

I mean long-story short, I'm in full agreement here.

@robinroestenburg
Copy link

My first question is for s6-overlay users; it's about the logutil-newfifo utility. Does anyone actually use this?

@skarnet FWIW I have been using that in all containers that we use at @code4me, but only to setup log fifos for the correct user. It is always used like this in a cont-init.d script:

if { logutil-newfifo -o nginx /var/run/itrp/nginx-access-log-fifo }
if { logutil-newfifo -o nginx /var/run/itrp/nginx-error-log-fifo }

The logutil-newfifo was doing a bunch of things that seemed important, so I just used that to be sure I was 'doing it right'. No problem for me to just use s6-mkfifo and s6-chown 👍

@jprjr
Copy link
Member

jprjr commented Nov 30, 2021

We could probably just shorten/simplify logutil-newfifo, and remove all the fdholder parts - it's likely the users that are using it, aren't using any of the fdholder stuff that logutil-newfifo does (since that part is broken anyway). That way we keep existing scripts working. Maybe throw in a nag message saying it's deprecated?

@dreamcat4
Copy link

hey sorry i missed this conversations. I have not an opinion to give here exactly, but it instead more of an observation, which is that docker project themselves does already have created it's own convention for where to put special files into a running container. For basic infrastructure stuff like dns or whatever else is needed. And where does docker put them? I believe they are .dot hidden files inside of the / root folder. like /.dockerenv

So if you were to view s6 here as a similar type of an infrastructure. (which is what i personally believe). Then at least to me it would make sense to put almost everything (for the binaries and toolchain). Into a hidden root folder called /.s6. Or something like that along those lines. And only have a minimum of all other required files sitting actually outside of that. Those things such as /init and /etc/s6 or whatever else is required for these things to work properly.

Now i could be wrong about all this. But I am just observing for what docker itself does behind the scenes. Of course now docker --> becomes OCI open container images. So those have developed into their own open standard. So maybe they can be consulted too. And people over there can give better guidance than myself. Or if there are any other applicable conventions or standards that are relevant here in respect to these new s6 toolchain.

Personally I am just really happy to hear about this new ongoing development works! It is very helpful thanks so much for getting around to it.

Elsewhere on the internet I also notice now that void linux is getting suite-66 higher level user facing tools. Thanks to some porting from the arch linux versions. Which makes me wonder if something like suite-66 ontop would be any value here later on. After you do your parts. The reason I bring it up is just to ask if it's actually of any value to keep compatibility with that other higher level user tools.

In terms of specific suggestions: well sorry i don't have much. Because the existing s6 overlay was pretty good. But am looking forwards to those future improvements that comes along with the rewrite. (Don't mind updating my container images!)

@skarnet
Copy link
Contributor Author

skarnet commented Nov 30, 2021

s6-overlay is an overlay at the user level, so it's still part of the final user image; I don't think it would be proper to hide everything under a dot-directory. You want to access binaries in the overlay via PATH, after all, and having a dot-directory in your PATH is questionable. I think it's good policy for Docker itself, because whatever state it needs to keep is not officially part of the image so it needs to be as carefully hidden as possible, but not for s6-overlay.

66 is Obarun's higher-level user interface to s6, I didn't know that Void wanted to adopt it; good for them if they do, but I don't think it's suited to containers, which generally have a much lower complexity level than real machines and don't need a fancy UI. If anything, I'll probably port s6-overlay to the next version of s6-rc when it's out (probably late 2022), but even then there should be little gain.

@dreamcat4
Copy link

  1. ah yes sorry to confuse. the official void core team have not made any commitments yet to using the s6. at least not yet.

what i meant there is that mobinmob has created some alternative service infrastructure for void. As an additional side repo void-66. Which can be added to void. It works alongside the existing runit service trees. And is compatible with them (a mixture). At least to the level of granularity of an entire service tree. So this is a good situation for others to try it out now on the void distro. with a lower friction / less difficulty. But yes hopefully that will continue to gain popularity over there.

Hmm... well yes this is nothing to do with void here. And my original question was: me wondering if suite-66 being applicable for inside containers. For maybe an easier service configuration (not knowing how the new suite-66 tools works myself). However now I am thinking there is a different usage for it: we can have it on the outside, and managing the running of the containers. This is in fact the redhat strategy for systemd + podman. Whereby you write systemd service file, it launches the podman container (and podman is a direct replacement for docker here, in case you were not aware of it yet)...

So that would be pretty interesting too imho. Of course it does not effect your porting efforts for s6-overlay here. It's just something of a more general interest. Because quite frankly all these people running around using systemd for podman (for switching over to podman)... it did not sound so appealing to me! hehe. And maybe the obarun suite-66 could be useful for managing that side of things.

@skarnet
Copy link
Contributor Author

skarnet commented Dec 2, 2021

Hey @jprjr,
Can you please remind me exactly what the point of fix-attrs is? What was the problem that it's supposed to solve?
Because as is:

  • I don't see how it can work with a USER directive, if we're trying to change stuff to root while running as a normal user it's going to crash and burn, so I must not be seeing the big picture.
  • More interestingly, depending on what exact problem it's currently solving, there is a distinct possibility that the new way of starting the containers and organizing the files will sidestep this problem entirely and that we can get rid of the whole fix-attrs shtick, which would be pretty nice.

Thanks!

@jprjr
Copy link
Member

jprjr commented Dec 2, 2021

@glerchundi wrote most of fix-attrs, if I recall correctly. I'll be honest, I don't really use it, I tend to just write scripts in cont-init.d.

I think the intended use-case is volumes - if you're mapping in a volume, odds are it's going to have nonsense permissions (from the container's perspective). So say you want to have your nginx logs go to a volume on the host, you can use fix-attrs to ensure the log folder is always writable by the nginx user. You can ensure a configuration file is always readable, and so on.

It definitely won't work with the USER right now. But that new "run-things-as-root-in-an-if-block" program could be a solution.

@skarnet
Copy link
Contributor Author

skarnet commented Dec 2, 2021

Well that's the thing, it cannot be a solution. There is only one root block, the preinit, and as the name implies it runs super early, like, first thing at init time, the supervision tree hasn't been started yet, printcontenv doesn't work yet, working copies of read-only stuff haven't been made in /run (previously /var/run) yet, etc. and there's no way the current fix-attrs files would work in preinit.

I'd rather keep it as is and mention that it won't work with USER containers, or, better yet, scrap it entirely and document that the user Dockerfile should make sure that host volumes are mounted with the proper permissions. Can I mark it as deprecated?

@jprjr
Copy link
Member

jprjr commented Dec 2, 2021

I'm ok with deprecating it. Getting rid of it also lets us get rid of that forstdin fork, fix-attrs is the only thing using it.

The only thing fix-attrs does that's... maybe? useful is being able to provide fallback UID/GIDs if the requested user/group doesn't exist, but I doubt anybody uses that.

@skarnet
Copy link
Contributor Author

skarnet commented Dec 3, 2021

Oh, I haven't even mentioned that yet :-) but with the new version you'll be able to get rid of:

  • justc-forstdin (the overlay works with new versions of execline, and some parts are in shell)
  • justc-installer (no /bin symlink shenanigans)
  • s6-overlay-preinit (replaced by the s6-overlay-helpers package containing a s6-overlay-suexec binary)
  • skaware (building skaware is done by the Makefile in the s6-overlay package instead)
  • socklog (new s6 includes a s6-socklog binary). On the flip side, we will probably have to rewrite socklog-overlay for people who want a syslogd service.
  • socklog-static (s6-socklog is a part of the arch-dependent s6-overlay tarball)
  • justc-envdir (the binary will be provided in s6-overlay-helpers package)
  • musl-cross-make (building s6-overlay will fetch prebuilt toolchains from the web, toolchain building should be completely orthogonal to this; we can host the useful toolchains on github if necessary, and I can probably upload my toolchain builder package at some point)

When I say "simpler", I mean it. ;-)

@skarnet
Copy link
Contributor Author

skarnet commented Dec 5, 2021

I just pushed everything I have to the v3 branch of s6-overlay.

It builds. (I had to fix a few quirks in the skaware build system for this; this will require me to release new packages, but in the meantime, building with the latest git commits works. I need to make toolchains for more archs, but everything builds properly on the meager amount of supported archs I have.)

It is, however, wildly untested. Testing will happen over the next few weeks; I am submitting this early version so that @jprjr can start reworking the CI/CD stuff. I have written a little building primer here; as promised, it's just a call to make with some flavour variables. A builder should be easy to whip up, the requirements are here. (My main goal was to be able to build it directly on my server without making a container for it.) Any basic dev image should do.

Do not try to use a custom toolchain unless you know exactly what you are doing. clang isn't supported yet, may happen in a later version.

The main README.md file is inaccurate for v3 on several points. We'll fix them one by one in time.

@jprjr, what would make testing a lot easier is if I had access to a host with a Docker daemon running and an endlessly breakable, wipable and reinstallable container image. Do you think you could find that for me?

Edit: forgot to mention that the s6-overlay-helpers package, which provides justc-envdir and s6-overlay-suexec, is now hosted here; it uses a skaware build system so these binaries are all included in the arch-dependent build.)

@jprjr
Copy link
Member

jprjr commented Dec 10, 2021

Hey! Just providing some early feedback:

  • that build process was so easy! I suspect the CI/CD is going to be very simple, I'll try to get some work done on that in the next few days / this weekend.
  • I also really like the layout of /package, and being able to get the various package versions via filesystem
  • having everytihng just build in one repo is also very nice and simplifies everything so much, holy cow.

Regarding a Docker host, I'll see what I can find. I think I have a spare Raspberry Pi around, I could install Docker on that. Send me a public SSH key (email is fine), I'll get a fresh install of Ubuntu + Docker and get that all configured.

@skarnet
Copy link
Contributor Author

skarnet commented Dec 28, 2021

(I eventually managed to get Docker working on an Alpine VM, and practice with it. Thanks for the offer though!)

Hey @jprjr and folks,

I hope the holidays find y'all well. Testing had been going great and after a few fixes and modifications, everything - or, more precisely, almost everything - appears to be working.

The exception is USER, which has been a pain in the ass from the start. So today, let's talk about USER.

One of the best features of Unix is privilege separation - running different programs in different privilege domains, i.e. different uid/gids. Containerization is but an extension of privilege separation: different services are run in different namespaces so unwanted interaction is made even less possible.

When Docker - or any container manager, fwiw - is used to run a single process, the USER directive makes sense: not only is the process isolated from the host, but it doesn't even have root privileges in its own namespace, so any potential damage is mitigated even more. All good.

But although one-container-per-process was a policy that Docker tried their damnedest to make a thing, it never really caught on, and for good reason: this policy sucks. It's much heavier a system organization than it needs to be, and most importantly, it doesn't accurately map the natural and intuitive organization for privilege separation, which is one container per service.

One-container-per-service is, overwhelmingly, the way people use Docker nowadays, and it's a good policy. And under that policy, the USER directive still kinda makes sense: even if you have several processes in your service, it's still one logical unit, nobody wants to bother with privilege separation inside of that unit, so running everything as the same USER is fine.

s6-overlay, like other inits for Docker, aims to support that use case. If you're only running one application in your container, with a few support services that are all dedicated to the application, it's nice to have a real init, and a supervision tree, but privilege separation is secondary, and running the whole supervision tree as the USER is a bit hackish (I had to modify s6-linux-init significantly to make it work) but reasonable; so, despite it requiring effort, it makes sense to bend over backwards a bit so that it works. I still need to iron out a few kinks but everything should be smooth as silk by the end of the week.

However, the fact that we are now running real inits inside containers has had a predictable consequence: inevitably, we are now running whole systems inside containers. Not only single applications that need a few support services, but quasi-VMs with real initialization sequences, a syslog daemon, and two or three kitchen appliances. And this is where my teeth start grinding a bit.

I am totally okay with running whole systems inside containers. A container is a VM-lite; it's no more of a problem to run a full system in a container than it is to run one in a VM. s6 supports this; s6-overlay supports this; base images support this; all is fine and dandy. However, at some point, the container stops hosting a single service; and if you're running a whole distro inside a container, it's not a particularly good idea to run it all under a single user anymore. At some point, you want to envision privilege separation again. (And in a few years you'll want to run containers inside your container, obviously, because that's how tech "evolves". But I digress.)

And so, I think we need to draw a line saying: beyond that point, USER is not supported anymore.

It's already impossible to support all the old s6-overlay's features with USER: for instance, fix-attrs, whose whole concept is chowning stuff, will definitely not work with USER. We're deprecating it, so it's fine; all in all, I think it's reasonable to support USER for the main s6-overlay tarball in the long run. But I kinda want to draw the line at syslogd, and by extension, at any tarball we may provide in the future that implements services over the main overlay architecture.

syslogd adds two longruns, syslogd that reads and processes logs from the /dev/log socket, and syslogd-log that dispatches and stores them into several log directories depending on the priority and facility of the log messages just like a traditional syslogd would do. It also adds a oneshot, syslogd-prepare that makes sure the log directories are present and accessible. Those services are not support services for the application running in the container; they are system services. It makes sense to have a dedicated user running syslogd and another dedicated user running syslogd-log; and syslogd-prepare needs root privileges in order to create subdirectories in /var/log if they don't already exist.

Supporting USER in the syslogd overlay tarball would require a lot of work and would not make much sense, I think. If an application requires a syslogd service, it basically requires a full system; the syslogd subsystem should work independently from the application, under different users, and the container hosting the application should run as root and implement its own privilege separation (which s6-overlay does quite well), and the admin should makes sure the application run under its own uid despite the container itself running as root. (It's as easy as ENV S6_CMD_ARG0="s6-setuidgid application_user", if the application runs as CMD!)

So, what do y'all think? Is it okay if I happily let USER containers crash and burn when the syslogd-overlay tarball is installed? Can we make this a policy for potential future service tarballs? Can we hard limit USER support with electric pruning shears?

@skarnet
Copy link
Contributor Author

skarnet commented Jan 19, 2022

Everything is now reasonably tested, and, I hope, reasonably complete. THE THING IS READY NOW.
I have updated the README.md as well as the CHANGELOG.md.
I have pushed my v3 branch to github, and merged it to master.

@jprjr, can you please take care of the CI/CD thing? So we can provide:

  • a tarball for the source (s6-overlay-3.0.0.0.tar.xz)
  • Tarballs for the built versions, the noarch ones as well as one for each arch that is supported in conf/toolchains

And please take a look at the README.md to check for blatant omissions or errors. Please keep maintaining this thing as you've done for the past years; it should be much easier to maintain now, and modifications should be quick and painless - but I don't want to be in charge of it, at least not mainly.

Thanks!

@GreyXor
Copy link

GreyXor commented Jan 19, 2022

@skarnet Thanks, this is awesome! I'm going to test it right away

@pvizeli
Copy link

pvizeli commented Jan 20, 2022

Yes, make complete sense do not support USER, and let's use the s6-setuidgid. I would also like to stay with fix-attrs. Thanks for you work

@skarnet
Copy link
Contributor Author

skarnet commented Jan 20, 2022

I have tagged the release so a source tarball is available for download. Still need CI stuff to provide pre-built tarballs though.

@jprjr
Copy link
Member

jprjr commented Jan 21, 2022

We've only got so many hours in a day, and it seems like I have way less free time than I used to - I am perfectly fine dropping USER support. I'll get CI done this weekend, it should be very straightforward

@skarnet
Copy link
Contributor Author

skarnet commented Jan 21, 2022

USER is fine for the main overlay, it's just add-ons such as syslogd-overlay that I think benefit more from in-container privilege separation than from a whole USER container.

Welcome back John, and thanks for the CI! Can you read the list of architectures (and number of tarballs) automatically from the conf/toolchains file? That's the file I'm going to modify if I add a toolchain for another architecture (somebody requested s390x, I'll see if I can build that next week).

@jprjr
Copy link
Member

jprjr commented Jan 23, 2022

Can you read the list of architectures (and number of tarballs) automatically from the conf/toolchains file?

I wasn't sure until today - but yep! You can create a matrix on-the-fly, that's handled by this part of the release.yml file - I transform the list of toolchains into a JSON array of strings.

@skarnet
Copy link
Contributor Author

skarnet commented Jan 24, 2022

@jprjr I'm not sure where the issue was mentioned last time, but I added a specialcase so that the tarball for arm-linux-musleabihf is named armhf instead of arm. So in future versions you can remove your CI workaround.

@jprjr
Copy link
Member

jprjr commented Jan 25, 2022

Awesome - updated in e45090e

@Bessonov
Copy link

Bessonov commented Feb 3, 2022

Is there any reason to use xz? It would be nice to get rid of xz-utils.

@skarnet
Copy link
Contributor Author

skarnet commented Feb 3, 2022

It's better than gz on both archive size and decompression speed. I expect there will come a point where xz-utils is part of every distribution's base just like gzip.
Busybox tar has an option to decompress .tar.xz archives without xz-utils.

@Bessonov
Copy link

Bessonov commented Feb 3, 2022

@skarnet thank you for the explanation. But does it matter? I mean it's not something like compressed js files from CDN to serve millions of users where every single byte can reduce traffic significantly. It just one-shot operation to get files. But it makes usage of s6-overlay more complex and less robust. For example, I overlooked this step (because I don't use nginx in the current container) and spend some time to googling meaningless errors, like:

# tar -C / -Jxpf /tmp/s6-overlay-x86_64-3.0.0.2.tar.xz 
tar (child): xz: Cannot exec: No such file or directory
tar (child): Error is not recoverable: exiting now
tar: Child returned status 2
tar: Error is not recoverable: exiting now

@skarnet
Copy link
Contributor Author

skarnet commented Feb 3, 2022

I will make it clearer in the documentation that you need xz-utils for the archive extraction operation.

@skarnet
Copy link
Contributor Author

skarnet commented Mar 8, 2022

Finally, v3.1.0.1 is out, fixing - I hope - most of the issues that people have opened since the 3.0.0.0 release.
(It should have been v3.1.0.0, but I goofed and forgot to update the version for s6-overlay-helpers, so v3.1.0.0 misses a fix.)

I'm now going to try to work on something else for a couple days before the bug-reports for v3.1.0.1 and feature requests for future versions start coming in. 😁

@shinsenter
Copy link

shinsenter commented Apr 21, 2022

@skarnet @jprjr

I think this might be a bug, but I'm not sure so allow me to discuss it in this thread.

I realized that after setting ENTRYPOINT ["/init"] in the Dockerfile then the commands I run with docker run won't fully recognize the environment variables.

For example:

Dockerfile

# s6-overlay v3.1.0.1 source image
FROM shinsenter/s6-overlay as s6
# Image Name: my-image
FROM ubuntu:latest

# skips some RUN commands here
# RUN ...

ENV TEST=my-image

COPY --from=s6 / /
ENTRYPOINT ["/init"]

My docker run command

docker run --rm my-image env

The result

s6-rc: info: service s6rc-fdholder: starting
s6-rc: info: service s6rc-oneshot-runner: starting
s6-rc: info: service s6rc-oneshot-runner successfully started
s6-rc: info: service fix-attrs: starting
s6-rc: info: service s6rc-fdholder successfully started
s6-rc: info: service fix-attrs successfully started
s6-rc: info: service legacy-cont-init: starting
s6-rc: info: service legacy-cont-init successfully started
s6-rc: info: service syslogd-prepare: starting
s6-rc: info: service syslogd-prepare successfully started
s6-rc: info: service syslogd-log: starting
s6-rc: info: service syslogd-log successfully started
s6-rc: info: service syslogd: starting
s6-rc: info: service syslogd successfully started
s6-rc: info: service legacy-services: starting
s6-rc: info: service legacy-services successfully started
OLDPWD=/
PATH=/command:/composer/vendor/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PWD=/var/www/html
s6-rc: info: service legacy-services: stopping
s6-rc: info: service legacy-services successfully stopped
s6-rc: info: service syslogd: stopping
s6-rc: info: service syslogd successfully stopped
s6-rc: info: service syslogd-log: stopping
s6-rc: info: service syslogd-log successfully stopped
s6-rc: info: service syslogd-prepare: stopping
s6-rc: info: service s6rc-fdholder: stopping
s6-rc: info: service syslogd-prepare successfully stopped
s6-rc: info: service legacy-cont-init: stopping
s6-rc: info: service legacy-cont-init successfully stopped
s6-rc: info: service fix-attrs: stopping
s6-rc: info: service fix-attrs successfully stopped
s6-rc: info: service s6rc-oneshot-runner: stopping
s6-rc: info: service s6rc-fdholder successfully stopped
s6-rc: info: service s6rc-oneshot-runner successfully stopped

But it was not my expected output.


After searching all over the Internet, I added --entrypoint='', and it got fixed.

docker run --rm --entrypoint='' my-image env

Output

PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
HOSTNAME=8ce2e69b808e
DEBIAN_FRONTEND=noninteractive
LANG=C.UTF-8
LANGUAGE=C.UTF-8
LC_ALL=C
TERM=xterm
HOME=/root
TEST=my-image

Sometimes I need to use the output of running a command in my container. I also don't want any s6-rc log in the output.

So I'm thinking it would be better if the /init script could maintain the same behavior when running the command docker run --entrypoint=''.

Best regards!

@skarnet
Copy link
Contributor Author

skarnet commented Apr 21, 2022

You are basically asking s6-overlay to not be s6-overlay.

Having a separate ENTRYPOINT is the point. Using /init will set up the supervision infrastructure and start your services, before handing over to your CMD in a controlled environment. If you don't use /init, then your container starts with your CMD as pid 1, as intended by Docker.

When you run s6-overlay, the environment is not inherited by default, and that's a feature: the idea is that the supervision tree always runs with a minimal environment, to avoid potential control flow hijacks from the outside of the container. If you want your services to inherit the environment that you started the container with (via ENV declarations in your Dockerfile), that's what with-contenv is for. Try:

docker run --rm my-image with-contenv env

s6-overlay isn't meant for short-lived container sessions such as running a small command. For that, you're better off emptying the ENTRYPOINT indeed. The intended usage is to have a long-lived container run by s6-overlay, and then you would run your short-lived commands under docker exec.

@shinsenter
Copy link

@skarnet Thank you for the explanation. I will try finding a solution to archive these two cases.

@pvizeli
Copy link

pvizeli commented May 12, 2022

As we not support USER, will fix-attrs then still exists?

@skarnet
Copy link
Contributor Author

skarnet commented May 12, 2022

fix-attrs still exists and it's not going away, we're just not going to expand on it in the future.

@zmcandee
Copy link

Is it possible to add back support for PIDs other than 1? A lot of docker alternatives such as fly.io microVMs don't allow replacing PID 1 and v2 ran fine at other PIDs. At first glance, v3 requires PID 1 for security when elevating to root but only preinit is ran elevated. It seems that preinit may be able to survive running at user level if everything is properly chown and setup in the dockerfile or if ran as root to start with.

@skarnet
Copy link
Contributor Author

skarnet commented Aug 25, 2022

No. s6-overlay was always meant to run as pid 1. In v2, it appeared to run as another pid, but it was just more subtly broken; any case where it worked was purely accidental. In v3, we purposefully prevent the boot if /init isn't running as pid 1, in order to avoid silent breakage.

Container managers that force you to run a pid 1 of their choice are going out of their lane and breaking the convention for containers. They do not qualify as container managers; they may be useful for other purposes, but you cannot expect projects meant to run with containers to properly interact with them.

@synoniem
Copy link

synoniem commented Nov 5, 2022

I am using v2 since a few years and intend to make the switch to v3. Only the s6-rc dependencies structure worries me as it is complex and error prone imo. Would it be an idea to adopt the service file format used by systemd and let s6-overlay do the structuring?

@skarnet
Copy link
Contributor Author

skarnet commented Nov 5, 2022

You're worried about the complexity of s6-rc and you want to adopt the service file format of systemd? 🤣

@synoniem
Copy link

synoniem commented Nov 5, 2022

Yes. Of course I was talking about the dependency structures and maybe the run statement. That is the good part of it, all other culprit of systemd in the service file should be ignored. It would give just one file per service which is far more easy then to jump hoops through a 5 or 6 directory structure and even more files per service. And they already exist for most services. Any other one file per service approach would be great of course.

@skarnet
Copy link
Contributor Author

skarnet commented Nov 5, 2022

Oh, so you mean you have an issue with the interface. That has nothing to do with being complex and error-prone then. It's just that you prefer text files.

I agree. Text files are easier for humans to manipulate. But they're not as easy for computers to manipulate, and I haven't gotten to writing the human-friendlier interface yet, so, sorry, that's what you're getting for now.

@synoniem
Copy link

synoniem commented Nov 5, 2022

Too bad. I am not a great programmer so I will make a quick hack by myself then. Under while waiting for you having the time and resources to write the human-friendlier interface because I really like this init-system for my containers.

@flylan
Copy link

flylan commented Jun 25, 2024

I also really like the software s6 overlay, but based on my usage experience, I feel that the existing configuration method is a bit cumbersome. Can we support configuring s6 overlay using yaml configuration in the future? For example, configuring a service list, configuring restart or smooth exit signals for each service, and so on. It seems that everything can be done through a yaml configuration file

@skarnet
Copy link
Contributor Author

skarnet commented Jun 25, 2024

The s6-rc source format is, on purpose, meant to be easily producible by automation. Nothing prevents you from having your configuration stored in a yaml file and building an s6-rc source directory out of it; on the contrary, you are encouraged to do it. A simple Python script could do the job, and it can entirely be done offline, with no involvement from s6-overlay and no dependency added to it.

@endersonmaia
Copy link

I'm using s6-overlays in some projects, and I'd like to have a "distroless" container image without a shell.

I see you mention the /bin/sh dependency, is there any plans to remove this dependency?

@skarnet
Copy link
Contributor Author

skarnet commented Jul 3, 2024

There's no plan to remove the /bin/sh dependency, sorry. The trade-off is just not worth it, especially if you use the busybox or alpine container base, where the shell is just super small.

Note that the shell dependency is only in the s6-overlay scripts themselves, though. If you're so inclined, you can have an image running s6-linux-init, s6 and s6-rc with no shell involvement at all during the boot procedure: that's what I run on my server, and you can see what it looks like by building lh-bootstrap. It uses a shell, but unlike s6-overlay, the shell is only there for interactive logins and if there's any amount of shell scripting in the boot script (which I'm not sure there is), it's absolutely minimal and can entirely be replaced with execline scripting.

@AntoineAtMistral
Copy link

I'm new to s6-overlay so it might already exist, but a way to specify healthchecks for services would be really nice. I'm thinking of a /healthcheck file in the service definition folder containing a probe command, and the service would restarts when the check fails. (we could also add /healthcheck-interval and /healthcheck-timeout files to configure the healthchecks behavior)

@skarnet
Copy link
Contributor Author

skarnet commented Oct 8, 2024

Health checking is part of monitoring, not supervision - the distinction is subtle but it's there, and I'm very cautious with scope creep, I don't want s6 to enter monitoring territory.

Health checking is better done as a separate service: to monitor foo, have a foo-check longrun service that runs your probe command then sleeps for some duration. Depending on the result of the probe command, send a signal to foo via an s6-svc command.

That said, I realize it's a common need that wouldn't scope creep too hard, and it would be convenient for users to have syntactic sugar for this functionality, so yes, I'm thinking of ways to add this at some level in a future version.

@AntoineAtMistral
Copy link

Thanks for the feedback! I'll definitely take a look at the service + s6-svc way, and would love to see it more conveniently integrated in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests