Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic systemd service plugin is required #13

Open
matwey opened this issue Jan 11, 2023 · 8 comments
Open

Generic systemd service plugin is required #13

matwey opened this issue Jan 11, 2023 · 8 comments

Comments

@matwey
Copy link
Member

matwey commented Jan 11, 2023

Hello,

I am exploring microOS. I think that health-checker is little complicated currently. As far as I understand I am forced to write a plugin (bash script) which just make systemd status myservice.service in my case.

What if we could have some generic plugin to check that required services are up and running? From user point of view, I would expect to make my critical services as WantedBy some systemd target, and then the health-checker plugin would check this target.

@laenion
Copy link
Contributor

laenion commented Jan 12, 2023

@lkocman: Is this something for the GSOC project you were talking about today?

@LevitatingBusinessMan
Copy link

LevitatingBusinessMan commented Feb 18, 2025

I agree that this could be useful. A plugin could check the After dependencies of the health-checker.service and verify that all are up. Or even check if the system status is not degraded.

To also be able to stop these services, they would need to be part of a slice. However the default behavior of the health-checker is to either reboot or isolate the emergency target anyway.

@thkukuk
Copy link
Contributor

thkukuk commented Feb 18, 2025

The problem is that I'm not aware of any API which provides the After dependencies of a servcie to a Script and which is not using DBUS.
Beside that: in some cases checking if the service quit with an error code or not may be enough, but most of the time that a service does not quit does not mean it's working.

@LevitatingBusinessMan
Copy link

LevitatingBusinessMan commented Feb 18, 2025

The problem is that I'm not aware of any API which provides the After dependencies of a servcie to a Script and which is not using DBUS.

Is not using the dbus a requirement? A lot of current plugins rely on systemctl already.
You could get the After dependencies using systemctl show

But I was planning on experimenting with a plugin that simply checks if the system has degraded before health-checker was invoked:

run_checks() {
    systemctl -q is-system-running
    test $? -ne 0 && exit 1
}

Beside that: in some cases checking if the service quit with an error code or not may be enough, but most of the time that a service does not quit does not mean it's working.

An automatic update could cause invalid configurations or missing dependencies or similar issues. In these cases a database or webserver would outright quit, and a rollback could restore it.

One should wonder however if using health-checker for this is desirable. A rollback could restore complete functionality to the system, but the default behavior of isolating to emergency.target when this fails would be much worse because it could prevent a sysadmin from even accessing the machine.

@matwey
Copy link
Member Author

matwey commented Feb 20, 2025

but most of the time that a service does not quit does not mean it's working.

I think that systemd is a system manager and it is final source of information of what is working and what is not. You are correct that the some services are malfunctioning while the process is formally alive, but I think that this services should be enhanced to use sd_notify protocol for more fine-granted status exchange with the system manager.

@matwey
Copy link
Member Author

matwey commented Feb 20, 2025

default behavior of isolating to emergency.target when this fails would be much worse because it could prevent a sysadmin from even accessing the machine.

I guess this should be disabled for microos installations. Booting into the emergency.target is a top of issues with my microos installations. And this is not what the user expects after reading microos concepts.

@thkukuk
Copy link
Contributor

thkukuk commented Feb 20, 2025

I think that systemd is a system manager and it is final source of information of what is working and what is not. You are correct that the some services are malfunctioning while the process is formally alive, but I think that this services should be enhanced to use sd_notify protocol for more fine-granted status exchange with the system manager.

Only theoretical correct. Best example: kubernetes. kubernetes applications are working fine, but not together. So you need to check if the node could e.g. join the cluster or not.

default behavior of isolating to emergency.target when this fails would be much worse because it could prevent a sysadmin from even accessing the machine.

I guess this should be disabled for microos installations. Booting into the emergency.target is a top of issues with my microos installations. And this is not what the user expects after reading microos concepts.

MicroOS will go into the emergency.target if it is not able to solve the problem itself by a rollback.
What would be your proposal we should do? Booting it normal and let it join e.g. a k8s cluster even if we know the node is not working correct is no option.

@LevitatingBusinessMan
Copy link

LevitatingBusinessMan commented Feb 20, 2025

Plugins already have the stop functionality.

So for your example, if you could write a check to check for the nodes healthiness, you could stop it and prevent it joining a cluster. In this example I think stopping the node via a plugin would be better then isolating to emergency.target.

The problem with isolating to emergency.target is that the system might have been in a state where it can still perform some of its functions, or at least send emails concerning its state and have ssh access.

In any case like you said, having a generic systemd service wouldn't suffice for k8s applications anyway. So I am not sure if it is relevant to the discussion.

I think a generic systemd service can bring a lot of value. But likely only when the behavior of isolating to emergency.target is reconsidered.

Could you give an example scenario where emergency.target is really preferred over only stopping all plugins? If there is a service that is not allowed to run when the system is unhealthy this service could be configured to be stopped with a plugin anyway.

The stopping functionality with the current default plugins is only configured for systemd-logind and rebootmgr (those services are stopped). But I don't think this does anything because the system changes target or reboots anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants