Generic systemd service plugin is required #13

matwey · 2023-01-11T10:06:28Z

Hello,

I am exploring microOS. I think that health-checker is little complicated currently. As far as I understand I am forced to write a plugin (bash script) which just make systemd status myservice.service in my case.

What if we could have some generic plugin to check that required services are up and running? From user point of view, I would expect to make my critical services as WantedBy some systemd target, and then the health-checker plugin would check this target.

The text was updated successfully, but these errors were encountered:

laenion · 2023-01-12T19:34:02Z

@lkocman: Is this something for the GSOC project you were talking about today?

LevitatingBusinessMan · 2025-02-18T03:38:00Z

I agree that this could be useful. A plugin could check the After dependencies of the health-checker.service and verify that all are up. Or even check if the system status is not degraded.

To also be able to stop these services, they would need to be part of a slice. However the default behavior of the health-checker is to either reboot or isolate the emergency target anyway.

thkukuk · 2025-02-18T11:21:05Z

The problem is that I'm not aware of any API which provides the After dependencies of a servcie to a Script and which is not using DBUS.
Beside that: in some cases checking if the service quit with an error code or not may be enough, but most of the time that a service does not quit does not mean it's working.

LevitatingBusinessMan · 2025-02-18T11:30:26Z

The problem is that I'm not aware of any API which provides the After dependencies of a servcie to a Script and which is not using DBUS.

Is not using the dbus a requirement? A lot of current plugins rely on systemctl already.
You could get the After dependencies using systemctl show

But I was planning on experimenting with a plugin that simply checks if the system has degraded before health-checker was invoked:

run_checks() {
    systemctl -q is-system-running
    test $? -ne 0 && exit 1
}

Beside that: in some cases checking if the service quit with an error code or not may be enough, but most of the time that a service does not quit does not mean it's working.

An automatic update could cause invalid configurations or missing dependencies or similar issues. In these cases a database or webserver would outright quit, and a rollback could restore it.

One should wonder however if using health-checker for this is desirable. A rollback could restore complete functionality to the system, but the default behavior of isolating to emergency.target when this fails would be much worse because it could prevent a sysadmin from even accessing the machine.

matwey · 2025-02-20T07:02:05Z

but most of the time that a service does not quit does not mean it's working.

I think that systemd is a system manager and it is final source of information of what is working and what is not. You are correct that the some services are malfunctioning while the process is formally alive, but I think that this services should be enhanced to use sd_notify protocol for more fine-granted status exchange with the system manager.

matwey · 2025-02-20T07:04:45Z

default behavior of isolating to emergency.target when this fails would be much worse because it could prevent a sysadmin from even accessing the machine.

I guess this should be disabled for microos installations. Booting into the emergency.target is a top of issues with my microos installations. And this is not what the user expects after reading microos concepts.

thkukuk · 2025-02-20T09:22:57Z

I think that systemd is a system manager and it is final source of information of what is working and what is not. You are correct that the some services are malfunctioning while the process is formally alive, but I think that this services should be enhanced to use sd_notify protocol for more fine-granted status exchange with the system manager.

Only theoretical correct. Best example: kubernetes. kubernetes applications are working fine, but not together. So you need to check if the node could e.g. join the cluster or not.

default behavior of isolating to emergency.target when this fails would be much worse because it could prevent a sysadmin from even accessing the machine.

I guess this should be disabled for microos installations. Booting into the emergency.target is a top of issues with my microos installations. And this is not what the user expects after reading microos concepts.

MicroOS will go into the emergency.target if it is not able to solve the problem itself by a rollback.
What would be your proposal we should do? Booting it normal and let it join e.g. a k8s cluster even if we know the node is not working correct is no option.

LevitatingBusinessMan · 2025-02-20T13:12:07Z

Plugins already have the stop functionality.

So for your example, if you could write a check to check for the nodes healthiness, you could stop it and prevent it joining a cluster. In this example I think stopping the node via a plugin would be better then isolating to emergency.target.

The problem with isolating to emergency.target is that the system might have been in a state where it can still perform some of its functions, or at least send emails concerning its state and have ssh access.

In any case like you said, having a generic systemd service wouldn't suffice for k8s applications anyway. So I am not sure if it is relevant to the discussion.

I think a generic systemd service can bring a lot of value. But likely only when the behavior of isolating to emergency.target is reconsidered.

Could you give an example scenario where emergency.target is really preferred over only stopping all plugins? If there is a service that is not allowed to run when the system is unhealthy this service could be configured to be stopped with a plugin anyway.

The stopping functionality with the current default plugins is only configured for systemd-logind and rebootmgr (those services are stopped). But I don't think this does anything because the system changes target or reboots anyway.

LevitatingBusinessMan mentioned this issue Feb 18, 2025

Failed service not triggering rollback #25

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generic systemd service plugin is required #13

Generic systemd service plugin is required #13

matwey commented Jan 11, 2023

laenion commented Jan 12, 2023

LevitatingBusinessMan commented Feb 18, 2025 •

edited

Loading

thkukuk commented Feb 18, 2025

LevitatingBusinessMan commented Feb 18, 2025 •

edited

Loading

matwey commented Feb 20, 2025

matwey commented Feb 20, 2025

thkukuk commented Feb 20, 2025

LevitatingBusinessMan commented Feb 20, 2025 •

edited

Loading

Generic systemd service plugin is required #13

Generic systemd service plugin is required #13

Comments

matwey commented Jan 11, 2023

laenion commented Jan 12, 2023

LevitatingBusinessMan commented Feb 18, 2025 • edited Loading

thkukuk commented Feb 18, 2025

LevitatingBusinessMan commented Feb 18, 2025 • edited Loading

matwey commented Feb 20, 2025

matwey commented Feb 20, 2025

thkukuk commented Feb 20, 2025

LevitatingBusinessMan commented Feb 20, 2025 • edited Loading

LevitatingBusinessMan commented Feb 18, 2025 •

edited

Loading

LevitatingBusinessMan commented Feb 18, 2025 •

edited

Loading

LevitatingBusinessMan commented Feb 20, 2025 •

edited

Loading