Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem in the SOFT HARD check logic #368

Open
dirtyren opened this issue Aug 4, 2021 · 1 comment
Open

Problem in the SOFT HARD check logic #368

dirtyren opened this issue Aug 4, 2021 · 1 comment

Comments

@dirtyren
Copy link
Contributor

dirtyren commented Aug 4, 2021

Hello,

I found this problem bellow.
The host went down and naemon set the service as CRITICAL HARD, but when the Host came back UP, naemon set the HOST to OK SOFT. This broke some availability reports that depend on HARD states to make the calculations.
The question is, should the service not be set to OK HARD when the Host came back up?

Tks.

[Fri Jul 23 03:39:31 2021] INITIAL SERVICE STATE: HOSTDEMO;SVCDEMO;OK;HARD;1;OK
[Fri Jul 23 21:41:11 2021] HOST ALERT: HOSTDEMO;DOWN;SOFT;1;CRITICAL - 192.168.54.32: rta nan, lost 100%
[Fri Jul 23 21:41:21 2021] HOST ALERT: HOSTDEMO;DOWN;SOFT;2;CRITICAL - 192.168.54.32: rta nan, lost 100%
[Fri Jul 23 21:41:37 2021] HOST ALERT: HOSTDEMO;DOWN;HARD;3;CRITICAL - 192.168.54.32: rta nan, lost 100%
[Fri Jul 23 21:42:57 2021] SERVICE INFO: HOSTDEMO;SVCDEMO; Service switch to hard down state due to host down.
[Fri Jul 23 21:42:57 2021] SERVICE ALERT: HOSTDEMO;SVCDEMO;CRITICAL;HARD;1;CRITICAL - cannot connect
[Fri Jul 23 21:46:57 2021] HOST ALERT: HOSTDEMO;UP;HARD;1;OK - 192.168.54.32: , rta 0.259ms, lost 0%
[Fri Jul 23 21:47:17 2021] SERVICE ALERT: HOSTDEMO;SVCDEMO;CRITICAL;SOFT;1;CRITICAL - cannot connect
[Fri Jul 23 21:49:17 2021] SERVICE ALERT: HOSTDEMO;SVCDEMO;CRITICAL;SOFT;2;CRITICAL - cannot connect
[Fri Jul 23 21:51:18 2021] SERVICE ALERT: HOSTDEMO;SVCDEMO;OK;SOFT;3;OK

@dirtyren
Copy link
Contributor Author

I got another behavior , naemon did not generate a state change for the service to OK, but the INITIAL LOG STATE changed to OK, like this
[Thu Jun 17 18:43:01 2021] SERVICE INFO: PABX;Port_8443; Service switch to hard down state due to host down.
[Thu Jun 17 18:43:01 2021] SERVICE ALERT: PABX;Port_8443;CRITICAL;HARD;1;CRITICAL - Socket timeout after 10 seconds
[Thu Jun 17 18:50:21 2021] HOST ALERT: PABX;UP;HARD;1;OK - x.x.x.x: , rta 0.446ms, lost 0%
[Thu Jun 17 18:59:35 2021] INITIAL HOST STATE: PABX;UP;HARD;1;OK - x.x.x.x: , rta 0.234ms, lost 0%
[Thu Jun 17 18:59:35 2021] INITIAL SERVICE STATE: PABX;Port_8443;OK;HARD;1;TCP OK - 0.000 second response time on x.x.x.x on port 8443

If you check this, the plugin output for the service when CRITICAL was CRITICAL - Socket timeout after 10 seconds, when naemon was restarted, the plugin output changed for the OK exit, but the SERVICE ALERT for the OK HARD states was not generated.
If you see, the HOST came back to OK 9minutes before naemon was restarted, and no SERVICE ALERT OK state was generate for the service.

[]s.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant