Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OK (HARD) state changes missing #1203

Closed
exa-mk opened this issue Jul 19, 2021 · 2 comments
Closed

OK (HARD) state changes missing #1203

exa-mk opened this issue Jul 19, 2021 · 2 comments

Comments

@exa-mk
Copy link

exa-mk commented Jul 19, 2021

Describe the bug
We have some services with a defined event handler that is disabling hosts with faulty services from a cluster service. The event handler is written to only react on HARD states.
Some of these services go to an UNKNOWN (HARD) state sometimes (e.g. no agent data for some time due to heavy load).
Unfortunately if the services come back sometimes there is no proper state change to OK (HARD) and so the event handler to enable the hosts gets never called.
Not sure if it also happens after CRITICAL (HARD) states.

root@openitc [core]: /opt/openitc/logs/nagios # zless nagios.log-2021071[0-9].gz nagios.log | perl -p -e 's/^\[([0-9]*)\]/"[".localtime($1)."]"/e' |grep d2d45d3f-61d9-4232-b05d-0be096e928e6 | grep HARD
[Fri Jul  9 06:03:20 2021] SERVICE ALERT: be38e06a-b6ec-49dd-b191-c5a3f75c2f23;d2d45d3f-61d9-4232-b05d-0be096e928e6;CRITICAL;HARD;1;CRITICAL: [...]
[Fri Jul  9 06:03:20 2021] SERVICE EVENT HANDLER: be38e06a-b6ec-49dd-b191-c5a3f75c2f23;d2d45d3f-61d9-4232-b05d-0be096e928e6;CRITICAL;HARD;1;024210cb-94a3-4e4f-bc52-8f6b063db1f4
[Fri Jul  9 06:04:20 2021] SERVICE ALERT: be38e06a-b6ec-49dd-b191-c5a3f75c2f23;d2d45d3f-61d9-4232-b05d-0be096e928e6;OK;HARD;1;OK: [...]
[Fri Jul  9 06:04:20 2021] SERVICE EVENT HANDLER: be38e06a-b6ec-49dd-b191-c5a3f75c2f23;d2d45d3f-61d9-4232-b05d-0be096e928e6;OK;HARD;1;024210cb-94a3-4e4f-bc52-8f6b063db1f4
[Thu Jul 15 04:45:35 2021] SERVICE ALERT: be38e06a-b6ec-49dd-b191-c5a3f75c2f23;d2d45d3f-61d9-4232-b05d-0be096e928e6;UNKNOWN;HARD;1;UNKNOWN: No data received from agent
[Thu Jul 15 04:45:35 2021] SERVICE EVENT HANDLER: be38e06a-b6ec-49dd-b191-c5a3f75c2f23;d2d45d3f-61d9-4232-b05d-0be096e928e6;UNKNOWN;HARD;1;024210cb-94a3-4e4f-bc52-8f6b063db1f4
[Thu Jul 15 04:46:20 2021] SERVICE ALERT: be38e06a-b6ec-49dd-b191-c5a3f75c2f23;d2d45d3f-61d9-4232-b05d-0be096e928e6;OK;HARD;1;OK: [...]
[Thu Jul 15 04:46:20 2021] SERVICE EVENT HANDLER: be38e06a-b6ec-49dd-b191-c5a3f75c2f23;d2d45d3f-61d9-4232-b05d-0be096e928e6;OK;HARD;1;024210cb-94a3-4e4f-bc52-8f6b063db1f4
[Sat Jul 17 05:57:23 2021] SERVICE ALERT: be38e06a-b6ec-49dd-b191-c5a3f75c2f23;d2d45d3f-61d9-4232-b05d-0be096e928e6;UNKNOWN;HARD;3;UNKNOWN: Custom check [...] timed out after 10s seconds
[Sat Jul 17 05:57:23 2021] SERVICE EVENT HANDLER: be38e06a-b6ec-49dd-b191-c5a3f75c2f23;d2d45d3f-61d9-4232-b05d-0be096e928e6;UNKNOWN;HARD;3;024210cb-94a3-4e4f-bc52-8f6b063db1f4

After this the service never becomes OK (HARD) again in the logs which is also visible in the "State History" of the service (could provide screenshot but will be very long). In the "History" (see screenshot below) however you see the service becoming OK (HARD) again just a few minutes later.

To Reproduce
No idea, sometimes it works, sometimes not (see log).

Expected behavior
Proper change to state OK (HARD) and sending according event handler.

Screenshots
image

Versions

  • openITCOKPIT Server Version: 4.2.1
  • Operating system: Ubuntu 20.04 LTS

Additional context
n/a

@nook24
Copy link
Member

nook24 commented Aug 9, 2021

Maybe this relates to naemon/naemon-core#368 ?

As possible workaround I would trigger the event handler on all Ok states - not just hard states.

@nook24
Copy link
Member

nook24 commented Feb 9, 2023

Is this still an issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants