You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
We have some services with a defined event handler that is disabling hosts with faulty services from a cluster service. The event handler is written to only react on HARD states.
Some of these services go to an UNKNOWN (HARD) state sometimes (e.g. no agent data for some time due to heavy load).
Unfortunately if the services come back sometimes there is no proper state change to OK (HARD) and so the event handler to enable the hosts gets never called.
Not sure if it also happens after CRITICAL (HARD) states.
root@openitc [core]: /opt/openitc/logs/nagios # zless nagios.log-2021071[0-9].gz nagios.log | perl -p -e 's/^\[([0-9]*)\]/"[".localtime($1)."]"/e' |grep d2d45d3f-61d9-4232-b05d-0be096e928e6 | grep HARD
[Fri Jul 9 06:03:20 2021] SERVICE ALERT: be38e06a-b6ec-49dd-b191-c5a3f75c2f23;d2d45d3f-61d9-4232-b05d-0be096e928e6;CRITICAL;HARD;1;CRITICAL: [...]
[Fri Jul 9 06:03:20 2021] SERVICE EVENT HANDLER: be38e06a-b6ec-49dd-b191-c5a3f75c2f23;d2d45d3f-61d9-4232-b05d-0be096e928e6;CRITICAL;HARD;1;024210cb-94a3-4e4f-bc52-8f6b063db1f4
[Fri Jul 9 06:04:20 2021] SERVICE ALERT: be38e06a-b6ec-49dd-b191-c5a3f75c2f23;d2d45d3f-61d9-4232-b05d-0be096e928e6;OK;HARD;1;OK: [...]
[Fri Jul 9 06:04:20 2021] SERVICE EVENT HANDLER: be38e06a-b6ec-49dd-b191-c5a3f75c2f23;d2d45d3f-61d9-4232-b05d-0be096e928e6;OK;HARD;1;024210cb-94a3-4e4f-bc52-8f6b063db1f4
[Thu Jul 15 04:45:35 2021] SERVICE ALERT: be38e06a-b6ec-49dd-b191-c5a3f75c2f23;d2d45d3f-61d9-4232-b05d-0be096e928e6;UNKNOWN;HARD;1;UNKNOWN: No data received from agent
[Thu Jul 15 04:45:35 2021] SERVICE EVENT HANDLER: be38e06a-b6ec-49dd-b191-c5a3f75c2f23;d2d45d3f-61d9-4232-b05d-0be096e928e6;UNKNOWN;HARD;1;024210cb-94a3-4e4f-bc52-8f6b063db1f4
[Thu Jul 15 04:46:20 2021] SERVICE ALERT: be38e06a-b6ec-49dd-b191-c5a3f75c2f23;d2d45d3f-61d9-4232-b05d-0be096e928e6;OK;HARD;1;OK: [...]
[Thu Jul 15 04:46:20 2021] SERVICE EVENT HANDLER: be38e06a-b6ec-49dd-b191-c5a3f75c2f23;d2d45d3f-61d9-4232-b05d-0be096e928e6;OK;HARD;1;024210cb-94a3-4e4f-bc52-8f6b063db1f4
[Sat Jul 17 05:57:23 2021] SERVICE ALERT: be38e06a-b6ec-49dd-b191-c5a3f75c2f23;d2d45d3f-61d9-4232-b05d-0be096e928e6;UNKNOWN;HARD;3;UNKNOWN: Custom check [...] timed out after 10s seconds
[Sat Jul 17 05:57:23 2021] SERVICE EVENT HANDLER: be38e06a-b6ec-49dd-b191-c5a3f75c2f23;d2d45d3f-61d9-4232-b05d-0be096e928e6;UNKNOWN;HARD;3;024210cb-94a3-4e4f-bc52-8f6b063db1f4
After this the service never becomes OK (HARD) again in the logs which is also visible in the "State History" of the service (could provide screenshot but will be very long). In the "History" (see screenshot below) however you see the service becoming OK (HARD) again just a few minutes later.
To Reproduce
No idea, sometimes it works, sometimes not (see log).
Expected behavior
Proper change to state OK (HARD) and sending according event handler.
Screenshots
Versions
openITCOKPIT Server Version: 4.2.1
Operating system: Ubuntu 20.04 LTS
Additional context
n/a
The text was updated successfully, but these errors were encountered:
Describe the bug
We have some services with a defined event handler that is disabling hosts with faulty services from a cluster service. The event handler is written to only react on HARD states.
Some of these services go to an UNKNOWN (HARD) state sometimes (e.g. no agent data for some time due to heavy load).
Unfortunately if the services come back sometimes there is no proper state change to OK (HARD) and so the event handler to enable the hosts gets never called.
Not sure if it also happens after CRITICAL (HARD) states.
After this the service never becomes OK (HARD) again in the logs which is also visible in the "State History" of the service (could provide screenshot but will be very long). In the "History" (see screenshot below) however you see the service becoming OK (HARD) again just a few minutes later.
To Reproduce
No idea, sometimes it works, sometimes not (see log).
Expected behavior
Proper change to state OK (HARD) and sending according event handler.
Screenshots
Versions
Additional context
n/a
The text was updated successfully, but these errors were encountered: