Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

no notifications for service after downtime when host was soft-critical #10354

Open
TQQEU opened this issue Feb 28, 2025 · 1 comment
Open

no notifications for service after downtime when host was soft-critical #10354

TQQEU opened this issue Feb 28, 2025 · 1 comment
Labels
area/notifications Notification events

Comments

@TQQEU
Copy link

TQQEU commented Feb 28, 2025

Bug Description

One of our Icinga2 checks is designed to immediately trigger a "hard Critical" state on the first attempt. Recently, there were maintenance operations on the underlying system, during which downtimes were scheduled.

Our expectation is that, if a service becomes "Critical" during the downtime, it should trigger its notifications at the end of the downtime. However, we encountered a scenario where this does not happen:

  • The service is configured to directly enter a "hard critical" state (no soft-state).
  • While the service is in a "Critical" state, the associated host is in a soft-state ("down").

This scenario is special/rare, but it has occurred in our production environment.

To Reproduce

  1. Set up a host and a service in Icinga2. The service should be configured to immediately enter a "hard Critical" state and have notifications enabled.

  2. Schedule a downtime.

  3. During the downtime, briefly transition the host into a "soft-down" state.

  4. While the host is in the "soft-down" state, cause the service to enter the "hard Critical" state.

  5. The host will return to the "OK" state (without ever reaching "hard-down"), and the downtime will end.

  6. At the end of the downtime, the host should be in the "OK" state, but the service remains in the "Critical" state.

Expected Behavior

The service should trigger a notification when it transitions to "Critical" at the end of the downtime.

Observed Behavior

No notification is triggered for the service, even though it is in the "Critical" state.

Sketch

Image

The notification in the diagram is currently not being triggered.

Your Environment

Include as many relevant details about the environment you experienced the problem in

  • Version used (icinga2 --version): r2.14.5-1
  • Operating System and version: RHEL 8.10 (Ootpa)
  • Enabled features (icinga2 feature list): api checker ido-mysql influxdb mainlog notification
@yhabteab yhabteab added the area/notifications Notification events label Feb 28, 2025
@yhabteab
Copy link
Member

yhabteab commented Feb 28, 2025

Hi, thanks for reporting!

For future reference, this is probably due to the !wasLastParentRecoveryRecent.Get() from the following code block:

if (!state_suppressed && GetStateType() == StateTypeHard && !IsLikelyToBeCheckedSoon() && !wasLastParentRecoveryRecent.Get()) {
if (cr->GetState() != GetStateBeforeSuppression()) {
Checkable::OnNotificationsRequested(this, type, cr, "", "", nullptr);
}

The implementation of wasLastParentRecoveryRecent() performs the following checks, among others:

if (!host->GetProblem() && host->GetLastStateChange() >= threshold) {
return true;
}

It makes use of the LastStateChange() timestamp of its parent checkable, but LastStateChange() is also updated with every soft state changes. Thus, it should probably use host->GetLastHardStateChange() instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/notifications Notification events
Projects
None yet
Development

No branches or pull requests

2 participants