Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

notification_interval in escalation is not used #303

Open
lgrn opened this issue Nov 12, 2018 · 1 comment
Open

notification_interval in escalation is not used #303

lgrn opened this issue Nov 12, 2018 · 1 comment

Comments

@lgrn
Copy link

lgrn commented Nov 12, 2018

  • Configure a service to have notification_interval 5 and a unique contact.
  • Tie two "1:0" escalations ("start on first notification, never end") to the service with notification_interval 10 and 15 respectively, and unique contacts.

The service now has two overlapping escalations, each with unique contacts, as well as its own service-specific contact and notification_interval. Now make the service go CRITICAL.

Expected behavior: Since escalations are active, service settings in regards to contacts and notification intervals should presumably be overridden by what the escalation(s) dictate. The contacts should get notifications at separate intervals, decided by what escalation they belong to (10 or 15 minutes). The service contact and service notification interval should
not be used.

What happens instead: Contacts are correctly taken from the escalations (the service contact is not notified), but the notification_interval is taken from the service, and applied for both of these contacts, instead of the notification_interval specified in the escalation.

Log example, where "CONTACT1" and "CONTACT2" are from the two escalations respectively. Service notifications are sent and logged with a 600 second interval (5 minutes, from the service object).

 [1542029872] SERVICE NOTIFICATION: CONTACT1;apanarenapa;ping;CRITICAL;service-notify;CRITICAL - 127.0.0.1: rta 0.012ms, lost 0% :: 1.3.3.7: rta nan, lost 100%
 [1542029872] SERVICE NOTIFICATION: CONTACT2;apanarenapa;ping;CRITICAL;service-notify;CRITICAL - 127.0.0.1: rta 0.012ms, lost 0% :: 1.3.3.7: rta nan, lost 100%
 [1542030472] SERVICE NOTIFICATION: CONTACT1;apanarenapa;ping;CRITICAL;service-notify;CRITICAL - 127.0.0.1: rta 0.010ms, lost 0% :: 1.3.3.7: rta nan, lost 100%
 [1542030472] SERVICE NOTIFICATION: CONTACT2;apanarenapa;ping;CRITICAL;service-notify;CRITICAL - 127.0.0.1: rta 0.010ms, lost 0% :: 1.3.3.7: rta nan, lost 100%
 [1542031072] SERVICE NOTIFICATION: CONTACT1;apanarenapa;ping;CRITICAL;service-notify;CRITICAL - 127.0.0.1: rta 0.010ms, lost 0% :: 1.3.3.7: rta nan, lost 100%
 [1542031072] SERVICE NOTIFICATION: CONTACT2;apanarenapa;ping;CRITICAL;service-notify;CRITICAL - 127.0.0.1: rta 0.010ms, lost 0% :: 1.3.3.7: rta nan, lost 100%
 [1542031672] SERVICE NOTIFICATION: CONTACT1;apanarenapa;ping;CRITICAL;service-notify;CRITICAL - 127.0.0.1: rta 0.010ms, lost 0% :: 1.3.3.7: rta nan, lost 100%
 [1542031672] SERVICE NOTIFICATION: CONTACT2;apanarenapa;ping;CRITICAL;service-notify;CRITICAL - 127.0.0.1: rta 0.010ms, lost 0% :: 1.3.3.7: rta nan, lost 100%
 [1542032272] SERVICE NOTIFICATION: CONTACT1;apanarenapa;ping;CRITICAL;service-notify;CRITICAL - 127.0.0.1: rta 0.010ms, lost 0% :: 1.3.3.7: rta nan, lost 100%
 [1542032272] SERVICE NOTIFICATION: CONTACT2;apanarenapa;ping;CRITICAL;service-notify;CRITICAL - 127.0.0.1: rta 0.010ms, lost 0% :: 1.3.3.7: rta nan, lost 100%
 [1542032872] SERVICE NOTIFICATION: CONTACT1;apanarenapa;ping;CRITICAL;service-notify;CRITICAL - 127.0.0.1: rta 0.012ms, lost 0% :: 1.3.3.7: rta nan, lost 100%
 [1542032872] SERVICE NOTIFICATION: CONTACT2;apanarenapa;ping;CRITICAL;service-notify;CRITICAL - 127.0.0.1: rta 0.012ms, lost 0% :: 1.3.3.7: rta nan, lost 100%
 [1542033472] SERVICE NOTIFICATION: CONTACT1;apanarenapa;ping;CRITICAL;service-notify;CRITICAL - 127.0.0.1: rta 0.020ms, lost 0% :: 1.3.3.7: rta nan, lost 100%
 [1542033472] SERVICE NOTIFICATION: CONTACT2;apanarenapa;ping;CRITICAL;service-notify;CRITICAL - 127.0.0.1: rta 0.020ms, lost 0% :: 1.3.3.7: rta nan, lost 100%

Escalations:

define serviceescalation{
     host_name                      apanarenapa
     service_description            ping
     contacts                       CONTACT2
     first_notification             1
     last_notification              0
     notification_interval          10
     escalation_options             c,r,u,w
     }
 
 define serviceescalation{
     host_name                      apanarenapa
     service_description            ping
     contacts                       CONTACT1
     first_notification             1
     last_notification              0
     notification_interval          15
     escalation_options             c,r,u,w
     }

Service object:

define service{
     use                            default-service
     host_name                      apanarenapa
     service_description            ping
     check_command                  check_ping!500,90%!800,100%
     check_interval                 1
     notification_interval          5
     contacts                       never_used
     }

Template:

 define service{
     is_volatile                    0
     max_check_attempts             3
     check_interval                 5
     retry_interval                 1
     active_checks_enabled          1
     passive_checks_enabled         1
     check_period                   24x7
     parallelize_check              0
     obsess                         0
     check_freshness                0
     event_handler_enabled          1
     flap_detection_enabled         1
     process_perf_data              1
     retain_status_information      1
     retain_nonstatus_information   1
     notification_interval          0
     notification_period            24x7
     notification_options           c,f,r,s,u,w
     notifications_enabled          1
     contacts                       never_used
     register                       0
     name                           default-service
     }

OP5 Jira: https://jira.op5.com/browse/MON-11356

@sni sni transferred this issue from naemon/naemon Aug 19, 2019
@vpber
Copy link

vpber commented Nov 3, 2021

This is somewhat by design, from the Naemon documentation https://www.naemon.org/documentation/usersguide/objectdefinitions.html#serviceescalation:
Note: If multiple escalation entries for a host overlap for one or more notification ranges, the smallest notification interval from all escalation entries is used.

So actually the expected behavior is:
Since escalations are active, service settings in regards to contacts and notification intervals should presumably be overridden by what the escalation(s) dictate. The contacts should get notifications at the lowest defined escalation interval (10 min), regardless of what escalation they belong to (10 or 15 minutes) because they are overlapping. The service contact and service notification interval should not be used.

The error in your case is that Naemon does not use the lowest defined escalation interval (10 min) but uses the "original" service notification interval (5 min) instead.

Also I guess the documentation should say "multiple escalation entries for a service" instead of "host" here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants