-
Notifications
You must be signed in to change notification settings - Fork 585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Take check timeout into account when rescheduling remote checks #10362
Comments
Hi thanks for reporting!
How exactly did you do that? As far as I can see from the
If you have debug logs enabled, you should be able to find such logs indicating that the checker will not schedule a check for this service.
|
Hi yhabteab, you are right, the check was not enabled, but manually executed, because it was created for this test only. I found the issue with another check, but to make sure the command (script) content is not triggering anything unexpected, I created a new command and service explicitly to verify the behaviour. Shouldn't make a difference, if I trigger the check via GUI manually or via timer. Regards, |
If you set Even if icinga2/lib/icinga/checkable-check.cpp Lines 577 to 579 in 206d7cd
So if these processes are indeed started by Icinga 2 and all belong to the same service, please provide the debug logs of Icinga 2 (you can anonymise all sensible information, such as host, service name etc.) and an output of e.g. the ps command on Linux ( |
You might be right here, I assume I already deactivated the service after the tests when I ran that command. But however, the check was triggered via GUI and that's not what needs to be discussed here.
But I can see them, if I search for the job name:
And you can see the relation when querying the service status:
I attached a debug.log from endpoint node containing 3 consecutive starts. That matches the ouput above. |
The debug logs don't say too much as the commands seem to be triggered from a remote endpoint. In order to see if these processes belong to the very same service, I would need the debug logs from your satellite sending these commands. [2025-03-10 09:36:09 +0100] notice/JsonRpcConnection: Received 'event::ExecuteCommand' message from identity '<satellite>'.
[2025-03-10 09:36:09 +0100] notice/Process: Running command '/usr/lib64/nagios/plugins/spawn_test': PID 1276294
...
[2025-03-10 09:37:39 +0100] notice/JsonRpcConnection: Received 'event::ExecuteCommand' message from identity '<satellite>'.
[2025-03-10 09:37:39 +0100] notice/Process: Running command '/usr/lib64/nagios/plugins/spawn_test': PID 1282168
...
[2025-03-10 09:39:09 +0100] notice/JsonRpcConnection: Received 'event::ExecuteCommand' message from identity '<satellite>'.
[2025-03-10 09:39:09 +0100] notice/Process: Running command '/usr/lib64/nagios/plugins/spawn_test': PID 1287278 |
Fresh run, endpoint values for reference:
Debug log of satellite system is attached. Service name is "test_service". |
Thanks for the debug logs! I can only partially confirm the issue as a bug in Icinga 2, i.e. with my previous comments I was assuming we were talking about the normal/locally executed checks, but that is no longer the case. What I mean by partial is that Icinga 2 should not be scheduling a new check every minute on its own, even if you have a huge A side note on why Icinga 2 runs a check more or less every minute is due to the following: icinga2/lib/icinga/checkable-check.cpp Lines 635 to 639 in 206d7cd
|
In the forum post there is also a:
Shouldn't this make any retry_interval obsolete? Best Regards |
Hi, as the new issue title implies, this has nothing to do with |
Describe the bug
Setting
retry interval
for a service to a non-default value, will not be honored. If the service status is != 0, it will be retried every minute, regardless of the retry interval setting. It will even be retried, before the current check run finished (if it is running longer then 1 minute), whereas this might be intentional to prevent infinite waiting states.To Reproduce
Have a look at this community post please, it contains the test results.
check interval=14400
,retry interval=3600
andcheck timeout=14400
(just examples)You'll see 1 additional instance of the command started (about) every minute, until the first started one comes back after 10 minutes and sets the service status to
OK
( =0 ). From this moment on no new process instances will be launched (because we don't to rerun the check anymore), only the result string is getting updated in Icinga database whenever one of the running instances finishes.Expected behavior
Retry runs in case of a check result != 0 should not start 1 minute after we entered that state, but after
retry interval
(in this example after 1 hour).Screenshots
Have a look at this community post please, it contains the test results.
Your Environment
Include as many relevant details about the environment you experienced the problem in
icinga2 --version
): r2.14.5-1icinga2 feature list
): api checker ido-mysql mainlog notificationicinga2 daemon -C
): no errorszones.conf
file (oricinga2 object list --type Endpoint
andicinga2 object list --type Zone
) from all affected nodes. -> Sorry, not allowed to provide hostnames here.Additional context
The text was updated successfully, but these errors were encountered: