Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Is there a recommendation for the smallest interval to submit the external command SAVE_STATE_INFORMATION? #435

Open
ccztux opened this issue Aug 2, 2023 · 2 comments
Labels

Comments

@ccztux
Copy link
Contributor

ccztux commented Aug 2, 2023

We have a monitoring cluster and the important files like the retention.dat were synced. Actually the external command SAVE_STATE_INFORMATION will be executed every 5 minutes by the cluster software on the active cluster node.

The retention_update_interval in naemon.cfg is a value in minutes:

# RETENTION DATA UPDATE INTERVAL
# This setting determines how often (in minutes) that Naemon
# will automatically save retention data during normal operation.
# If you set this value to 0, Naemon will not save retention
# data at regular interval, but it will still save retention
# data before shutting down or restarting.  If you have disabled
# state retention, this option has no effect.

retention_update_interval=60

Now my question is if there is a recommendation for the smallest interval the external command SAVE_STATE_INFORMATION should be executed. We would like to decrease the actual value of 5 minutes to something between 10 and 30 seconds.

@sni
Copy link
Contributor

sni commented Aug 13, 2023

The file will be stored on shutdown as well, so the interval is only relevant for cases where the cluster is suddenly divided or the active core crashes.
As far as i know, without digging into the code, saving the retention information blocks the
core and prevents scheduling new checks. And depending on the size of this installation (and the disk performance), it might take several seconds to complete the action.
I usually would not set this value to less than 5minutes. But one minute should be ok as well if the file is written in less then 5 seconds.

@nook24
Copy link
Member

nook24 commented Aug 14, 2023

As far as i know, without digging into the code, saving the retention information blocks the
core and prevents scheduling new checks. And depending on the size of this installation (and the disk performance), it might take several seconds to complete the action.

That's also my knowledge of the retention update process. It gets scheduled and executed from the main pid and should therefore also block the main loop of the core.

We had measured an execution time of 30 to 40 seconds, so we decided to set the default update interval to 60 minutes on all of our systems. It's worth mentioning that these measurements were done years ago, when most systems used HDDs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants