Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for expiring acknowledgements #391

Closed
wants to merge 19 commits into from

Conversation

roelvanmeer
Copy link
Contributor

This adds support for acknowledgements that can expire.

I've split everything up in small commits. Please let me know if you'd like to have this differently, or if any improvements can be made. For one thing, I was unsure about the duplicate declaration of the handle_(host|service)_acknowledgement_expire_event functions. It might be better to put these in a header file.

I'm currently testing this out on our systems, but I'd like some preliminary feedback if possible.

Closes #302

Parts of this code were based on work done in Icinga v1 by Ricardo
Bartels for Icinga/icinga-core#369.

Closes naemon#302.

Signed-off-by: Roel van Meer <[email protected]>
Signed-off-by: Roel van Meer <[email protected]>
@sni
Copy link
Contributor

sni commented Aug 16, 2022

very nice. I'll try this out here as well.

Copy link
Contributor

@jacobbaungard jacobbaungard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the contribution, and apologies I haven't been able to review this sooner. This is a nice feature. I have code-reviewed and I think it looks really good. I added one minor comment.

Also it would be nice if it is possible to add a test where the expiration of the acknowledgement happens. Create an acknowledgement which expires shortly, check the acknowledgement is there, sleep a a little while, check the acknowledgement has been correctly expired.

@sni did you manage to do some testing of this?

src/naemon/xrddefault.c Outdated Show resolved Hide resolved
@roelvanmeer
Copy link
Contributor Author

Also it would be nice if it is possible to add a test where the expiration of the acknowledgement happens. Create an acknowledgement which expires shortly, check the acknowledgement is there, sleep a a little while, check the acknowledgement has been correctly expired.

I would like to do this, but I wouldn't know how. I also haven't found any tests that checked downtime expiration, so I didn't find anything to base this on. Perhaps someone else can step in here, provide a basis for checking the result of events, or point me to test code that already does this?

Another way may be to test this the light way: schedule an ack that expires in two seconds, wait three, run the expire event manually, and check the results?

@sni
Copy link
Contributor

sni commented Sep 6, 2022

@sni did you manage to do some testing of this?

not yet, i am still on vacation. Will do that next week.

@jacobbaungard
Copy link
Contributor

Another way may be to test this the light way: schedule an ack that expires in two seconds, wait three, run the expire event manually, and check the results?

That sounds like a good way forward.

Copy link
Contributor Author

@roelvanmeer roelvanmeer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a stripped down copy of the test_commands.c file. The test works, but the way the event queue is emptied is not very robust now. If this looks promising, I would like some advice on how to progress.
Ideas: create a test function that empties the event queue, and run that before the test, so after we run the external command we should have to poll the event queue only once. If we do it that way, these tests can also be done in test_commands again, now that I think about it.

@roelvanmeer
Copy link
Contributor Author

Update: function declaration has been de-duplicated, and the test has been added. I'd love to hear what you think.

Copy link
Contributor

@jacobbaungard jacobbaungard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks for adding the additional test. I think the code looks good, except for two small indentation issues, and another very minor comment.

If everything works fine following some manual testing, I think this should be fine to merge. I might be able to find some time late in the week to do so, otherwise perhaps sni can do it next week.

Thanks again for the contribution!

src/naemon/nebstructs.h Outdated Show resolved Hide resolved
src/naemon/objects_host.h Outdated Show resolved Hide resolved
clear_event_queue();
target_host->current_state = STATE_DOWN;
time(&current_time);
nm_asprintf(&cmdstr, "[1234567890] ACKNOWLEDGE_HOST_PROBLEM_EXPIRE;host1;2;0;0;%llu;myself;expire in two seconds", (unsigned long long int)current_time+2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps 1234567890 -> current_time now that we anyway have it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't done in any of the other tests, but I did update the commit and removed the current_time variable; it's not used in other tests either.

Signed-off-by: Roel van Meer <[email protected]>
If neamon starts and the retention data contains expiring acks that
still need to happen, ensure the expire events are properly scheduled.

Signed-off-by: Roel van Meer <[email protected]>
In tests, we want to be able to remove all events but not destroy the
queue.
After destroying the check events, references to those events must be
cleared. If not, scheduling a new check event will try to destroy the
event that is still referenced, which causes a segfault.
This adds a test that checks if the acknowledgement expiry event does
its work.

It starts by removing all events from the event queue. We then run the
command, with expiration set in two seconds. After waiting three
seconds, we process a single event (we should have only one event in the
queue at this point). After processing the event, the acknowledgement
should be gone.
Copy link
Contributor

@sni sni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, thanks a lot

@sni
Copy link
Contributor

sni commented Sep 15, 2022

i had to rebase and push it manually. But it's merged now. Thanks a lot for this nice feature.

@sni sni closed this Sep 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Acknowledge problem temporarily
3 participants