Avoid SpoolReader/SpoolWriter races with synchronized (atomic-like) updates to spool files #983
+85
−3
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
munin-asyncd
by default queries munin-node in turn for each plugin and writes to spool files. With--fork
this happens in parallel. On the other handmunin-async
iterates through spool files and scans them for new data.Nothing stops spoolreader from reading from a file while (or before) spoolwriter writes to it in a particular round of updates. After such a spoolfetch, the master will have missed any updates written to spool files after spoolreader visited them. If it did fetch data that had already been written in that round, the master will also have bumped the spoolfetch timestamp and therefore it will skip over such data in subsequent spoolfetch, which means data loss.
This race condition is particularly evident when spoolfetch and spoolwriter coincide. For example if
munin-update
runs on the master every 5 minutes andmunin-asyncd
wakes up at the same time, the nodes visited bymunin-update
in the first 30-60 seconds are most likely to exhibit this problem.munin-asyncd --fork
does help, but again if some munin plugins take a long time to run the respective services will be susceptible to data loss.The proposed patch addresses the problem by emulating atomic-like updates to spool files. SpoolWriter writes updates to copies of spool files and moves (renames) them over in one go at the end. In fork mode, this involves pipe IPC where
munin-asyncd
reads files to be committed from its' children processes stdout.