Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid SpoolReader/SpoolWriter races with synchronized (atomic-like) updates to spool files #983

Open
wants to merge 1 commit into
base: stable-2.0
Choose a base branch
from

Conversation

zmousm
Copy link
Contributor

@zmousm zmousm commented Jul 6, 2018

munin-asyncd by default queries munin-node in turn for each plugin and writes to spool files. With --fork this happens in parallel. On the other hand munin-async iterates through spool files and scans them for new data.

Nothing stops spoolreader from reading from a file while (or before) spoolwriter writes to it in a particular round of updates. After such a spoolfetch, the master will have missed any updates written to spool files after spoolreader visited them. If it did fetch data that had already been written in that round, the master will also have bumped the spoolfetch timestamp and therefore it will skip over such data in subsequent spoolfetch, which means data loss.

This race condition is particularly evident when spoolfetch and spoolwriter coincide. For example if munin-update runs on the master every 5 minutes and munin-asyncd wakes up at the same time, the nodes visited by munin-update in the first 30-60 seconds are most likely to exhibit this problem. munin-asyncd --fork does help, but again if some munin plugins take a long time to run the respective services will be susceptible to data loss.

The proposed patch addresses the problem by emulating atomic-like updates to spool files. SpoolWriter writes updates to copies of spool files and moves (renames) them over in one go at the end. In fork mode, this involves pipe IPC where munin-asyncd reads files to be committed from its' children processes stdout.

This serves to protect against SpoolWriter/SpoolReader race
conditions: Copy spool files, update them and move (rename) them over
in one go at the end. In fork mode, this involves pipe IPC.
@steveschnepp
Copy link
Member

while I do agree with the issue, I'm not really at ease with heavy file copying.

It would be much more convenient to add some kind of locking via flock().

I also thought about enhancing the protocol inside the spoolfetch files to be able for the reader to detect a half-wrote dataset.

@kimheino
Copy link
Contributor

Commit a48026a tries to fix this (or similar case?).

Well... I doesn't fix this, it is just a workaround to hide this problem.

sumpfralle added a commit that referenced this pull request May 29, 2019
2.0.48

The following changes are omitted:
* #1176: Ignore malformed lines when parsing "datafile"
* #983: Let the server use max timestamp rather than now for
  non-timestamped dirtyconfig values
    * this PR should be checked, whether it still applies for master
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants