Health-Check might check for non-existing files, marking mirror down by mistake. Could be improved #152

elboulangero · 2023-11-02T04:42:59Z

Background - how does health-check work

How health-check works? It's done every minute by default. Mirrorbits does a HTTP request (HEAD) on a random file served by the mirror. If the request is successful, mirror is marked as up, otherwise mirror is marked as down.

To go a bit more in depth: Mirrorbits gets the file from the hash HANDLEDFILES_<mirror-id>. This hash contains the files that are 1) on the mirror and 2) on the local repo (the source, in mirrorbits-speak). So it's the intersection between these two sets. It means that the HANDLEDFILES hash doesn't contain extra files that would be on the mirror but not in the source (if any), and doesn't contain files that are in the source but not on the mirror (if any). So it should be pretty good at picking a suitable file

The value of HANDLEDFILES_<mirror-id> is updated every time a mirror scan completes.

Issue - the theory

The issue lies with the last line. Every time a mirror is updated, HANDLEDFILES will be outdated, until the mirror is scanned. Assuming mirrors are scanned every hour, then there's a window of one hour at most during which HANDLEDFILES contain files that might not be on the mirror anymore. If the health-check picks one of those files, the mirror will return 404, and the mirror is marked as down. If the health-check picks a file that is still on the mirror, all good, the mirror is up. Assuming health-check is done every minute, then we have a one hour window during which mirrors might appear as "flaky", and go up and down every minute.

Issue - in practice

Is it really an issue? Well, depends on how many files disappear when the repo is updated, compared to the total number of file.

Let's look at the Kali Linux images, in numbers:

# cd /srv/mirrors/kali-images
# find -type f | wc -l
174
# find kali-weekly/ | wc -l
89
# find kali-weekly/ | grep -- -W41- | wc -l
42
# find kali-weekly/ | grep -- -W42- | wc -l
42

To say it words:

in the Kali Linux image repo, there are around 175 files
50% are weekly images, and we have two weeks of weekly images
so 25% of the files are for the current weekly image, and 25% are for the last weekly image

Once a week, this repo is updated, a new weekly image is added, and the old weekly image is removed. Meaning: once a week, when the repo is updated, 25% of the files in the repo disappear.

For the health-check, it means that, during a one hour window, it has 1 chance out of 4 to pick a file that is not on the mirror anymore, and to mark the mirror down.

So, once a week, during a one hour window, the mirrors seem to be flaky, and go up and down from mirrorbits point of view. We can see it with this graph that shows around 10 days of data, and that check the availability of an image in the repo. We can clearly see the two moments when the repo was updated with a new weekly image, causing mirrors to be marked up/down by mirrorbits.

Mitigation and possible improvements

The easy mitigation for a mirrorbits user is just to reduce the scan interval (eg. to 30 minutes). It work for Kali Linux images, as there are only 175 files in this repo, so scanning is quick. So it's Ok to reduce the scan interval.

I think this issue could be mitigated in mirrorbits as well, here are a few ideas:

easy to implement: add a "404 counter" and mark a mirror down only after it returned 404 X times in a row. Might want to expose this setting in the config file.
harder, and maybe not better: limit the health-check to files in a certain location of the repo, expose this setting in the config file. Implementation-wise, it's awkward as mirrorbits just picks a random file (efficient). Picking a random file within a directory might be less efficient.
limit health-check to one particular file only. Might even make more sense than picking a random file, for some users?

The text was updated successfully, but these errors were encountered:

jbkempf · 2023-11-07T13:45:31Z

I think it is a good idea, yes.

elboulangero · 2023-11-08T02:21:45Z

Which one? The 404 counter?

lazka · 2024-03-12T06:20:51Z

imo limiting it to a single file is good enough (similar to TraceFileLocation config wise, even the same file could be used)

Another heuristic would be to check the "newest" file each mirror has according to the last scan, assuming only old files get removed. But that would fail if files constantly get added an removed again.

elboulangero changed the title ~~Health-Check might check for non-existing files, could be improved~~ Health-Check might check for non-existing files, marking mirror down by mistake. Could be improved Nov 2, 2023

jbkempf added enhancement help wanted labels Nov 7, 2023

elboulangero mentioned this issue Jun 18, 2024

New Release? #179

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Health-Check might check for non-existing files, marking mirror down by mistake. Could be improved #152

Health-Check might check for non-existing files, marking mirror down by mistake. Could be improved #152

elboulangero commented Nov 2, 2023

jbkempf commented Nov 7, 2023

elboulangero commented Nov 8, 2023

lazka commented Mar 12, 2024 •

edited

Loading

Health-Check might check for non-existing files, marking mirror down by mistake. Could be improved #152

Health-Check might check for non-existing files, marking mirror down by mistake. Could be improved #152

Comments

elboulangero commented Nov 2, 2023

Background - how does health-check work

Issue - the theory

Issue - in practice

Mitigation and possible improvements

jbkempf commented Nov 7, 2023

elboulangero commented Nov 8, 2023

lazka commented Mar 12, 2024 • edited Loading

lazka commented Mar 12, 2024 •

edited

Loading