Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Health-Check might check for non-existing files, marking mirror down by mistake. Could be improved #152

Open
elboulangero opened this issue Nov 2, 2023 · 3 comments

Comments

@elboulangero
Copy link
Contributor

Background - how does health-check work

How health-check works? It's done every minute by default. Mirrorbits does a HTTP request (HEAD) on a random file served by the mirror. If the request is successful, mirror is marked as up, otherwise mirror is marked as down.

To go a bit more in depth: Mirrorbits gets the file from the hash HANDLEDFILES_<mirror-id>. This hash contains the files that are 1) on the mirror and 2) on the local repo (the source, in mirrorbits-speak). So it's the intersection between these two sets. It means that the HANDLEDFILES hash doesn't contain extra files that would be on the mirror but not in the source (if any), and doesn't contain files that are in the source but not on the mirror (if any). So it should be pretty good at picking a suitable file

The value of HANDLEDFILES_<mirror-id> is updated every time a mirror scan completes.

Issue - the theory

The issue lies with the last line. Every time a mirror is updated, HANDLEDFILES will be outdated, until the mirror is scanned. Assuming mirrors are scanned every hour, then there's a window of one hour at most during which HANDLEDFILES contain files that might not be on the mirror anymore. If the health-check picks one of those files, the mirror will return 404, and the mirror is marked as down. If the health-check picks a file that is still on the mirror, all good, the mirror is up. Assuming health-check is done every minute, then we have a one hour window during which mirrors might appear as "flaky", and go up and down every minute.

Issue - in practice

Is it really an issue? Well, depends on how many files disappear when the repo is updated, compared to the total number of file.

Let's look at the Kali Linux images, in numbers:

# cd /srv/mirrors/kali-images
# find -type f | wc -l
174
# find kali-weekly/ | wc -l
89
# find kali-weekly/ | grep -- -W41- | wc -l
42
# find kali-weekly/ | grep -- -W42- | wc -l
42

To say it words:

  • in the Kali Linux image repo, there are around 175 files
  • 50% are weekly images, and we have two weeks of weekly images
  • so 25% of the files are for the current weekly image, and 25% are for the last weekly image

Once a week, this repo is updated, a new weekly image is added, and the old weekly image is removed. Meaning: once a week, when the repo is updated, 25% of the files in the repo disappear.

For the health-check, it means that, during a one hour window, it has 1 chance out of 4 to pick a file that is not on the mirror anymore, and to mark the mirror down.

So, once a week, during a one hour window, the mirrors seem to be flaky, and go up and down from mirrorbits point of view. We can see it with this graph that shows around 10 days of data, and that check the availability of an image in the repo. We can clearly see the two moments when the repo was updated with a new weekly image, causing mirrors to be marked up/down by mirrorbits.

installer-2-weeks

Mitigation and possible improvements

The easy mitigation for a mirrorbits user is just to reduce the scan interval (eg. to 30 minutes). It work for Kali Linux images, as there are only 175 files in this repo, so scanning is quick. So it's Ok to reduce the scan interval.

I think this issue could be mitigated in mirrorbits as well, here are a few ideas:

  • easy to implement: add a "404 counter" and mark a mirror down only after it returned 404 X times in a row. Might want to expose this setting in the config file.
  • harder, and maybe not better: limit the health-check to files in a certain location of the repo, expose this setting in the config file. Implementation-wise, it's awkward as mirrorbits just picks a random file (efficient). Picking a random file within a directory might be less efficient.
  • limit health-check to one particular file only. Might even make more sense than picking a random file, for some users?
@elboulangero elboulangero changed the title Health-Check might check for non-existing files, could be improved Health-Check might check for non-existing files, marking mirror down by mistake. Could be improved Nov 2, 2023
@jbkempf
Copy link
Collaborator

jbkempf commented Nov 7, 2023

I think it is a good idea, yes.

@elboulangero
Copy link
Contributor Author

Which one? The 404 counter?

@lazka
Copy link
Contributor

lazka commented Mar 12, 2024

imo limiting it to a single file is good enough (similar to TraceFileLocation config wise, even the same file could be used)

Another heuristic would be to check the "newest" file each mirror has according to the last scan, assuming only old files get removed. But that would fail if files constantly get added an removed again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants