Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review hash functions #859

Open
bnoordhuis opened this issue Jan 28, 2025 · 0 comments
Open

Review hash functions #859

bnoordhuis opened this issue Jan 28, 2025 · 0 comments
Labels
enhancement New feature or request

Comments

@bnoordhuis
Copy link
Contributor

bnoordhuis commented Jan 28, 2025

I've observed that the test case from #456 (which has about 15,000 variables, IIRC) shows pretty bad collision rates, even after switching from k&r hash to perl hash.

I suspect two things are in play:

  1. We use 32 bits hash functions. Per the Birthday Paradox, that means a 50% chance of collision after ~25,000 elements, and growing rapidly1. A 64 bits hash function reaches that point only after 1.6 billion elements.

  2. We use multiplicative hashes and those tend to have poor entropy in their low bits. Perl hash tries to mitigate that with a final shuffle but I have a hunch that a 64 bits big prime multiplicative hash performs better.

Counterpoint contra (2): we could get better distribution out of our 32 bits multiplicative hash functions with h >> (32-N), where N is a power of two, but only if we used power-ot-two hash tables. But we don't because those are more memory hungry.


1 ninja edit: which we then bucket - truncating the result, basically - compounding the effect

@bnoordhuis bnoordhuis added the enhancement New feature or request label Jan 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant