You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've observed that the test case from #456 (which has about 15,000 variables, IIRC) shows pretty bad collision rates, even after switching from k&r hash to perl hash.
I suspect two things are in play:
We use 32 bits hash functions. Per the Birthday Paradox, that means a 50% chance of collision after ~25,000 elements, and growing rapidly1. A 64 bits hash function reaches that point only after 1.6 billion elements.
We use multiplicative hashes and those tend to have poor entropy in their low bits. Perl hash tries to mitigate that with a final shuffle but I have a hunch that a 64 bits big prime multiplicative hash performs better.
Counterpoint contra (2): we could get better distribution out of our 32 bits multiplicative hash functions with h >> (32-N), where N is a power of two, but only if we used power-ot-two hash tables. But we don't because those are more memory hungry.
1 ninja edit: which we then bucket - truncating the result, basically - compounding the effect
The text was updated successfully, but these errors were encountered:
I've observed that the test case from #456 (which has about 15,000 variables, IIRC) shows pretty bad collision rates, even after switching from k&r hash to perl hash.
I suspect two things are in play:
We use 32 bits hash functions. Per the Birthday Paradox, that means a 50% chance of collision after ~25,000 elements, and growing rapidly1. A 64 bits hash function reaches that point only after 1.6 billion elements.
We use multiplicative hashes and those tend to have poor entropy in their low bits. Perl hash tries to mitigate that with a final shuffle but I have a hunch that a 64 bits big prime multiplicative hash performs better.
Counterpoint contra (2): we could get better distribution out of our 32 bits multiplicative hash functions with
h >> (32-N)
, whereN
is a power of two, but only if we used power-ot-two hash tables. But we don't because those are more memory hungry.1 ninja edit: which we then bucket - truncating the result, basically - compounding the effect
The text was updated successfully, but these errors were encountered: