Generate hashes with random keys to avoid collisions #212

maurermi · 2023-03-20T16:10:48Z

Address TODO in src/util/common/hashmap.hpp which calls for randomly generated hash keys. Requires use of libsodium for crypto-safe random number generation.

NOTE: This causes certain tests to fail currently because of the added randomness. This will be fixed shortly.

maurermi · 2023-03-20T16:12:46Z

@HalosGhost some tests currently fail with this update because of the added randomness, but wanted to push it here first to get initial thoughts. My main question is whether libsodium is the most appropriate choice here, so if you have thoughts, I'd appreciate hearing them.

maurermi · 2023-03-20T16:15:06Z

src/util/common/hashmap.hpp

-            // TODO: pick the initialization key at random.
-            static constexpr std::array<uint64_t, 2> siphash_key{0x1337,
-                                                                 0x1337};
+            std::array<uint64_t, 2> siphash_key{randombytes_random(), randombytes_random()};


Also curious whether this seems appropriate. Since the random values can't be determined at compile time, this update required making this value non-constant. Dynamic RNG seems like a good choice to me, but curious if static would be more appropriate here.

My main question is whether libsodium is the most appropriate choice here, so if you have thoughts, I'd appreciate hearing them.

When in-doubt, libsodium (or monocypher, in my opinion) is a very good fallback.

However, we already have our own random_source which we use throughout the codebase. Since the purpose of choosing this key at-random is to avoid collision (rather than specifically a given security property with specific requirements), it probably makes more sense to just leverage our utility directly.

Also curious whether this seems appropriate. Since the random values can't be determined at compile time, this update required making this value non-constant. Dynamic RNG seems like a good choice to me, but curious if static would be more appropriate here.

I suppose my immediate wonder is if it would be possible to inject values into that process during the test harness. That way, we could keep our deterministic tests, but still leverage the randomness at runtime. However, we would have then created a divergence between what we're testing and what we're running—and it might be more prudent to modify the tests to handle the randomness (to remain inline).

Mull on that possibility for a little while, and I will attempt to take a look at the affected tests in reasonably short order to have a better handle on the lift.

Ah I completely neglected random_source -- jumped the gun on that one. One potential solution would be in testing, to seed random_source from a fixed file, rather than /dev/urandom as random_source.hpp suggests. However, this does still create divergence. Another, likely more difficult strategy would be when a random value is generated in a test, save it (either in a temporary variable or in a file) for evaluating whether the operation was performed correctly after the fact.

Also upon further inspection, some of the unit tests that are failing are not ones which I'd expect to fail on first glance, so I will need to delve into that more as well.

Potential food-for-thought: random_source takes a file source as its seed. We could, for the test suite, choose a stable, unchanging file to seed from (the most obvious option would be /dev/zero). This then leverages all the same code paths as would be used everywhere already, but because the seed is known, would allow the randomness to be predictable throughout the test suite.

This is only a very minimal divergence (and, at least in part, what seeding is for). Keep it in mind as a possibility as you explore the options.

Address TODO in src/util/common/hashmap.hpp which calls for randomly generated hash keys. Leverages random_source and /dev/urandom to generate crypto-safe randomness. NOTE: This causes certain tests to fail currently because of the added randomness. This will be fixed shortly. (still work in progress) Signed-off-by: Michael Maurer <[email protected]>

maurermi · 2023-04-03T14:42:46Z

Pushed an update to use random_source() based on /dev/urandom, but this is still very much a work in progress.

@HalosGhost here's some of my thoughts on what to do here -- it might be simplest to specify the generator file in a config file (not to add more dependence on config files), and we could create a predictable RNG for a more authentic testing environment -- then in an actual system outside of the tests we specify /dev/random as the generator

maurermi commented Mar 20, 2023

View reviewed changes

maurermi force-pushed the random_hash_keys branch from 526f193 to 7e3171d Compare March 20, 2023 16:27

maurermi force-pushed the random_hash_keys branch from 7e3171d to 8fab4e5 Compare April 3, 2023 14:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate hashes with random keys to avoid collisions #212

Generate hashes with random keys to avoid collisions #212

maurermi commented Mar 20, 2023

maurermi commented Mar 20, 2023

maurermi Mar 20, 2023

HalosGhost Mar 21, 2023 •

edited

Loading

maurermi Mar 21, 2023

HalosGhost Mar 22, 2023 •

edited

Loading

maurermi commented Apr 3, 2023

Generate hashes with random keys to avoid collisions #212

Are you sure you want to change the base?

Generate hashes with random keys to avoid collisions #212

Conversation

maurermi commented Mar 20, 2023

maurermi commented Mar 20, 2023

maurermi Mar 20, 2023

Choose a reason for hiding this comment

HalosGhost Mar 21, 2023 • edited Loading

Choose a reason for hiding this comment

maurermi Mar 21, 2023

Choose a reason for hiding this comment

HalosGhost Mar 22, 2023 • edited Loading

Choose a reason for hiding this comment

maurermi commented Apr 3, 2023

HalosGhost Mar 21, 2023 •

edited

Loading

HalosGhost Mar 22, 2023 •

edited

Loading