-
Couldn't load subscription status.
- Fork 80
MCOL-5758 Bloom filter pre join GSOC 2025 #3682
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
AestheticAkhmad
wants to merge
22
commits into
mariadb-corporation:stable-23.10
Choose a base branch
from
AestheticAkhmad:MCOL-5758-BLOOM-FILTER-PRE-JOIN-GSOC-2025-REDESIGNED
base: stable-23.10
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 15 commits
Commits
Show all changes
22 commits
Select commit
Hold shift + click to select a range
f2e81f0
Implemented Bloom Filter's interface, and main methods
AestheticAkhmad 9eca168
Merge branch 'stable-23.10' into MCOL-5758-BLOOM-FILTER-PRE-JOIN-GSOC…
mariadb-LeonidFedorov cf7c1ee
Change vector to array
AestheticAkhmad 7589dea
Update vec to array due to fixed size, update datatypes of BF to 32bit
AestheticAkhmad b556725
Add Bloom filter to TupleHashJoinStepp
AestheticAkhmad 8ddfe83
Merge branch 'stable-23.10' into MCOL-5758-BLOOM-FILTER-PRE-JOIN-GSOC…
AestheticAkhmad 8fd70a0
Add BF and populate in TupleJoiner
AestheticAkhmad 5306a3b
use consistent names for BFs in TupleJoiner
AestheticAkhmad 5a10d7e
Implemented Bloom filter's serialize/deserialize methods
AestheticAkhmad 3383a46
Pass Bloom filter from TupleHashJoinStep to BatchPrimitiveProcessorJL
AestheticAkhmad 3eecc51
Merge branch 'stable-23.10' into MCOL-5758-BLOOM-FILTER-PRE-JOIN-GSOC…
AestheticAkhmad 2d54d55
Revert "Pass Bloom filter from TupleHashJoinStep to BatchPrimitivePro…
AestheticAkhmad 8ca6269
Merge branch 'stable-23.10' into MCOL-5758-BLOOM-FILTER-PRE-JOIN-GSOC…
AestheticAkhmad 389311c
Pass BFs to TupleBPS for further steps
AestheticAkhmad 6ea996f
Pass BF to BPP through TBPS->BPPSeeder
AestheticAkhmad 4895f26
Serialize BFs when in UM, type-alias strcuture of BFs
AestheticAkhmad 071db76
Merge branch 'stable-23.10' into MCOL-5758-BLOOM-FILTER-PRE-JOIN-GSOC…
AestheticAkhmad 0c01743
Add namespace joblist before BFs
AestheticAkhmad fd5d459
Serialize BFs size, so BPP is aware of it
AestheticAkhmad aff3a67
Merge branch 'stable-23.10' into MCOL-5758-BLOOM-FILTER-PRE-JOIN-GSOC…
AestheticAkhmad ab91611
Synchronize BF and BPP::execute, filter rows with BF
AestheticAkhmad ad805c1
Merge branch 'MCOL-5758-BLOOM-FILTER-PRE-JOIN-GSOC-2025-REDESIGNED' o…
AestheticAkhmad File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,78 @@ | ||
| #include "blockedbloomfilter.h" | ||
|
|
||
| namespace joblist | ||
| { | ||
|
|
||
| void BlockedBloomFilter::insert(uint32_t hash) | ||
| { | ||
| uint32_t blockIdx = hash % BLOOM_FILTER_BLOCK_COUNT; | ||
| uint64_t bitmask = 0; | ||
|
|
||
| for (const auto& salt : SALTS) | ||
| { | ||
| uint32_t mixed = mix32(hash ^ salt); | ||
| uint8_t bitIdx = mixed % 64; | ||
|
|
||
| bitmask |= (1ULL << bitIdx); | ||
| } | ||
|
|
||
| bloomFilter[blockIdx].fetch_or(bitmask, std::memory_order_relaxed); | ||
| } | ||
|
|
||
| bool BlockedBloomFilter::probe(uint32_t hash) const | ||
| { | ||
| uint32_t blockIdx = hash % BLOOM_FILTER_BLOCK_COUNT; | ||
| uint64_t block = bloomFilter[blockIdx].load(std::memory_order_relaxed); | ||
|
|
||
| for (const auto& salt : SALTS) | ||
| { | ||
| uint32_t mixed = mix32(hash ^ salt); | ||
| uint8_t bitIdx = mixed % 64; | ||
|
|
||
| if ((block & (1ULL << bitIdx)) == 0) | ||
| { | ||
| return false; | ||
| } | ||
|
|
||
| } | ||
|
|
||
| return true; | ||
| } | ||
|
|
||
| // SplitMix | ||
| inline uint32_t BlockedBloomFilter::mix32(uint32_t hash) const | ||
| { | ||
| hash ^= hash >> 16; | ||
| hash *= 0x85ebca6b; | ||
| hash ^= hash >> 13; | ||
| hash *= 0xc2b2ae35; | ||
| hash ^= hash >> 16; | ||
|
|
||
| return hash; | ||
| } | ||
|
|
||
| void BlockedBloomFilter::serialize(messageqcpp::ByteStream& bs) const | ||
| { | ||
| for (const auto& block : bloomFilter) | ||
| { | ||
| bs << block.load(std::memory_order_relaxed); | ||
| } | ||
| } | ||
|
|
||
| void BlockedBloomFilter::deserialize(messageqcpp::ByteStream& bs) | ||
| { | ||
| for (auto& block : bloomFilter) | ||
| { | ||
| uint64_t val; | ||
| bs >> val; | ||
| block.store(val, std::memory_order_relaxed); | ||
| } | ||
| } | ||
|
|
||
| size_t BlockedBloomFilter::getSize() const | ||
| { | ||
| return bloomFilter.size(); | ||
| } | ||
|
|
||
| } // namespace joblist | ||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,67 @@ | ||
| #pragma once | ||
|
|
||
| #include <array> | ||
| #include <atomic> | ||
| #include <cmath> | ||
|
|
||
| // Debug | ||
| #include <string> | ||
| #include <bitset> | ||
|
|
||
| #include "bytestream.h" | ||
|
|
||
| namespace joblist | ||
| { | ||
|
|
||
| class BlockedBloomFilter | ||
| { | ||
| public: | ||
| BlockedBloomFilter() = default; | ||
|
|
||
| void insert(uint32_t hash); | ||
| bool probe(uint32_t hash) const; | ||
|
|
||
| void serialize(messageqcpp::ByteStream& bs) const; | ||
| void deserialize(messageqcpp::ByteStream& bs); | ||
|
|
||
| size_t getSize() const; | ||
|
|
||
| private: | ||
| // Member variables | ||
| static constexpr uint8_t HASH_FUNC_COUNT = 8; | ||
| static constexpr uint32_t SALTS[HASH_FUNC_COUNT] = | ||
| { | ||
| 0x47b6137b, | ||
| 0x44974d91, | ||
| 0x8824ad5b, | ||
| 0xa2b7289d, | ||
| 0x705495c7, | ||
| 0x2df1424b, | ||
| 0x9efc4947, | ||
| 0x5c6bfb31 | ||
| }; | ||
|
|
||
| // Calculating BF's parameters at compile-time | ||
| static constexpr uint8_t BLOCK_SIZE = 64; | ||
| static constexpr uint32_t EXTENT_SIZE = 8'000'000UL; | ||
| static constexpr uint32_t DOUBLE_EXTENT_SIZE = 2*8'000'000UL; | ||
| static constexpr double FALSE_POSITIVE_RATE = 0.01; | ||
| static constexpr double lnFP = 4.605170186; // lnFP <- |ln(FPR)| | ||
| static constexpr double ln2sqr = 0.4804530139; // pow(ln(2), 2) | ||
| static constexpr uint32_t NUMBER_OF_BITS = (EXTENT_SIZE * lnFP) / ln2sqr; | ||
| static constexpr uint32_t BLOOM_FILTER_BLOCK_COUNT = (NUMBER_OF_BITS + BLOCK_SIZE - 1) / BLOCK_SIZE; | ||
|
|
||
| std::array<std::atomic<uint64_t>, BLOOM_FILTER_BLOCK_COUNT> bloomFilter = {}; | ||
|
|
||
| // Private member functions | ||
| inline uint32_t mix32(uint32_t hash) const; | ||
|
|
||
| }; | ||
|
|
||
|
|
||
|
|
||
|
|
||
| } // namespace joblist | ||
|
|
||
|
|
||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.