Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bloom filter support. #1269

Closed
wants to merge 44 commits into from

Conversation

nvdbaranec
Copy link
Collaborator

@nvdbaranec nvdbaranec commented Jul 14, 2023

Adds support for Spark-style bloom filters via the BloomFilter class. The gpu implementation is in spark-rapids-jni itself and not cudf.

Added benchmark for bloom_filter_put. On an A5000, we're getting 140 GB/s write-throughput for bloom filter sizes of 512k, 1MB, 2MB, 4MB and 8MB. 12.5 milliseconds for 150 million rows. So it's not lightning fast, but it's serviceable.

Also fixed several assorted benchmark build errors. The cudf push for always providing null counts and specifying stream/mr broke a few of them.

…mur hash instead of the cudf version. Brought over

cpp and java tests.
…nents instead of an instance. Change BloomFilterInterfaces to take a

BaseDeviceMemoryBuffer instead of a DeviceMemoryBuffer. Handle some exception cases. Reordered some function parameter lists for consistency/cleanliness.
revans2
revans2 previously approved these changes Jul 21, 2023
Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After chatting with Dave offline I now think this is good to go.

ttnghia
ttnghia previously approved these changes Jul 21, 2023
…oomFilter class to be more restrictive about bloom filter bit sizes:

must always be a multiple of 64 bits.
@jlowe
Copy link
Member

jlowe commented Jul 27, 2023

build

… Handles nulls in the c++ code : build will ignore null input values and probe will return

null for any input value.
@nvdbaranec nvdbaranec requested a review from jlowe July 28, 2023 15:57
@nvdbaranec nvdbaranec closed this Aug 2, 2023
@nvdbaranec
Copy link
Collaborator Author

Made obsolete by: #1303

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants