Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add xxhash64 support #1248

Merged
merged 27 commits into from
Jul 12, 2023
Merged
Show file tree
Hide file tree
Changes from 26 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
b03c47d
Back port spark-specific murmur32 hash code from cudf.
nvdbaranec Jun 30, 2023
03f18eb
Run pre-commit to format files. We were behind a bit.
nvdbaranec Jun 30, 2023
c06f9c1
Merge branch 'pre_commit_pass' into murmur_hash_move
nvdbaranec Jun 30, 2023
39cec08
Update pre-commit config to 16.0.1 to match cudf. Re-ran formatting.
nvdbaranec Jun 30, 2023
59aed1b
Merge branch 'pre_commit_pass' into murmur_hash_move
nvdbaranec Jun 30, 2023
963f475
Change jni bindings to use the spark-rapids-jni implementation of mur…
nvdbaranec Jun 30, 2023
029ce11
Documentation fix.
nvdbaranec Jun 30, 2023
acb834c
Fix cpp tests to actually call the spark_rapids_jni murmur hash.
nvdbaranec Jun 30, 2023
8db7b76
First pass at xxhash64. cpp tests passing.
nvdbaranec Jul 1, 2023
cb11e73
Improve cpp tests - null cases and more floating point edge cases.
nvdbaranec Jul 1, 2023
0b85a03
Add Java tests.
nvdbaranec Jul 1, 2023
a63e155
Moved murmur32 hash implementaion from cudf to spark-rapids-jni
nvdbaranec Jul 5, 2023
b59cab4
PR review changes.
nvdbaranec Jul 5, 2023
622f89b
Fix copyright data in Hash.java
nvdbaranec Jul 5, 2023
8af2107
Enable 32 bit decimal hash test.
nvdbaranec Jul 5, 2023
55eafd0
Implement xxhash64 on the gpu
nvdbaranec Jul 5, 2023
3583eea
Merge branch 'branch-23.08' into murmur_hash_move
nvdbaranec Jul 5, 2023
14b9e29
Add missing newlines.
nvdbaranec Jul 5, 2023
7f3ed1e
Merge branch 'murmur_hash_move' into xxhash64_support
nvdbaranec Jul 6, 2023
ed4f54b
PR review changes.
nvdbaranec Jul 6, 2023
d7a7e16
Merge branch 'murmur_hash_move' into xxhash64_support
nvdbaranec Jul 6, 2023
6d67c00
Remove default xxhash64 class constructor. Remove unused parameter (…
nvdbaranec Jul 7, 2023
590886f
Merge branch 'branch-23.08' into xxhash64_support
nvdbaranec Jul 7, 2023
a344623
Fix issues with merge.
nvdbaranec Jul 7, 2023
198c000
Rectify thirdparty/cudf issues.
nvdbaranec Jul 10, 2023
e35d96e
Revert inadvertent change to pom.xml
nvdbaranec Jul 10, 2023
0b84b7d
PR review feedback changes.
nvdbaranec Jul 11, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions src/main/cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -161,6 +161,7 @@ add_library(
src/map_utils.cu
src/murmur_hash.cu
src/row_conversion.cu
src/xxhash64.cu
src/zorder.cu
)

Expand Down
17 changes: 17 additions & 0 deletions src/main/cpp/src/HashJni.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -36,4 +36,21 @@ JNIEXPORT jlong JNICALL Java_com_nvidia_spark_rapids_jni_Hash_murmurHash32(
}
CATCH_STD(env, 0);
}

JNIEXPORT jlong JNICALL Java_com_nvidia_spark_rapids_jni_Hash_xxhash64(JNIEnv* env,
jclass,
jlong seed,
jlongArray column_handles)
{
JNI_NULL_CHECK(env, column_handles, "array of column handles is null", 0);

try {
cudf::jni::auto_set_device(env);
auto column_views =
nvdbaranec marked this conversation as resolved.
Show resolved Hide resolved
cudf::jni::native_jpointerArray<cudf::column_view>{env, column_handles}.get_dereferenced();
return cudf::jni::release_as_jlong(
spark_rapids_jni::xxhash64(cudf::table_view{column_views}, seed));
}
CATCH_STD(env, 0);
}
}
18 changes: 18 additions & 0 deletions src/main/cpp/src/hash.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@

namespace spark_rapids_jni {

constexpr int64_t DEFAULT_XXHASH64_SEED = 42;

/**
* @brief Converts a cudf decimal128 value to a java bigdecimal value.
*
Expand Down Expand Up @@ -91,4 +93,20 @@ std::unique_ptr<cudf::column> murmur_hash3_32(
rmm::cuda_stream_view stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
* @brief Computes the xxhash64 hash value of each row in the input set of columns.
*
* @param input The table of columns to hash
* @param seed Optional seed value to use for the hash function
* @param stream CUDA stream used for device memory operations and kernel launches
* @param mr Device memory resource used to allocate the returned column's device memory
*
* @returns A column where each row is the hash of a column from the input.
*/
std::unique_ptr<cudf::column> xxhash64(
cudf::table_view const& input,
int64_t seed = DEFAULT_XXHASH64_SEED,
rmm::cuda_stream_view stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

} // namespace spark_rapids_jni
Loading