Skip to content

Commit

Permalink
Add xxhash64 support (#1248)
Browse files Browse the repository at this point in the history
* Back port spark-specific murmur32 hash code from cudf.

* Run pre-commit to format files. We were behind a bit.

* Update pre-commit config to 16.0.1 to match cudf. Re-ran formatting.

* Change jni bindings to use the spark-rapids-jni implementation of murmur hash instead of the cudf version.  Brought over
cpp and java tests.

* Documentation fix.

* Fix cpp tests to actually call the spark_rapids_jni murmur hash.

* First pass at xxhash64. cpp tests passing.

* Improve cpp tests - null cases and more floating point edge cases.

* Add Java tests.

* Moved murmur32 hash implementaion from cudf to spark-rapids-jni

Signed-off-by: db <[email protected]>

* PR review changes.

* Fix copyright data in Hash.java

* Enable 32 bit decimal hash test.

* Implement xxhash64 on the gpu

Signed-off-by: db <[email protected]>

* Add missing newlines.

* PR review changes.

* Remove default xxhash64 class constructor.  Remove unused parameter (row index) from remaining constructor.

* Fix issues with merge.

* Rectify thirdparty/cudf issues.

* Revert inadvertent change to pom.xml

* PR review feedback changes.

---------

Signed-off-by: db <[email protected]>
  • Loading branch information
nvdbaranec authored Jul 12, 2023
1 parent df792c2 commit c77ab9d
Show file tree
Hide file tree
Showing 7 changed files with 1,017 additions and 31 deletions.
1 change: 1 addition & 0 deletions src/main/cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -161,6 +161,7 @@ add_library(
src/map_utils.cu
src/murmur_hash.cu
src/row_conversion.cu
src/xxhash64.cu
src/zorder.cu
)

Expand Down
17 changes: 17 additions & 0 deletions src/main/cpp/src/HashJni.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -36,4 +36,21 @@ JNIEXPORT jlong JNICALL Java_com_nvidia_spark_rapids_jni_Hash_murmurHash32(
}
CATCH_STD(env, 0);
}

JNIEXPORT jlong JNICALL Java_com_nvidia_spark_rapids_jni_Hash_xxhash64(JNIEnv* env,
jclass,
jlong seed,
jlongArray column_handles)
{
JNI_NULL_CHECK(env, column_handles, "array of column handles is null", 0);

try {
cudf::jni::auto_set_device(env);
auto column_views =
cudf::jni::native_jpointerArray<cudf::column_view>{env, column_handles}.get_dereferenced();
return cudf::jni::release_as_jlong(
spark_rapids_jni::xxhash64(cudf::table_view{column_views}, seed));
}
CATCH_STD(env, 0);
}
}
18 changes: 18 additions & 0 deletions src/main/cpp/src/hash.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@

namespace spark_rapids_jni {

constexpr int64_t DEFAULT_XXHASH64_SEED = 42;

/**
* @brief Converts a cudf decimal128 value to a java bigdecimal value.
*
Expand Down Expand Up @@ -91,4 +93,20 @@ std::unique_ptr<cudf::column> murmur_hash3_32(
rmm::cuda_stream_view stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
* @brief Computes the xxhash64 hash value of each row in the input set of columns.
*
* @param input The table of columns to hash
* @param seed Optional seed value to use for the hash function
* @param stream CUDA stream used for device memory operations and kernel launches
* @param mr Device memory resource used to allocate the returned column's device memory
*
* @returns A column where each row is the hash of a column from the input.
*/
std::unique_ptr<cudf::column> xxhash64(
cudf::table_view const& input,
int64_t seed = DEFAULT_XXHASH64_SEED,
rmm::cuda_stream_view stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

} // namespace spark_rapids_jni
Loading

0 comments on commit c77ab9d

Please sign in to comment.