Copy the spark-specific implementation of murmur32 hash from cudf into spark-rapids-jni #1246

nvdbaranec · 2023-06-30T21:29:24Z

This duplicates the implementation, cpp tests and java tests for the spark-specific murmur32 hash done in cudf by @rwlee. The jni bindings now point to this implementation instead of cudf so in theory we could deprecate what's left in cudf.

One thing I didn't do was trim out the "Spark" name prefixes scattered around the code. Since the code is properly in Spark-land now there's no real need to be so verbose. If people want, I can clean it up.

Includes a small refactoring of the java decimal128 conversion code (see to_java_bigdecimal in hash.cuh) which will also be used by xxhash64.

~~Dependent on #1244~~

…mur hash instead of the cudf version. Brought over cpp and java tests.

jlowe

Marking this as a draft since it depends upon #1244.

This needs a signoff commit per the contributor guidelines.

Signed-off-by: db <[email protected]>

rwlee

Looks good to me, code is largely identical to the cudf code with minimal refactoring. Only concern is if the to_java_bigdecimal belongs with the decimal utils, but I think that's a nit.

src/main/cpp/tests/hash.cpp

src/test/java/com/nvidia/spark/rapids/jni/HashTest.java

src/main/cpp/src/hash.cuh

src/main/cpp/src/HashJni.cpp

src/main/java/com/nvidia/spark/rapids/jni/Hash.java

nvdbaranec · 2023-07-07T18:47:45Z

build

abellina · 2023-07-07T18:50:40Z

build

nvdbaranec · 2023-07-07T21:36:17Z

merge

nvdbaranec added 6 commits June 30, 2023 11:15

Back port spark-specific murmur32 hash code from cudf.

b03c47d

Run pre-commit to format files. We were behind a bit.

03f18eb

Merge branch 'pre_commit_pass' into murmur_hash_move

c06f9c1

Update pre-commit config to 16.0.1 to match cudf. Re-ran formatting.

39cec08

Merge branch 'pre_commit_pass' into murmur_hash_move

59aed1b

Change jni bindings to use the spark-rapids-jni implementation of mur…

963f475

…mur hash instead of the cudf version. Brought over cpp and java tests.

nvdbaranec added the enhancement New feature or request label Jun 30, 2023

nvdbaranec requested review from revans2 and rwlee June 30, 2023 21:29

nvdbaranec changed the title ~~Copy the spark-specific implementation of murmur32 has from cudf into spark-rapids-jni~~ Copy the spark-specific implementation of murmur32 hash from cudf into spark-rapids-jni Jun 30, 2023

nvdbaranec added 2 commits June 30, 2023 16:30

Documentation fix.

029ce11

Fix cpp tests to actually call the spark_rapids_jni murmur hash.

acb834c

jlowe reviewed Jul 3, 2023

View reviewed changes

jlowe marked this pull request as draft July 3, 2023 15:05

jlowe mentioned this pull request Jul 3, 2023

Add xxhash64 support #1248

Merged

nvdbaranec added 3 commits July 5, 2023 10:22

Moved murmur32 hash implementaion from cudf to spark-rapids-jni

a63e155

Signed-off-by: db <[email protected]>

Merge branch 'branch-23.08' into murmur_hash_move

3583eea

Add missing newlines.

14b9e29

nvdbaranec marked this pull request as ready for review July 5, 2023 20:17

nvdbaranec requested a review from jlowe July 5, 2023 22:21

rwlee previously approved these changes Jul 6, 2023

View reviewed changes

src/main/cpp/tests/hash.cpp Outdated Show resolved Hide resolved

src/test/java/com/nvidia/spark/rapids/jni/HashTest.java Outdated Show resolved Hide resolved

src/main/cpp/src/hash.cuh Show resolved Hide resolved

jlowe reviewed Jul 6, 2023

View reviewed changes

src/main/cpp/src/HashJni.cpp Outdated Show resolved Hide resolved

src/main/cpp/src/HashJni.cpp Show resolved Hide resolved

src/main/java/com/nvidia/spark/rapids/jni/Hash.java Outdated Show resolved Hide resolved

PR review changes.

ed4f54b

nvdbaranec dismissed rwlee’s stale review via ed4f54b July 6, 2023 16:07

nvdbaranec requested review from jlowe and rwlee July 6, 2023 16:17

jlowe approved these changes Jul 7, 2023

View reviewed changes

rwlee approved these changes Jul 7, 2023

View reviewed changes

nvdbaranec merged commit f3588b9 into NVIDIA:branch-23.08 Jul 7, 2023
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Copy the spark-specific implementation of murmur32 hash from cudf into spark-rapids-jni #1246

Copy the spark-specific implementation of murmur32 hash from cudf into spark-rapids-jni #1246

nvdbaranec commented Jun 30, 2023 •

edited

Loading

jlowe left a comment

rwlee left a comment

nvdbaranec commented Jul 7, 2023

abellina commented Jul 7, 2023

nvdbaranec commented Jul 7, 2023

Copy the spark-specific implementation of murmur32 hash from cudf into spark-rapids-jni #1246

Copy the spark-specific implementation of murmur32 hash from cudf into spark-rapids-jni #1246

Conversation

nvdbaranec commented Jun 30, 2023 • edited Loading

jlowe left a comment

Choose a reason for hiding this comment

rwlee left a comment

Choose a reason for hiding this comment

nvdbaranec commented Jul 7, 2023

abellina commented Jul 7, 2023

nvdbaranec commented Jul 7, 2023

nvdbaranec commented Jun 30, 2023 •

edited

Loading