Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copy the spark-specific implementation of murmur32 hash from cudf into spark-rapids-jni #1246

Merged
merged 12 commits into from
Jul 7, 2023

Conversation

nvdbaranec
Copy link
Collaborator

@nvdbaranec nvdbaranec commented Jun 30, 2023

This duplicates the implementation, cpp tests and java tests for the spark-specific murmur32 hash done in cudf by @rwlee. The jni bindings now point to this implementation instead of cudf so in theory we could deprecate what's left in cudf.

One thing I didn't do was trim out the "Spark" name prefixes scattered around the code. Since the code is properly in Spark-land now there's no real need to be so verbose. If people want, I can clean it up.

Includes a small refactoring of the java decimal128 conversion code (see to_java_bigdecimal in hash.cuh) which will also be used by xxhash64.

Dependent on #1244

@nvdbaranec nvdbaranec added the enhancement New feature or request label Jun 30, 2023
@nvdbaranec nvdbaranec requested review from revans2 and rwlee June 30, 2023 21:29
@nvdbaranec nvdbaranec changed the title Copy the spark-specific implementation of murmur32 has from cudf into spark-rapids-jni Copy the spark-specific implementation of murmur32 hash from cudf into spark-rapids-jni Jun 30, 2023
Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marking this as a draft since it depends upon #1244.

This needs a signoff commit per the contributor guidelines.

@jlowe jlowe marked this pull request as draft July 3, 2023 15:05
@jlowe jlowe mentioned this pull request Jul 3, 2023
@nvdbaranec nvdbaranec marked this pull request as ready for review July 5, 2023 20:17
@nvdbaranec nvdbaranec requested a review from jlowe July 5, 2023 22:21
rwlee
rwlee previously approved these changes Jul 6, 2023
Copy link
Collaborator

@rwlee rwlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, code is largely identical to the cudf code with minimal refactoring. Only concern is if the to_java_bigdecimal belongs with the decimal utils, but I think that's a nit.

src/main/cpp/tests/hash.cpp Outdated Show resolved Hide resolved
src/test/java/com/nvidia/spark/rapids/jni/HashTest.java Outdated Show resolved Hide resolved
src/main/cpp/src/hash.cuh Show resolved Hide resolved
src/main/cpp/src/HashJni.cpp Outdated Show resolved Hide resolved
src/main/cpp/src/HashJni.cpp Show resolved Hide resolved
src/main/java/com/nvidia/spark/rapids/jni/Hash.java Outdated Show resolved Hide resolved
@nvdbaranec
Copy link
Collaborator Author

build

1 similar comment
@abellina
Copy link
Collaborator

abellina commented Jul 7, 2023

build

@nvdbaranec
Copy link
Collaborator Author

merge

@nvdbaranec nvdbaranec merged commit f3588b9 into NVIDIA:branch-23.08 Jul 7, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants