Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nvcomp revert followup #2497

Merged
merged 4 commits into from
Oct 11, 2024
Merged

Conversation

revans2
Copy link
Collaborator

@revans2 revans2 commented Oct 11, 2024

This includes rework from #2483 that didn't make it into 24.10 and an updated version of the patch. The old patch would not apply cleanly in 24.12. The new patch was created by applying the old one to CUDF in 24.10, and then up-merging that branch to 24.12.

@revans2
Copy link
Collaborator Author

revans2 commented Oct 11, 2024

build

@revans2
Copy link
Collaborator Author

revans2 commented Oct 11, 2024

I ran the full set of spark-rapids tests with this and it looks like everything passes.

@gerashegalov
Copy link
Collaborator

Currently we fail the 2nd+ build if we don't add clean every time.

[INFO]      [exec] ~/gits/NVIDIA/spark-rapids-jni.reviews/thirdparty/cudf ~/gits/NVIDIA/spark-rapids-jni.reviews
[INFO]      [exec] Error: CUDF repository has uncommitted changes. No patches will be applied...
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 0.790 s
[INFO] Finished at: 2024-10-11T18:27:43Z
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:3.0.0:run (cudf patch) on project spark-rapids-

We could instead after, all patches successfully apply initially, generate a cumulative.patch somewhere under target.
and check if git -C thirdparty/cudf apply --check --reverse target/cumulative.patch succeeds. If yes, skip applying patches,

mythrocks
mythrocks previously approved these changes Oct 11, 2024
Copy link
Collaborator

@mythrocks mythrocks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing that, @revans2.

gerashegalov
gerashegalov previously approved these changes Oct 11, 2024
Copy link
Collaborator

@gerashegalov gerashegalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM already, the comment about detecting of prior patch applications can be another PR

case compression_type::DEFLATE: return "Deflate";
case compression_type::LZ4: return "LZ4";
- case compression_type::GZIP: return "GZIP";
diff --git a/cpp/src/strings/regex/regcomp.cpp b/cpp/src/strings/regex/regcomp.cpp
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why we're patching regcomp and string extract files as part of fixing cudf to have nvcomp 3.x?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since 24.12 build is totally busted, filed #2498 to track this issue. It's pretty nasty, since it silently doesn't build any changes. Running with outdated binaries relative to the source after a successful build is confusing to debug.

Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no changes to thirdparty/cudf-pins/versions.json to have nvcomp be the old version, so I'm not sure this does what is intended.

@revans2 revans2 marked this pull request as draft October 11, 2024 19:20
@revans2
Copy link
Collaborator Author

revans2 commented Oct 11, 2024

I am not sure what I messed up but the patch is not good. I am working on a fix for it.

@revans2 revans2 dismissed stale reviews from gerashegalov and mythrocks via cdd4db0 October 11, 2024 20:11
@revans2 revans2 marked this pull request as ready for review October 11, 2024 20:11
@revans2
Copy link
Collaborator Author

revans2 commented Oct 11, 2024

Sorry @gerashegalov @mythrocks and @jlowe I messed up when making the patch. I was multitasking and ended up putting up a patch file that was for the wrong thing, but because the versions file was pointing at the new version of nvcomp, then in order for it to pass we just had to get the patch to apply and compile. So it was two wrongs making a right. I think I have that fixed now.

@revans2
Copy link
Collaborator Author

revans2 commented Oct 11, 2024

Or I should say two wrongs making something that compiled, but was still wrong.

@revans2
Copy link
Collaborator Author

revans2 commented Oct 11, 2024

build

Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a lot better. No major changes to the patch file and the pin versions file is being updated to pull in the old nvcomp version.

This is a bit hostile to developers since you cannot build twice in a row without remembering to specify -Dsubmodule.patch.skip=true after the first build. Would like to see this be updated to automatically detect whether the patch has been applied or not if we're going to do this long-term.

@@ -518,6 +520,7 @@
<id>build-sparkrapidsjni</id>
<phase>validate</phase>
<configuration>
<skip>${submodule.patch.skip}</skip>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we skipping building the C++ code if we're skipping patching? This prevents building with subsequent changes on a followup build that refuses to proceed if it's already been patched.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gerashegalov this was your suggestion. I assume that it was to avoid building CUDF in an unpatched state.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to get rid of unnecessary work in the validate phase of the original PR assuming that we don't want to build anything unless the repo is patched.

Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to unblock 24.12 builds.

@sameerz sameerz added the build label Oct 11, 2024
@revans2 revans2 merged commit 9356867 into NVIDIA:branch-24.12 Oct 11, 2024
3 checks passed
@revans2
Copy link
Collaborator Author

revans2 commented Oct 11, 2024

I have some more work to do to make this okay for development that I will do next week. Sorry to everyone who is having problems with this until then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants