-
Notifications
You must be signed in to change notification settings - Fork 1.4k
feat(cudf): Add velox-cudf support for TopN #14797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for meta-velox canceled.
|
0bfa61b
to
2288ca9
Compare
Co-authored-by: Copilot <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds GPU-accelerated TopN operator support to velox-cudf integration. The implementation provides a CudfTopN operator that executes TopN operations on GPU using cuDF libraries, with comprehensive test coverage to verify correctness against CPU implementations.
- Implements CudfTopN operator with batch-based optimization for managing memory and performance
- Adds comprehensive test suite covering various TopN scenarios including multi-key sorting, filtering, and edge cases
- Includes a bug fix for CudfFilterProject to properly handle null values in filter conditions
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
File | Description |
---|---|
velox/experimental/cudf/exec/CudfTopN.h | Header file defining the CudfTopN operator class with batch management strategy |
velox/experimental/cudf/exec/CudfTopN.cpp | Implementation of GPU-accelerated TopN operations using cuDF sorting APIs |
velox/experimental/cudf/exec/ToCudf.cpp | Integration of CudfTopN into the operator replacement framework |
velox/experimental/cudf/tests/TopNTest.cpp | Comprehensive test suite for TopN functionality verification |
velox/experimental/cudf/tests/CMakeLists.txt | Build configuration for TopN tests |
velox/experimental/cudf/exec/CMakeLists.txt | Build configuration including CudfTopN source |
velox/experimental/cudf/exec/CudfFilterProject.cpp | Bug fix for null handling in filter conditions |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Co-authored-by: Copilot <[email protected]>
39e6aad
to
1e100cd
Compare
Co-authored-by: Copilot <[email protected]>
This PR adds operator CudfTopN to run TopN in GPU.
Unit tests are added to verify the output.
The CudfTopN stores top N values of each input batch, and if number of batches exceed 5, the batches are concatenated together and topN values are stored back to the batch vector. getOutput will concatenated all batches together and return topN values.
The tests are same as TopN, but Setup makes sure they run in GPU.
This PR has additional fix 2288ca9 for cudfFilterProject which affects filter results with null values.