[Java] Supports value indices for `contiguousSplitGroupsAndGenUniqKeys` #20391

res-life · 2025-10-28T10:06:10Z

Description

Supports value indices for contiguousSplitGroupsAndGenUniqKeys

contributes to NVIDIA/spark-rapids#13679

Signed-off-by: Chong Gao [email protected]

API changes in detail

the original API contiguousSplitGroupsAndGenUniqKeys()

Only specify key indices, the split tables contains both key columns and other columns and keep the column order.
E.g.:
Input table is [c0, c1, c2, c3], key indices is [2, 0]
Outoup split table columns is [c0, c1, c2, c3], keeps the original column order.

new API contiguousSplitGroupsAndGenUniqKeys(valueIndices)

specify both key indices and value indices.
Input table is [c0, c1, c2, c3, c4], key indices is [2, 0], values indices is [3, 1]
Output split table columns is [c3, c1]

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

Signed-off-by: Chong Gao <[email protected]>

copy-pr-bot · 2025-10-28T10:06:14Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

res-life · 2025-10-28T10:07:23Z

/ok to test

copy-pr-bot · 2025-10-28T10:07:26Z

/ok to test

@res-life, there was an error processing your request: E1

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/

Signed-off-by: Chong Gao <[email protected]>

res-life · 2025-10-28T10:10:11Z

/ok to test 4485fda

Signed-off-by: Chong Gao <[email protected]>

res-life · 2025-10-29T02:37:30Z

build

res-life · 2025-10-29T02:39:35Z

/ok to test 996c9ae

Signed-off-by: Chong Gao <[email protected]>

res-life · 2025-10-29T05:52:20Z

/ok to test 175769c

mythrocks · 2025-11-04T18:39:58Z

It took me some time to understand the intent here. The problem I had was with the term "value indices". Values are usually associated with the table rows, not references to columns.

If we could refer to "value indices" with a different term, say "projection column indices", I think it will be a clearer change.

I will continue to review this change again under that new lens.

mythrocks · 2025-11-04T18:40:40Z

java/src/main/java/ai/rapids/cudf/Table.java

+     * Similar to the above {@link #contiguousSplitGroupsAndGenUniqKeys}.
+     *
+     * The diff with the above method is:
+     * - Provide an extra input `valueIndices` which defines the columns to output.


projectionColumnIndices would be a clearer term.

mythrocks · 2025-11-04T21:09:37Z

java/src/main/native/src/TableJni.cpp

-      if (std::find(key_indices.begin(), key_indices.end(), index) == key_indices.end()) {
-        // not key column, so adds it as value column.
-        value_indices.emplace_back(index);
+    auto num_value_cols = [&]() -> size_t {


Nit: We might be able to remove the parentheses:

Suggested change

auto num_value_cols = [&]() -> size_t {

auto num_value_cols = [&] -> size_t {

Adding () causes a warning:

TableJni.cpp:4650:31: error: parameter declaration before lambda trailing return type only optional with ‘-std=c++2b’ or ‘-std=gnu++2b’ [-Werror=c++23-extensions] [INFO] [exec] 4650 | auto num_value_cols = [&] -> size_t { [INFO] [exec] | ^~ [INFO] [exec] cc1plus: all warnings being treated as errors

mythrocks · 2025-11-04T21:34:50Z

java/src/main/native/src/TableJni.cpp

+    }();
+
+    std::vector<cudf::column_view> grouped_cols(num_grouped_cols);
+    [&]() -> void {


I don't see a need for an IIFE here. Why not just put the body here?

mythrocks · 2025-11-04T21:37:36Z

java/src/main/native/src/TableJni.cpp

+    auto num_grouped_cols = [&]() -> size_t {
+      if (jvalue_indices == NULL) {
+        // output both key columns and value columns
+        return key_indices.size() + num_value_cols;
+      } else {
+        // only output value columns
+        return num_value_cols;
+      }
+    }();


This might have a simpler phrasing.

Suggested change

auto num_grouped_cols = [&]() -> size_t {

if (jvalue_indices == NULL) {

// output both key columns and value columns

return key_indices.size() + num_value_cols;

} else {

// only output value columns

return num_value_cols;

}

}();

// Include key columns if output projection is not specified.

size_t const num_grouped_cols = num_value_cols + (jvalue_indices == NULL)? key_indices.size() : 0;

mythrocks · 2025-11-04T22:31:46Z

java/src/main/native/src/TableJni.cpp

+        auto key_view    = groups.keys->view();
+        auto key_view_it = key_view.begin();
+        for (auto key_id : key_indices) {
+          grouped_cols.at(key_id) = std::move(*key_view_it);


If we're just copying column views here, I think this might have been easier:

Suggested change

grouped_cols.at(key_id) = std::move(*key_view_it);

grouped_cols[key_id] = *key_view_it;

Column views are meant to be copied, IIRC. Maybe I've missed why this is written this way?

You're right, it's meaningless using std::move(view).
Previously it's written in this way, so I followed.
Let's use what you proposed.

mythrocks

A couple of nits and questions. But I see where this is going.

Signed-off-by: Chong Gao <[email protected]>

res-life · 2025-11-06T06:33:05Z

/ok to test 398668a

res-life · 2025-11-06T09:51:22Z

/ok to test 398668a

Signed-off-by: Chong Gao <[email protected]>

res-life · 2025-11-06T10:48:31Z

/ok to test 45b4ef2

Signed-off-by: Chong Gao <[email protected]>

res-life · 2025-11-06T10:55:38Z

/ok to test cd929e6

Specify value indices for contiguousSplitGroupsAndGenUniqKeys

c84d453

Signed-off-by: Chong Gao <[email protected]>

res-life requested a review from a team as a code owner October 28, 2025 10:06

github-actions bot assigned res-life Oct 28, 2025

github-actions bot added the Java Affects Java cuDF API. label Oct 28, 2025

res-life added feature request New feature or request non-breaking Non-breaking change and removed Java Affects Java cuDF API. labels Oct 28, 2025

res-life marked this pull request as draft October 28, 2025 10:07

Typo

4485fda

Signed-off-by: Chong Gao <[email protected]>

github-actions bot added the Java Affects Java cuDF API. label Oct 28, 2025

Fix bug

996c9ae

Signed-off-by: Chong Gao <[email protected]>

Fix bug

175769c

Signed-off-by: Chong Gao <[email protected]>

res-life marked this pull request as ready for review October 29, 2025 06:54

res-life requested review from mythrocks and ttnghia October 29, 2025 06:54

res-life mentioned this pull request Oct 29, 2025

Use new API to do Iceberg partition. NVIDIA/spark-rapids#13688

Open

3 tasks

mythrocks changed the title ~~Supports value indices for contiguousSplitGroupsAndGenUniqKeys~~ [Java] Supports value indices for contiguousSplitGroupsAndGenUniqKeys Oct 30, 2025

mythrocks reviewed Nov 4, 2025

View reviewed changes

mythrocks requested changes Nov 4, 2025

View reviewed changes

Chong Gao added 2 commits November 6, 2025 09:34

Merge branch 'b1' into partition

59d541e

Fix comments

398668a

Signed-off-by: Chong Gao <[email protected]>

Fix bug

45b4ef2

Signed-off-by: Chong Gao <[email protected]>

Refactor

cd929e6

Signed-off-by: Chong Gao <[email protected]>

	auto num_value_cols = [&]() -> size_t {
	auto num_value_cols = [&] -> size_t {

	grouped_cols.at(key_id) = std::move(*key_view_it);
	grouped_cols[key_id] = *key_view_it;

[Java] Supports value indices for contiguousSplitGroupsAndGenUniqKeys #20391

Are you sure you want to change the base?

[Java] Supports value indices for contiguousSplitGroupsAndGenUniqKeys #20391

Uh oh!

Conversation

res-life commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

API changes in detail

the original API contiguousSplitGroupsAndGenUniqKeys()

new API contiguousSplitGroupsAndGenUniqKeys(valueIndices)

Checklist

Uh oh!

copy-pr-bot bot commented Oct 28, 2025

Uh oh!

res-life commented Oct 28, 2025

Uh oh!

copy-pr-bot bot commented Oct 28, 2025

Uh oh!

res-life commented Oct 28, 2025

Uh oh!

res-life commented Oct 29, 2025

Uh oh!

res-life commented Oct 29, 2025

Uh oh!

res-life commented Oct 29, 2025

Uh oh!

mythrocks commented Nov 4, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mythrocks left a comment

Choose a reason for hiding this comment

Uh oh!

res-life commented Nov 6, 2025

Uh oh!

res-life commented Nov 6, 2025

Uh oh!

res-life commented Nov 6, 2025

Uh oh!

res-life commented Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Java] Supports value indices for `contiguousSplitGroupsAndGenUniqKeys` #20391

[Java] Supports value indices for `contiguousSplitGroupsAndGenUniqKeys` #20391

res-life commented Oct 28, 2025 •

edited

Loading