feat: rpad support column for second arg instead of just literal #2099

coderfender · 2025-08-08T16:54:51Z

Which issue does this PR close?

Closes #2096

Implement comet native logic to support rpad(column, column) API in Spark . Currently comet only supports rpad(column, int)

What changes are included in this PR?

PR to implement native code to support rpad(col, int)

How are these changes tested?

Unit testing in cometSuite

coderfender · 2025-08-09T01:06:50Z

@andygrove , The issue is with implementation of rpad to only support col,int signature . Rather than reverting to native spark code, I went ahead and implemented native code for col,col input (and added a test case in CometExpressionSuite . Please take a look at the changes and let me know your thoughts on the changes

codecov-commenter · 2025-08-09T01:38:20Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 57.62%. Comparing base (f09f8af) to head (0fc9f93).
⚠️ Report is 483 commits behind head on main.

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #2099      +/-   ##
============================================
+ Coverage     56.12%   57.62%   +1.49%     
- Complexity      976     1297     +321     
============================================
  Files           119      147      +28     
  Lines         11743    13497    +1754     
  Branches       2251     2390     +139     
============================================
+ Hits           6591     7777    +1186     
- Misses         4012     4451     +439     
- Partials       1140     1269     +129

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

comphead

Thanks @coderfender wondering should we port the code to https://github.com/apache/datafusion/tree/main/datafusion/spark/src/function and then reuse spark function from the DF spark crate?

coderfender · 2025-08-11T00:19:01Z

Thank you for the review @comphead . Moving expressions to datafusion-spark create is indeed the goal once this change is merged into main

coderfender · 2025-08-11T16:06:24Z

@andygrove , @comphead could you please review the code whenever you get a chance ? Thank you very much

mbutrovich

Thanks @coderfender! First round of feedback.

native/spark-expr/src/static_invoke/char_varchar_utils/read_side_padding.rs

coderfender · 2025-08-13T17:01:13Z

@mbutrovich , seems like a test failed due to a perhaps transient Spark env issue. Could you rerun the failed check whenever you get a chance please ?

mbutrovich · 2025-08-19T14:45:44Z

native/spark-expr/src/static_invoke/char_varchar_utils/read_side_padding.rs

+                DataType::Utf8 => {
+                    spark_read_side_padding_internal::<i32>(array, truncate, rpad_arg)
+                }
+                DataType::LargeUtf8 => {


When we bring this to DataFusion we will need to support Utf8View. We can't really test that in Comet without a unit test in the file, but something to prepare for.

Thank you for sharing this @mbutrovich I will update this info in the github issue I plan to create to port these changes to data fusion crate

mbutrovich · 2025-08-19T17:43:40Z

native/spark-expr/src/static_invoke/char_varchar_utils/read_side_padding.rs

@@ -71,44 +100,78 @@ fn spark_read_side_padding2(
    }
 }

+enum RPadArgument {


Why do we need a new enum type instead of relying on ColumnarValue when can already represent a scalar or array?

Thank you . This is great suggestion and I went ahead and leveraged ColumnarValue to fork to the right logic

mbutrovich · 2025-08-29T18:59:08Z

native/spark-expr/src/static_invoke/char_varchar_utils/read_side_padding.rs

+                    truncate,
+                    ColumnarValue::Scalar(ScalarValue::Int32(Some(*length))),
+                ),
+                // Dictionary support required for SPARK-48498


apache/spark#46832

This seems related to padding. How does this affect dictionary encoded columns?

mbutrovich · 2025-08-29T18:59:38Z

native/spark-expr/src/static_invoke/char_varchar_utils/read_side_padding.rs

+                    array,
+                    truncate,
+                    ColumnarValue::Array(Arc::<dyn arrow::array::Array>::clone(array_int)),
+                ),
                // Dictionary support required for SPARK-48498


Same question.

Great catch! My understanding is that dictionary support ensures SQL-compliant CHAR type literals, which always have a fixed length (This change already existed by the time I picked up this issue). Therefore, my support for the array argument is obsolete.

mbutrovich

This is looking very close, just questions about the comments at this point. Thanks for your patience @coderfender!

comphead · 2025-08-29T20:19:18Z

spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala

@@ -322,6 +322,16 @@ class CometExpressionSuite extends CometTestBase with AdaptiveSparkPlanHelper {
      checkSparkAnswer("SELECT try_add(_1, _2) FROM tbl")
    }
  }
+  test("fix_rpad") {


can we get the meaningful test name? what exactly fix is tested

Sure . Thank you for the review. I will update the test name to add more context

comphead · 2025-08-29T20:24:51Z

native/spark-expr/src/static_invoke/char_varchar_utils/read_side_padding.rs

+    }
+}
+
+fn add_padding_string(string: String, length: usize, truncate: bool) -> String {


perhaps we can think of impl like

fn add_padding_string(input: String, length: usize, truncate: bool) -> String { let char_len = input.chars().count(); if char_len >= length { if truncate { // Take the first `length` chars safely input.chars().take(length).collect() } else { input } } else { // Pad with only the needed spaces let padding = " ".repeat(length - char_len); input + &padding } }

so we don't allocate spaces if its not needed
no unwrap

refering string by index, is it unicode safe? 🤔

This is a great suggestion. My goal for now was to keep the original implementation intact and not introduce changes which directly doesn't solve the issue

comphead

Thanks @coderfender its LGTM
Please add a test for unicode string to see if there is an issue, if it is we need to comment the test to be fixed in the future, and also we probably need to document this limitations

comphead · 2025-09-10T22:36:14Z

native/spark-expr/src/static_invoke/char_varchar_utils/read_side_padding.rs

+            for string in string_array.iter() {
+                match string {
+                    Some(string) => builder.append_value(add_padding_string(
+                        string.parse().unwrap(),


its good to avoid unwraps and return Err instead

comphead · 2025-09-10T22:36:29Z

native/spark-expr/src/static_invoke/char_varchar_utils/read_side_padding.rs

+            for (string, length) in string_array.iter().zip(int_pad_array) {
+                match string {
+                    Some(string) => builder.append_value(add_padding_string(
+                        string.parse().unwrap(),


coderfender marked this pull request as draft August 8, 2025 16:54

coderfender mentioned this pull request Aug 8, 2025

rpad expression panics if length input is not a literal value #2096

Open

coderfender force-pushed the fix_rpad_panic branch from 8c5e0aa to bb0c2f3 Compare August 9, 2025 01:03

coderfender marked this pull request as ready for review August 9, 2025 01:03

comphead reviewed Aug 10, 2025

View reviewed changes

mbutrovich requested changes Aug 11, 2025

View reviewed changes

native/spark-expr/src/static_invoke/char_varchar_utils/read_side_padding.rs Outdated Show resolved Hide resolved

native/spark-expr/src/static_invoke/char_varchar_utils/read_side_padding.rs Outdated Show resolved Hide resolved

coderfender force-pushed the fix_rpad_panic branch from bb0c2f3 to 3d68aa9 Compare August 12, 2025 20:41

coderfender requested a review from mbutrovich August 12, 2025 20:44

coderfender mentioned this pull request Aug 13, 2025

feat: implement_comet_native_lpad_expr #2102

Open

mbutrovich changed the title ~~fix: rpad_bug_fix~~ feat: rpad support column for second arg instead of just literal Aug 19, 2025

mbutrovich reviewed Aug 19, 2025

View reviewed changes

coderfender requested a review from mbutrovich August 27, 2025 02:09

mbutrovich reviewed Aug 29, 2025

View reviewed changes

mbutrovich requested review from mbutrovich and comphead August 29, 2025 19:01

comphead reviewed Aug 29, 2025

View reviewed changes

coderfender and others added 5 commits September 9, 2025 14:21

rpad_bug_fix

32d66c7

rpad_bug_fix

fb19f95

address_review_comments_rpad

1a4b082

check_upstream_json_enrichments

76ab555

address_review_comments

7e8bf61

formatting

0fc9f93

coderfender force-pushed the fix_rpad_panic branch from e507c16 to 0fc9f93 Compare September 9, 2025 21:21

mbutrovich requested a review from comphead September 10, 2025 22:26

comphead approved these changes Sep 10, 2025

View reviewed changes

comphead reviewed Sep 10, 2025

View reviewed changes

feat: rpad support column for second arg instead of just literal #2099

Are you sure you want to change the base?

feat: rpad support column for second arg instead of just literal #2099

Conversation

coderfender commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

What changes are included in this PR?

How are these changes tested?

Uh oh!

coderfender commented Aug 9, 2025

Uh oh!

codecov-commenter commented Aug 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

comphead left a comment

Choose a reason for hiding this comment

Uh oh!

coderfender commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderfender commented Aug 11, 2025

Uh oh!

mbutrovich left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderfender commented Aug 13, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mbutrovich left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

comphead left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderfender commented Aug 8, 2025 •

edited

Loading

codecov-commenter commented Aug 9, 2025 •

edited

Loading

coderfender commented Aug 11, 2025 •

edited

Loading