feat: rand expression support by akupchinskiy · Pull Request #1199 · apache/datafusion-comet

akupchinskiy · 2024-12-24T11:56:16Z

Which issue does this PR close?

Closes #1198

Rationale for this change

Support of the spark rand() expression

What changes are included in this PR?

rand() expression implementation
partition-awareness of the planner

How are these changes tested?

Spark compatibility tests and expression correctness test are included in the PR

codecov-commenter · 2024-12-28T06:52:05Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 58.98%. Comparing base (f09f8af) to head (b95f4b7).
Report is 282 commits behind head on main.

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #1199      +/-   ##
============================================
+ Coverage     56.12%   58.98%   +2.85%     
- Complexity      976     1141     +165     
============================================
  Files           119      130      +11     
  Lines         11743    12872    +1129     
  Branches       2251     2421     +170     
============================================
+ Hits           6591     7592    +1001     
- Misses         4012     4059      +47     
- Partials       1140     1221      +81

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

andygrove · 2024-12-30T03:30:49Z

Thanks @akupchinskiy. I plan on reviewing this after the holidays.

mbutrovich · 2025-01-02T15:48:47Z

Are the partition related changes necessary for this PR? Otherwise, it might be better to reduce the scope to just the rand() expression.

dharanad · 2025-01-02T15:58:54Z

+const DOUBLE_UNIT: f64 = 1.1102230246251565e-16;
+const SPARK_MURMUR_ARRAY_SEED: u32 = 0x3c074a61;


It would really helpful if you could add documentation / refrences around these constants

Added doc comments with all the references.

comphead · 2025-01-02T20:58:17Z

        if exec_context.root_op.is_none() {
            let start = Instant::now();
-            let planner = PhysicalPlanner::new(Arc::clone(&exec_context.session_ctx))
+            let planner = PhysicalPlanner::new(Arc::clone(&exec_context.session_ctx), partition)


here is interesting. Is there any reason the partition is not used in Comet native physical planner? this is def used in DF physical plan during plan node execution https://github.com/apache/datafusion/blob/main/datafusion/physical-plan/src/execution_plan.rs#L371

The spark partition index is erased when a native DF plan is sent for the execution for some reason : https://github.com/apache/datafusion-comet/blob/main/native/core/src/execution/jni_api.rs#L496

This is something that I would like to see improved. We currently use partition 0 for each native plan rather than the real partition id.

@andygrove Can i do it as a part of this PR or it would be better to create a separate one?

…r-support

akupchinskiy · 2025-01-05T08:58:35Z

Are the partition related changes necessary for this PR? Otherwise, it might be better to reduce the scope to just the rand() expression.

There is a handful of expressions besides rand() relying on the partition index. All of them implement nondetermenistic trait providing a hook method to initialize a state before a partition evaluation for spark runtime.

Encapsulation-wise, I agree that the scope of the partition exposure should be limited. But I could not find another way to extract it other than making it a part of a planner struct.

kazuyukitanimura · 2025-03-06T01:51:31Z

@akupchinskiy do you plan to resolve the conflicts?

akupchinskiy · 2025-03-06T18:14:03Z

@akupchinskiy do you plan to resolve the conflicts?

Yeah, thanks for the reminder. Will do it tomorrow

akupchinskiy · 2025-03-07T16:52:48Z

@kazuyukitanimura could you trigger the workflow?

# Conflicts: # native/Cargo.lock

# Conflicts: # native/core/src/execution/planner.rs # native/proto/src/proto/expr.proto # spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala # spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala

andygrove · 2025-06-24T17:02:52Z

I upmerged this PR and re-triggered the workflows. Sorry for the delay @akupchinskiy

andygrove

LGTM. Thanks @akupchinskiy

akupchinskiy added 2 commits December 24, 2024 15:32

feat: rand expression support

c5c80e2

fix: support for spark-compatible null seed

7e4ca2c

akupchinskiy force-pushed the rand-expr-support branch from 2c1c0c4 to 7e4ca2c Compare December 24, 2024 16:09

fix: unnecessary borrowing removal

41c917b

dharanad reviewed Jan 2, 2025

View reviewed changes

Comment thread native/spark-expr/src/rand.rs Outdated

comphead reviewed Jan 2, 2025

View reviewed changes

akupchinskiy added 3 commits January 5, 2025 11:07

Merge branch 'main' into rand-expr-support

fdb8949

added references to the constants and typo fix

cc2b20f

Merge remote-tracking branch 'forked/rand-expr-support' into rand-exp…

783c381

…r-support

rluvaton reviewed Jan 5, 2025

View reviewed changes

Comment thread native/spark-expr/src/rand.rs

akupchinskiy added 2 commits January 5, 2025 22:05

added permalinks for the reference links

10f310d

fixed compile errors after master merge

e7e629c

resolving conflicts with main

ef416d5

akupchinskiy and others added 5 commits March 7, 2025 21:49

fmt fix

a6559f8

Merge remote-tracking branch 'upstream/main' into rand-expr-support

41fb568

# Conflicts: # native/Cargo.lock

Merge branch 'main' into rand-expr-support

82beec9

# Conflicts: # native/core/src/execution/planner.rs # native/proto/src/proto/expr.proto # spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala # spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala

refactor to align with the latest master changes

a919c75

upmerge

09cf581

andygrove added 2 commits June 24, 2025 11:12

fix

79a9b96

format

a44366f

andygrove reviewed Jun 24, 2025

View reviewed changes

Comment thread spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala

andygrove reviewed Jun 24, 2025

View reviewed changes

Comment thread spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala Outdated

andygrove added 5 commits June 24, 2025 12:22

fix

796d317

fix

c811cae

fmt

b95f4b7

upmerge

43e2864

revert accidental change

e0b26c7

andygrove approved these changes Jun 25, 2025

View reviewed changes

andygrove merged commit d72e54c into apache:main Jun 25, 2025
124 of 126 checks passed

coderfender pushed a commit to coderfender/datafusion-comet that referenced this pull request Dec 13, 2025

feat: rand expression support (apache#1199)

02e83af

		const DOUBLE_UNIT: f64 = 1.1102230246251565e-16;
		const SPARK_MURMUR_ARRAY_SEED: u32 = 0x3c074a61;

Conversation

akupchinskiy commented Dec 24, 2024

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

codecov-commenter commented Dec 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

andygrove commented Dec 30, 2024

Uh oh!

mbutrovich commented Jan 2, 2025

Uh oh!

dharanad Jan 2, 2025

Choose a reason for hiding this comment

Uh oh!

akupchinskiy Jan 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

comphead Jan 2, 2025

Choose a reason for hiding this comment

Uh oh!

akupchinskiy Jan 5, 2025

Choose a reason for hiding this comment

Uh oh!

andygrove Jan 10, 2025

Choose a reason for hiding this comment

Uh oh!

akupchinskiy Jan 17, 2025

Choose a reason for hiding this comment

Uh oh!

akupchinskiy commented Jan 5, 2025

Uh oh!

Uh oh!

kazuyukitanimura commented Mar 6, 2025

Uh oh!

akupchinskiy commented Mar 6, 2025

Uh oh!

akupchinskiy commented Mar 7, 2025

Uh oh!

andygrove commented Jun 24, 2025

Uh oh!

Uh oh!

Uh oh!

andygrove left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

codecov-commenter commented Dec 28, 2024 •

edited

Loading