feat: rand expression support#1199
Conversation
2c1c0c4 to
7e4ca2c
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1199 +/- ##
============================================
+ Coverage 56.12% 58.98% +2.85%
- Complexity 976 1141 +165
============================================
Files 119 130 +11
Lines 11743 12872 +1129
Branches 2251 2421 +170
============================================
+ Hits 6591 7592 +1001
- Misses 4012 4059 +47
- Partials 1140 1221 +81 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Thanks @akupchinskiy. I plan on reviewing this after the holidays. |
|
Are the partition related changes necessary for this PR? Otherwise, it might be better to reduce the scope to just the |
| const DOUBLE_UNIT: f64 = 1.1102230246251565e-16; | ||
| const SPARK_MURMUR_ARRAY_SEED: u32 = 0x3c074a61; |
There was a problem hiding this comment.
It would really helpful if you could add documentation / refrences around these constants
There was a problem hiding this comment.
Added doc comments with all the references.
| if exec_context.root_op.is_none() { | ||
| let start = Instant::now(); | ||
| let planner = PhysicalPlanner::new(Arc::clone(&exec_context.session_ctx)) | ||
| let planner = PhysicalPlanner::new(Arc::clone(&exec_context.session_ctx), partition) |
There was a problem hiding this comment.
here is interesting. Is there any reason the partition is not used in Comet native physical planner? this is def used in DF physical plan during plan node execution https://github.com/apache/datafusion/blob/main/datafusion/physical-plan/src/execution_plan.rs#L371
There was a problem hiding this comment.
The spark partition index is erased when a native DF plan is sent for the execution for some reason : https://github.com/apache/datafusion-comet/blob/main/native/core/src/execution/jni_api.rs#L496
There was a problem hiding this comment.
This is something that I would like to see improved. We currently use partition 0 for each native plan rather than the real partition id.
There was a problem hiding this comment.
@andygrove Can i do it as a part of this PR or it would be better to create a separate one?
There is a handful of expressions besides rand() relying on the partition index. All of them implement nondetermenistic trait providing a hook method to initialize a state before a partition evaluation for spark runtime. Encapsulation-wise, I agree that the scope of the partition exposure should be limited. But I could not find another way to extract it other than making it a part of a planner struct. |
|
@akupchinskiy do you plan to resolve the conflicts? |
Yeah, thanks for the reminder. Will do it tomorrow |
|
@kazuyukitanimura could you trigger the workflow? |
# Conflicts: # native/Cargo.lock
# Conflicts: # native/core/src/execution/planner.rs # native/proto/src/proto/expr.proto # spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala # spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala
|
I upmerged this PR and re-triggered the workflows. Sorry for the delay @akupchinskiy |
Which issue does this PR close?
Closes #1198
Rationale for this change
Support of the spark rand() expression
What changes are included in this PR?
How are these changes tested?
Spark compatibility tests and expression correctness test are included in the PR