-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DATE_TRUNC
Optimizer
#14385
base: master
Are you sure you want to change the base?
Add DATE_TRUNC
Optimizer
#14385
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #14385 +/- ##
=============================================
- Coverage 61.75% 33.99% -27.76%
- Complexity 207 675 +468
=============================================
Files 2436 2708 +272
Lines 133233 151642 +18409
Branches 20636 23413 +2777
=============================================
- Hits 82274 51555 -30719
- Misses 44911 95927 +51016
+ Partials 6048 4160 -1888
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
DATETRUNC
in PredicateDATE_TRUNC
Optimizer
@jadami10 Made that optimizer enhancement here. Let me know if anything looks off! |
...test/java/org/apache/pinot/core/query/optimizer/filter/TimePredicateFilterOptimizerTest.java
Outdated
Show resolved
Hide resolved
...src/main/java/org/apache/pinot/core/query/optimizer/filter/TimePredicateFilterOptimizer.java
Outdated
Show resolved
Hide resolved
...test/java/org/apache/pinot/core/query/optimizer/filter/TimePredicateFilterOptimizerTest.java
Show resolved
Hide resolved
...src/main/java/org/apache/pinot/core/query/optimizer/filter/TimePredicateFilterOptimizer.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first, thank you for doing this!
took an initial look and left some nit comments, some testing ideas, and some ideas about stuff that looks like it might break.
I haven't reviewed the actual algorithm yet
...src/main/java/org/apache/pinot/core/query/optimizer/filter/TimePredicateFilterOptimizer.java
Outdated
Show resolved
Hide resolved
...src/main/java/org/apache/pinot/core/query/optimizer/filter/TimePredicateFilterOptimizer.java
Outdated
Show resolved
Hide resolved
...src/main/java/org/apache/pinot/core/query/optimizer/filter/TimePredicateFilterOptimizer.java
Show resolved
Hide resolved
...src/main/java/org/apache/pinot/core/query/optimizer/filter/TimePredicateFilterOptimizer.java
Outdated
Show resolved
Hide resolved
...src/main/java/org/apache/pinot/core/query/optimizer/filter/TimePredicateFilterOptimizer.java
Outdated
Show resolved
Hide resolved
...test/java/org/apache/pinot/core/query/optimizer/filter/TimePredicateFilterOptimizerTest.java
Show resolved
Hide resolved
|
||
@Test | ||
public void testDateTruncOptimizer() { | ||
testDateTrunc("datetrunc('DAY', col) < 1620777600000", new Range("0", true, "1620777600000", false)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- do we need a test with an INT instead of long? I believe that's also a valid time type in pinot
- can we have test with week or month truncation
- can we have a test case where it's not a support function. as in
IN
to make sure nothing is erroring - also a test case where the time granularity is unsupported? I'm not sure if calcite will catch that before the optimizer, but we do use
DAY
as an input
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @jadami10. Thanks for this insight! I've been working on this for the past couple days (specifically on the time zone test). I've introduced several time zone tests and my implementation seems to work only for some time zone usages.
If you're willing, would you be able to write up a quick draft of what the algorithm should look like to convert the date_trunc function with time zones to a range query (essentially, the floor and ceiling inverse of date trunc). I think it would be beneficial to hear it from another perspective to find what I'm missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think your approach to convert the date trunc predicates to floor/ceiling sounds good. I left some comments where I'm confused about why the implementation drifts for similar functions
...src/main/java/org/apache/pinot/core/query/optimizer/filter/TimePredicateFilterOptimizer.java
Show resolved
Hide resolved
upperMillis = dateTruncFloor(operands); | ||
if (upperMillis != TimeUnit.MILLISECONDS.convert(getLongValue(filterOperands.get(1)), TimeUnit.valueOf(outputTimeUnit.toUpperCase()))) { | ||
upperInclusive = true; | ||
upperMillis = dateTruncCeil(operands); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this recomputed here?
lowerMillis = Long.MIN_VALUE; | ||
upperInclusive = false; | ||
upperMillis = dateTruncFloor(operands); | ||
if (upperMillis != TimeUnit.MILLISECONDS.convert(getLongValue(filterOperands.get(1)), TimeUnit.valueOf(outputTimeUnit.toUpperCase()))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we checking this here but not in in GREATER_THAN
?
break; | ||
case GREATER_THAN_OR_EQUAL: | ||
operands.set(1, getExpression(getLongValue(filterOperands.get(1)), new DateTimeFormatSpec("TIMESTAMP"))); | ||
lowerMillis = dateTruncFloor(operands); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be ceil?
// testDateTrunc("datetrunc('DAY', col, 'DAYS', 'CET', 'MILLISECONDS') = 39193714800000", | ||
// new Range("453631", true, "453631", true)); | ||
testDateTrunc("datetrunc('DAY', col, 'MILLISECONDS', 'UTC', 'DAYS') = 453630", | ||
new Range("39193632000000", true, "39193718399999", true)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks strange. 39193632000000
is like 1000 years from now?
@jadami10 Following up on this, sorry for it taking a while! I think there are some implementation-level nuances that I'm running into. Specifically, this optimizer works 100% of the time without timezones but struggles when moving into a different time zone, outputted in a lower granularity unit than the bucketing unit. If you're free and willing, I'd love to schedule a quick call to help walk through some cases since this is my first time working on a feature like this. Let me know if that's a possibility! |
dateTrunc
function