Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DATE_TRUNC Optimizer #14385

Draft
wants to merge 24 commits into
base: master
Choose a base branch
from

Conversation

ashishjayamohan
Copy link
Contributor

@ashishjayamohan ashishjayamohan commented Nov 5, 2024

@codecov-commenter
Copy link

codecov-commenter commented Nov 5, 2024

Codecov Report

Attention: Patch coverage is 0% with 98 lines in your changes missing coverage. Please review.

Project coverage is 33.99%. Comparing base (59551e4) to head (f38e322).
Report is 1625 commits behind head on master.

Files with missing lines Patch % Lines
...optimizer/filter/TimePredicateFilterOptimizer.java 0.00% 97 Missing and 1 partial ⚠️

❗ There is a different number of reports uploaded between BASE (59551e4) and HEAD (f38e322). Click for more details.

HEAD has 19 uploads less than BASE
Flag BASE (59551e4) HEAD (f38e322)
integration 7 5
integration2 3 2
temurin 12 8
java-21 7 5
skip-bytebuffers-true 3 2
skip-bytebuffers-false 7 6
unittests 5 3
unittests1 2 0
java-11 5 3
custom-integration1 2 0
Additional details and impacted files
@@              Coverage Diff              @@
##             master   #14385       +/-   ##
=============================================
- Coverage     61.75%   33.99%   -27.76%     
- Complexity      207      675      +468     
=============================================
  Files          2436     2708      +272     
  Lines        133233   151642    +18409     
  Branches      20636    23413     +2777     
=============================================
- Hits          82274    51555    -30719     
- Misses        44911    95927    +51016     
+ Partials       6048     4160     -1888     
Flag Coverage Δ
custom-integration1 ?
integration 100.00% <ø> (+99.99%) ⬆️
integration1 100.00% <ø> (+99.99%) ⬆️
integration2 0.00% <ø> (ø)
java-11 33.98% <0.00%> (-27.73%) ⬇️
java-21 33.99% <0.00%> (-27.64%) ⬇️
skip-bytebuffers-false 33.99% <0.00%> (-27.76%) ⬇️
skip-bytebuffers-true 33.98% <0.00%> (+6.25%) ⬆️
temurin 33.99% <0.00%> (-27.76%) ⬇️
unittests 33.99% <0.00%> (-27.76%) ⬇️
unittests1 ?
unittests2 33.99% <0.00%> (+6.26%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ashishjayamohan ashishjayamohan changed the title [draft] Optimize DATETRUNC in Predicate Add DATE_TRUNC Optimizer Nov 6, 2024
@ashishjayamohan ashishjayamohan marked this pull request as ready for review November 6, 2024 00:59
@ashishjayamohan
Copy link
Contributor Author

@jadami10 Made that optimizer enhancement here. Let me know if anything looks off!

Copy link
Contributor

@jadami10 jadami10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first, thank you for doing this!

took an initial look and left some nit comments, some testing ideas, and some ideas about stuff that looks like it might break.

I haven't reviewed the actual algorithm yet


@Test
public void testDateTruncOptimizer() {
testDateTrunc("datetrunc('DAY', col) < 1620777600000", new Range("0", true, "1620777600000", false));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. do we need a test with an INT instead of long? I believe that's also a valid time type in pinot
  2. can we have test with week or month truncation
  3. can we have a test case where it's not a support function. as in IN to make sure nothing is erroring
  4. also a test case where the time granularity is unsupported? I'm not sure if calcite will catch that before the optimizer, but we do use DAY as an input

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @jadami10. Thanks for this insight! I've been working on this for the past couple days (specifically on the time zone test). I've introduced several time zone tests and my implementation seems to work only for some time zone usages.
If you're willing, would you be able to write up a quick draft of what the algorithm should look like to convert the date_trunc function with time zones to a range query (essentially, the floor and ceiling inverse of date trunc). I think it would be beneficial to hear it from another perspective to find what I'm missing.

@ashishjayamohan ashishjayamohan marked this pull request as draft November 12, 2024 00:43
Copy link
Contributor

@jadami10 jadami10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think your approach to convert the date trunc predicates to floor/ceiling sounds good. I left some comments where I'm confused about why the implementation drifts for similar functions

upperMillis = dateTruncFloor(operands);
if (upperMillis != TimeUnit.MILLISECONDS.convert(getLongValue(filterOperands.get(1)), TimeUnit.valueOf(outputTimeUnit.toUpperCase()))) {
upperInclusive = true;
upperMillis = dateTruncCeil(operands);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this recomputed here?

lowerMillis = Long.MIN_VALUE;
upperInclusive = false;
upperMillis = dateTruncFloor(operands);
if (upperMillis != TimeUnit.MILLISECONDS.convert(getLongValue(filterOperands.get(1)), TimeUnit.valueOf(outputTimeUnit.toUpperCase()))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we checking this here but not in in GREATER_THAN?

break;
case GREATER_THAN_OR_EQUAL:
operands.set(1, getExpression(getLongValue(filterOperands.get(1)), new DateTimeFormatSpec("TIMESTAMP")));
lowerMillis = dateTruncFloor(operands);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be ceil?

// testDateTrunc("datetrunc('DAY', col, 'DAYS', 'CET', 'MILLISECONDS') = 39193714800000",
// new Range("453631", true, "453631", true));
testDateTrunc("datetrunc('DAY', col, 'MILLISECONDS', 'UTC', 'DAYS') = 453630",
new Range("39193632000000", true, "39193718399999", true));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks strange. 39193632000000 is like 1000 years from now?

@ashishjayamohan
Copy link
Contributor Author

@jadami10 Following up on this, sorry for it taking a while! I think there are some implementation-level nuances that I'm running into. Specifically, this optimizer works 100% of the time without timezones but struggles when moving into a different time zone, outputted in a lower granularity unit than the bucketing unit. If you're free and willing, I'd love to schedule a quick call to help walk through some cases since this is my first time working on a feature like this. Let me know if that's a possibility!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Time Pruner should support queries where we're operating on the time column itself
4 participants