-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CALCITE-6593] NPE when outer joining tables with many fields and unmatching rows #3977
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks for the patch @rorueda , it looks good to me. |
50d7af7
to
3862a10
Compare
3862a10
to
3c4e41c
Compare
core/src/main/java/org/apache/calcite/adapter/enumerable/JavaRowFormat.java
Outdated
Show resolved
Hide resolved
compactCode.add(exp); | ||
} else { | ||
compactOutputVar = null; | ||
if (outputFieldCount >= 100) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you checked this patch with a lower value of outputFieldCount
?
What would you think about committing temporarily in the branch a lower threshold here (e.g. 1 or 2) and see how the CI checks behave? (and of course reverting this temp change before merging)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a good idea.
I also don't like this hardcoded there. Is there a better, centralized, place to put this? Then we could also generate the test tables using it, instead of having to hardcode it again in the test class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could define a new CalciteSystemProperty of type Integer, perhaps with defaultValue 100, but if the users sets it to a negative value then the compactCode version is disable? (just to have a way to "disable" this feature and fallback to the original code in case a problem is found with the compactCodeGenerator afterwards). WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea. Then it becomes easier also to run all the tests with a specific value, or to benchmark a real use case with different values. If nobody opposes, I can do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @rorueda . The CI checks with the temporary (outputFieldCount > 0)
threshold have passed successfully, so the new compact code seems stable 👏
I guess the only remaining bit is reverting this temp change, setting the threshold via a new CalciteSystemProperty of type int, with defaultValue 100 and if <0 then codeCompact is disable. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I will be able to do this at the end of the day. Any suggestion for the property name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd propose something along the lines:
/**
* Minimum numbers of fields in a Join result that will trigger the "compact code generation".
* This feature reduces the risk of running into a compilation error due to the code of a
* dynamically generated method growing beyond the 64KB limit.
*
* <p>The default value is 100, a negative value disables completely the "compact code" feature.
*
* @see org.apache.calcite.adapter.enumerable.EnumUtils
*/
public static final CalciteSystemProperty<Integer> JOIN_SELECTOR_COMPACT_CODE_THRESHOLD =
intProperty("calcite.join.selector.compact.code.threshold", 100);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, please verify that setting the new CalciteSystemProperty to e.g. -1
causes all test to pass, except for LargeGeneratedJoinTest (which should fail since it should reach the 64KB error if compact code is not used)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used your code to add the property, only adding a note on performance. Also had to modify the test to fail when the property is set to -1
.
To run tests disabling compact code generation:
./gradlew -Dcalcite.join.selector.compact.code.threshold=-1 test
To run tests enabling compact code generation for any join size:
./gradlew -Dcalcite.join.selector.compact.code.threshold=0 test
Summarizing the changes after more issues were discovered (check jira for details): EnumUtils:
JavaRowType:
|
Just to complete your summary, I think you missed in your comment (but you implemented in your patch) the fact that in EnumUtils the compact code was missing the early break in case of semi/anti joins (i.e. when the output type contains only the types of the first input) |
Patch looks good. Thanks @rorueda @julianhyde , why did you force push here? It seems the PR now contains several changes unrelated to the ticket. |
714a7ef
to
d67af52
Compare
I rebased the changes on top of the current master to make it easier to review. |
Thanks @rorueda , I just saw that Julian had to force-push on main to correct one commit. As I said before, I think the PR is in a good shape, it only remains to squash commits into a single one (you can do that whenever you want). I plan to merge this in the next 24/48h if no further objection appears. |
I force-pushed to change Mihai's commit message from 'RexBuilder can't handle NaN,Infinity double constants' to 'RexBuilder can't handle NaN, Infinity double constants'. |
…atching rows EnumUtils: - Split the code paths of compact code and normal code; - Keep the normal code path as is and change the compact code path: - Generate a null check if the row might be null; - Add the early break in case of semi/anti joins; - Remove the "optimization" of generating an array of a specific type if all output fields are of the same type; JavaRowType: - Make the copy method abstract and implement a specific copy for every JavaRowType; Also, add a new system property to determine the threshold that triggers compact code generation.
d67af52
to
85da1a3
Compare
Quality Gate passedIssues Measures |
The fix is quite simple. I added tests for left join and right join with empty tables.
The test file changed a bit more than expected because:
query.WithHook
because it was not being used, as this call doesn't mutate the query object;