Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PPL Lookup #407

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft

PPL Lookup #407

wants to merge 4 commits into from

Conversation

salyh
Copy link
Contributor

@salyh salyh commented Jul 2, 2024

Description

Implement PPL Lookup

Issues Resolved

opensearch-project/sql#2651

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@YANG-DB
Copy link
Member

YANG-DB commented Jul 2, 2024

@salyh thanks for your contribution
can you also please take a look at this existing PPL correlation API

@salyh
Copy link
Contributor Author

salyh commented Jul 9, 2024

@YANG-DB any comments on the proposal so far?

@salyh
Copy link
Contributor Author

salyh commented Jul 9, 2024

@rupal-bq @anasalkouz any comments on the proposal so far?

@YANG-DB YANG-DB added the Lang:PPL Pipe Processing Language support label Jul 11, 2024
Comment on lines +274 to +276
while(!root.getChild().isEmpty()) {
root = root.getChild().get(0);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop finds the root command as lookup's source table, but we may have multiple lookup commands, such as source=t1 | lookup t2 | ... | lookup t3

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The source=<table> is used to specify the "source" table name.
It's not required to specify (repeat) the name of the "source" table within the lookup command, so I guess finding the root Relation node is OK. Am I missing something?


node.getChild().get(0).accept(this, context);

//TODO: not sure how to implement appendonly
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To implement appendonly, the output Project (parent of Join) should be

Project [other fields in source, CASE WHEN isnull('source.copyField) THEN coalesce('source.copyField, 'lookup.copyField) AS copyField ELSE 'lookup.copyField END AS copyField]
  +- Join LeftOuter

In CASE WHEN expression1 ELSE expression2, the expression1 is for appendonly (not performing when the source field already exist), the else is for overwriting existing field.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't it mean that we need to know list of fields available in the "source" table, in order to build project consisting of two lists:
a list of fields which we want to overwrite with values from the lookup table and a list of fields which should remain untouched?
I'm not sure whether it is possible to retrieve such list of available fields.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Lang:PPL Pipe Processing Language support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants