Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[kv] Support index lookup for primary key table #222

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

swuferhong
Copy link
Collaborator

Purpose

Linked issue: #65

Index lookup is a feature that exposes lookup capabilities built on top of secondary indexes. By using secondary indexes, the required data can be located quickly, which can be utilized in conjunction with Flink to implement delta joins.
The purpose of this PR is to provide index lookup for kv tables. The implementation approach is to define the primary key of the kv storage as "secondary keys + primary key", and set the bucket key to the secondary keys. This way, when looking up data through the secondary keys, the corresponding bucket and server can be quickly identified, providing efficient point query capabilities.

Tests

API and Format

Documentation

@wuchong wuchong linked an issue Dec 18, 2024 that may be closed by this pull request
2 tasks
@swuferhong swuferhong force-pushed the index-lookup-1216 branch 3 times, most recently from b95540f to 90f6295 Compare December 20, 2024 09:55
Copy link
Member

@wuchong wuchong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think our current index is not a general index, it is just a prefix of primary key index. So, actually, it is just a prefix scan/lookup for the prefix of primary key (the prefix should include bucket key). I don't want to call this indexLookup because it occupies the API for future possible index (index on arbitrary columns).

How about changing the API into prefixLookup? The parameter key should be the prefix of primary key and must include bucket key. For DDL, we don't need to introduce new options table.index.keys, we can just continue to use bucket.key.

As we don't have force checks for bucket key is a prefix of primary key. We have to add some best practices for Delta Join cases in the future documentation. For tables used for DeltaJoin queries, the best practice is putting columns of bucket key before other columns in the definition of primary key. Otherwise, the prefixLookup doesn't work when the parameter key only contains bucket join. For example, given a primary key table orders with schema user_id, item_id, order_id, col1, col2, col3 (order_id can be used as primary key as it is unique). If the join key is (user_id, item_id), the primary key of the table must be set to user_id, item_id, order_id and bucket key to user_id, item_id. The prefixLookup will not work if the primary key is set to order_id, user_id, item_id, because the join key is not a prefix of primary key.

@swuferhong
Copy link
Collaborator Author

@wuchong comments addressed. PR ready

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Fluss support index lookup for primary key table
2 participants