-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[kv] Support index lookup for primary key table #222
base: main
Are you sure you want to change the base?
Conversation
b95540f
to
90f6295
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think our current index is not a general index, it is just a prefix of primary key index. So, actually, it is just a prefix scan/lookup for the prefix of primary key (the prefix should include bucket key). I don't want to call this indexLookup
because it occupies the API for future possible index (index on arbitrary columns).
How about changing the API into prefixLookup
? The parameter key
should be the prefix of primary key and must include bucket key. For DDL, we don't need to introduce new options table.index.keys
, we can just continue to use bucket.key
.
As we don't have force checks for bucket key is a prefix of primary key. We have to add some best practices for Delta Join cases in the future documentation. For tables used for DeltaJoin queries, the best practice is putting columns of bucket key before other columns in the definition of primary key. Otherwise, the prefixLookup
doesn't work when the parameter key only contains bucket join. For example, given a primary key table orders
with schema user_id, item_id, order_id, col1, col2, col3
(order_id
can be used as primary key as it is unique). If the join key is (user_id, item_id)
, the primary key of the table must be set to user_id, item_id, order_id
and bucket key to user_id, item_id
. The prefixLookup
will not work if the primary key is set to order_id, user_id, item_id
, because the join key is not a prefix of primary key.
fluss-client/src/main/java/com/alibaba/fluss/client/lookup/AbstractLookup.java
Show resolved
Hide resolved
fluss-client/src/main/java/com/alibaba/fluss/client/lookup/AbstractLookupBatch.java
Outdated
Show resolved
Hide resolved
fluss-client/src/main/java/com/alibaba/fluss/client/lookup/AbstractLookup.java
Outdated
Show resolved
Hide resolved
fluss-server/src/main/java/com/alibaba/fluss/server/replica/ReplicaManager.java
Outdated
Show resolved
Hide resolved
fluss-server/src/main/java/com/alibaba/fluss/server/replica/ReplicaManager.java
Outdated
Show resolved
Hide resolved
fluss-server/src/test/java/com/alibaba/fluss/server/replica/ReplicaManagerTest.java
Outdated
Show resolved
Hide resolved
fluss-server/src/test/java/com/alibaba/fluss/server/testutils/KvTestUtils.java
Outdated
Show resolved
Hide resolved
fluss-server/src/test/java/com/alibaba/fluss/server/tablet/TabletServiceITCase.java
Outdated
Show resolved
Hide resolved
90f6295
to
eeff7c0
Compare
eeff7c0
to
23bc3fd
Compare
@wuchong comments addressed. PR ready |
23bc3fd
to
c26f475
Compare
c26f475
to
593cb02
Compare
Purpose
Linked issue: #65
Index lookup is a feature that exposes lookup capabilities built on top of secondary indexes. By using secondary indexes, the required data can be located quickly, which can be utilized in conjunction with Flink to implement delta joins.
The purpose of this PR is to provide index lookup for kv tables. The implementation approach is to define the primary key of the kv storage as "secondary keys + primary key", and set the bucket key to the secondary keys. This way, when looking up data through the secondary keys, the corresponding bucket and server can be quickly identified, providing efficient point query capabilities.
Tests
API and Format
Documentation