-
Notifications
You must be signed in to change notification settings - Fork 1.2k
[core] Support chain tbl on batch mode #6394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,142 @@ | ||
| --- | ||
| title: "Chain Table" | ||
| weight: 6 | ||
| type: docs | ||
| aliases: | ||
| - /primary-key-table/chain-table.html | ||
| --- | ||
| <!-- | ||
| Licensed to the Apache Software Foundation (ASF) under one | ||
| or more contributor license agreements. See the NOTICE file | ||
| distributed with this work for additional information | ||
| regarding copyright ownership. The ASF licenses this file | ||
| to you under the Apache License, Version 2.0 (the | ||
| "License"); you may not use this file except in compliance | ||
| with the License. You may obtain a copy of the License at | ||
|
|
||
| http://www.apache.org/licenses/LICENSE-2.0 | ||
|
|
||
| Unless required by applicable law or agreed to in writing, | ||
| software distributed under the License is distributed on an | ||
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| KIND, either express or implied. See the License for the | ||
| specific language governing permissions and limitations | ||
| under the License. | ||
| --> | ||
|
|
||
| # Chain Table | ||
|
|
||
| Chain table is a new feature for Primary Key tables that revolutionizes how you process incremental data. | ||
| Consider such a type of scenario: store a full set of data periodically, such as every day, but most of it | ||
| is redundant, with only a small amount of changed data every day. ODS Binlog Dump is just such a typical | ||
| scene. | ||
|
|
||
| Taking a daily binlog dump job as an example, the previous day's full data and the new incremental data | ||
| are merged to generate a new full dataset by a daily task. There are two obvious drawbacks in this way: | ||
| * Full computation: Merge operation includes all data, and it will involve shuffle, which results in poor performance. | ||
| * Full storage: Store a full set of data every day, and the changed data usually accounts for a very small proportion (e.g., 1%). | ||
|
|
||
| Paimon can solve this problem by consuming the changed data directly and merge on read, in this way, the | ||
| aforementioned full computation and storage can be optimized into incremental mode. | ||
| * Incremental computation: The offline ETL daily job only needs to consume the changed data of the current day and do not require merging all data. | ||
| * Incremental Storage: Only store the changed data each day, and asynchronously compact it periodically (e.g., weekly) to build a global chain table within the lifecycle. | ||
| {{< img src="/img/chain-table.png">}} | ||
|
|
||
| Based on the regular paimon table, snapshot and delta branches are introduced to describe the full and incremental data | ||
| respectively. When writing, you can specify branch to write full or incremental data, and when reading, the | ||
| appropriate reading strategy is automatically selected based on the reading mode, such as full, incremental, | ||
| or hybrid reading. | ||
|
|
||
| To enable chain table, you must config `chain-table.enabled` to true in the table options when creating the | ||
| table, and the snapshot and delta branch need to be created as well. Consider an example via Spark SQL: | ||
|
|
||
| ```sql | ||
| CREATE TABLE default.t ( | ||
| `t1` string , | ||
| `t2` string , | ||
| `t3` string | ||
| ) PARTITIONED BY (`date` string) | ||
| TBLPROPERTIES ( | ||
| 'chain-table.enabled' = 'true' | ||
| ); | ||
|
|
||
| CALL sys.create_branch('default.t', 'snapshot'); | ||
|
|
||
| CALL sys.create_branch('default.t', 'delta'); | ||
|
|
||
| ALTER TABLE default.t SET tblproperties | ||
| ('scan.fallback-snapshot-branch' = 'snapshot', | ||
| 'scan.fallback-delta-branch' = 'delta'); | ||
|
|
||
| ALTER TABLE `default`.`t$branch_snapshot` SET tblproperties | ||
| ('scan.fallback-snapshot-branch' = 'snapshot', | ||
| 'scan.fallback-delta-branch' = 'delta'); | ||
|
|
||
| ALTER TABLE `default`.`t$branch_delta` SET tblproperties | ||
| ('scan.fallback-snapshot-branch' = 'snapshot', | ||
| 'scan.fallback-delta-branch' = 'delta'); | ||
| ``` | ||
|
|
||
| Notice that: | ||
| - Chain table is only supported for primary key table, which means you should define `bucket` and `bucket-key` for the table. | ||
| - Chain table should ensure that the schema of each branch is consistent. | ||
| - Only spark support now, flink will be supported later. | ||
| - Chain compact is not supported for now, and it will be supported later. | ||
|
|
||
| After creating a chain table, you can read and write data in the following ways. | ||
|
|
||
| - Full Write: Write data to t$branch_snapshot. | ||
| ```sql | ||
| insert overwrite `default`.`t$branch_snapshot` partition (date = '20250810') | ||
| values ('1', '1', '1'); | ||
| ``` | ||
|
|
||
| - Incremental Write: Write data to t$branch_delta. | ||
| ```sql | ||
| insert overwrite `default`.`t$branch_delta` partition (date = '20250811') | ||
| values ('2', '1', '1'); | ||
| ``` | ||
|
|
||
| - Full Query: If the snapshot branch has full partition, read it directly; otherwise, read on chain merge mode. | ||
| ```sql | ||
| select t1, t2, t3 from default.t where date = '20250811' | ||
| ``` | ||
| you will get the following result: | ||
| ```text | ||
| +---+----+-----+ | ||
| | t1| t2| t3| | ||
| +---+----+-----+ | ||
| | 1 | 1| 1 | | ||
| | 2 | 1| 1 | | ||
| +---+----+-----+ | ||
| ``` | ||
|
|
||
| - Incremental Query: Read the incremental partition from t$branch_delta | ||
| ```sql | ||
| select t1, t2, t3 from `default`.`t$branch_delta` where date = '20250811' | ||
| ``` | ||
| you will get the following result: | ||
| ```text | ||
| +---+----+-----+ | ||
| | t1| t2| t3| | ||
| +---+----+-----+ | ||
| | 2 | 1| 1 | | ||
| +---+----+-----+ | ||
| ``` | ||
|
|
||
| - Hybrid Query: Read both full and incremental data simultaneously. | ||
| ```sql | ||
| select t1, t2, t3 from default.t where date = '20250811' | ||
| union all | ||
| select t1, t2, t3 from `default`.`t$branch_delta` where date = '20250811' | ||
| ``` | ||
| you will get the following result: | ||
| ```text | ||
| +---+----+-----+ | ||
| | t1| t2| t3| | ||
| +---+----+-----+ | ||
| | 1 | 1| 1 | | ||
| | 2 | 1| 1 | | ||
| | 2 | 1| 1 | | ||
| +---+----+-----+ | ||
| ``` |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -28,6 +28,7 @@ | |
| import org.apache.paimon.types.RowType; | ||
| import org.apache.paimon.types.VarCharType; | ||
|
|
||
| import java.util.ArrayList; | ||
| import java.util.LinkedHashMap; | ||
| import java.util.List; | ||
| import java.util.Map; | ||
|
|
@@ -142,4 +143,13 @@ public static String partToSimpleString( | |
| String result = builder.toString(); | ||
| return result.substring(0, Math.min(result.length(), maxLength)); | ||
| } | ||
|
|
||
| public List<String> generateOrderPartValues(InternalRow in) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why add this? What is difference? You can add a test for this.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is hoped to obtain the partition values in the order of the partition fields, and then calculate the dependent partitions according to the unit step size. For example, taking the hourly level as an example, the unit step size of the partition is hours. If starting from 0:00 on 20250811 and ending at 0:00 on 20250812, 24 incremental partitions in hours can be obtained, for the specific usage, please refer to org.apache.paimon.partition.InternalRowPartitionUtils#getDeltaPartitions. Unit tests have been added, please refer to org.apache.paimon.utils.InternalRowPartitionComputerTest#testGenerateOrderPartValues |
||
| LinkedHashMap<String, String> partSpec = generatePartValues(in); | ||
| List<String> partValues = new ArrayList<>(); | ||
| for (String columnName : partitionColumns) { | ||
| partValues.add(partSpec.get(columnName)); | ||
| } | ||
| return partValues; | ||
| } | ||
| } | ||
Uh oh!
There was an error while loading. Please reload this page.