feat(storage): support non_pk_prefix_watermark state cleaning #19889

Li0k · 2024-12-23T06:06:15Z

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

related to #18802

This PR supports non_pk_prefix_watermark state cleaning for Hummock.

Since non_pk_prefix_watermark relies on catalogs, this introduces additional overhead. Therefore, this PR does not guarantee read filtering for non_pk_prefix_watermark and only handles expired data.

The changes are as follows:

watermarks of type non_pk_prefix_watermark are not added to ReadWatermarkIndex.
state table support to write non_pk_prefix_watermark and serialize.
compaction catalog agent support to get watermark serde
skip watermark iterator supports filtering non_pk_prefix_watermark.

Checklist

I have written necessary rustdoc comments.
I have added necessary unit tests and integration tests.
I have added test labels as necessary.
I have added fuzzing tests or opened an issue to track them.
My PR contains breaking changes.
My PR changes performance-critical code, so I will run (micro) benchmarks and present the results.
My PR contains critical fixes that are necessary to be merged into the latest release.

Documentation

My PR needs documentation updates.

Release note

…nto li0k/storage_non_pk_watermark_clean

hzxa21 · 2025-01-03T08:07:01Z

src/storage/src/hummock/store/version.rs

+                                .committed_epoch,
+                        )),
+
+                        WatermarkSerdeType::NonPkPrefix => None, /* do not fill the non-pk prefix watermark to index */


Given that non pk prefix watermark will never be used in CN, should we exclude it in meta-CN version delta notification? We can add sanity check in the CN side to make sure this assumption holds.

hzxa21 · 2025-01-03T08:14:38Z

src/meta/src/hummock/manager/compaction/mod.rs

+                .filter_map(|table| {
+                    // pk prefix watermark.
+                    if table.clean_watermark_index_in_pk.is_none()
+                        || table.clean_watermark_index_in_pk.unwrap() == 0


Does this assume the pk prefix watermark can only be a single column? Do we have a sanity check somewhere to make sure this assumption holds?

hzxa21 · 2025-01-03T08:16:31Z

src/meta/src/hummock/manager/compaction/mod.rs

+                .table_watermarks
+                .iter()
+                .filter_map(|(table_id, table_watermarks)| {
+                    if table_id_with_pk_prefix_watermark.contains(table_id) {


We already have a WaterMarkType define in the version, why don't we just use that to filter out table with non pk prefix watermark?

Also, if we filter out non pk prefix watermark here, how can compactor retrieve the non pk prefix watermark? Based on the logic here, it seems that we rely on the fact that non pk prefix watermark is present in the compact task.

hzxa21 · 2025-01-03T08:19:41Z

src/storage/hummock_sdk/src/compact_task.rs

@@ -114,6 +115,48 @@ impl CompactTask {
    }
 }

+impl CompactTask {
+    // The compact task may need to reclaim key with TTL
+    pub fn is_contains_ttl(&self) -> bool {


rename: contains_ttl

hzxa21 · 2025-01-03T08:19:55Z

src/storage/hummock_sdk/src/compact_task.rs

+    }
+
+    // The compact task may need to reclaim key with range tombstone
+    pub fn is_contains_range_tombstone(&self) -> bool {


rename: contains_range_tombstone

hzxa21 · 2025-01-03T08:21:20Z

src/storage/hummock_sdk/src/table_watermark.rs

@@ -301,6 +308,26 @@ impl WatermarkDirection {
        }
    }

+    pub fn filter_by_watermark_datum(


minor: datum_filter_by_watermark

hzxa21 · 2025-01-03T08:24:16Z

src/storage/hummock_sdk/src/table_watermark.rs

@@ -19,11 +19,14 @@ use std::mem::size_of;
 use std::ops::Bound::{Excluded, Included, Unbounded};
 use std::sync::Arc;

+use bincode::{Decode, Encode};


We need this because we define Decode Encode on WatermarkSerdeType. Is this neccessary?

I think the reason why we need it is because we need it for hummock trace. Like other trace related struct, let's not leak encode/decode outside of hummock trace and handle it here because we can just use a bool to do the encode and decode in hummock trace.

hzxa21 · 2025-01-03T08:24:27Z

src/storage/hummock_sdk/Cargo.toml

@@ -8,6 +8,7 @@ license = { workspace = true }
 repository = { workspace = true }

 [dependencies]
+bincode = { version = "=2.0.0-rc.3", features = ["serde"] }


ditto. See comments above.

hzxa21 · 2025-01-03T08:32:49Z

src/storage/src/hummock/iterator/skip_watermark.rs

@@ -42,10 +47,14 @@ pub struct SkipWatermarkIterator<I> {
 }

 impl<I: HummockIterator<Direction = Forward>> SkipWatermarkIterator<I> {


nits: since SkipWatermarkIterator is only used by compactor, how about moving skip_watermark.rs into src/hummock/compactor?

hzxa21 · 2025-01-03T08:41:45Z

src/storage/src/hummock/iterator/skip_watermark.rs

+                                            });
+                                    let watermark_col_in_pk =
+                                        row.datum_at(*watermark_col_idx_in_pk);
+                                    cmp_datum(


IIUC, if cmp_datum returns Euqal | Greater, based on the logic in L360, the watermark will be advanced. I think this is incorrect for non pk prefix watermark because the non pk prefix watermark and the pk doesn't have the same ordering.

feat(storage): basic of non_pk_watermark state clean

605f235

github-actions bot added type/feature ci/run-e2e-single-node-tests ci/run-e2e-test-other-backends labels Dec 23, 2024

Li0k added 2 commits December 23, 2024 15:27

feat(storage): ignore non_pk_prefix_watermark compaction

501d374

Merge branch 'main' of https://github.com/risingwavelabs/risingwave i…

3544c0e

…nto li0k/storage_non_pk_watermark_clean

Li0k changed the title ~~feat(storage): non_pk_watermark state clean~~ WIP: feat(storage): non_pk_watermark state clean Dec 23, 2024

Li0k marked this pull request as ready for review December 23, 2024 07:28

github-actions bot added the Invalid PR Title label Dec 23, 2024

fix ut

d1a39a8

graphite-app bot requested a review from a team December 23, 2024 08:18

Li0k added 4 commits December 23, 2024 17:01

fix panic

7c3f521

Merge branch 'main' of https://github.com/risingwavelabs/risingwave i…

e3dbc73

…nto li0k/storage_non_pk_watermark_clean

refactor(storage): refactor watermark type

b71eff9

Merge branch 'main' of https://github.com/risingwavelabs/risingwave i…

9e0af8e

…nto li0k/storage_non_pk_watermark_clean

Li0k requested a review from a team as a code owner December 25, 2024 12:20

Li0k requested a review from xxchan December 25, 2024 12:20

Li0k added 10 commits December 25, 2024 20:22

typo

74336d6

fix(storage): fix wateramrk_col_idx_in_pk

96de9ba

Merge branch 'main' of https://github.com/risingwavelabs/risingwave i…

3127678

…nto li0k/storage_non_pk_watermark_clean

fix check

49a48ad

Merge branch 'main' of https://github.com/risingwavelabs/risingwave i…

bb7a29b

…nto li0k/storage_non_pk_watermark_clean

refactor

6b0b295

typo

3c23aa3

Merge branch 'main' of https://github.com/risingwavelabs/risingwave i…

3113463

…nto li0k/storage_non_pk_watermark_clean

fix panic

b2e158e

Merge branch 'main' of https://github.com/risingwavelabs/risingwave i…

fd308de

…nto li0k/storage_non_pk_watermark_clean

Li0k changed the title ~~WIP: feat(storage): non_pk_watermark state clean~~ feat(storage): support non_pk_prefix_watermark state cleaning Dec 30, 2024

github-actions bot removed the Invalid PR Title label Dec 30, 2024

typo

bf28307

Li0k added 2 commits December 30, 2024 14:53

typo

369d718

Merge branch 'main' of https://github.com/risingwavelabs/risingwave i…

3500061

…nto li0k/storage_non_pk_watermark_clean

Li0k requested review from hzxa21, st1page and chenzl25 December 30, 2024 07:32

hzxa21 reviewed Jan 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(storage): support non_pk_prefix_watermark state cleaning #19889

feat(storage): support non_pk_prefix_watermark state cleaning #19889

Li0k commented Dec 23, 2024 •

edited

Loading

hzxa21 Jan 3, 2025

hzxa21 Jan 3, 2025

hzxa21 Jan 3, 2025

hzxa21 Jan 3, 2025

hzxa21 Jan 3, 2025

hzxa21 Jan 3, 2025

hzxa21 Jan 3, 2025

hzxa21 Jan 3, 2025

hzxa21 Jan 3, 2025

hzxa21 Jan 3, 2025

hzxa21 Jan 3, 2025

hzxa21 Jan 3, 2025

		@@ -42,10 +47,14 @@ pub struct SkipWatermarkIterator<I> {
		}

		impl<I: HummockIterator<Direction = Forward>> SkipWatermarkIterator<I> {

feat(storage): support non_pk_prefix_watermark state cleaning #19889

Are you sure you want to change the base?

feat(storage): support non_pk_prefix_watermark state cleaning #19889

Conversation

Li0k commented Dec 23, 2024 • edited Loading

What's changed and what's your intention?

Checklist

Documentation

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Li0k commented Dec 23, 2024 •

edited

Loading