-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-48742][SS] Virtual Column Family for RocksDB #47107
base: master
Are you sure you want to change the base?
Conversation
@@ -126,6 +126,7 @@ private[sql] class HDFSBackedStateStoreProvider extends StateStoreProvider with | |||
valueSchema: StructType, | |||
keyStateEncoderSpec: KeyStateEncoderSpec, | |||
useMultipleValuesPerKey: Boolean = false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: as discussed with Anish today, will made this a ENUM type and passed into RocksDB
@@ -2168,6 +2168,15 @@ object SQLConf { | |||
.checkValue(v => Set(1, 2).contains(v), "Valid versions are 1 and 2") | |||
.createWithDefault(2) | |||
|
|||
val STREAMING_ROCKSDB_VIRTUAL_COL_FAMILY_ENABLED = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: remove this config, and pass virtual col family as an argument similar as useColumnFamilies
per operator based
What changes were proposed in this pull request?
Introducing virtual column family to RocksDB. We attach an 8-byte-Id prefix as column family identifier for each of the key row that is put into RocksDB. The encoding and decoding of the virtual column family prefix happens at the
RocksDBKeyEncoder
layer as we can pre-allocate extra 8 bytes and avoid additional memcpy.Why are the changes needed?
Currently within the scope of the arbitrary stateful API v2 (transformWithState) project, each state variable is stored inside one physical column family within the RocksDB state store instance. Column families are also used to implement secondary indexes for various features. Each physical column family has its own memtables, creates its own SST files, and handles compaction independently on those independent SST files.
When the number of operations to RocksDB is relatively small and the number of column families is relatively large (what is considered to be “small” and “large” will be discussed in the benchmark results in the following sections), the overhead of handling small SST files becomes high, especially since all of these have to be uploaded in the snapshot dir and referenced in the metadata file for the uploaded RocksDB snapshot. Using prefix to manage different key spaces / virtual column family could reduce such overheads.
Does this PR introduce any user-facing change?
TODO: decide how we are exposing the Virtual Col Family Config to users
How was this patch tested?
Unit tests in
RocksDBStateStoreSuite
, and integration tests inTransformWithStateSuite
Was this patch authored or co-authored using generative AI tooling?
No.