DSP-23657: Introduce sstable encryption#1669
Merged
szymon-miezal merged 30 commits intomainfrom May 23, 2025
Merged
Conversation
cc55e61 to
a908d2d
Compare
489642a to
1f82b43
Compare
blambov
reviewed
Apr 25, 2025
| { | ||
| compressionMetadata = CompressionMetadata.read(channelCopy.getFile(), sliceDescriptor, encryptionOnly); | ||
| if (!encryptionOnly) | ||
| overrideLength = compressionMetadata.compressedFileLength; |
There was a problem hiding this comment.
If the caller passes a length override, it should have priority over the one in the metadata; this is likely to make EarlyOpenCachingTest fail.
There was a problem hiding this comment.
This is still valid, if a user has passed in a length override, it should take priority over this.
Perhaps make it if (overrideLength < 0) overrideLength = ...?
Note: I believe this also means that there should be no need for the !encryptionOnly check.
Author
There was a problem hiding this comment.
Fixed, running CI to double-check.
Author
There was a problem hiding this comment.
I had to keep the !encryptionOnly as otherwise sstables loaded from DSE were unreadable - I made it !encryptionOnly && overrideLength < 0.
This commit adds encryption for partition and row index data. It is achieved by implementing a special EncryptedSequentialWriter that encrypts each chunk before writing, and an EncryptedChunkReader that decrypts the chunks when reading. To enable this, the usable space in each chunk is reduced by the size of the encryption metadata plus 8 bytes used to store the CRC and actual chunk length. This design avoids the need to write and keep in memory a compressed offsets map for the indices, since each chunk's offset remains the same before and after compression. This works well for trie data, but index chunks may also include other data such as keys, offsets, and deletion times, which may not fit entirely in a chunk. To handle this, the writer supports data that spans chunk boundaries, splitting it as needed. The random access reader is updated to skip encryption metadata during reads using a new RebuffererFactory.adjustPosition() method. No additional configuration is required. The encryptor is retrieved from the data file compression settings and applied to the indices.
- Add a test that verifies the encrypted sstable are queryable - Add a test that verifies reading data without the key doesn't work
Adds a preliminary test that reads encrypted data written by DSE. There were two tables added: - one with simple PRIMARY KEY, - one with clustering columns.
That patch is inspired by DB-3845, it encrypts metadata components to ensure sensitive data is not leaked
… end Stop writing an empty compressed or encrypted chunk at the end when the file size is a multiple of the chunk size. This patch is inspired by DB-2931.
Fix the version check for metadata
…y implementations in subclasses
szymon-miezal
added a commit
that referenced
this pull request
May 23, 2025
### What is the issue The sstable components like `*Data.db`, `*Rows.db`, `*Partitions.db`, `*Statistics.db` contain sensitive data that the customer may want to protect with encryption. ### What does this PR fix and why was it fixed This patch adds support for encrypting SSTable data, indexes (partition and row), and metadata. Data encryption is integrated via the Cassandra compression framework. Index and metadata encryption is implemented by updating their respective readers and writers. It is achieved by implementing a special EncryptedSequentialWriter that encrypts each chunk before writing, and an EncryptedChunkReader that decrypts the chunks when reading. To enable this, the usable space in each chunk is reduced by the size of the encryption metadata plus 8 bytes used to store the CRC and actual chunk length. This design avoids the need to write and keep in memory a compressed offsets map for the indices, since each chunk's offset remains the same before and after compression. This works well for trie data, but index chunks may also include other data such as keys, offsets, and deletion times, which may not fit entirely in a chunk. To handle this, the writer supports data that spans chunk boundaries, splitting it as needed. The random access reader is updated to skip encryption metadata during reads using a new RebuffererFactory.adjustPosition() method. Encryption is enabled by altering the table to specify an encryptable compressor. Encryption keys are read from the local file system, under the directory specified by the JVM flag `cassandra.system_key_directory` (default: `/etc/cassandra/conf`). Note: Support for generating encryption keys will be introduced in a future patch. --------- Co-authored-by: Branimir Lambov <branimir.lambov@datastax.com>
szymon-miezal
added a commit
that referenced
this pull request
May 23, 2025
This reverts commit 229854b.
szymon-miezal
added a commit
that referenced
this pull request
Jul 18, 2025
The sstable components like `*Data.db`, `*Rows.db`, `*Partitions.db`, `*Statistics.db` contain sensitive data that the customer may want to protect with encryption. This patch adds support for encrypting SSTable data, indexes (partition and row), and metadata. Data encryption is integrated via the Cassandra compression framework. Index and metadata encryption is implemented by updating their respective readers and writers. It is achieved by implementing a special EncryptedSequentialWriter that encrypts each chunk before writing, and an EncryptedChunkReader that decrypts the chunks when reading. To enable this, the usable space in each chunk is reduced by the size of the encryption metadata plus 8 bytes used to store the CRC and actual chunk length. This design avoids the need to write and keep in memory a compressed offsets map for the indices, since each chunk's offset remains the same before and after compression. This works well for trie data, but index chunks may also include other data such as keys, offsets, and deletion times, which may not fit entirely in a chunk. To handle this, the writer supports data that spans chunk boundaries, splitting it as needed. The random access reader is updated to skip encryption metadata during reads using a new RebuffererFactory.adjustPosition() method. Encryption is enabled by altering the table to specify an encryptable compressor. Encryption keys are read from the local file system, under the directory specified by the JVM flag `cassandra.system_key_directory` (default: `/etc/cassandra/conf`). Note: Support for generating encryption keys will be introduced in a future patch. --------- Co-authored-by: Branimir Lambov <branimir.lambov@datastax.com>
szymon-miezal
added a commit
that referenced
this pull request
Jul 23, 2025
### What is the issue The sstable components like `*Data.db`, `*Rows.db`, `*Partitions.db`, `*Statistics.db` contain sensitive data that the customer may want to protect with encryption. ### What does this PR fix and why was it fixed This patch adds support for encrypting SSTable data, indexes (partition and row), and metadata. Data encryption is integrated via the Cassandra compression framework. Index and metadata encryption is implemented by updating their respective readers and writers. It is achieved by implementing a special EncryptedSequentialWriter that encrypts each chunk before writing, and an EncryptedChunkReader that decrypts the chunks when reading. To enable this, the usable space in each chunk is reduced by the size of the encryption metadata plus 8 bytes used to store the CRC and actual chunk length. This design avoids the need to write and keep in memory a compressed offsets map for the indices, since each chunk's offset remains the same before and after compression. This works well for trie data, but index chunks may also include other data such as keys, offsets, and deletion times, which may not fit entirely in a chunk. To handle this, the writer supports data that spans chunk boundaries, splitting it as needed. The random access reader is updated to skip encryption metadata during reads using a new RebuffererFactory.adjustPosition() method. Encryption is enabled by altering the table to specify an encryptable compressor. Encryption keys are read from the local file system, under the directory specified by the JVM flag `cassandra.system_key_directory` (default: `/etc/cassandra/conf`). Note: Support for generating encryption keys will be introduced in a future patch. Previously reviewed and merged under #1669. As a follow-up which addresses earlier CNDB regression this patch also uses a different file handle depending on write/read time. It is required by CNDB as the file writer has to access the file on local disk (the file wans't uploaded yet) and the reader need to access it via remote storage (as it does not have it at hand locally). Note: This patch does not introduce encryption for SAI indexes. --------- Co-authored-by: Branimir Lambov <branimir.lambov@datastax.com>
szymon-miezal
added a commit
that referenced
this pull request
Jul 24, 2025
### What is the issue The sstable components like `*Data.db`, `*Rows.db`, `*Partitions.db`, `*Statistics.db` contain sensitive data that the customer may want to protect with encryption. ### What does this PR fix and why was it fixed This patch adds support for encrypting SSTable data, indexes (partition and row), and metadata. Data encryption is integrated via the Cassandra compression framework. Index and metadata encryption is implemented by updating their respective readers and writers. It is achieved by implementing a special EncryptedSequentialWriter that encrypts each chunk before writing, and an EncryptedChunkReader that decrypts the chunks when reading. To enable this, the usable space in each chunk is reduced by the size of the encryption metadata plus 8 bytes used to store the CRC and actual chunk length. This design avoids the need to write and keep in memory a compressed offsets map for the indices, since each chunk's offset remains the same before and after compression. This works well for trie data, but index chunks may also include other data such as keys, offsets, and deletion times, which may not fit entirely in a chunk. To handle this, the writer supports data that spans chunk boundaries, splitting it as needed. The random access reader is updated to skip encryption metadata during reads using a new RebuffererFactory.adjustPosition() method. Encryption is enabled by altering the table to specify an encryptable compressor. Encryption keys are read from the local file system, under the directory specified by the JVM flag `cassandra.system_key_directory` (default: `/etc/cassandra/conf`). Note: Support for generating encryption keys will be introduced in a future patch. --------- Co-authored-by: Branimir Lambov <branimir.lambov@datastax.com>
driftx
pushed a commit
that referenced
this pull request
Jul 29, 2025
The sstable components like `*Data.db`, `*Rows.db`, `*Partitions.db`, `*Statistics.db` contain sensitive data that the customer may want to protect with encryption. This patch adds support for encrypting SSTable data, indexes (partition and row), and metadata. Data encryption is integrated via the Cassandra compression framework. Index and metadata encryption is implemented by updating their respective readers and writers. It is achieved by implementing a special EncryptedSequentialWriter that encrypts each chunk before writing, and an EncryptedChunkReader that decrypts the chunks when reading. To enable this, the usable space in each chunk is reduced by the size of the encryption metadata plus 8 bytes used to store the CRC and actual chunk length. This design avoids the need to write and keep in memory a compressed offsets map for the indices, since each chunk's offset remains the same before and after compression. This works well for trie data, but index chunks may also include other data such as keys, offsets, and deletion times, which may not fit entirely in a chunk. To handle this, the writer supports data that spans chunk boundaries, splitting it as needed. The random access reader is updated to skip encryption metadata during reads using a new RebuffererFactory.adjustPosition() method. Encryption is enabled by altering the table to specify an encryptable compressor. Encryption keys are read from the local file system, under the directory specified by the JVM flag `cassandra.system_key_directory` (default: `/etc/cassandra/conf`). Note: Support for generating encryption keys will be introduced in a future patch. Previously reviewed and merged under #1669. As a follow-up which addresses earlier CNDB regression this patch also uses a different file handle depending on write/read time. It is required by CNDB as the file writer has to access the file on local disk (the file wans't uploaded yet) and the reader need to access it via remote storage (as it does not have it at hand locally). Note: This patch does not introduce encryption for SAI indexes. --------- Co-authored-by: Branimir Lambov <branimir.lambov@datastax.com>
driftx
pushed a commit
that referenced
this pull request
Jul 30, 2025
The sstable components like `*Data.db`, `*Rows.db`, `*Partitions.db`, `*Statistics.db` contain sensitive data that the customer may want to protect with encryption. This patch adds support for encrypting SSTable data, indexes (partition and row), and metadata. Data encryption is integrated via the Cassandra compression framework. Index and metadata encryption is implemented by updating their respective readers and writers. It is achieved by implementing a special EncryptedSequentialWriter that encrypts each chunk before writing, and an EncryptedChunkReader that decrypts the chunks when reading. To enable this, the usable space in each chunk is reduced by the size of the encryption metadata plus 8 bytes used to store the CRC and actual chunk length. This design avoids the need to write and keep in memory a compressed offsets map for the indices, since each chunk's offset remains the same before and after compression. This works well for trie data, but index chunks may also include other data such as keys, offsets, and deletion times, which may not fit entirely in a chunk. To handle this, the writer supports data that spans chunk boundaries, splitting it as needed. The random access reader is updated to skip encryption metadata during reads using a new RebuffererFactory.adjustPosition() method. Encryption is enabled by altering the table to specify an encryptable compressor. Encryption keys are read from the local file system, under the directory specified by the JVM flag `cassandra.system_key_directory` (default: `/etc/cassandra/conf`). Note: Support for generating encryption keys will be introduced in a future patch. Previously reviewed and merged under #1669. As a follow-up which addresses earlier CNDB regression this patch also uses a different file handle depending on write/read time. It is required by CNDB as the file writer has to access the file on local disk (the file wans't uploaded yet) and the reader need to access it via remote storage (as it does not have it at hand locally). Note: This patch does not introduce encryption for SAI indexes. --------- Co-authored-by: Branimir Lambov <branimir.lambov@datastax.com>
emerkle826
pushed a commit
that referenced
this pull request
Oct 16, 2025
### What is the issue The sstable components like `*Data.db`, `*Rows.db`, `*Partitions.db`, `*Statistics.db` contain sensitive data that the customer may want to protect with encryption. ### What does this PR fix and why was it fixed This patch adds support for encrypting SSTable data, indexes (partition and row), and metadata. Data encryption is integrated via the Cassandra compression framework. Index and metadata encryption is implemented by updating their respective readers and writers. It is achieved by implementing a special EncryptedSequentialWriter that encrypts each chunk before writing, and an EncryptedChunkReader that decrypts the chunks when reading. To enable this, the usable space in each chunk is reduced by the size of the encryption metadata plus 8 bytes used to store the CRC and actual chunk length. This design avoids the need to write and keep in memory a compressed offsets map for the indices, since each chunk's offset remains the same before and after compression. This works well for trie data, but index chunks may also include other data such as keys, offsets, and deletion times, which may not fit entirely in a chunk. To handle this, the writer supports data that spans chunk boundaries, splitting it as needed. The random access reader is updated to skip encryption metadata during reads using a new RebuffererFactory.adjustPosition() method. Encryption is enabled by altering the table to specify an encryptable compressor. Encryption keys are read from the local file system, under the directory specified by the JVM flag `cassandra.system_key_directory` (default: `/etc/cassandra/conf`). Note: Support for generating encryption keys will be introduced in a future patch. --------- Co-authored-by: Branimir Lambov <branimir.lambov@datastax.com>
michaelsembwever
pushed a commit
that referenced
this pull request
Feb 6, 2026
The sstable components like `*Data.db`, `*Rows.db`, `*Partitions.db`, `*Statistics.db` contain sensitive data that the customer may want to protect with encryption. This patch adds support for encrypting SSTable data, indexes (partition and row), and metadata. Data encryption is integrated via the Cassandra compression framework. Index and metadata encryption is implemented by updating their respective readers and writers. It is achieved by implementing a special EncryptedSequentialWriter that encrypts each chunk before writing, and an EncryptedChunkReader that decrypts the chunks when reading. To enable this, the usable space in each chunk is reduced by the size of the encryption metadata plus 8 bytes used to store the CRC and actual chunk length. This design avoids the need to write and keep in memory a compressed offsets map for the indices, since each chunk's offset remains the same before and after compression. This works well for trie data, but index chunks may also include other data such as keys, offsets, and deletion times, which may not fit entirely in a chunk. To handle this, the writer supports data that spans chunk boundaries, splitting it as needed. The random access reader is updated to skip encryption metadata during reads using a new RebuffererFactory.adjustPosition() method. Encryption is enabled by altering the table to specify an encryptable compressor. Encryption keys are read from the local file system, under the directory specified by the JVM flag `cassandra.system_key_directory` (default: `/etc/cassandra/conf`). Note: Support for generating encryption keys will be introduced in a future patch. Previously reviewed and merged under #1669. As a follow-up which addresses earlier CNDB regression this patch also uses a different file handle depending on write/read time. It is required by CNDB as the file writer has to access the file on local disk (the file wans't uploaded yet) and the reader need to access it via remote storage (as it does not have it at hand locally). Note: This patch does not introduce encryption for SAI indexes. --------- Co-authored-by: Branimir Lambov <branimir.lambov@datastax.com>
michaelsembwever
pushed a commit
that referenced
this pull request
Feb 10, 2026
The sstable components like `*Data.db`, `*Rows.db`, `*Partitions.db`, `*Statistics.db` contain sensitive data that the customer may want to protect with encryption. This patch adds support for encrypting SSTable data, indexes (partition and row), and metadata. Data encryption is integrated via the Cassandra compression framework. Index and metadata encryption is implemented by updating their respective readers and writers. It is achieved by implementing a special EncryptedSequentialWriter that encrypts each chunk before writing, and an EncryptedChunkReader that decrypts the chunks when reading. To enable this, the usable space in each chunk is reduced by the size of the encryption metadata plus 8 bytes used to store the CRC and actual chunk length. This design avoids the need to write and keep in memory a compressed offsets map for the indices, since each chunk's offset remains the same before and after compression. This works well for trie data, but index chunks may also include other data such as keys, offsets, and deletion times, which may not fit entirely in a chunk. To handle this, the writer supports data that spans chunk boundaries, splitting it as needed. The random access reader is updated to skip encryption metadata during reads using a new RebuffererFactory.adjustPosition() method. Encryption is enabled by altering the table to specify an encryptable compressor. Encryption keys are read from the local file system, under the directory specified by the JVM flag `cassandra.system_key_directory` (default: `/etc/cassandra/conf`). Note: Support for generating encryption keys will be introduced in a future patch. Previously reviewed and merged under #1669. As a follow-up which addresses earlier CNDB regression this patch also uses a different file handle depending on write/read time. It is required by CNDB as the file writer has to access the file on local disk (the file wans't uploaded yet) and the reader need to access it via remote storage (as it does not have it at hand locally). Note: This patch does not introduce encryption for SAI indexes. --------- Co-authored-by: Branimir Lambov <branimir.lambov@datastax.com> (Rebase of commit ab9fb4c)
michaelsembwever
pushed a commit
that referenced
this pull request
Feb 11, 2026
The sstable components like `*Data.db`, `*Rows.db`, `*Partitions.db`, `*Statistics.db` contain sensitive data that the customer may want to protect with encryption. This patch adds support for encrypting SSTable data, indexes (partition and row), and metadata. Data encryption is integrated via the Cassandra compression framework. Index and metadata encryption is implemented by updating their respective readers and writers. It is achieved by implementing a special EncryptedSequentialWriter that encrypts each chunk before writing, and an EncryptedChunkReader that decrypts the chunks when reading. To enable this, the usable space in each chunk is reduced by the size of the encryption metadata plus 8 bytes used to store the CRC and actual chunk length. This design avoids the need to write and keep in memory a compressed offsets map for the indices, since each chunk's offset remains the same before and after compression. This works well for trie data, but index chunks may also include other data such as keys, offsets, and deletion times, which may not fit entirely in a chunk. To handle this, the writer supports data that spans chunk boundaries, splitting it as needed. The random access reader is updated to skip encryption metadata during reads using a new RebuffererFactory.adjustPosition() method. Encryption is enabled by altering the table to specify an encryptable compressor. Encryption keys are read from the local file system, under the directory specified by the JVM flag `cassandra.system_key_directory` (default: `/etc/cassandra/conf`). Note: Support for generating encryption keys will be introduced in a future patch. Previously reviewed and merged under #1669. As a follow-up which addresses earlier CNDB regression this patch also uses a different file handle depending on write/read time. It is required by CNDB as the file writer has to access the file on local disk (the file wans't uploaded yet) and the reader need to access it via remote storage (as it does not have it at hand locally). Note: This patch does not introduce encryption for SAI indexes. --------- Co-authored-by: Branimir Lambov <branimir.lambov@datastax.com> (Rebase of commit ab9fb4c)
michaelsembwever
pushed a commit
that referenced
this pull request
Feb 12, 2026
The sstable components like `*Data.db`, `*Rows.db`, `*Partitions.db`, `*Statistics.db` contain sensitive data that the customer may want to protect with encryption. This patch adds support for encrypting SSTable data, indexes (partition and row), and metadata. Data encryption is integrated via the Cassandra compression framework. Index and metadata encryption is implemented by updating their respective readers and writers. It is achieved by implementing a special EncryptedSequentialWriter that encrypts each chunk before writing, and an EncryptedChunkReader that decrypts the chunks when reading. To enable this, the usable space in each chunk is reduced by the size of the encryption metadata plus 8 bytes used to store the CRC and actual chunk length. This design avoids the need to write and keep in memory a compressed offsets map for the indices, since each chunk's offset remains the same before and after compression. This works well for trie data, but index chunks may also include other data such as keys, offsets, and deletion times, which may not fit entirely in a chunk. To handle this, the writer supports data that spans chunk boundaries, splitting it as needed. The random access reader is updated to skip encryption metadata during reads using a new RebuffererFactory.adjustPosition() method. Encryption is enabled by altering the table to specify an encryptable compressor. Encryption keys are read from the local file system, under the directory specified by the JVM flag `cassandra.system_key_directory` (default: `/etc/cassandra/conf`). Note: Support for generating encryption keys will be introduced in a future patch. Previously reviewed and merged under #1669. As a follow-up which addresses earlier CNDB regression this patch also uses a different file handle depending on write/read time. It is required by CNDB as the file writer has to access the file on local disk (the file wans't uploaded yet) and the reader need to access it via remote storage (as it does not have it at hand locally). Note: This patch does not introduce encryption for SAI indexes. --------- Co-authored-by: Branimir Lambov <branimir.lambov@datastax.com> (Rebase of commit ab9fb4c)
michaelsembwever
pushed a commit
that referenced
this pull request
Feb 14, 2026
The sstable components like `*Data.db`, `*Rows.db`, `*Partitions.db`, `*Statistics.db` contain sensitive data that the customer may want to protect with encryption. This patch adds support for encrypting SSTable data, indexes (partition and row), and metadata. Data encryption is integrated via the Cassandra compression framework. Index and metadata encryption is implemented by updating their respective readers and writers. It is achieved by implementing a special EncryptedSequentialWriter that encrypts each chunk before writing, and an EncryptedChunkReader that decrypts the chunks when reading. To enable this, the usable space in each chunk is reduced by the size of the encryption metadata plus 8 bytes used to store the CRC and actual chunk length. This design avoids the need to write and keep in memory a compressed offsets map for the indices, since each chunk's offset remains the same before and after compression. This works well for trie data, but index chunks may also include other data such as keys, offsets, and deletion times, which may not fit entirely in a chunk. To handle this, the writer supports data that spans chunk boundaries, splitting it as needed. The random access reader is updated to skip encryption metadata during reads using a new RebuffererFactory.adjustPosition() method. Encryption is enabled by altering the table to specify an encryptable compressor. Encryption keys are read from the local file system, under the directory specified by the JVM flag `cassandra.system_key_directory` (default: `/etc/cassandra/conf`). Note: Support for generating encryption keys will be introduced in a future patch. Previously reviewed and merged under #1669. As a follow-up which addresses earlier CNDB regression this patch also uses a different file handle depending on write/read time. It is required by CNDB as the file writer has to access the file on local disk (the file wans't uploaded yet) and the reader need to access it via remote storage (as it does not have it at hand locally). Note: This patch does not introduce encryption for SAI indexes. --------- Co-authored-by: Branimir Lambov <branimir.lambov@datastax.com> (Rebase of commit ab9fb4c)
michaelsembwever
pushed a commit
that referenced
this pull request
Feb 16, 2026
The sstable components like `*Data.db`, `*Rows.db`, `*Partitions.db`, `*Statistics.db` contain sensitive data that the customer may want to protect with encryption. This patch adds support for encrypting SSTable data, indexes (partition and row), and metadata. Data encryption is integrated via the Cassandra compression framework. Index and metadata encryption is implemented by updating their respective readers and writers. It is achieved by implementing a special EncryptedSequentialWriter that encrypts each chunk before writing, and an EncryptedChunkReader that decrypts the chunks when reading. To enable this, the usable space in each chunk is reduced by the size of the encryption metadata plus 8 bytes used to store the CRC and actual chunk length. This design avoids the need to write and keep in memory a compressed offsets map for the indices, since each chunk's offset remains the same before and after compression. This works well for trie data, but index chunks may also include other data such as keys, offsets, and deletion times, which may not fit entirely in a chunk. To handle this, the writer supports data that spans chunk boundaries, splitting it as needed. The random access reader is updated to skip encryption metadata during reads using a new RebuffererFactory.adjustPosition() method. Encryption is enabled by altering the table to specify an encryptable compressor. Encryption keys are read from the local file system, under the directory specified by the JVM flag `cassandra.system_key_directory` (default: `/etc/cassandra/conf`). Note: Support for generating encryption keys will be introduced in a future patch. Previously reviewed and merged under #1669. As a follow-up which addresses earlier CNDB regression this patch also uses a different file handle depending on write/read time. It is required by CNDB as the file writer has to access the file on local disk (the file wans't uploaded yet) and the reader need to access it via remote storage (as it does not have it at hand locally). Note: This patch does not introduce encryption for SAI indexes. --------- Co-authored-by: Branimir Lambov <branimir.lambov@datastax.com> (Rebase of commit ab9fb4c)
michaelsembwever
pushed a commit
that referenced
this pull request
Feb 27, 2026
The sstable components like `*Data.db`, `*Rows.db`, `*Partitions.db`, `*Statistics.db` contain sensitive data that the customer may want to protect with encryption. This patch adds support for encrypting SSTable data, indexes (partition and row), and metadata. Data encryption is integrated via the Cassandra compression framework. Index and metadata encryption is implemented by updating their respective readers and writers. It is achieved by implementing a special EncryptedSequentialWriter that encrypts each chunk before writing, and an EncryptedChunkReader that decrypts the chunks when reading. To enable this, the usable space in each chunk is reduced by the size of the encryption metadata plus 8 bytes used to store the CRC and actual chunk length. This design avoids the need to write and keep in memory a compressed offsets map for the indices, since each chunk's offset remains the same before and after compression. This works well for trie data, but index chunks may also include other data such as keys, offsets, and deletion times, which may not fit entirely in a chunk. To handle this, the writer supports data that spans chunk boundaries, splitting it as needed. The random access reader is updated to skip encryption metadata during reads using a new RebuffererFactory.adjustPosition() method. Encryption is enabled by altering the table to specify an encryptable compressor. Encryption keys are read from the local file system, under the directory specified by the JVM flag `cassandra.system_key_directory` (default: `/etc/cassandra/conf`). Note: Support for generating encryption keys will be introduced in a future patch. Previously reviewed and merged under #1669. As a follow-up which addresses earlier CNDB regression this patch also uses a different file handle depending on write/read time. It is required by CNDB as the file writer has to access the file on local disk (the file wans't uploaded yet) and the reader need to access it via remote storage (as it does not have it at hand locally). Note: This patch does not introduce encryption for SAI indexes. --------- Co-authored-by: Branimir Lambov <branimir.lambov@datastax.com> (Rebase of commit ab9fb4c)
michaelsembwever
pushed a commit
that referenced
this pull request
Mar 2, 2026
The sstable components like `*Data.db`, `*Rows.db`, `*Partitions.db`, `*Statistics.db` contain sensitive data that the customer may want to protect with encryption. This patch adds support for encrypting SSTable data, indexes (partition and row), and metadata. Data encryption is integrated via the Cassandra compression framework. Index and metadata encryption is implemented by updating their respective readers and writers. It is achieved by implementing a special EncryptedSequentialWriter that encrypts each chunk before writing, and an EncryptedChunkReader that decrypts the chunks when reading. To enable this, the usable space in each chunk is reduced by the size of the encryption metadata plus 8 bytes used to store the CRC and actual chunk length. This design avoids the need to write and keep in memory a compressed offsets map for the indices, since each chunk's offset remains the same before and after compression. This works well for trie data, but index chunks may also include other data such as keys, offsets, and deletion times, which may not fit entirely in a chunk. To handle this, the writer supports data that spans chunk boundaries, splitting it as needed. The random access reader is updated to skip encryption metadata during reads using a new RebuffererFactory.adjustPosition() method. Encryption is enabled by altering the table to specify an encryptable compressor. Encryption keys are read from the local file system, under the directory specified by the JVM flag `cassandra.system_key_directory` (default: `/etc/cassandra/conf`). Note: Support for generating encryption keys will be introduced in a future patch. Previously reviewed and merged under #1669. As a follow-up which addresses earlier CNDB regression this patch also uses a different file handle depending on write/read time. It is required by CNDB as the file writer has to access the file on local disk (the file wans't uploaded yet) and the reader need to access it via remote storage (as it does not have it at hand locally). Note: This patch does not introduce encryption for SAI indexes. --------- Co-authored-by: Branimir Lambov <branimir.lambov@datastax.com> (Rebase of commit ab9fb4c)
michaelsembwever
pushed a commit
that referenced
this pull request
Mar 4, 2026
The sstable components like `*Data.db`, `*Rows.db`, `*Partitions.db`, `*Statistics.db` contain sensitive data that the customer may want to protect with encryption. This patch adds support for encrypting SSTable data, indexes (partition and row), and metadata. Data encryption is integrated via the Cassandra compression framework. Index and metadata encryption is implemented by updating their respective readers and writers. It is achieved by implementing a special EncryptedSequentialWriter that encrypts each chunk before writing, and an EncryptedChunkReader that decrypts the chunks when reading. To enable this, the usable space in each chunk is reduced by the size of the encryption metadata plus 8 bytes used to store the CRC and actual chunk length. This design avoids the need to write and keep in memory a compressed offsets map for the indices, since each chunk's offset remains the same before and after compression. This works well for trie data, but index chunks may also include other data such as keys, offsets, and deletion times, which may not fit entirely in a chunk. To handle this, the writer supports data that spans chunk boundaries, splitting it as needed. The random access reader is updated to skip encryption metadata during reads using a new RebuffererFactory.adjustPosition() method. Encryption is enabled by altering the table to specify an encryptable compressor. Encryption keys are read from the local file system, under the directory specified by the JVM flag `cassandra.system_key_directory` (default: `/etc/cassandra/conf`). Note: Support for generating encryption keys will be introduced in a future patch. Previously reviewed and merged under #1669. As a follow-up which addresses earlier CNDB regression this patch also uses a different file handle depending on write/read time. It is required by CNDB as the file writer has to access the file on local disk (the file wans't uploaded yet) and the reader need to access it via remote storage (as it does not have it at hand locally). Note: This patch does not introduce encryption for SAI indexes. --------- Co-authored-by: Branimir Lambov <branimir.lambov@datastax.com> (Rebase of commit ab9fb4c)
michaelsembwever
pushed a commit
that referenced
this pull request
Mar 5, 2026
The sstable components like `*Data.db`, `*Rows.db`, `*Partitions.db`, `*Statistics.db` contain sensitive data that the customer may want to protect with encryption. This patch adds support for encrypting SSTable data, indexes (partition and row), and metadata. Data encryption is integrated via the Cassandra compression framework. Index and metadata encryption is implemented by updating their respective readers and writers. It is achieved by implementing a special EncryptedSequentialWriter that encrypts each chunk before writing, and an EncryptedChunkReader that decrypts the chunks when reading. To enable this, the usable space in each chunk is reduced by the size of the encryption metadata plus 8 bytes used to store the CRC and actual chunk length. This design avoids the need to write and keep in memory a compressed offsets map for the indices, since each chunk's offset remains the same before and after compression. This works well for trie data, but index chunks may also include other data such as keys, offsets, and deletion times, which may not fit entirely in a chunk. To handle this, the writer supports data that spans chunk boundaries, splitting it as needed. The random access reader is updated to skip encryption metadata during reads using a new RebuffererFactory.adjustPosition() method. Encryption is enabled by altering the table to specify an encryptable compressor. Encryption keys are read from the local file system, under the directory specified by the JVM flag `cassandra.system_key_directory` (default: `/etc/cassandra/conf`). Note: Support for generating encryption keys will be introduced in a future patch. Previously reviewed and merged under #1669. As a follow-up which addresses earlier CNDB regression this patch also uses a different file handle depending on write/read time. It is required by CNDB as the file writer has to access the file on local disk (the file wans't uploaded yet) and the reader need to access it via remote storage (as it does not have it at hand locally). Note: This patch does not introduce encryption for SAI indexes. --------- Co-authored-by: Branimir Lambov <branimir.lambov@datastax.com> (Rebase of commit ab9fb4c)
michaelsembwever
pushed a commit
that referenced
this pull request
Mar 25, 2026
The sstable components like `*Data.db`, `*Rows.db`, `*Partitions.db`, `*Statistics.db` contain sensitive data that the customer may want to protect with encryption. This patch adds support for encrypting SSTable data, indexes (partition and row), and metadata. Data encryption is integrated via the Cassandra compression framework. Index and metadata encryption is implemented by updating their respective readers and writers. It is achieved by implementing a special EncryptedSequentialWriter that encrypts each chunk before writing, and an EncryptedChunkReader that decrypts the chunks when reading. To enable this, the usable space in each chunk is reduced by the size of the encryption metadata plus 8 bytes used to store the CRC and actual chunk length. This design avoids the need to write and keep in memory a compressed offsets map for the indices, since each chunk's offset remains the same before and after compression. This works well for trie data, but index chunks may also include other data such as keys, offsets, and deletion times, which may not fit entirely in a chunk. To handle this, the writer supports data that spans chunk boundaries, splitting it as needed. The random access reader is updated to skip encryption metadata during reads using a new RebuffererFactory.adjustPosition() method. Encryption is enabled by altering the table to specify an encryptable compressor. Encryption keys are read from the local file system, under the directory specified by the JVM flag `cassandra.system_key_directory` (default: `/etc/cassandra/conf`). Note: Support for generating encryption keys will be introduced in a future patch. Previously reviewed and merged under #1669. As a follow-up which addresses earlier CNDB regression this patch also uses a different file handle depending on write/read time. It is required by CNDB as the file writer has to access the file on local disk (the file wans't uploaded yet) and the reader need to access it via remote storage (as it does not have it at hand locally). Note: This patch does not introduce encryption for SAI indexes. --------- Co-authored-by: Branimir Lambov <branimir.lambov@datastax.com> (Rebase of commit ab9fb4c)
michaelsembwever
pushed a commit
that referenced
this pull request
Mar 27, 2026
The sstable components like `*Data.db`, `*Rows.db`, `*Partitions.db`, `*Statistics.db` contain sensitive data that the customer may want to protect with encryption. This patch adds support for encrypting SSTable data, indexes (partition and row), and metadata. Data encryption is integrated via the Cassandra compression framework. Index and metadata encryption is implemented by updating their respective readers and writers. It is achieved by implementing a special EncryptedSequentialWriter that encrypts each chunk before writing, and an EncryptedChunkReader that decrypts the chunks when reading. To enable this, the usable space in each chunk is reduced by the size of the encryption metadata plus 8 bytes used to store the CRC and actual chunk length. This design avoids the need to write and keep in memory a compressed offsets map for the indices, since each chunk's offset remains the same before and after compression. This works well for trie data, but index chunks may also include other data such as keys, offsets, and deletion times, which may not fit entirely in a chunk. To handle this, the writer supports data that spans chunk boundaries, splitting it as needed. The random access reader is updated to skip encryption metadata during reads using a new RebuffererFactory.adjustPosition() method. Encryption is enabled by altering the table to specify an encryptable compressor. Encryption keys are read from the local file system, under the directory specified by the JVM flag `cassandra.system_key_directory` (default: `/etc/cassandra/conf`). Note: Support for generating encryption keys will be introduced in a future patch. Previously reviewed and merged under #1669. As a follow-up which addresses earlier CNDB regression this patch also uses a different file handle depending on write/read time. It is required by CNDB as the file writer has to access the file on local disk (the file wans't uploaded yet) and the reader need to access it via remote storage (as it does not have it at hand locally). Note: This patch does not introduce encryption for SAI indexes. --------- Co-authored-by: Branimir Lambov <branimir.lambov@datastax.com> (Rebase of commit ab9fb4c)
michaelsembwever
pushed a commit
that referenced
this pull request
Apr 14, 2026
The sstable components like `*Data.db`, `*Rows.db`, `*Partitions.db`, `*Statistics.db` contain sensitive data that the customer may want to protect with encryption. This patch adds support for encrypting SSTable data, indexes (partition and row), and metadata. Data encryption is integrated via the Cassandra compression framework. Index and metadata encryption is implemented by updating their respective readers and writers. It is achieved by implementing a special EncryptedSequentialWriter that encrypts each chunk before writing, and an EncryptedChunkReader that decrypts the chunks when reading. To enable this, the usable space in each chunk is reduced by the size of the encryption metadata plus 8 bytes used to store the CRC and actual chunk length. This design avoids the need to write and keep in memory a compressed offsets map for the indices, since each chunk's offset remains the same before and after compression. This works well for trie data, but index chunks may also include other data such as keys, offsets, and deletion times, which may not fit entirely in a chunk. To handle this, the writer supports data that spans chunk boundaries, splitting it as needed. The random access reader is updated to skip encryption metadata during reads using a new RebuffererFactory.adjustPosition() method. Encryption is enabled by altering the table to specify an encryptable compressor. Encryption keys are read from the local file system, under the directory specified by the JVM flag `cassandra.system_key_directory` (default: `/etc/cassandra/conf`). Note: Support for generating encryption keys will be introduced in a future patch. Previously reviewed and merged under #1669. As a follow-up which addresses earlier CNDB regression this patch also uses a different file handle depending on write/read time. It is required by CNDB as the file writer has to access the file on local disk (the file wans't uploaded yet) and the reader need to access it via remote storage (as it does not have it at hand locally). Note: This patch does not introduce encryption for SAI indexes. --------- Co-authored-by: Branimir Lambov <branimir.lambov@datastax.com> (Rebase of commit ab9fb4c)
michaelsembwever
pushed a commit
that referenced
this pull request
Apr 15, 2026
The sstable components like `*Data.db`, `*Rows.db`, `*Partitions.db`, `*Statistics.db` contain sensitive data that the customer may want to protect with encryption. This patch adds support for encrypting SSTable data, indexes (partition and row), and metadata. Data encryption is integrated via the Cassandra compression framework. Index and metadata encryption is implemented by updating their respective readers and writers. It is achieved by implementing a special EncryptedSequentialWriter that encrypts each chunk before writing, and an EncryptedChunkReader that decrypts the chunks when reading. To enable this, the usable space in each chunk is reduced by the size of the encryption metadata plus 8 bytes used to store the CRC and actual chunk length. This design avoids the need to write and keep in memory a compressed offsets map for the indices, since each chunk's offset remains the same before and after compression. This works well for trie data, but index chunks may also include other data such as keys, offsets, and deletion times, which may not fit entirely in a chunk. To handle this, the writer supports data that spans chunk boundaries, splitting it as needed. The random access reader is updated to skip encryption metadata during reads using a new RebuffererFactory.adjustPosition() method. Encryption is enabled by altering the table to specify an encryptable compressor. Encryption keys are read from the local file system, under the directory specified by the JVM flag `cassandra.system_key_directory` (default: `/etc/cassandra/conf`). Note: Support for generating encryption keys will be introduced in a future patch. Previously reviewed and merged under #1669. As a follow-up which addresses earlier CNDB regression this patch also uses a different file handle depending on write/read time. It is required by CNDB as the file writer has to access the file on local disk (the file wans't uploaded yet) and the reader need to access it via remote storage (as it does not have it at hand locally). Note: This patch does not introduce encryption for SAI indexes. --------- Co-authored-by: Branimir Lambov <branimir.lambov@datastax.com> (Rebase of commit ab9fb4c)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What is the issue
The sstable components like
*Data.db,*Rows.db,*Partitions.db,*Statistics.dbcontain sensitive data that the customer may want to protect with encryption.What does this PR fix and why was it fixed
This patch adds support for encrypting SSTable data, indexes (partition and row), and metadata.
Data encryption is integrated via the Cassandra compression framework.
Index and metadata encryption is implemented by updating their respective readers and writers.
It is achieved by implementing a special EncryptedSequentialWriter that
encrypts each chunk before writing, and an EncryptedChunkReader that
decrypts the chunks when reading. To enable this, the usable space in
each chunk is reduced by the size of the encryption metadata plus 8
bytes used to store the CRC and actual chunk length.
This design avoids the need to write and keep in memory a compressed
offsets map for the indices, since each chunk's offset remains the same
before and after compression.
This works well for trie data, but index chunks may also include other
data such as keys, offsets, and deletion times, which may not fit
entirely in a chunk. To handle this, the writer supports data that
spans chunk boundaries, splitting it as needed.
The random access reader is updated to skip encryption metadata during
reads using a new RebuffererFactory.adjustPosition() method.
Encryption is enabled by altering the table to specify an encryptable compressor.
Encryption keys are read from the local file system, under the directory specified by the JVM flag
cassandra.system_key_directory(default:/etc/cassandra/conf).Note: Support for generating encryption keys will be introduced in a future patch.