Skip to content

DSP-23657: Introduce sstable encryption#1669

Merged
szymon-miezal merged 30 commits intomainfrom
DSP-23657
May 23, 2025
Merged

DSP-23657: Introduce sstable encryption#1669
szymon-miezal merged 30 commits intomainfrom
DSP-23657

Conversation

@szymon-miezal
Copy link
Copy Markdown

@szymon-miezal szymon-miezal commented Apr 2, 2025

What is the issue

The sstable components like *Data.db, *Rows.db, *Partitions.db, *Statistics.db contain sensitive data that the customer may want to protect with encryption.

What does this PR fix and why was it fixed

This patch adds support for encrypting SSTable data, indexes (partition and row), and metadata.

Data encryption is integrated via the Cassandra compression framework.

Index and metadata encryption is implemented by updating their respective readers and writers.

It is achieved by implementing a special EncryptedSequentialWriter that
encrypts each chunk before writing, and an EncryptedChunkReader that
decrypts the chunks when reading. To enable this, the usable space in
each chunk is reduced by the size of the encryption metadata plus 8
bytes used to store the CRC and actual chunk length.

This design avoids the need to write and keep in memory a compressed
offsets map for the indices, since each chunk's offset remains the same
before and after compression.

This works well for trie data, but index chunks may also include other
data such as keys, offsets, and deletion times, which may not fit
entirely in a chunk. To handle this, the writer supports data that
spans chunk boundaries, splitting it as needed.

The random access reader is updated to skip encryption metadata during
reads using a new RebuffererFactory.adjustPosition() method.

Encryption is enabled by altering the table to specify an encryptable compressor.

Encryption keys are read from the local file system, under the directory specified by the JVM flag cassandra.system_key_directory (default: /etc/cassandra/conf).

Note: Support for generating encryption keys will be introduced in a future patch.

@szymon-miezal szymon-miezal force-pushed the DSP-23657 branch 2 times, most recently from cc55e61 to a908d2d Compare April 7, 2025 14:57
@szymon-miezal szymon-miezal force-pushed the DSP-23657 branch 4 times, most recently from 489642a to 1f82b43 Compare April 22, 2025 11:43
@szymon-miezal szymon-miezal changed the title [WIP] DSP-23657: TDE port POC [WIP] DSP-23657: Introduce data, row and partition index encryption Apr 22, 2025
@szymon-miezal szymon-miezal changed the title [WIP] DSP-23657: Introduce data, row and partition index encryption [WIP] DSP-23657: Introduce sstable encryption Apr 22, 2025
@szymon-miezal szymon-miezal changed the title [WIP] DSP-23657: Introduce sstable encryption DSP-23657: Introduce sstable encryption Apr 22, 2025
@blambov blambov self-requested a review April 23, 2025 12:54
Comment thread src/java/org/apache/cassandra/crypto/EncryptionKeyBackup.java Outdated
Comment thread src/java/org/apache/cassandra/io/compress/Encryptor.java Outdated
Comment thread src/java/org/apache/cassandra/io/compress/StatefulDecryptor.java Outdated
Comment thread src/java/org/apache/cassandra/io/util/EncryptedChunkReader.java Outdated
{
compressionMetadata = CompressionMetadata.read(channelCopy.getFile(), sliceDescriptor, encryptionOnly);
if (!encryptionOnly)
overrideLength = compressionMetadata.compressedFileLength;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the caller passes a length override, it should have priority over the one in the metadata; this is likely to make EarlyOpenCachingTest fail.

Copy link
Copy Markdown

@blambov blambov May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still valid, if a user has passed in a length override, it should take priority over this.
Perhaps make it if (overrideLength < 0) overrideLength = ...?

Note: I believe this also means that there should be no need for the !encryptionOnly check.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, running CI to double-check.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to keep the !encryptionOnly as otherwise sstables loaded from DSE were unreadable - I made it !encryptionOnly && overrideLength < 0.

Comment thread src/java/org/apache/cassandra/io/util/FileHandle.java Outdated
Comment thread src/java/org/apache/cassandra/schema/CompressionParams.java
szymon-miezal and others added 17 commits May 16, 2025 13:13
This commit adds encryption for partition and row index data.

It is achieved by implementing a special EncryptedSequentialWriter that
encrypts each chunk before writing, and an EncryptedChunkReader that
decrypts the chunks when reading. To enable this, the usable space in
each chunk is reduced by the size of the encryption metadata plus 8
bytes used to store the CRC and actual chunk length.

This design avoids the need to write and keep in memory a compressed
offsets map for the indices, since each chunk's offset remains the same
before and after compression.

This works well for trie data, but index chunks may also include other
data such as keys, offsets, and deletion times, which may not fit
entirely in a chunk. To handle this, the writer supports data that
spans chunk boundaries, splitting it as needed.

The random access reader is updated to skip encryption metadata during
reads using a new RebuffererFactory.adjustPosition() method.

No additional configuration is required. The encryptor is retrieved
from the data file compression settings and applied to the indices.
- Add a test that verifies the encrypted sstable are queryable
- Add a test that verifies reading data without the key doesn't work
Adds a preliminary test that reads encrypted data written by DSE.
There were two tables added:
- one with simple PRIMARY KEY,
- one with clustering columns.
That patch is inspired by DB-3845,
it encrypts metadata components to ensure sensitive data is not leaked
… end

Stop writing an empty compressed or encrypted chunk at the end
when the file size is a multiple of the chunk size.
This patch is inspired by DB-2931.
@szymon-miezal szymon-miezal deleted the DSP-23657 branch May 23, 2025 07:36
szymon-miezal added a commit that referenced this pull request May 23, 2025
### What is the issue
The sstable components like `*Data.db`, `*Rows.db`, `*Partitions.db`,
`*Statistics.db` contain sensitive data that the customer may want to
protect with encryption.

### What does this PR fix and why was it fixed
This patch adds support for encrypting SSTable data, indexes (partition
and row), and metadata.

Data encryption is integrated via the Cassandra compression framework.

Index and metadata encryption is implemented by updating their
respective readers and writers.

It is achieved by implementing a special EncryptedSequentialWriter that
encrypts each chunk before writing, and an EncryptedChunkReader that
decrypts the chunks when reading. To enable this, the usable space in
each chunk is reduced by the size of the encryption metadata plus 8
bytes used to store the CRC and actual chunk length.

This design avoids the need to write and keep in memory a compressed
offsets map for the indices, since each chunk's offset remains the same
before and after compression.

This works well for trie data, but index chunks may also include other
data such as keys, offsets, and deletion times, which may not fit
entirely in a chunk. To handle this, the writer supports data that
spans chunk boundaries, splitting it as needed.

The random access reader is updated to skip encryption metadata during
reads using a new RebuffererFactory.adjustPosition() method.

Encryption is enabled by altering the table to specify an encryptable
compressor.

Encryption keys are read from the local file system, under the directory
specified by the JVM flag `cassandra.system_key_directory` (default:
`/etc/cassandra/conf`).

Note: Support for generating encryption keys will be introduced in a
future patch.

---------

Co-authored-by: Branimir Lambov <branimir.lambov@datastax.com>
szymon-miezal added a commit that referenced this pull request May 23, 2025
szymon-miezal added a commit that referenced this pull request Jul 18, 2025
The sstable components like `*Data.db`, `*Rows.db`, `*Partitions.db`,
`*Statistics.db` contain sensitive data that the customer may want to
protect with encryption.

This patch adds support for encrypting SSTable data, indexes (partition
and row), and metadata.

Data encryption is integrated via the Cassandra compression framework.

Index and metadata encryption is implemented by updating their
respective readers and writers.

It is achieved by implementing a special EncryptedSequentialWriter that
encrypts each chunk before writing, and an EncryptedChunkReader that
decrypts the chunks when reading. To enable this, the usable space in
each chunk is reduced by the size of the encryption metadata plus 8
bytes used to store the CRC and actual chunk length.

This design avoids the need to write and keep in memory a compressed
offsets map for the indices, since each chunk's offset remains the same
before and after compression.

This works well for trie data, but index chunks may also include other
data such as keys, offsets, and deletion times, which may not fit
entirely in a chunk. To handle this, the writer supports data that
spans chunk boundaries, splitting it as needed.

The random access reader is updated to skip encryption metadata during
reads using a new RebuffererFactory.adjustPosition() method.

Encryption is enabled by altering the table to specify an encryptable
compressor.

Encryption keys are read from the local file system, under the directory
specified by the JVM flag `cassandra.system_key_directory` (default:
`/etc/cassandra/conf`).

Note: Support for generating encryption keys will be introduced in a
future patch.

---------

Co-authored-by: Branimir Lambov <branimir.lambov@datastax.com>
szymon-miezal added a commit that referenced this pull request Jul 23, 2025
### What is the issue
The sstable components like `*Data.db`, `*Rows.db`, `*Partitions.db`,
`*Statistics.db` contain sensitive data that the customer may want to
protect with encryption.

### What does this PR fix and why was it fixed
This patch adds support for encrypting SSTable data, indexes (partition
and row), and metadata.

Data encryption is integrated via the Cassandra compression framework.

Index and metadata encryption is implemented by updating their
respective readers and writers.

It is achieved by implementing a special EncryptedSequentialWriter that
encrypts each chunk before writing, and an EncryptedChunkReader that
decrypts the chunks when reading. To enable this, the usable space in
each chunk is reduced by the size of the encryption metadata plus 8
bytes used to store the CRC and actual chunk length.

This design avoids the need to write and keep in memory a compressed
offsets map for the indices, since each chunk's offset remains the same
before and after compression.

This works well for trie data, but index chunks may also include other
data such as keys, offsets, and deletion times, which may not fit
entirely in a chunk. To handle this, the writer supports data that
spans chunk boundaries, splitting it as needed.

The random access reader is updated to skip encryption metadata during
reads using a new RebuffererFactory.adjustPosition() method.

Encryption is enabled by altering the table to specify an encryptable
compressor.

Encryption keys are read from the local file system, under the directory
specified by the JVM flag `cassandra.system_key_directory` (default:
`/etc/cassandra/conf`).

Note: Support for generating encryption keys will be introduced in a
future patch.

Previously reviewed and merged under
#1669.

As a follow-up which addresses earlier CNDB regression this patch also
uses a different file handle depending on write/read time.
It is required by CNDB as the file writer has to access the file on
local disk (the file wans't uploaded yet)
and the reader need to access it via remote storage (as it does not have
it at hand locally).

Note: This patch does not introduce encryption for SAI indexes.

---------

Co-authored-by: Branimir Lambov <branimir.lambov@datastax.com>
szymon-miezal added a commit that referenced this pull request Jul 24, 2025
### What is the issue
The sstable components like `*Data.db`, `*Rows.db`, `*Partitions.db`,
`*Statistics.db` contain sensitive data that the customer may want to
protect with encryption.

### What does this PR fix and why was it fixed
This patch adds support for encrypting SSTable data, indexes (partition
and row), and metadata.

Data encryption is integrated via the Cassandra compression framework.

Index and metadata encryption is implemented by updating their
respective readers and writers.

It is achieved by implementing a special EncryptedSequentialWriter that
encrypts each chunk before writing, and an EncryptedChunkReader that
decrypts the chunks when reading. To enable this, the usable space in
each chunk is reduced by the size of the encryption metadata plus 8
bytes used to store the CRC and actual chunk length.

This design avoids the need to write and keep in memory a compressed
offsets map for the indices, since each chunk's offset remains the same
before and after compression.

This works well for trie data, but index chunks may also include other
data such as keys, offsets, and deletion times, which may not fit
entirely in a chunk. To handle this, the writer supports data that
spans chunk boundaries, splitting it as needed.

The random access reader is updated to skip encryption metadata during
reads using a new RebuffererFactory.adjustPosition() method.

Encryption is enabled by altering the table to specify an encryptable
compressor.

Encryption keys are read from the local file system, under the directory
specified by the JVM flag `cassandra.system_key_directory` (default:
`/etc/cassandra/conf`).

Note: Support for generating encryption keys will be introduced in a
future patch.

---------

Co-authored-by: Branimir Lambov <branimir.lambov@datastax.com>
driftx pushed a commit that referenced this pull request Jul 29, 2025
The sstable components like `*Data.db`, `*Rows.db`, `*Partitions.db`,
`*Statistics.db` contain sensitive data that the customer may want to
protect with encryption.

This patch adds support for encrypting SSTable data, indexes (partition
and row), and metadata.

Data encryption is integrated via the Cassandra compression framework.

Index and metadata encryption is implemented by updating their
respective readers and writers.

It is achieved by implementing a special EncryptedSequentialWriter that
encrypts each chunk before writing, and an EncryptedChunkReader that
decrypts the chunks when reading. To enable this, the usable space in
each chunk is reduced by the size of the encryption metadata plus 8
bytes used to store the CRC and actual chunk length.

This design avoids the need to write and keep in memory a compressed
offsets map for the indices, since each chunk's offset remains the same
before and after compression.

This works well for trie data, but index chunks may also include other
data such as keys, offsets, and deletion times, which may not fit
entirely in a chunk. To handle this, the writer supports data that
spans chunk boundaries, splitting it as needed.

The random access reader is updated to skip encryption metadata during
reads using a new RebuffererFactory.adjustPosition() method.

Encryption is enabled by altering the table to specify an encryptable
compressor.

Encryption keys are read from the local file system, under the directory
specified by the JVM flag `cassandra.system_key_directory` (default:
`/etc/cassandra/conf`).

Note: Support for generating encryption keys will be introduced in a
future patch.

Previously reviewed and merged under
#1669.

As a follow-up which addresses earlier CNDB regression this patch also
uses a different file handle depending on write/read time.
It is required by CNDB as the file writer has to access the file on
local disk (the file wans't uploaded yet)
and the reader need to access it via remote storage (as it does not have
it at hand locally).

Note: This patch does not introduce encryption for SAI indexes.

---------

Co-authored-by: Branimir Lambov <branimir.lambov@datastax.com>
driftx pushed a commit that referenced this pull request Jul 30, 2025
The sstable components like `*Data.db`, `*Rows.db`, `*Partitions.db`,
`*Statistics.db` contain sensitive data that the customer may want to
protect with encryption.

This patch adds support for encrypting SSTable data, indexes (partition
and row), and metadata.

Data encryption is integrated via the Cassandra compression framework.

Index and metadata encryption is implemented by updating their
respective readers and writers.

It is achieved by implementing a special EncryptedSequentialWriter that
encrypts each chunk before writing, and an EncryptedChunkReader that
decrypts the chunks when reading. To enable this, the usable space in
each chunk is reduced by the size of the encryption metadata plus 8
bytes used to store the CRC and actual chunk length.

This design avoids the need to write and keep in memory a compressed
offsets map for the indices, since each chunk's offset remains the same
before and after compression.

This works well for trie data, but index chunks may also include other
data such as keys, offsets, and deletion times, which may not fit
entirely in a chunk. To handle this, the writer supports data that
spans chunk boundaries, splitting it as needed.

The random access reader is updated to skip encryption metadata during
reads using a new RebuffererFactory.adjustPosition() method.

Encryption is enabled by altering the table to specify an encryptable
compressor.

Encryption keys are read from the local file system, under the directory
specified by the JVM flag `cassandra.system_key_directory` (default:
`/etc/cassandra/conf`).

Note: Support for generating encryption keys will be introduced in a
future patch.

Previously reviewed and merged under
#1669.

As a follow-up which addresses earlier CNDB regression this patch also
uses a different file handle depending on write/read time.
It is required by CNDB as the file writer has to access the file on
local disk (the file wans't uploaded yet)
and the reader need to access it via remote storage (as it does not have
it at hand locally).

Note: This patch does not introduce encryption for SAI indexes.

---------

Co-authored-by: Branimir Lambov <branimir.lambov@datastax.com>
emerkle826 pushed a commit that referenced this pull request Oct 16, 2025
### What is the issue
The sstable components like `*Data.db`, `*Rows.db`, `*Partitions.db`,
`*Statistics.db` contain sensitive data that the customer may want to
protect with encryption.

### What does this PR fix and why was it fixed
This patch adds support for encrypting SSTable data, indexes (partition
and row), and metadata.

Data encryption is integrated via the Cassandra compression framework.

Index and metadata encryption is implemented by updating their
respective readers and writers.

It is achieved by implementing a special EncryptedSequentialWriter that
encrypts each chunk before writing, and an EncryptedChunkReader that
decrypts the chunks when reading. To enable this, the usable space in
each chunk is reduced by the size of the encryption metadata plus 8
bytes used to store the CRC and actual chunk length.

This design avoids the need to write and keep in memory a compressed
offsets map for the indices, since each chunk's offset remains the same
before and after compression.

This works well for trie data, but index chunks may also include other
data such as keys, offsets, and deletion times, which may not fit
entirely in a chunk. To handle this, the writer supports data that
spans chunk boundaries, splitting it as needed.

The random access reader is updated to skip encryption metadata during
reads using a new RebuffererFactory.adjustPosition() method.

Encryption is enabled by altering the table to specify an encryptable
compressor.

Encryption keys are read from the local file system, under the directory
specified by the JVM flag `cassandra.system_key_directory` (default:
`/etc/cassandra/conf`).

Note: Support for generating encryption keys will be introduced in a
future patch.

---------

Co-authored-by: Branimir Lambov <branimir.lambov@datastax.com>
michaelsembwever pushed a commit that referenced this pull request Feb 6, 2026
The sstable components like `*Data.db`, `*Rows.db`, `*Partitions.db`,
`*Statistics.db` contain sensitive data that the customer may want to
protect with encryption.

This patch adds support for encrypting SSTable data, indexes (partition
and row), and metadata.

Data encryption is integrated via the Cassandra compression framework.

Index and metadata encryption is implemented by updating their
respective readers and writers.

It is achieved by implementing a special EncryptedSequentialWriter that
encrypts each chunk before writing, and an EncryptedChunkReader that
decrypts the chunks when reading. To enable this, the usable space in
each chunk is reduced by the size of the encryption metadata plus 8
bytes used to store the CRC and actual chunk length.

This design avoids the need to write and keep in memory a compressed
offsets map for the indices, since each chunk's offset remains the same
before and after compression.

This works well for trie data, but index chunks may also include other
data such as keys, offsets, and deletion times, which may not fit
entirely in a chunk. To handle this, the writer supports data that
spans chunk boundaries, splitting it as needed.

The random access reader is updated to skip encryption metadata during
reads using a new RebuffererFactory.adjustPosition() method.

Encryption is enabled by altering the table to specify an encryptable
compressor.

Encryption keys are read from the local file system, under the directory
specified by the JVM flag `cassandra.system_key_directory` (default:
`/etc/cassandra/conf`).

Note: Support for generating encryption keys will be introduced in a
future patch.

Previously reviewed and merged under
#1669.

As a follow-up which addresses earlier CNDB regression this patch also
uses a different file handle depending on write/read time.
It is required by CNDB as the file writer has to access the file on
local disk (the file wans't uploaded yet)
and the reader need to access it via remote storage (as it does not have
it at hand locally).

Note: This patch does not introduce encryption for SAI indexes.

---------

Co-authored-by: Branimir Lambov <branimir.lambov@datastax.com>
michaelsembwever pushed a commit that referenced this pull request Feb 10, 2026
The sstable components like `*Data.db`, `*Rows.db`, `*Partitions.db`,
`*Statistics.db` contain sensitive data that the customer may want to
protect with encryption.

This patch adds support for encrypting SSTable data, indexes (partition
and row), and metadata.

Data encryption is integrated via the Cassandra compression framework.

Index and metadata encryption is implemented by updating their
respective readers and writers.

It is achieved by implementing a special EncryptedSequentialWriter that
encrypts each chunk before writing, and an EncryptedChunkReader that
decrypts the chunks when reading. To enable this, the usable space in
each chunk is reduced by the size of the encryption metadata plus 8
bytes used to store the CRC and actual chunk length.

This design avoids the need to write and keep in memory a compressed
offsets map for the indices, since each chunk's offset remains the same
before and after compression.

This works well for trie data, but index chunks may also include other
data such as keys, offsets, and deletion times, which may not fit
entirely in a chunk. To handle this, the writer supports data that
spans chunk boundaries, splitting it as needed.

The random access reader is updated to skip encryption metadata during
reads using a new RebuffererFactory.adjustPosition() method.

Encryption is enabled by altering the table to specify an encryptable
compressor.

Encryption keys are read from the local file system, under the directory
specified by the JVM flag `cassandra.system_key_directory` (default:
`/etc/cassandra/conf`).

Note: Support for generating encryption keys will be introduced in a
future patch.

Previously reviewed and merged under
#1669.

As a follow-up which addresses earlier CNDB regression this patch also
uses a different file handle depending on write/read time.
It is required by CNDB as the file writer has to access the file on
local disk (the file wans't uploaded yet)
and the reader need to access it via remote storage (as it does not have
it at hand locally).

Note: This patch does not introduce encryption for SAI indexes.

---------

Co-authored-by: Branimir Lambov <branimir.lambov@datastax.com>

 (Rebase of commit ab9fb4c)
michaelsembwever pushed a commit that referenced this pull request Feb 11, 2026
The sstable components like `*Data.db`, `*Rows.db`, `*Partitions.db`,
`*Statistics.db` contain sensitive data that the customer may want to
protect with encryption.

This patch adds support for encrypting SSTable data, indexes (partition
and row), and metadata.

Data encryption is integrated via the Cassandra compression framework.

Index and metadata encryption is implemented by updating their
respective readers and writers.

It is achieved by implementing a special EncryptedSequentialWriter that
encrypts each chunk before writing, and an EncryptedChunkReader that
decrypts the chunks when reading. To enable this, the usable space in
each chunk is reduced by the size of the encryption metadata plus 8
bytes used to store the CRC and actual chunk length.

This design avoids the need to write and keep in memory a compressed
offsets map for the indices, since each chunk's offset remains the same
before and after compression.

This works well for trie data, but index chunks may also include other
data such as keys, offsets, and deletion times, which may not fit
entirely in a chunk. To handle this, the writer supports data that
spans chunk boundaries, splitting it as needed.

The random access reader is updated to skip encryption metadata during
reads using a new RebuffererFactory.adjustPosition() method.

Encryption is enabled by altering the table to specify an encryptable
compressor.

Encryption keys are read from the local file system, under the directory
specified by the JVM flag `cassandra.system_key_directory` (default:
`/etc/cassandra/conf`).

Note: Support for generating encryption keys will be introduced in a
future patch.

Previously reviewed and merged under
#1669.

As a follow-up which addresses earlier CNDB regression this patch also
uses a different file handle depending on write/read time.
It is required by CNDB as the file writer has to access the file on
local disk (the file wans't uploaded yet)
and the reader need to access it via remote storage (as it does not have
it at hand locally).

Note: This patch does not introduce encryption for SAI indexes.

---------

Co-authored-by: Branimir Lambov <branimir.lambov@datastax.com>

 (Rebase of commit ab9fb4c)
michaelsembwever pushed a commit that referenced this pull request Feb 12, 2026
The sstable components like `*Data.db`, `*Rows.db`, `*Partitions.db`,
`*Statistics.db` contain sensitive data that the customer may want to
protect with encryption.

This patch adds support for encrypting SSTable data, indexes (partition
and row), and metadata.

Data encryption is integrated via the Cassandra compression framework.

Index and metadata encryption is implemented by updating their
respective readers and writers.

It is achieved by implementing a special EncryptedSequentialWriter that
encrypts each chunk before writing, and an EncryptedChunkReader that
decrypts the chunks when reading. To enable this, the usable space in
each chunk is reduced by the size of the encryption metadata plus 8
bytes used to store the CRC and actual chunk length.

This design avoids the need to write and keep in memory a compressed
offsets map for the indices, since each chunk's offset remains the same
before and after compression.

This works well for trie data, but index chunks may also include other
data such as keys, offsets, and deletion times, which may not fit
entirely in a chunk. To handle this, the writer supports data that
spans chunk boundaries, splitting it as needed.

The random access reader is updated to skip encryption metadata during
reads using a new RebuffererFactory.adjustPosition() method.

Encryption is enabled by altering the table to specify an encryptable
compressor.

Encryption keys are read from the local file system, under the directory
specified by the JVM flag `cassandra.system_key_directory` (default:
`/etc/cassandra/conf`).

Note: Support for generating encryption keys will be introduced in a
future patch.

Previously reviewed and merged under
#1669.

As a follow-up which addresses earlier CNDB regression this patch also
uses a different file handle depending on write/read time.
It is required by CNDB as the file writer has to access the file on
local disk (the file wans't uploaded yet)
and the reader need to access it via remote storage (as it does not have
it at hand locally).

Note: This patch does not introduce encryption for SAI indexes.

---------

Co-authored-by: Branimir Lambov <branimir.lambov@datastax.com>

 (Rebase of commit ab9fb4c)
michaelsembwever pushed a commit that referenced this pull request Feb 14, 2026
The sstable components like `*Data.db`, `*Rows.db`, `*Partitions.db`,
`*Statistics.db` contain sensitive data that the customer may want to
protect with encryption.

This patch adds support for encrypting SSTable data, indexes (partition
and row), and metadata.

Data encryption is integrated via the Cassandra compression framework.

Index and metadata encryption is implemented by updating their
respective readers and writers.

It is achieved by implementing a special EncryptedSequentialWriter that
encrypts each chunk before writing, and an EncryptedChunkReader that
decrypts the chunks when reading. To enable this, the usable space in
each chunk is reduced by the size of the encryption metadata plus 8
bytes used to store the CRC and actual chunk length.

This design avoids the need to write and keep in memory a compressed
offsets map for the indices, since each chunk's offset remains the same
before and after compression.

This works well for trie data, but index chunks may also include other
data such as keys, offsets, and deletion times, which may not fit
entirely in a chunk. To handle this, the writer supports data that
spans chunk boundaries, splitting it as needed.

The random access reader is updated to skip encryption metadata during
reads using a new RebuffererFactory.adjustPosition() method.

Encryption is enabled by altering the table to specify an encryptable
compressor.

Encryption keys are read from the local file system, under the directory
specified by the JVM flag `cassandra.system_key_directory` (default:
`/etc/cassandra/conf`).

Note: Support for generating encryption keys will be introduced in a
future patch.

Previously reviewed and merged under
#1669.

As a follow-up which addresses earlier CNDB regression this patch also
uses a different file handle depending on write/read time.
It is required by CNDB as the file writer has to access the file on
local disk (the file wans't uploaded yet)
and the reader need to access it via remote storage (as it does not have
it at hand locally).

Note: This patch does not introduce encryption for SAI indexes.

---------

Co-authored-by: Branimir Lambov <branimir.lambov@datastax.com>

 (Rebase of commit ab9fb4c)
michaelsembwever pushed a commit that referenced this pull request Feb 16, 2026
The sstable components like `*Data.db`, `*Rows.db`, `*Partitions.db`,
`*Statistics.db` contain sensitive data that the customer may want to
protect with encryption.

This patch adds support for encrypting SSTable data, indexes (partition
and row), and metadata.

Data encryption is integrated via the Cassandra compression framework.

Index and metadata encryption is implemented by updating their
respective readers and writers.

It is achieved by implementing a special EncryptedSequentialWriter that
encrypts each chunk before writing, and an EncryptedChunkReader that
decrypts the chunks when reading. To enable this, the usable space in
each chunk is reduced by the size of the encryption metadata plus 8
bytes used to store the CRC and actual chunk length.

This design avoids the need to write and keep in memory a compressed
offsets map for the indices, since each chunk's offset remains the same
before and after compression.

This works well for trie data, but index chunks may also include other
data such as keys, offsets, and deletion times, which may not fit
entirely in a chunk. To handle this, the writer supports data that
spans chunk boundaries, splitting it as needed.

The random access reader is updated to skip encryption metadata during
reads using a new RebuffererFactory.adjustPosition() method.

Encryption is enabled by altering the table to specify an encryptable
compressor.

Encryption keys are read from the local file system, under the directory
specified by the JVM flag `cassandra.system_key_directory` (default:
`/etc/cassandra/conf`).

Note: Support for generating encryption keys will be introduced in a
future patch.

Previously reviewed and merged under
#1669.

As a follow-up which addresses earlier CNDB regression this patch also
uses a different file handle depending on write/read time.
It is required by CNDB as the file writer has to access the file on
local disk (the file wans't uploaded yet)
and the reader need to access it via remote storage (as it does not have
it at hand locally).

Note: This patch does not introduce encryption for SAI indexes.

---------

Co-authored-by: Branimir Lambov <branimir.lambov@datastax.com>

 (Rebase of commit ab9fb4c)
michaelsembwever pushed a commit that referenced this pull request Feb 27, 2026
The sstable components like `*Data.db`, `*Rows.db`, `*Partitions.db`,
`*Statistics.db` contain sensitive data that the customer may want to
protect with encryption.

This patch adds support for encrypting SSTable data, indexes (partition
and row), and metadata.

Data encryption is integrated via the Cassandra compression framework.

Index and metadata encryption is implemented by updating their
respective readers and writers.

It is achieved by implementing a special EncryptedSequentialWriter that
encrypts each chunk before writing, and an EncryptedChunkReader that
decrypts the chunks when reading. To enable this, the usable space in
each chunk is reduced by the size of the encryption metadata plus 8
bytes used to store the CRC and actual chunk length.

This design avoids the need to write and keep in memory a compressed
offsets map for the indices, since each chunk's offset remains the same
before and after compression.

This works well for trie data, but index chunks may also include other
data such as keys, offsets, and deletion times, which may not fit
entirely in a chunk. To handle this, the writer supports data that
spans chunk boundaries, splitting it as needed.

The random access reader is updated to skip encryption metadata during
reads using a new RebuffererFactory.adjustPosition() method.

Encryption is enabled by altering the table to specify an encryptable
compressor.

Encryption keys are read from the local file system, under the directory
specified by the JVM flag `cassandra.system_key_directory` (default:
`/etc/cassandra/conf`).

Note: Support for generating encryption keys will be introduced in a
future patch.

Previously reviewed and merged under
#1669.

As a follow-up which addresses earlier CNDB regression this patch also
uses a different file handle depending on write/read time.
It is required by CNDB as the file writer has to access the file on
local disk (the file wans't uploaded yet)
and the reader need to access it via remote storage (as it does not have
it at hand locally).

Note: This patch does not introduce encryption for SAI indexes.

---------

Co-authored-by: Branimir Lambov <branimir.lambov@datastax.com>

 (Rebase of commit ab9fb4c)
michaelsembwever pushed a commit that referenced this pull request Mar 2, 2026
The sstable components like `*Data.db`, `*Rows.db`, `*Partitions.db`,
`*Statistics.db` contain sensitive data that the customer may want to
protect with encryption.

This patch adds support for encrypting SSTable data, indexes (partition
and row), and metadata.

Data encryption is integrated via the Cassandra compression framework.

Index and metadata encryption is implemented by updating their
respective readers and writers.

It is achieved by implementing a special EncryptedSequentialWriter that
encrypts each chunk before writing, and an EncryptedChunkReader that
decrypts the chunks when reading. To enable this, the usable space in
each chunk is reduced by the size of the encryption metadata plus 8
bytes used to store the CRC and actual chunk length.

This design avoids the need to write and keep in memory a compressed
offsets map for the indices, since each chunk's offset remains the same
before and after compression.

This works well for trie data, but index chunks may also include other
data such as keys, offsets, and deletion times, which may not fit
entirely in a chunk. To handle this, the writer supports data that
spans chunk boundaries, splitting it as needed.

The random access reader is updated to skip encryption metadata during
reads using a new RebuffererFactory.adjustPosition() method.

Encryption is enabled by altering the table to specify an encryptable
compressor.

Encryption keys are read from the local file system, under the directory
specified by the JVM flag `cassandra.system_key_directory` (default:
`/etc/cassandra/conf`).

Note: Support for generating encryption keys will be introduced in a
future patch.

Previously reviewed and merged under
#1669.

As a follow-up which addresses earlier CNDB regression this patch also
uses a different file handle depending on write/read time.
It is required by CNDB as the file writer has to access the file on
local disk (the file wans't uploaded yet)
and the reader need to access it via remote storage (as it does not have
it at hand locally).

Note: This patch does not introduce encryption for SAI indexes.

---------

Co-authored-by: Branimir Lambov <branimir.lambov@datastax.com>

 (Rebase of commit ab9fb4c)
michaelsembwever pushed a commit that referenced this pull request Mar 4, 2026
The sstable components like `*Data.db`, `*Rows.db`, `*Partitions.db`,
`*Statistics.db` contain sensitive data that the customer may want to
protect with encryption.

This patch adds support for encrypting SSTable data, indexes (partition
and row), and metadata.

Data encryption is integrated via the Cassandra compression framework.

Index and metadata encryption is implemented by updating their
respective readers and writers.

It is achieved by implementing a special EncryptedSequentialWriter that
encrypts each chunk before writing, and an EncryptedChunkReader that
decrypts the chunks when reading. To enable this, the usable space in
each chunk is reduced by the size of the encryption metadata plus 8
bytes used to store the CRC and actual chunk length.

This design avoids the need to write and keep in memory a compressed
offsets map for the indices, since each chunk's offset remains the same
before and after compression.

This works well for trie data, but index chunks may also include other
data such as keys, offsets, and deletion times, which may not fit
entirely in a chunk. To handle this, the writer supports data that
spans chunk boundaries, splitting it as needed.

The random access reader is updated to skip encryption metadata during
reads using a new RebuffererFactory.adjustPosition() method.

Encryption is enabled by altering the table to specify an encryptable
compressor.

Encryption keys are read from the local file system, under the directory
specified by the JVM flag `cassandra.system_key_directory` (default:
`/etc/cassandra/conf`).

Note: Support for generating encryption keys will be introduced in a
future patch.

Previously reviewed and merged under
#1669.

As a follow-up which addresses earlier CNDB regression this patch also
uses a different file handle depending on write/read time.
It is required by CNDB as the file writer has to access the file on
local disk (the file wans't uploaded yet)
and the reader need to access it via remote storage (as it does not have
it at hand locally).

Note: This patch does not introduce encryption for SAI indexes.

---------

Co-authored-by: Branimir Lambov <branimir.lambov@datastax.com>

 (Rebase of commit ab9fb4c)
michaelsembwever pushed a commit that referenced this pull request Mar 5, 2026
The sstable components like `*Data.db`, `*Rows.db`, `*Partitions.db`,
`*Statistics.db` contain sensitive data that the customer may want to
protect with encryption.

This patch adds support for encrypting SSTable data, indexes (partition
and row), and metadata.

Data encryption is integrated via the Cassandra compression framework.

Index and metadata encryption is implemented by updating their
respective readers and writers.

It is achieved by implementing a special EncryptedSequentialWriter that
encrypts each chunk before writing, and an EncryptedChunkReader that
decrypts the chunks when reading. To enable this, the usable space in
each chunk is reduced by the size of the encryption metadata plus 8
bytes used to store the CRC and actual chunk length.

This design avoids the need to write and keep in memory a compressed
offsets map for the indices, since each chunk's offset remains the same
before and after compression.

This works well for trie data, but index chunks may also include other
data such as keys, offsets, and deletion times, which may not fit
entirely in a chunk. To handle this, the writer supports data that
spans chunk boundaries, splitting it as needed.

The random access reader is updated to skip encryption metadata during
reads using a new RebuffererFactory.adjustPosition() method.

Encryption is enabled by altering the table to specify an encryptable
compressor.

Encryption keys are read from the local file system, under the directory
specified by the JVM flag `cassandra.system_key_directory` (default:
`/etc/cassandra/conf`).

Note: Support for generating encryption keys will be introduced in a
future patch.

Previously reviewed and merged under
#1669.

As a follow-up which addresses earlier CNDB regression this patch also
uses a different file handle depending on write/read time.
It is required by CNDB as the file writer has to access the file on
local disk (the file wans't uploaded yet)
and the reader need to access it via remote storage (as it does not have
it at hand locally).

Note: This patch does not introduce encryption for SAI indexes.

---------

Co-authored-by: Branimir Lambov <branimir.lambov@datastax.com>

 (Rebase of commit ab9fb4c)
michaelsembwever pushed a commit that referenced this pull request Mar 25, 2026
The sstable components like `*Data.db`, `*Rows.db`, `*Partitions.db`,
`*Statistics.db` contain sensitive data that the customer may want to
protect with encryption.

This patch adds support for encrypting SSTable data, indexes (partition
and row), and metadata.

Data encryption is integrated via the Cassandra compression framework.

Index and metadata encryption is implemented by updating their
respective readers and writers.

It is achieved by implementing a special EncryptedSequentialWriter that
encrypts each chunk before writing, and an EncryptedChunkReader that
decrypts the chunks when reading. To enable this, the usable space in
each chunk is reduced by the size of the encryption metadata plus 8
bytes used to store the CRC and actual chunk length.

This design avoids the need to write and keep in memory a compressed
offsets map for the indices, since each chunk's offset remains the same
before and after compression.

This works well for trie data, but index chunks may also include other
data such as keys, offsets, and deletion times, which may not fit
entirely in a chunk. To handle this, the writer supports data that
spans chunk boundaries, splitting it as needed.

The random access reader is updated to skip encryption metadata during
reads using a new RebuffererFactory.adjustPosition() method.

Encryption is enabled by altering the table to specify an encryptable
compressor.

Encryption keys are read from the local file system, under the directory
specified by the JVM flag `cassandra.system_key_directory` (default:
`/etc/cassandra/conf`).

Note: Support for generating encryption keys will be introduced in a
future patch.

Previously reviewed and merged under
#1669.

As a follow-up which addresses earlier CNDB regression this patch also
uses a different file handle depending on write/read time.
It is required by CNDB as the file writer has to access the file on
local disk (the file wans't uploaded yet)
and the reader need to access it via remote storage (as it does not have
it at hand locally).

Note: This patch does not introduce encryption for SAI indexes.

---------

Co-authored-by: Branimir Lambov <branimir.lambov@datastax.com>

 (Rebase of commit ab9fb4c)
michaelsembwever pushed a commit that referenced this pull request Mar 27, 2026
The sstable components like `*Data.db`, `*Rows.db`, `*Partitions.db`,
`*Statistics.db` contain sensitive data that the customer may want to
protect with encryption.

This patch adds support for encrypting SSTable data, indexes (partition
and row), and metadata.

Data encryption is integrated via the Cassandra compression framework.

Index and metadata encryption is implemented by updating their
respective readers and writers.

It is achieved by implementing a special EncryptedSequentialWriter that
encrypts each chunk before writing, and an EncryptedChunkReader that
decrypts the chunks when reading. To enable this, the usable space in
each chunk is reduced by the size of the encryption metadata plus 8
bytes used to store the CRC and actual chunk length.

This design avoids the need to write and keep in memory a compressed
offsets map for the indices, since each chunk's offset remains the same
before and after compression.

This works well for trie data, but index chunks may also include other
data such as keys, offsets, and deletion times, which may not fit
entirely in a chunk. To handle this, the writer supports data that
spans chunk boundaries, splitting it as needed.

The random access reader is updated to skip encryption metadata during
reads using a new RebuffererFactory.adjustPosition() method.

Encryption is enabled by altering the table to specify an encryptable
compressor.

Encryption keys are read from the local file system, under the directory
specified by the JVM flag `cassandra.system_key_directory` (default:
`/etc/cassandra/conf`).

Note: Support for generating encryption keys will be introduced in a
future patch.

Previously reviewed and merged under
#1669.

As a follow-up which addresses earlier CNDB regression this patch also
uses a different file handle depending on write/read time.
It is required by CNDB as the file writer has to access the file on
local disk (the file wans't uploaded yet)
and the reader need to access it via remote storage (as it does not have
it at hand locally).

Note: This patch does not introduce encryption for SAI indexes.

---------

Co-authored-by: Branimir Lambov <branimir.lambov@datastax.com>

 (Rebase of commit ab9fb4c)
michaelsembwever pushed a commit that referenced this pull request Apr 14, 2026
The sstable components like `*Data.db`, `*Rows.db`, `*Partitions.db`,
`*Statistics.db` contain sensitive data that the customer may want to
protect with encryption.

This patch adds support for encrypting SSTable data, indexes (partition
and row), and metadata.

Data encryption is integrated via the Cassandra compression framework.

Index and metadata encryption is implemented by updating their
respective readers and writers.

It is achieved by implementing a special EncryptedSequentialWriter that
encrypts each chunk before writing, and an EncryptedChunkReader that
decrypts the chunks when reading. To enable this, the usable space in
each chunk is reduced by the size of the encryption metadata plus 8
bytes used to store the CRC and actual chunk length.

This design avoids the need to write and keep in memory a compressed
offsets map for the indices, since each chunk's offset remains the same
before and after compression.

This works well for trie data, but index chunks may also include other
data such as keys, offsets, and deletion times, which may not fit
entirely in a chunk. To handle this, the writer supports data that
spans chunk boundaries, splitting it as needed.

The random access reader is updated to skip encryption metadata during
reads using a new RebuffererFactory.adjustPosition() method.

Encryption is enabled by altering the table to specify an encryptable
compressor.

Encryption keys are read from the local file system, under the directory
specified by the JVM flag `cassandra.system_key_directory` (default:
`/etc/cassandra/conf`).

Note: Support for generating encryption keys will be introduced in a
future patch.

Previously reviewed and merged under
#1669.

As a follow-up which addresses earlier CNDB regression this patch also
uses a different file handle depending on write/read time.
It is required by CNDB as the file writer has to access the file on
local disk (the file wans't uploaded yet)
and the reader need to access it via remote storage (as it does not have
it at hand locally).

Note: This patch does not introduce encryption for SAI indexes.

---------

Co-authored-by: Branimir Lambov <branimir.lambov@datastax.com>

 (Rebase of commit ab9fb4c)
michaelsembwever pushed a commit that referenced this pull request Apr 15, 2026
The sstable components like `*Data.db`, `*Rows.db`, `*Partitions.db`,
`*Statistics.db` contain sensitive data that the customer may want to
protect with encryption.

This patch adds support for encrypting SSTable data, indexes (partition
and row), and metadata.

Data encryption is integrated via the Cassandra compression framework.

Index and metadata encryption is implemented by updating their
respective readers and writers.

It is achieved by implementing a special EncryptedSequentialWriter that
encrypts each chunk before writing, and an EncryptedChunkReader that
decrypts the chunks when reading. To enable this, the usable space in
each chunk is reduced by the size of the encryption metadata plus 8
bytes used to store the CRC and actual chunk length.

This design avoids the need to write and keep in memory a compressed
offsets map for the indices, since each chunk's offset remains the same
before and after compression.

This works well for trie data, but index chunks may also include other
data such as keys, offsets, and deletion times, which may not fit
entirely in a chunk. To handle this, the writer supports data that
spans chunk boundaries, splitting it as needed.

The random access reader is updated to skip encryption metadata during
reads using a new RebuffererFactory.adjustPosition() method.

Encryption is enabled by altering the table to specify an encryptable
compressor.

Encryption keys are read from the local file system, under the directory
specified by the JVM flag `cassandra.system_key_directory` (default:
`/etc/cassandra/conf`).

Note: Support for generating encryption keys will be introduced in a
future patch.

Previously reviewed and merged under
#1669.

As a follow-up which addresses earlier CNDB regression this patch also
uses a different file handle depending on write/read time.
It is required by CNDB as the file writer has to access the file on
local disk (the file wans't uploaded yet)
and the reader need to access it via remote storage (as it does not have
it at hand locally).

Note: This patch does not introduce encryption for SAI indexes.

---------

Co-authored-by: Branimir Lambov <branimir.lambov@datastax.com>

 (Rebase of commit ab9fb4c)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants