Skip to content

feat(core): Add ZSTD dictionary compression for finalized stream nodes#7108

Open
mkaruza wants to merge 2 commits intomainfrom
mkaruza/streamnode-compression
Open

feat(core): Add ZSTD dictionary compression for finalized stream nodes#7108
mkaruza wants to merge 2 commits intomainfrom
mkaruza/streamnode-compression

Conversation

@mkaruza
Copy link
Copy Markdown
Contributor

@mkaruza mkaruza commented Apr 10, 2026

Introduce per-thread ZSTD dictionary compression for stream node listpacks,
reducing memory for compressible stream data.

  • Add ZstdCompressionCtx: accumulates listpack samples until the configured
    threshold, trains a ZSTD dictionary, and holds CCtx/DCtx/CDict/DDict state.
  • StreamNodeObj gains TryCompress() (compress on node finalization in
    StreamAppendItem) and MaterializeListpack() (copy out of the decompression
    buffer before in-place mutation in XDEL/XTRIM paths).
  • GetListpack() decompresses transparently into a thread-local reuse buffer.
  • Gated by --stream_node_zstd_dict_threshold (0 = disabled); only nodes
    bigger than 512 bytes that achieve >=30% size reduction are compressed.
  • Add StreamNodeCompressTest covering XRANGE round-trip, XDEL, and XTRIM.

Add stream benchmark suite that includes performance metrics for XADD, XREAD,
XRANGE, and consumer groups with throughput, latency, and memory tracking.

@augmentcode
Copy link
Copy Markdown

augmentcode Bot commented Apr 10, 2026

🤖 Augment PR Summary

Summary: Adds optional ZSTD compression for finalized Redis stream listpack nodes to reduce memory usage while preserving stream semantics.

Changes:

  • Introduce a new internal StreamNode encoding (kZstd) alongside raw listpacks.
  • Train a per-thread ZSTD dictionary from accumulated node samples once --stream_node_zstd_dict_threshold bytes are collected.
  • Compress a node when it becomes finalized during XADD node rollover, only keeping compression if it saves at least ~30%.
  • Store compressed nodes as [u32 compressed_len][zstd frame] and decode on demand in GetListpack() using a thread-local buffer.
  • Update XDEL/XTRIM mutation paths to call StreamNode::Reset() before re-attaching a mutated listpack with SetListpack().
  • Add unit tests covering XRANGE/XDEL/XTRIM behavior when compression is enabled via the new flag.

Technical Notes: Compression is disabled by default (threshold=0), uses trained dictionaries to improve ratios, and keeps raw nodes when data is small or incompressible.

🤖 Was this summary useful? React with 👍 or 👎

Copy link
Copy Markdown

@augmentcode augmentcode Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 2 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

Comment thread src/core/stream_node.cc Outdated
Comment thread src/core/stream_node.cc
@mkaruza
Copy link
Copy Markdown
Contributor Author

mkaruza commented Apr 10, 2026

Synthetic use-case with dfly_bench result on spot-c4a:

Running dfly_bench:

dfly_bench  -n 200000 -p 6379 -qps=0  -d 64 --key_maximum=10  --command="xadd __key__ * d foooooooooobbbbbbbbbbbbbbaaaaaaaaaaaaaaaaaarrrrrrrrrrrrrrrrrrrrrrrrr" --pipeline=5

Results:

  1. No compression --stream_node_zstd_dict_threshold=0
Total time: 20.511480008s. Overall number of requests: 16000000, QPS: 1122561, P99 lat: 1070us
used_memory_human:1.22GiB
type_used_memory_stream:1310815136
  1. Compression --stream_node_zstd_dict_threshold=10000
Total time: 20.974520275s. Overall number of requests: 16000000, QPS: 1079765, P99 lat: 1024.03us
used_memory_human:85.39MiB
type_used_memory_stream:88404768

@mkaruza mkaruza requested a review from romange April 10, 2026 08:57
@mkaruza mkaruza marked this pull request as draft April 10, 2026 12:26
@mkaruza mkaruza force-pushed the mkaruza/streamnode-compression branch 2 times, most recently from 85ce029 to cf4bae9 Compare April 11, 2026 11:36
@mkaruza mkaruza marked this pull request as ready for review April 11, 2026 11:44
@mkaruza mkaruza changed the title feat(core): Add ZSTD compression for finalized stream nodes feat(core): Add ZSTD dictionary compression for finalized stream nodes Apr 11, 2026
Copy link
Copy Markdown

@augmentcode augmentcode Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 4 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

Comment thread src/core/stream_node.cc Outdated
Comment thread src/core/stream_node.cc
Comment thread src/core/stream_node.h Outdated
Comment thread src/server/stream_family_test.cc
@mkaruza mkaruza force-pushed the mkaruza/streamnode-object branch 3 times, most recently from 0a4f05d to ea88316 Compare April 13, 2026 10:22
@mkaruza mkaruza force-pushed the mkaruza/streamnode-compression branch from 2e978ad to 0d3e65e Compare April 13, 2026 12:42
Comment thread src/core/stream_node.h Outdated
Comment thread src/core/stream_node.h Outdated
Comment thread src/server/stream_family.cc Outdated
Comment thread src/core/stream_node.cc Outdated
Comment thread src/core/stream_node.cc Outdated
@mkaruza mkaruza force-pushed the mkaruza/streamnode-object branch from ea88316 to 286bfb4 Compare April 17, 2026 08:32
@mkaruza mkaruza force-pushed the mkaruza/streamnode-compression branch 5 times, most recently from 442a1cb to 1c78db8 Compare April 22, 2026 14:05
@mkaruza
Copy link
Copy Markdown
Contributor Author

mkaruza commented Apr 22, 2026

I was running this artitical benchmark (in PR) on spot-c4a. Server and benchmark were on different machines.

Run stream_benchmark.py with/without compression on single key with following arguments

--xadd --threads 16 --seed 42
  • --xadd-num-ops 1000000

Result without compression:

PRODUCER - XADD
  Operations: 1,000,000
  Duration: 79.504s
  Throughput: 12,578 ops/sec
  Latency - Min: 0.102ms, Avg: 1.267ms, P95: 2.435ms, P99: 3.486ms, Max: 559.410ms
  Memory - Before: 1.08MB, After: 93.77MB, Delta: +92.69MB

Result with compression --stream_node_zstd_dict_threshold 10000:

PRODUCER - XADD
  Operations: 1,000,000
  Duration: 80.710s
  Throughput: 12,390 ops/sec
  Latency - Min: 0.098ms, Avg: 1.287ms, P95: 2.508ms, P99: 3.629ms, Max: 545.719ms
  Memory - Before: 1.08MB, After: 59.50MB, Delta: +58.41MB
  • --xadd-num-ops 2000000

Result without compression:

PRODUCER - XADD
  Operations: 2,000,000
  Duration: 160.463s
  Throughput: 12,464 ops/sec
  Latency - Min: 0.097ms, Avg: 1.280ms, P95: 2.378ms, P99: 3.420ms, Max: 1043.355ms
  Memory - Before: 1.08MB, After: 186.51MB, Delta: +185.43MB

Result with compression --stream_node_zstd_dict_threshold 10000:

PRODUCER - XADD
  Operations: 2,000,000
  Duration: 157.834s
  Throughput: 12,672 ops/sec
  Latency - Min: 0.100ms, Avg: 1.258ms, P95: 2.415ms, P99: 3.448ms, Max: 1052.222ms
  Memory - Before: 1.08MB, After: 117.94MB, Delta: +116.87MB
$ python benchmark_analyzer.py no_compression.csv compression.csv --regression
Loaded 2 result file(s)

====================================================================================================
PERFORMANCE REGRESSION DETECTION
====================================================================================================

no_compression vs compression:

  producer:
    Throughput: 12,464 → 12,672 (+1.7%)
    P95 Latency: 2.378ms → 2.415ms (+1.6%)
    Memory Delta: +185.43MB → +116.87MB (-68.56MB)

@mkaruza mkaruza force-pushed the mkaruza/streamnode-compression branch 2 times, most recently from 0c0507d to 7aa4fe5 Compare April 23, 2026 08:40
@romange
Copy link
Copy Markdown
Collaborator

romange commented Apr 23, 2026

interesting. I would expect to see much higher compression ratio with a good dictionary. Did you check if compressing without dictionary makes compression worse? i.e. wether it moves the needle at all?

@mkaruza mkaruza force-pushed the mkaruza/streamnode-compression branch from 7aa4fe5 to 9699696 Compare April 23, 2026 12:19
@mkaruza
Copy link
Copy Markdown
Contributor Author

mkaruza commented Apr 23, 2026

interesting. I would expect to see much higher compression ratio with a good dictionary. Did you check if compressing without dictionary makes compression worse? i.e. wether it moves the needle at al

Could be that this payload randomness part contribute to not be highly compressible. Asked for quick analysis

Category                     % of payload
Truly random:                       ~59%
Structured                          ~41%

@mkaruza
Copy link
Copy Markdown
Contributor Author

mkaruza commented Apr 24, 2026

interesting. I would expect to see much higher compression ratio with a good dictionary. Did you check if compressing without dictionary makes compression worse? i.e. wether it moves the needle at all?

I have run with level 0 compression instead of dictionary so results:


==== (5_000_000 )

-- no compression

PRODUCER - XADD
  Operations: 5,000,000
  Duration: 369.966s
  Throughput: 13,515 ops/sec
  Latency - Min: 0.094ms, Avg: 1.180ms, P95: 2.274ms, P99: 3.270ms, Max: 2462.897ms
  Memory - Before: 1.08MB, After: 464.69MB, Delta: +463.61MB


-- compression

PRODUCER - XADD
  Operations: 5,000,000
  Duration: 369.703s
  Throughput: 13,524 ops/sec
  Latency - Min: 0.098ms, Avg: 1.179ms, P95: 2.251ms, P99: 3.226ms, Max: 2474.331ms
  Memory - Before: 1.08MB, After: 293.26MB, Delta: +292.18MB


-- diff

no_compression vs compression:

  producer:
    Throughput: 13,515 → 13,524 (+0.1%)
    P95 Latency: 2.274ms → 2.251ms (-1.0%)
    Memory Delta: +463.61MB → +292.18MB (-171.43MB)

==== (2_000_000 )

-- no compression

PRODUCER - XADD
  Operations: 2,000,000
  Duration: 146.842s
  Throughput: 13,620 ops/sec
  Latency - Min: 0.099ms, Avg: 1.170ms, P95: 2.264ms, P99: 3.258ms, Max: 1029.009ms
  Memory - Before: 1.08MB, After: 186.49MB, Delta: +185.41MB


-- compression

PRODUCER - XADD
  Operations: 2,000,000
  Duration: 147.055s
  Throughput: 13,600 ops/sec
  Latency - Min: 0.098ms, Avg: 1.172ms, P95: 2.251ms, P99: 3.236ms, Max: 1047.143ms
  Memory - Before: 1.08MB, After: 117.94MB, Delta: +116.86MB

There is no visible difference.

@mkaruza
Copy link
Copy Markdown
Contributor Author

mkaruza commented Apr 24, 2026

Tested with celery from https://github.com/romange/python_queue_benchmark and manually list to stream with

#!/bin/bash

for q in $(redis-cli --scan --pattern "queue_*"); do
  idx=${q#queue_}
  stream="stream_${idx}"

  redis-cli EVAL "
    local items = redis.call('LRANGE', KEYS[1], 0, -1)
    for i,v in ipairs(items) do
      redis.call('XADD', KEYS[2], '*', 'value', v)
    end
    return 1
  " 2 "$q" "$stream" > /dev/null 2>&1
done
  1. Run python enqueue_jobs.py
  2. Run provided shell script

Results:

No compression:

object_used_memory:399335050
type_used_memory_list:193224330
type_used_memory_set:1280
type_used_memory_key:160
type_used_memory_stream:206109280

With compression --stream_node_zstd_dict_threshold 10000 and dictionary compression

object_used_memory:218476010
type_used_memory_list:193224330
type_used_memory_set:1280
type_used_memory_key:160
type_used_memory_stream:25250240

With direct level 0 compression:

object_used_memory:239372426
type_used_memory_list:193224330
type_used_memory_set:1280
type_used_memory_key:160
type_used_memory_stream:46146656
  • Dictionary compression is better by factor of ~2 than direct level 0 compression
  • Memory used with compression on streams is 8x (4x) lower

@mkaruza mkaruza force-pushed the mkaruza/streamnode-compression branch from e3bf2c2 to eb3874f Compare April 24, 2026 09:53
@mkaruza mkaruza force-pushed the mkaruza/streamnode-compression branch from eb3874f to 52ecde6 Compare April 24, 2026 10:00
@mkaruza mkaruza force-pushed the mkaruza/streamnode-object branch 2 times, most recently from 7f211a9 to b344487 Compare April 29, 2026 09:11
@mkaruza mkaruza force-pushed the mkaruza/streamnode-compression branch from 52ecde6 to 3b1e7e6 Compare April 29, 2026 09:12
Base automatically changed from mkaruza/streamnode-object to main April 29, 2026 11:13
@mkaruza mkaruza force-pushed the mkaruza/streamnode-compression branch from 3b1e7e6 to 3512d3a Compare May 4, 2026 06:47
Copilot AI review requested due to automatic review settings May 4, 2026 06:47
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces optional ZSTD dictionary compression for finalized Redis stream nodes to reduce memory usage for compressible stream data, and adds a Python-based stream benchmark suite plus an analyzer to compare benchmark runs.

Changes:

  • Add StreamNodeObj support for storing stream nodes as either raw listpacks or ZSTD-compressed buffers, with transparent decompression via a thread-local reuse buffer.
  • Compress finalized stream nodes during XADD node finalization (gated by --stream_node_zstd_dict_threshold) and materialize listpacks before mutating paths (XDEL/XTRIM).
  • Add StreamNodeCompressTest coverage and introduce tools/stream benchmark + analyzer scripts with documentation.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/core/stream_node.h Defines raw vs compressed stream node representation and new APIs (compress/materialize).
src/core/stream_node.cc Implements per-thread ZSTD dictionary training, compression, and transparent decompression.
src/server/stream_family.cc Integrates node compression on finalization and materialization on mutation paths.
src/server/stream_family_test.cc Adds tests validating XRANGE/XDEL/XTRIM behavior with compressed nodes enabled.
tools/stream/stream_benchmark.py Adds a stream benchmark runner measuring throughput/latency/memory across scenarios.
tools/stream/stream_benchmark_analyzer.py Adds a CLI tool to compare multiple benchmark result CSVs.
tools/stream/README.md Documents running benchmarks and analyzing results.

Comment on lines +136 to +147
def benchmark_xread(self, num_ops: int = 10000, num_entries: int = 5000) -> BenchmarkResult:
"""Benchmark XREAD command (reading entries)"""
print(f"\nBenchmarking XREAD ({num_ops} ops on {num_entries} entries)...")

# Pre-populate stream
print(f" Populating stream with {num_entries} entries...")
for i in range(num_entries):
self.r.xadd(
self.stream_key,
_make_payload(i),
)

Comment thread tools/stream/stream_benchmark.py
Comment thread tools/stream/stream_benchmark.py
Comment thread tools/stream/stream_benchmark.py
Comment thread src/core/stream_node.cc
return lp;
}

void StreamNodeObj::Free() const {
Comment thread src/core/stream_node.cc
Comment on lines +175 to +179
static const uint32_t dict_threshold = absl::GetFlag(FLAGS_stream_node_zstd_dict_threshold);
DCHECK(dict_threshold > 0);

if (!tl_zstd_ctx) {
tl_zstd_ctx = std::make_unique<ZstdCompressionCtx>(dict_threshold);
mkaruza added 2 commits May 4, 2026 09:11
Introduce per-thread ZSTD dictionary compression for stream node listpacks,
reducing memory for compressible stream data.

- Add ZstdCompressionCtx: accumulates listpack samples until the configured
  threshold, trains a ZSTD dictionary, and holds CCtx/DCtx/CDict/DDict state.
- StreamNodeObj gains TryCompress() (compress on node finalization in
  StreamAppendItem) and MaterializeListpack() (copy out of the decompression
  buffer before in-place mutation in XDEL/XTRIM paths).
- GetListpack() decompresses transparently into a thread-local reuse buffer.
- Gated by --stream_node_zstd_dict_threshold (0 = disabled); only nodes
  >=512 bytes that achieve >=30% size reduction are compressed.
- Add StreamNodeCompressTest covering XRANGE round-trip, XDEL, and XTRIM.

Signed-off-by: mkaruza <mario@dragonflydb.io>
* Syntethic tool stream_benchmark.py stream_benchmark_analyzer.py
@mkaruza mkaruza force-pushed the mkaruza/streamnode-compression branch from 3512d3a to c7272b3 Compare May 4, 2026 07:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants