feat(core): Add ZSTD dictionary compression for finalized stream nodes by mkaruza · Pull Request #7108 · dragonflydb/dragonfly

mkaruza · 2026-04-10T08:35:44Z

Introduce per-thread ZSTD dictionary compression for stream node listpacks,
reducing memory for compressible stream data.

Add ZstdCompressionCtx: accumulates listpack samples until the configured
threshold, trains a ZSTD dictionary, and holds CCtx/DCtx/CDict/DDict state.
StreamNodeObj gains TryCompress() (compress on node finalization in
StreamAppendItem) and MaterializeListpack() (copy out of the decompression
buffer before in-place mutation in XDEL/XTRIM paths).
GetListpack() decompresses transparently into a thread-local reuse buffer.
Gated by --stream_node_zstd_dict_threshold (0 = disabled); only nodes
bigger than 512 bytes that achieve >=30% size reduction are compressed.
Add StreamNodeCompressTest covering XRANGE round-trip, XDEL, and XTRIM.

Add stream benchmark suite that includes performance metrics for XADD, XREAD,
XRANGE, and consumer groups with throughput, latency, and memory tracking.

augmentcode · 2026-04-10T08:43:03Z

🤖 Augment PR Summary

Summary: Adds optional ZSTD compression for finalized Redis stream listpack nodes to reduce memory usage while preserving stream semantics.

Changes:

Introduce a new internal StreamNode encoding (kZstd) alongside raw listpacks.
Train a per-thread ZSTD dictionary from accumulated node samples once --stream_node_zstd_dict_threshold bytes are collected.
Compress a node when it becomes finalized during XADD node rollover, only keeping compression if it saves at least ~30%.
Store compressed nodes as [u32 compressed_len][zstd frame] and decode on demand in GetListpack() using a thread-local buffer.
Update XDEL/XTRIM mutation paths to call StreamNode::Reset() before re-attaching a mutated listpack with SetListpack().
Add unit tests covering XRANGE/XDEL/XTRIM behavior when compression is enabled via the new flag.

Technical Notes: Compression is disabled by default (threshold=0), uses trained dictionaries to improve ratios, and keeps raw nodes when data is small or incompressible.

_{🤖 Was this summary useful? React with 👍 or 👎}

augmentcode

Review completed. 2 suggestions posted.

Comment augment review to trigger a new review at any time.

mkaruza · 2026-04-10T08:56:07Z

Synthetic use-case with dfly_bench result on spot-c4a:

Running dfly_bench:

dfly_bench  -n 200000 -p 6379 -qps=0  -d 64 --key_maximum=10  --command="xadd __key__ * d foooooooooobbbbbbbbbbbbbbaaaaaaaaaaaaaaaaaarrrrrrrrrrrrrrrrrrrrrrrrr" --pipeline=5

Results:

No compression --stream_node_zstd_dict_threshold=0

Total time: 20.511480008s. Overall number of requests: 16000000, QPS: 1122561, P99 lat: 1070us
used_memory_human:1.22GiB
type_used_memory_stream:1310815136

Compression --stream_node_zstd_dict_threshold=10000

Total time: 20.974520275s. Overall number of requests: 16000000, QPS: 1079765, P99 lat: 1024.03us
used_memory_human:85.39MiB
type_used_memory_stream:88404768

augmentcode

Review completed. 4 suggestions posted.

Comment augment review to trigger a new review at any time.

mkaruza · 2026-04-22T14:22:15Z

I was running this artitical benchmark (in PR) on spot-c4a. Server and benchmark were on different machines.

Run stream_benchmark.py with/without compression on single key with following arguments

--xadd --threads 16 --seed 42

--xadd-num-ops 1000000

Result without compression:

PRODUCER - XADD
  Operations: 1,000,000
  Duration: 79.504s
  Throughput: 12,578 ops/sec
  Latency - Min: 0.102ms, Avg: 1.267ms, P95: 2.435ms, P99: 3.486ms, Max: 559.410ms
  Memory - Before: 1.08MB, After: 93.77MB, Delta: +92.69MB

Result with compression --stream_node_zstd_dict_threshold 10000:

PRODUCER - XADD
  Operations: 1,000,000
  Duration: 80.710s
  Throughput: 12,390 ops/sec
  Latency - Min: 0.098ms, Avg: 1.287ms, P95: 2.508ms, P99: 3.629ms, Max: 545.719ms
  Memory - Before: 1.08MB, After: 59.50MB, Delta: +58.41MB

--xadd-num-ops 2000000

Result without compression:

PRODUCER - XADD
  Operations: 2,000,000
  Duration: 160.463s
  Throughput: 12,464 ops/sec
  Latency - Min: 0.097ms, Avg: 1.280ms, P95: 2.378ms, P99: 3.420ms, Max: 1043.355ms
  Memory - Before: 1.08MB, After: 186.51MB, Delta: +185.43MB

Result with compression --stream_node_zstd_dict_threshold 10000:

PRODUCER - XADD
  Operations: 2,000,000
  Duration: 157.834s
  Throughput: 12,672 ops/sec
  Latency - Min: 0.100ms, Avg: 1.258ms, P95: 2.415ms, P99: 3.448ms, Max: 1052.222ms
  Memory - Before: 1.08MB, After: 117.94MB, Delta: +116.87MB

$ python benchmark_analyzer.py no_compression.csv compression.csv --regression
Loaded 2 result file(s)

====================================================================================================
PERFORMANCE REGRESSION DETECTION
====================================================================================================

no_compression vs compression:

  producer:
    Throughput: 12,464 → 12,672 (+1.7%)
    P95 Latency: 2.378ms → 2.415ms (+1.6%)
    Memory Delta: +185.43MB → +116.87MB (-68.56MB)

romange · 2026-04-23T09:03:02Z

interesting. I would expect to see much higher compression ratio with a good dictionary. Did you check if compressing without dictionary makes compression worse? i.e. wether it moves the needle at all?

mkaruza · 2026-04-23T13:02:26Z

interesting. I would expect to see much higher compression ratio with a good dictionary. Did you check if compressing without dictionary makes compression worse? i.e. wether it moves the needle at al

Could be that this payload randomness part contribute to not be highly compressible. Asked for quick analysis

Category                     % of payload
Truly random:                       ~59%
Structured                          ~41%

mkaruza · 2026-04-24T08:28:52Z

interesting. I would expect to see much higher compression ratio with a good dictionary. Did you check if compressing without dictionary makes compression worse? i.e. wether it moves the needle at all?

I have run with level 0 compression instead of dictionary so results:


==== (5_000_000 )

-- no compression

PRODUCER - XADD
  Operations: 5,000,000
  Duration: 369.966s
  Throughput: 13,515 ops/sec
  Latency - Min: 0.094ms, Avg: 1.180ms, P95: 2.274ms, P99: 3.270ms, Max: 2462.897ms
  Memory - Before: 1.08MB, After: 464.69MB, Delta: +463.61MB


-- compression

PRODUCER - XADD
  Operations: 5,000,000
  Duration: 369.703s
  Throughput: 13,524 ops/sec
  Latency - Min: 0.098ms, Avg: 1.179ms, P95: 2.251ms, P99: 3.226ms, Max: 2474.331ms
  Memory - Before: 1.08MB, After: 293.26MB, Delta: +292.18MB


-- diff

no_compression vs compression:

  producer:
    Throughput: 13,515 → 13,524 (+0.1%)
    P95 Latency: 2.274ms → 2.251ms (-1.0%)
    Memory Delta: +463.61MB → +292.18MB (-171.43MB)

==== (2_000_000 )

-- no compression

PRODUCER - XADD
  Operations: 2,000,000
  Duration: 146.842s
  Throughput: 13,620 ops/sec
  Latency - Min: 0.099ms, Avg: 1.170ms, P95: 2.264ms, P99: 3.258ms, Max: 1029.009ms
  Memory - Before: 1.08MB, After: 186.49MB, Delta: +185.41MB


-- compression

PRODUCER - XADD
  Operations: 2,000,000
  Duration: 147.055s
  Throughput: 13,600 ops/sec
  Latency - Min: 0.098ms, Avg: 1.172ms, P95: 2.251ms, P99: 3.236ms, Max: 1047.143ms
  Memory - Before: 1.08MB, After: 117.94MB, Delta: +116.86MB

There is no visible difference.

mkaruza · 2026-04-24T09:34:46Z

Tested with celery from https://github.com/romange/python_queue_benchmark and manually list to stream with

#!/bin/bash

for q in $(redis-cli --scan --pattern "queue_*"); do
  idx=${q#queue_}
  stream="stream_${idx}"

  redis-cli EVAL "
    local items = redis.call('LRANGE', KEYS[1], 0, -1)
    for i,v in ipairs(items) do
      redis.call('XADD', KEYS[2], '*', 'value', v)
    end
    return 1
  " 2 "$q" "$stream" > /dev/null 2>&1
done

Run python enqueue_jobs.py
Run provided shell script

Results:

No compression:

object_used_memory:399335050
type_used_memory_list:193224330
type_used_memory_set:1280
type_used_memory_key:160
type_used_memory_stream:206109280

With compression --stream_node_zstd_dict_threshold 10000 and dictionary compression

object_used_memory:218476010
type_used_memory_list:193224330
type_used_memory_set:1280
type_used_memory_key:160
type_used_memory_stream:25250240

With direct level 0 compression:

object_used_memory:239372426
type_used_memory_list:193224330
type_used_memory_set:1280
type_used_memory_key:160
type_used_memory_stream:46146656

Dictionary compression is better by factor of ~2 than direct level 0 compression
Memory used with compression on streams is 8x (4x) lower

Copilot

Pull request overview

This PR introduces optional ZSTD dictionary compression for finalized Redis stream nodes to reduce memory usage for compressible stream data, and adds a Python-based stream benchmark suite plus an analyzer to compare benchmark runs.

Changes:

Add StreamNodeObj support for storing stream nodes as either raw listpacks or ZSTD-compressed buffers, with transparent decompression via a thread-local reuse buffer.
Compress finalized stream nodes during XADD node finalization (gated by --stream_node_zstd_dict_threshold) and materialize listpacks before mutating paths (XDEL/XTRIM).
Add StreamNodeCompressTest coverage and introduce tools/stream benchmark + analyzer scripts with documentation.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
`src/core/stream_node.h`	Defines raw vs compressed stream node representation and new APIs (compress/materialize).
`src/core/stream_node.cc`	Implements per-thread ZSTD dictionary training, compression, and transparent decompression.
`src/server/stream_family.cc`	Integrates node compression on finalization and materialization on mutation paths.
`src/server/stream_family_test.cc`	Adds tests validating XRANGE/XDEL/XTRIM behavior with compressed nodes enabled.
`tools/stream/stream_benchmark.py`	Adds a stream benchmark runner measuring throughput/latency/memory across scenarios.
`tools/stream/stream_benchmark_analyzer.py`	Adds a CLI tool to compare multiple benchmark result CSVs.
`tools/stream/README.md`	Documents running benchmarks and analyzing results.

+    def benchmark_xread(self, num_ops: int = 10000, num_entries: int = 5000) -> BenchmarkResult:
+        """Benchmark XREAD command (reading entries)"""
+        print(f"\nBenchmarking XREAD ({num_ops} ops on {num_entries} entries)...")
+
+        # Pre-populate stream
+        print(f"  Populating stream with {num_entries} entries...")
+        for i in range(num_entries):
+            self.r.xadd(
+                self.stream_key,
+                _make_payload(i),
+            )
+


+  return lp;
 }

 void StreamNodeObj::Free() const {


+  static const uint32_t dict_threshold = absl::GetFlag(FLAGS_stream_node_zstd_dict_threshold);
+  DCHECK(dict_threshold > 0);
+
+  if (!tl_zstd_ctx) {
+    tl_zstd_ctx = std::make_unique<ZstdCompressionCtx>(dict_threshold);


Introduce per-thread ZSTD dictionary compression for stream node listpacks, reducing memory for compressible stream data. - Add ZstdCompressionCtx: accumulates listpack samples until the configured threshold, trains a ZSTD dictionary, and holds CCtx/DCtx/CDict/DDict state. - StreamNodeObj gains TryCompress() (compress on node finalization in StreamAppendItem) and MaterializeListpack() (copy out of the decompression buffer before in-place mutation in XDEL/XTRIM paths). - GetListpack() decompresses transparently into a thread-local reuse buffer. - Gated by --stream_node_zstd_dict_threshold (0 = disabled); only nodes >=512 bytes that achieve >=30% size reduction are compressed. - Add StreamNodeCompressTest covering XRANGE round-trip, XDEL, and XTRIM. Signed-off-by: mkaruza <mario@dragonflydb.io>

* Syntethic tool stream_benchmark.py stream_benchmark_analyzer.py

augmentcode Bot reviewed Apr 10, 2026

View reviewed changes

Comment thread src/core/stream_node.cc Outdated

Comment thread src/core/stream_node.cc

mkaruza requested a review from romange April 10, 2026 08:57

mkaruza marked this pull request as draft April 10, 2026 12:26

mkaruza force-pushed the mkaruza/streamnode-compression branch 2 times, most recently from 85ce029 to cf4bae9 Compare April 11, 2026 11:36

mkaruza marked this pull request as ready for review April 11, 2026 11:44

mkaruza changed the title ~~feat(core): Add ZSTD compression for finalized stream nodes~~ feat(core): Add ZSTD dictionary compression for finalized stream nodes Apr 11, 2026

augmentcode Bot reviewed Apr 11, 2026

View reviewed changes

Comment thread src/core/stream_node.cc Outdated

Comment thread src/core/stream_node.cc

Comment thread src/core/stream_node.h Outdated

Comment thread src/server/stream_family_test.cc

mkaruza force-pushed the mkaruza/streamnode-object branch 3 times, most recently from 0a4f05d to ea88316 Compare April 13, 2026 10:22

mkaruza force-pushed the mkaruza/streamnode-compression branch from 2e978ad to 0d3e65e Compare April 13, 2026 12:42

romange reviewed Apr 16, 2026

View reviewed changes

Comment thread src/core/stream_node.h Outdated

Comment thread src/core/stream_node.h Outdated

Comment thread src/server/stream_family.cc Outdated

Comment thread src/core/stream_node.cc Outdated

Comment thread src/core/stream_node.cc Outdated

mkaruza force-pushed the mkaruza/streamnode-object branch from ea88316 to 286bfb4 Compare April 17, 2026 08:32

mkaruza force-pushed the mkaruza/streamnode-compression branch 5 times, most recently from 442a1cb to 1c78db8 Compare April 22, 2026 14:05

mkaruza force-pushed the mkaruza/streamnode-compression branch 2 times, most recently from 0c0507d to 7aa4fe5 Compare April 23, 2026 08:40

mkaruza force-pushed the mkaruza/streamnode-compression branch from 7aa4fe5 to 9699696 Compare April 23, 2026 12:19

mkaruza force-pushed the mkaruza/streamnode-compression branch from e3bf2c2 to eb3874f Compare April 24, 2026 09:53

mkaruza force-pushed the mkaruza/streamnode-compression branch from eb3874f to 52ecde6 Compare April 24, 2026 10:00

mkaruza force-pushed the mkaruza/streamnode-object branch 2 times, most recently from 7f211a9 to b344487 Compare April 29, 2026 09:11

mkaruza force-pushed the mkaruza/streamnode-compression branch from 52ecde6 to 3b1e7e6 Compare April 29, 2026 09:12

Base automatically changed from mkaruza/streamnode-object to main April 29, 2026 11:13

mkaruza force-pushed the mkaruza/streamnode-compression branch from 3b1e7e6 to 3512d3a Compare May 4, 2026 06:47

Copilot AI review requested due to automatic review settings May 4, 2026 06:47

Copilot started reviewing on behalf of mkaruza May 4, 2026 06:47 View session

Copilot AI reviewed May 4, 2026

View reviewed changes

mkaruza added 2 commits May 4, 2026 09:11

Claude code generated tool to test streams

c7272b3

* Syntethic tool stream_benchmark.py stream_benchmark_analyzer.py

mkaruza force-pushed the mkaruza/streamnode-compression branch from 3512d3a to c7272b3 Compare May 4, 2026 07:11

Conversation

mkaruza commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

augmentcode Bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

augmentcode Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mkaruza commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

augmentcode Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mkaruza commented Apr 22, 2026

Uh oh!

romange commented Apr 23, 2026

Uh oh!

mkaruza commented Apr 23, 2026

Uh oh!

mkaruza commented Apr 24, 2026

Uh oh!

mkaruza commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mkaruza commented Apr 10, 2026 •

edited

Loading

augmentcode Bot commented Apr 10, 2026 •

edited

Loading

mkaruza commented Apr 10, 2026 •

edited

Loading

mkaruza commented Apr 24, 2026 •

edited

Loading