Skip to content

Fix(forkchoice): add finalization-based pruning and storage retention limits#193

Merged
dimka90 merged 3 commits intomainfrom
fix/memory-pruning
Apr 2, 2026
Merged

Fix(forkchoice): add finalization-based pruning and storage retention limits#193
dimka90 merged 3 commits intomainfrom
fix/memory-pruning

Conversation

@dimka90
Copy link
Copy Markdown
Collaborator

@dimka90 dimka90 commented Apr 1, 2026

Summary

  • Add DeleteBlock/DeleteSignedBlock/DeleteState and ForEachBlock to storage interface (memory + bolt backends)
  • Replace GetAllBlocks() full map copy in allKnownBlockSummaries() with ForEachBlock iteration — eliminates O(n) allocation per head update
  • Add pruneOnFinalization() triggered when finalization advances: prunes stale attestation data, signature caches, and non-canonical blocks/states below finalized slot (leanSpec
    prune_stale_attestation_data store.py:228-268)
  • Add storage retention limits: 21,600 blocks (~1 day) and 3,000 states (~3.3 hours) (ethlambda store.rs:83-92)
  • Add enforcePayloadCap (4,096 known payloads) and enforceAggregatedPayloadsCacheCap (8,192 keys) to bound memory when finalization stalls (ethlambda PayloadBuffer pattern)
  • Add periodic pruning safety net every 7,200 slots (~8 hours) when finalization lags >14,400 slots behind (zeam FORKCHOICE_PRUNING_INTERVAL_SLOTS pattern constants.zig:22)
  • Guard gossipSignatures behind isAggregator in processAttestationLocked — non-aggregator nodes were accumulating signatures they never use

Context

Gean had zero pruning. Every block, state, attestation payload, and signature cache entry was kept forever, causing steady memory growth during normal chain-following. On devnet-3,

Test plan

  • go build ./... compiles cleanly
  • go vet ./... passes
  • go test ./chain/forkchoice/... -count=1 — forkchoice tests pass
  • go test ./storage/... -count=1 — storage tests pass (both memory and bolt)
  • go test ./node/... -count=1 — node tests pass
  • go test -race ./chain/forkchoice/... ./storage/... — no races
  • Local 3-node devnet — nodes finalize and memory stabilizes (does not grow monotonically)
  • Verify pruned storage on finalization log appears after finalization advances
  • Restart a node after 100+ slots — verify old non-canonical blocks/states are cleaned up
  • Monitor heap with GODEBUG=memprofrate=1 over 1000+ slots — confirm bounded growth
    Closes Sync floods peers with individual blocks_by_root requests instead of batching. #189

@dimka90 dimka90 changed the title Title: fix(forkchoice): add finalization-based pruning and storage retention limits Fix(forkchoice): add finalization-based pruning and storage retention limits Apr 1, 2026
@dimka90 dimka90 requested a review from devylongs April 1, 2026 18:56
@dimka90 dimka90 requested review from morelucks and shaaibu7 April 2, 2026 09:12
dimka90 added 3 commits April 2, 2026 10:28
…concurrency cap

Addresses review feedback on the sync batching PR:

- Keep blocks fetched during the backward walk instead of discarding
  and re-fetching in a separate phase. Eliminates redundant RPCs —
  total requests cut from N+N/10 to N.
- Mark each root as pending BEFORE requesting it (inside the walk loop)
  instead of after the walk completes. Matches leanSpec
  BackfillSync._pending pattern (backfill_sync.py:164).
- Add per-peer concurrency cap of 2 in-flight requests, matching
  leanSpec MAX_CONCURRENT_REQUESTS (sync/config.py:14). Peers at
  capacity are skipped with a debug log.
… limits

Gean had zero pruning — every block, state, attestation payload, and
signature cache entry was kept forever, causing steady memory growth
during normal chain-following operation.

Changes:
- Add DeleteBlock/DeleteSignedBlock/DeleteState to storage interface
  with implementations in both memory and bolt backends
- Add ForEachBlock iterator to avoid O(n) full block map copies
- Replace GetAllBlocks() copy in allKnownBlockSummaries() with
  ForEachBlock iteration (eliminates quadratic GC pressure)
- Implement pruneOnFinalization() triggered when finalization advances:
  prune stale attestation data, aggregated payload cache, gossip
  signatures, and non-canonical blocks/states below finalized slot
  (matches leanSpec prune_stale_attestation_data store.py:228-268)
- Add storage retention limits: 21,600 blocks (~1 day) and 3,000
  states (~3.3 hours), matching ethlambda's retention policy
- Add enforcePayloadCap (4096 known payloads) and
  enforceAggregatedPayloadsCacheCap (8192 keys) to bound memory
  even when finalization stalls (ethlambda FIFO buffer pattern)
- Guard gossipSignatures behind isAggregator in
  processAttestationLocked to prevent non-aggregator nodes from
  accumulating unused signatures
…tion

When finalization stalls, pruneOnFinalization() never runs and memory
grows unboundedly. This adds a periodic pruning pass every 7,200 slots
(~8 hours) as a safety net, triggered only when finalization is lagging
more than 14,400 slots behind the current slot.

Matches zeam's FORKCHOICE_PRUNING_INTERVAL_SLOTS pattern
(constants.zig:22, chain.zig:302-326).
@dimka90 dimka90 force-pushed the fix/memory-pruning branch from 3e9c331 to c9a2235 Compare April 2, 2026 09:28
@dimka90 dimka90 merged commit 77f83e4 into main Apr 2, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Sync floods peers with individual blocks_by_root requests instead of batching.

3 participants