Fix(forkchoice): add finalization-based pruning and storage retention limits#193
Merged
Fix(forkchoice): add finalization-based pruning and storage retention limits#193
Conversation
devylongs
approved these changes
Apr 2, 2026
shaaibu7
approved these changes
Apr 2, 2026
…concurrency cap Addresses review feedback on the sync batching PR: - Keep blocks fetched during the backward walk instead of discarding and re-fetching in a separate phase. Eliminates redundant RPCs — total requests cut from N+N/10 to N. - Mark each root as pending BEFORE requesting it (inside the walk loop) instead of after the walk completes. Matches leanSpec BackfillSync._pending pattern (backfill_sync.py:164). - Add per-peer concurrency cap of 2 in-flight requests, matching leanSpec MAX_CONCURRENT_REQUESTS (sync/config.py:14). Peers at capacity are skipped with a debug log.
… limits Gean had zero pruning — every block, state, attestation payload, and signature cache entry was kept forever, causing steady memory growth during normal chain-following operation. Changes: - Add DeleteBlock/DeleteSignedBlock/DeleteState to storage interface with implementations in both memory and bolt backends - Add ForEachBlock iterator to avoid O(n) full block map copies - Replace GetAllBlocks() copy in allKnownBlockSummaries() with ForEachBlock iteration (eliminates quadratic GC pressure) - Implement pruneOnFinalization() triggered when finalization advances: prune stale attestation data, aggregated payload cache, gossip signatures, and non-canonical blocks/states below finalized slot (matches leanSpec prune_stale_attestation_data store.py:228-268) - Add storage retention limits: 21,600 blocks (~1 day) and 3,000 states (~3.3 hours), matching ethlambda's retention policy - Add enforcePayloadCap (4096 known payloads) and enforceAggregatedPayloadsCacheCap (8192 keys) to bound memory even when finalization stalls (ethlambda FIFO buffer pattern) - Guard gossipSignatures behind isAggregator in processAttestationLocked to prevent non-aggregator nodes from accumulating unused signatures
…tion When finalization stalls, pruneOnFinalization() never runs and memory grows unboundedly. This adds a periodic pruning pass every 7,200 slots (~8 hours) as a safety net, triggered only when finalization is lagging more than 14,400 slots behind the current slot. Matches zeam's FORKCHOICE_PRUNING_INTERVAL_SLOTS pattern (constants.zig:22, chain.zig:302-326).
3e9c331 to
c9a2235
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
DeleteBlock/DeleteSignedBlock/DeleteStateandForEachBlockto storage interface (memory + bolt backends)GetAllBlocks()full map copy inallKnownBlockSummaries()withForEachBlockiteration — eliminates O(n) allocation per head updatepruneOnFinalization()triggered when finalization advances: prunes stale attestation data, signature caches, and non-canonical blocks/states below finalized slot (leanSpecprune_stale_attestation_datastore.py:228-268)enforcePayloadCap(4,096 known payloads) andenforceAggregatedPayloadsCacheCap(8,192 keys) to bound memory when finalization stalls (ethlambdaPayloadBufferpattern)FORKCHOICE_PRUNING_INTERVAL_SLOTSpattern constants.zig:22)gossipSignaturesbehindisAggregatorinprocessAttestationLocked— non-aggregator nodes were accumulating signatures they never useContext
Gean had zero pruning. Every block, state, attestation payload, and signature cache entry was kept forever, causing steady memory growth during normal chain-following. On devnet-3,
Test plan
go build ./...compiles cleanlygo vet ./...passesgo test ./chain/forkchoice/... -count=1— forkchoice tests passgo test ./storage/... -count=1— storage tests pass (both memory and bolt)go test ./node/... -count=1— node tests passgo test -race ./chain/forkchoice/... ./storage/...— no racespruned storage on finalizationlog appears after finalization advancesGODEBUG=memprofrate=1over 1000+ slots — confirm bounded growthCloses Sync floods peers with individual blocks_by_root requests instead of batching. #189