Skip to content

feat: prune InnerForest#1635

Open
drahnr wants to merge 4 commits intomainfrom
bernhard-cleanup-inner-forest-for-main
Open

feat: prune InnerForest#1635
drahnr wants to merge 4 commits intomainfrom
bernhard-cleanup-inner-forest-for-main

Conversation

@drahnr
Copy link
Contributor

@drahnr drahnr commented Feb 3, 2026

Important

Targeting main

What

Does cleanup the InnerForest, both the lookup tables/maps and the actual SmtForest.

Required for bounding the in-memory size growth.

Context

Related #1175

How

There are a few requirements:

  1. loading from DB is rather expensive
    • limit maximum loaded storage map slot entries from DB or return limit exceeded
    • use an LRU cache for the entries to avoid DB lookups for the recent history
  2. forest cleanup
  • in order to cleanup effectively we need store additional meta info for each of vault and storage map slots
    • per block changes of accounts -> avoid full iteration over all accounts
    • forest does deduplication of leaves, but does not refcount roots -> need to do this manually, we use *_refcount tables
  • retain the latest version of a storage map root, just like we do with the DB -> required for partial queries to create smt proofs / witnesses

Caveats

  • For proofs we only need the roots, not all entries of individual storage slots
  • We need storage map slot map entries only if we get a request for ::AllEntries , the SmtProof contains the relevant SmtLeaf itself
    • we don't have a way to access all entries from the current SmtForest for a specific key

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements periodic pruning of the InnerForest in-memory data structures to bound memory growth, addressing issue #1175 about pruning account storage maps and vault tables.

Changes:

  • Added pruning mechanism that retains only the last 50 blocks (configurable via HISTORICAL_BLOCK_RETENTION constant) of account vault and storage map data
  • Introduced block-indexed tracking structures (vault_roots_by_block and storage_slots_by_block) to efficiently identify entries eligible for pruning
  • Refactored state/mod.rs to cache block.body() calls and improve code readability

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
crates/store/src/inner_forest/mod.rs Added pruning logic with new tracking data structures, HISTORICAL_BLOCK_RETENTION constant, and helper methods to prune old vault and storage map roots
crates/store/src/inner_forest/tests.rs Added comprehensive test coverage for pruning functionality, including edge cases like empty forest, young chains, boundary conditions, and multiple accounts/slots
crates/store/src/state/mod.rs Refactored to cache block.body() result and renamed block_data to block_bytes for clarity; split forest write lock acquisition into separate line
crates/store/src/db/tests.rs Renamed test function to follow consistent naming convention using db_roundtrip_ prefix
CHANGELOG.md Added enhancement entry for periodic cleanup feature

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 60 out of 81 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@drahnr drahnr force-pushed the bernhard-cleanup-inner-forest-for-main branch 2 times, most recently from 72dad48 to a3be45b Compare February 6, 2026 16:48
@drahnr
Copy link
Contributor Author

drahnr commented Feb 6, 2026

Found one more bug, when using RPC with response ::AllEntries when the last updates was older than history depth kept.

The large changeset is mostly due to tests added.

Copy link
Collaborator

@igamigo igamigo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, but left a couple of comments. Not sure if they are fully correct (or if they land within the scope of this PR exactly), but worth reviewing for the long term at least.

@Mirko-von-Leipzig Mirko-von-Leipzig self-requested a review February 9, 2026 18:30
@Mirko-von-Leipzig
Copy link
Collaborator

Mirko-von-Leipzig commented Feb 10, 2026

@bobbinth it looks like the main branch still has the CI/test related protections on it (unless this is just some sort of github artifact)

@drahnr drahnr force-pushed the bernhard-cleanup-inner-forest-for-main branch from 35b80c3 to 098c805 Compare February 10, 2026 10:54
@drahnr drahnr marked this pull request as draft February 10, 2026 15:36
@drahnr drahnr marked this pull request as ready for review February 11, 2026 14:12
.map(|((cached_block, ..), snapshot)| (*cached_block, snapshot))
.max_by_key(|(cached_block, _)| cached_block.as_u32())
.map(|(_, snapshot)| snapshot.entries.clone())
.unwrap_or_default();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct? Seems like the entry that you are trying to retrieve here could be evicted and then the new delta would be built from a default base.

Comment on lines +645 to +654
let values = self
.select_storage_map_sync_values(
account_id,
BlockNumber::GENESIS..=block_num,
entries_limit,
)
.await?;
if values.last_block_included != block_num {
return Ok(AccountStorageMapDetails::limit_exceeded(slot_name));
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this loop until all values are retrieved instead?

)
.await?;
if values.last_block_included != block_num {
return Ok(AccountStorageMapDetails::limit_exceeded(slot_name));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the query does not filter per slot, right? So really the limit is exceeded as a whole

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants