Skip to content

Conversation

@alarso16
Copy link
Contributor

Why this should be merged

This is about a 30% performance improvement for bootstrapping nodes.

How this works

See ava-labs/libevm#240 for comments from an initial review. All proposals created at hash time are stored in a map, and some shutdown recovery is added for block hashes and heights.

How this was tested

UT and re-execution of state.

Need to be documented in RELEASES.md?

No

@alarso16 alarso16 self-assigned this Dec 15, 2025
@alarso16 alarso16 added evm Related to EVM functionality coreth Related to the former coreth standalone repository subnet-evm Related to the former subnet-evm standalone repository labels Dec 15, 2025
@alarso16 alarso16 linked an issue Dec 15, 2025 that may be closed by this pull request
@alarso16 alarso16 force-pushed the alarso16/shared-firewood branch from 6b31642 to 9ab772f Compare December 15, 2025 20:42
@alarso16 alarso16 force-pushed the alarso16/firewood-perf-propose branch 3 times, most recently from 201573a to e7a87c3 Compare December 16, 2025 19:59
@alarso16 alarso16 changed the base branch from alarso16/shared-firewood to alarso16/firewood-v0.0.17 December 16, 2025 20:00
@alarso16 alarso16 force-pushed the alarso16/firewood-perf-propose branch from e7a87c3 to 1395fce Compare December 16, 2025 20:05
@alarso16 alarso16 force-pushed the alarso16/firewood-v0.0.17 branch from e7ee927 to 51e8ad4 Compare December 16, 2025 20:07
@alarso16 alarso16 force-pushed the alarso16/firewood-perf-propose branch from 1395fce to de5a6aa Compare December 16, 2025 20:48
Base automatically changed from alarso16/firewood-v0.0.17 to master December 16, 2025 21:01
@alarso16 alarso16 force-pushed the alarso16/firewood-perf-propose branch from de5a6aa to 9961f98 Compare December 18, 2025 19:12
@alarso16 alarso16 marked this pull request as ready for review December 18, 2025 22:07
@alarso16 alarso16 requested a review from a team as a code owner December 18, 2025 22:07
Copilot AI review requested due to automatic review settings December 18, 2025 22:07
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the Firewood database integration to improve proposal management and enable a ~30% performance improvement for bootstrapping nodes. The key optimization is sharing proposals between hashing and commit operations, eliminating redundant proposal creation. Additionally, the refactoring includes better recovery handling through persistent storage of block hashes and heights.

Key changes:

  • Proposals are now created during the Hash() operation and reused during Commit(), rather than being created and dropped during hashing
  • Recovery information (committed block hashes and heights) is now persisted to disk
  • Configuration structure and naming conventions have been improved for clarity

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
graft/evm/firewood/triedb.go Major refactoring to track proposals created during hashing for later commit, added recovery state persistence, renamed config fields
graft/evm/firewood/account_trie.go Updated to use new proposal creation API, improved documentation
graft/evm/firewood/storage_trie.go Enhanced documentation for storage trie behavior
graft/evm/firewood/recovery.go New file implementing recovery functions for persisting/reading committed block hashes and heights
graft/evm/firewood/hash_test.go Updated to use new config API
graft/subnet-evm/core/genesis.go Added Firewood-specific handling for empty genesis blocks, added helper function
graft/subnet-evm/core/genesis_test.go Updated to use new config API
graft/subnet-evm/core/blockchain.go Updated config field names to match new API
graft/subnet-evm/tests/state_test_util.go Updated to use new config API
graft/coreth/core/genesis.go Added Firewood-specific handling for empty genesis blocks, added helper function
graft/coreth/core/genesis_test.go Updated to use new config API
graft/coreth/core/blockchain.go Updated config field names to match new API
graft/coreth/tests/state_test_util.go Updated to use new config API

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Member

@JonathanOppenheimer JonathanOppenheimer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments - you definitely know this code better than I do

@Elvis339
Copy link
Contributor

Elvis339 commented Jan 2, 2026

Hey Austin

Found a bug while running C-Chain re-execution benchmarks to measure the performance impact of your proposal-sharing changes.

The PR CI passes because it runs with config: "default" (hashdb) using a hashdb snapshot. It never exercises the Firewood code path.

When running with config: "firewood" starting from a restored Firewood snapshot at block 10M, it fails immediately on the first block https://github.com/ava-labs/avalanchego/actions/runs/20661797719/job/59326295398#step:4:831:

failed to verify block at height 10000001: no proposal found for block 10000001,
root 0x5f7138331cc9dd9d47cd78fb5bf632b0961a28d29d394f19e6d32f2a9e1f4266,
hash 0x90649defd375c9784a95bb01e752c2a06341213c59e25dedfe8aee7a7e8b8d3d

The proposal-sharing logic expects Hash() to populate t.possible before Update() is called. When loading from a Firewood snapshot and executing the first block, something in the initialization path skips Hash() so t.possible is empty when Update() runs.

To reproduce:

gh workflow run "C-Chain Re-Execution Benchmark w/ Container" \
    --repo ava-labs/avalanchego \
    --ref es/candidate-firewood \
    -f runner="avalanche-avalanchego-runner-2ti" \
    -f block-dir-src="cchain-mainnet-blocks-10m-20m-ldb" \
    -f current-state-dir-src="cchain-current-state-firewood-10m" \
    -f config="firewood" \
    -f start-block="10000001" \
    -f end-block="10050000"

To save you time digging through git history: es/candidate-firewood is your PR merged on top of es/baseline-firewood, which contains master plus benchmarking infrastructure (pprof, FFI metrics, cross-stack profiling). The baseline branch with the same Firewood config and snapshot works fine. https://github.com/ava-labs/avalanchego/actions/runs/20661268516/job/59324164005

Let me know if you need help reproducing or testing a fix.

@alarso16
Copy link
Contributor Author

alarso16 commented Jan 5, 2026

Found a bug while running C-Chain re-execution benchmarks to measure the performance impact of your proposal-sharing changes.

This isn't exactly a bug... I added specific recovery logic in this PR. On master, when the database opens, we set the height of the trie to 0, and use this as a special value, indicating either the genesis block is committed or the database was just opened.

This PR makes it safer by tracking the most recently committed height/block hash in leveldb, so upon recovery, we can properly populate these fields in the proposal trie. However, your snapshot does not have that, and I didn't realize this would be a breaking change.

Now that you bring this up, I do think this will also be an issue for statesync as well. Let me rework a few things to see if I can find a less fragile way of crash recovery

@alarso16
Copy link
Contributor Author

alarso16 commented Jan 5, 2026

After some investigation, I've discovered a couple ideas:

  1. Leave it as it is. Really, this should have already been included in Firewood, but wasn't purely necessary. For this performance change, it practically is necessary. For statesync, we can just add another method that would be called in core.BlockChain.ResetToStateSyncedBlock. This would require re-generating the re-execution state
  2. Find some hacky way to use common.Hash{} and 0 as magic values for block hash and height, respectively. Although this would work, the code would be much harder to reason about and is pretty bug prone. Really, the only advantage is that it doesn't corrupt the state for the re-execution tests and doesn't expand the state.
  3. Use core.BlockChain state to populate these fields independently of the Firewood state. This would avoid your issue and remove a race if there's an unclean shutdown (Firewood committed, trackers not committed). This just requires adding another method that must be called in NewBlockChain, making Firewood difficult to use outside of this (like when we test TrieDB independently).

I'm leaning towards option 3.

@Elvis339
Copy link
Contributor

Elvis339 commented Jan 6, 2026

After some investigation, I've discovered a couple ideas:

  1. Leave it as it is. Really, this should have already been included in Firewood, but wasn't purely necessary. For this performance change, it practically is necessary. For statesync, we can just add another method that would be called in core.BlockChain.ResetToStateSyncedBlock. This would require re-generating the re-execution state
  2. Find some hacky way to use common.Hash{} and 0 as magic values for block hash and height, respectively. Although this would work, the code would be much harder to reason about and is pretty bug prone. Really, the only advantage is that it doesn't corrupt the state for the re-execution tests and doesn't expand the state.
  3. Use core.BlockChain state to populate these fields independently of the Firewood state. This would avoid your issue and remove a race if there's an unclean shutdown (Firewood committed, trackers not committed). This just requires adding another method that must be called in NewBlockChain, making Firewood difficult to use outside of this (like when we test TrieDB independently).

I'm leaning towards option 3.

I see the issue. proposalTree.Block == 0 is a magic value that lets the first proposal skip the parent hash check. Your PR stores height/blockHashes to leveldb instead, but my snapshot doesn't have that data which breaks in initialization path.

It does seem like Option 3 is cleanest. BlockChain already knows the committed height/hash after loadLastState(), no need to duplicate it in leveldb (and risk a two-phase commit race).

Copy link
Contributor

@Elvis339 Elvis339 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I profiled master on C-Chain re-execution (blocks 10M-10.05M):
Commit: 18.5% of CPU
(*Database).Propose: 1.76% of CPU
The proposed optimization removes one of two proposal creations, so max improvement is ~0.9%.
Where did the 30% improvement come from in your testing? Different workload? Or maybe the win is in memory, not CPU?

Just want to understand where to look.

To reproduce download pprof artifacts: https://github.com/ava-labs/avalanchego/actions/runs/20661268516/artifacts/5036934243
Run: go tool pprof -top pprof-profiles/cpu.prof | grep -E "(firewood|ffi.*Propose)"

Edit: Re-ran blocks 1 to 10M and got 28% improvement nice!
On early blocks (*Database).Propose is ~6% of CPU, so cutting that in half gives real gains. On 10M+ blocks it's only ~1.7% since reads dominate at larger state sizes. So the optimization is solid for genesis sync only.

@alarso16
Copy link
Contributor Author

After some investigation, I've discovered a couple ideas

I made a draft showing how number 3 could be implemented: #4835. It's definitely messier, but wouldn't break state. However, I do think that we will be making a breaking change to firewood state before the v0.1.0 release

@github-project-automation github-project-automation bot moved this to In Progress 🏗️ in avalanchego Jan 15, 2026
@JonathanOppenheimer
Copy link
Member

I'm not sure about test coverage for this -- I would assume this refactor would introduce new things to test -- some ideas include -- like if Hash() gets called concurrently, or if any of the functions (e.g. Update()) are called in correctly.

@alarso16
Copy link
Contributor Author

These tests are not complete, but I do have an outstanding issue for guaranteeing minimum behavior. This just ensure that the basics are laced together correctly, and the specific multiple hashing thing you suggested

@JonathanOppenheimer
Copy link
Member

These tests are not complete, but I do have an outstanding issue for guaranteeing minimum behavior. This just ensure that the basics are laced together correctly, and the specific multiple hashing thing you suggested

fire - they can be built upon!!

// TrieDB is a triedb.DBOverride implementation backed by Firewood.
// It acts as a HashDB for backwards compatibility with most of the blockchain code.
type TrieDB struct {
proposals
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
proposals
proposalTree proposals

Reasoning: https://github.com/uber-go/guide/blob/master/style.md#avoid-embedding-types-in-public-structs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It says to avoid embedded public fields specifically. Does it make any difference since it's private?

Comment on lines +347 to +350
} else {
if t, ok := triedb.Backend().(*firewood.TrieDB); ok {
t.SetHashAndHeight(block.Hash(), 0)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused, why do we call this here only if the root is empty? Shouldn't we call this nevertheless?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

coreth Related to the former coreth standalone repository evm Related to EVM functionality subnet-evm Related to the former subnet-evm standalone repository

Projects

Status: In Progress 🏗️

Development

Successfully merging this pull request may close these issues.

perf: Use persisted proposal in Firewood

7 participants