Skip to content

Conversation

@DracoLi
Copy link
Contributor

@DracoLi DracoLi commented Nov 10, 2025

Why this should be merged

Adds a ethdb.Database implementation that moves EVM block data out of the shared KV store into dedicated height-indexed databases optimized for block storage, reducing KV store size and compaction cost.

How this works

The blockdb.Database wraps ethdb.Database and routes block data by key prefix, while leaving non-block data in the KV store.

  • Header/body/receipt keys for blocks with height ≥ the configured minimum are written to height-indexed DBs; below that threshold they use the KV store.
  • At most one block per height is stored. Writing at an occupied height overwrites the previous value and is not an intended use case. Deletes of block data are no-ops.
  • If a block isn’t yet in the height-indexed DBs and migration isn’t complete, reads fall back to the KV store.
  • Block-related keys bypass the KV batch and write directly to the height-indexed DBs; non-block keys are batched on the underlying KV store.
  • Deferred init: With allowDeferredInit, initialization can wait until the minimum block height is known (e.g., when state sync is enabled). Database min height is persisted and cannot be changed without recreating the block databases.

Migration

Separated to #4771

How this was tested

  • Unit test for routing and migration behavior.
  • Two Mainnet nodes (with state sync enabled and disabled) with blockdb enabled running for >4 weeks.

Need to be documented in RELEASES.md?

No

@DracoLi DracoLi moved this to In Progress 🏗️ in avalanchego Nov 10, 2025
@DracoLi DracoLi self-assigned this Nov 10, 2025
@DracoLi DracoLi linked an issue Nov 12, 2025 that may be closed by this pull request
@DracoLi DracoLi force-pushed the dl/evm-blockdb branch 4 times, most recently from ef6921e to d8ee531 Compare November 23, 2025 23:22
@DracoLi DracoLi marked this pull request as ready for review November 24, 2025 14:30
Copilot AI review requested due to automatic review settings November 24, 2025 14:30
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new blockdb.Database implementation that separates EVM block data (headers, bodies, receipts) from the shared KV store into dedicated height-indexed databases. This optimization aims to reduce KV store size and compaction overhead.

Key changes:

  • Implements routing logic to store block data at/above a minimum height in height-indexed databases while maintaining backward compatibility via fallback reads
  • Adds background migration to move existing canonical block data from the KV store to height-indexed databases
  • Provides deferred initialization support for scenarios like state sync where the minimum block height isn't initially known

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
vms/evm/database/blockdb/database.go Core database implementation with key routing, read/write operations, and batch support
vms/evm/database/blockdb/migrator.go Background migration logic with progress tracking, compaction, and pause/resume capabilities
vms/evm/database/blockdb/database_test.go Comprehensive tests for database operations, initialization scenarios, and edge cases
vms/evm/database/blockdb/migration_test.go Tests for migration completion, resumption, and data integrity during migration
vms/evm/database/blockdb/helpers_test.go Test utilities for creating blocks, comparing data, and controlling migration

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

vms/evm/database/blockdb/database_test.go:1

  • [nitpick] The comment uses 'blocks 1-3' but the code implements i >= 1 && i <= 3 without parentheses. While operator precedence makes this correct, adding parentheses would improve clarity: migrated := (i >= 1) && (i <= 3).
// Copyright (C) 2019-2025, Ava Labs, Inc. All rights reserved.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@DracoLi DracoLi force-pushed the dl/evm-blockdb branch 2 times, most recently from 3cad123 to 2c6713d Compare November 25, 2025 19:14
stateDB database.Database,
evmDB ethdb.Database,
dbPath string,
allowDeferredInit bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why might someone want to defer? Why would the consumer know the minimum block height later, but not now? I'm thinking that this could instead be defaultMinimBlockHeight uint64, which you use instead of 1 when the current argument is false.

The InitBlockDBs method could then be unexported and the heightsDBReady bool field (which feels like a race-condition waiting to happen) removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for context, I added this to address the state sync use case.

Currently, evm database is created on vm initialization. At that point, if state sync is enabled, we don't know what the height of the first block we will be fetching until we get the state sync summary. While we can just set min height to 1, this allows us to init the block dbs once we know the first block height we need to store is and call InitBlockDBs manually (See the an example here)

I'm open to alternative solutions. The main goal here is to being able to set the min height of the dbs to the first non genesis block we need when state sync is enabled to reduce the size of the db index file. Alternatively, we can also not do that and just use min height = 1. This will lead to the db index files being few GBs larger though.

_ ethdb.Database = (*Database)(nil)
_ ethdb.Batch = (*batch)(nil)

migratorDBPrefix = []byte("migrator")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to prefix our prefixes to guarantee that they never clash with a geth value? Like "ava-migrator"?

Is there a downside to doing that? FWIW they use b for block bodies and blt- for something else so we don't need to worry about a substring clash and starting with a is (at first glance) safe.

Copy link
Contributor Author

@DracoLi DracoLi Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the prefix on the stateDB, which is different than the evmDB, which contains the evm data. The evmDB currently has the ethedb prefix and clients should be passing a different prefixed db when creating the blockdb.Database (See a draft usage of this database usage here)

Current convention seems to be having the caller creating the prefixdb. But alternatively, we can also just wrap the stateDB with prefix db in New if you think that might be clearer.

Comment on lines 47 to 48
// Since the prefixes should never be changed, we can avoid libevm changes by
// duplicating them here.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

Just noting in case any other reviewer asks me about libevm.

Comment on lines 28 to 29
customtypes.Register()
params.RegisterExtras()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these needed?

Copy link
Contributor Author

@DracoLi DracoLi Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed by the core.GenerateChainWithGenesis call in createBlocksToAddr in helpers_test.go.
Without them we get a nil pointer error on coreth/params.GetExtra and coreth/plugin/evm/customtypes.GetHeaderExtra

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated block creation to not use core.GenerateChainWithGenesis so we can remove this and also not import from graft/coreeth.
2561499

"github.com/ava-labs/avalanchego/database/prefixdb"
"github.com/ava-labs/avalanchego/utils/logging"

heightindexdb "github.com/ava-labs/avalanchego/x/blockdb"
Copy link
Contributor Author

@DracoLi DracoLi Nov 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for reviewers: x/blockdb is planned to be moved to database/heightindexdb (see #4520).
I renamed it here to avoid confusing this evm database package (blockdb) with the current x/blockdb package.

@DracoLi DracoLi requested a review from ARR4N December 1, 2025 21:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress 🏗️

Development

Successfully merging this pull request may close these issues.

Add evm database that supports separate storage for block data

3 participants