Skip to content

Conversation

CPerezz
Copy link
Collaborator

@CPerezz CPerezz commented Sep 29, 2025

This PR should only be merged once #552 is.

Also, in order to review this PR now, it's easier to just look at a047ff3 wihch contains the diff of the only commit that isn't already in the previous PR.

This commit adds support for dumping state from Binary Trie (the new
storage format replacing Verkle at genesis) to enable multi-block test
execution in the t8n tool.

The t8n tool was failing on multi-block withdrawal tests with "nonce
too high" errors. Investigation revealed that state wasn't being
persisted between blocks when using Binary Trie mode because
DumpToCollector couldn't handle the Binary Trie format.

Binary Trie stores data differently than MPT:

  • MPT: RLP-encoded StateAccount structures
  • Binary Trie: Raw bytes with specific offsets (per EIP-7864)
    • Nonce at offset 8 (8 bytes, big-endian)
    • Balance at offset 16 (16 bytes, big-endian)
    • Code hash in separate leaf (suffix byte 1)

Implemented dumpBinaryTrieToCollector function that:

  1. Iterates through Binary Trie leaf nodes
  2. Extracts account data from raw byte format
  3. Maps hashed keys back to Ethereum addresses
  4. Reconstructs full account state for the collector
  • Added Binary Trie detection in DumpToCollector
  • Created specialized dumping logic for Binary Trie format
  • Documented each step with extensive comments
  • Added panic for unrecoverable addresses (temporary)

Currently uses stateObjects cache to map keys to addresses, which only
includes modified accounts. Future improvements needed:

  • Implement preimage database (like MPT)
  • Or iterate all known addresses
  • Or store addresses in leaf values

All 4 previously failing withdrawal tests now pass:

  • test_use_value_in_contract
  • test_many_withdrawals
  • test_self_destructing_account
  • test_balance_within_block

This enables proper multi-block test execution with Binary Trie

CPerezz and others added 30 commits September 20, 2025 15:53
- Introduced a new field `BT` in the `Prestate` struct to accommodate binary trie data.
- Updated the `MakePreState` function to replace the `verkle` flag with `isBintrie` for better clarity.
- Renamed functions and variables related to Verkle to their binary trie counterparts, including `BinKey`, `BinKeys`, and `BinTrieRoot`.
- Added TODO comments for further updates and clarifications regarding the transition from Verkle to binary trie structures.
- Introduced the `ChunkedCode` type to represent 32-byte chunks of EVM bytecode.
- Implemented the `ChunkifyCode` function to convert a byte array of EVM bytecode into chunked format.
- Updated the `UpdateContractCode` method to utilize the new `ChunkifyCode` function.
- Renamed `OutputVKTFlag` to `OutputBTFlag`.
- Renamed `InputVKTFlag` to `InputBTFlag` and updated its usage description for prestate input.
- Adjusted `Transition` function to utilize the new `BT` field for Binary Trie data instead of Verkle.
- Updated `dispatchOutput` function to handle Binary Trie output correctly.
- Replaced references to Verkle Trie with Binary Trie in various functions.
- Updated `genBinTrieFromAlloc` to return a Binary Trie instead of a Verkle Trie.
- Adjusted output functions to utilize Binary Trie methods and updated associated comments.
…de chunking

- Introduced `BinaryCodeChunkKey` to compute the tree key of a code-chunk for a given address.
- Added `BinaryCodeChunkCode` to return the chunkification of bytecode.
- Added a new `TransitionTrie` type to facilitate the integration of the Binary Trie with the existing MPT trie.
- Updated the `newTrieReader` function to utilize the new `TransitionTrie` implementation.
- The transition tree has been moved to its own package
 `core/state/transitiontrie`, resolving the import cycle issue we get into when including `BinaryTrie` within `database.go`/
- Replaced instances of `bintrie.BinaryTrie` with `transitiontrie.TransitionTrie` in the `execution.go`, `dump.go`, and `reader.go`.
- Adjusted the `newTrieReader` function with atemporary workaround for import cycles.
This fixes the "invalid serialized node length" error that occurred when
Binary Trie nodes were persisted and later reloaded from the database.

Issue Discovery:
The error manifested when running execution-spec-tests with Binary Trie mode.
After committing state changes, attempting to create a new StateDB with the
committed root would fail with "invalid serialized node length" when
deserializing the root node.

Root Cause Analysis:
Through debug logging, discovered that:
1. StemNodes were being serialized with 97 bytes of actual data
2. But the SerializeNode function was returning the entire 8224-byte buffer
3. When the node was later loaded and deserialized, it received 97 bytes
4. The deserializer expected at least 128 bytes for the bitmap and values
5. This mismatch caused the deserialization to fail

The Fix:
Changed SerializeNode to return only the actual data (serialized[:offset])
instead of the entire pre-allocated buffer (serialized[:]). This ensures
that only the meaningful bytes are persisted to the database.

This aligns the serialization with the deserialization logic, which
correctly handles variable-length StemNode data based on which values
are actually present in the node.
This fixes a critical off-by-one error in the StemNode serialization buffer
that was causing "invalid serialized node length" errors during execution-spec-tests.

The error was discovered when running execution-spec-tests with Binary Trie mode:
```
cd ../execution-spec-tests && uv run fill --fork Verkle -v -m blockchain_test \
  -k test_contract_creation -n 1 --evm-bin=PATH_TO/go-ethereum/evm
```

The tests were failing with:
```
ERROR: error getting prestate: invalid serialized node length
```

Through systematic debugging with hex dumps and bitmap analysis, we discovered
that the serialized array was incorrectly sized:

**Incorrect**: `var serialized [32 + 32 + 256*32]byte`
- This allocated: 32 + 32 + 8192 = 8256 bytes
- Missing 1 byte for the node type prefix

**Correct**: `var serialized [1 + 31 + 32 + 256*32]byte`
- This allocates: 1 + 31 + 32 + 8192 = 8256 bytes
- Properly accounts for all fields per EIP-7864

The layout should be:
- 1 byte: node type (nodeTypeStem)
- 31 bytes: stem (path prefix)
- 32 bytes: bitmap indicating which of 256 values are present
- Up to 256*32 bytes: the actual values (32 bytes each)

1. Corrected array size from `[32 + 32 + 256*32]` to `[1 + 31 + 32 + 256*32]`
2. Cleaned up type assertions to use `n` from the type switch instead of `node.(*StemNode)`
3. This ensures proper alignment of all fields during serialization

The misalignment was causing the deserializer to interpret value data as bitmap data,
leading to impossible bitmap patterns (e.g., 122 bits set requiring 3904 bytes when
only 97 bytes were available).

After this fix, the "invalid serialized node length" errors are completely resolved.
The execution-spec-tests now progress past the serialization stage, confirming that
Binary Trie nodes are being correctly serialized and deserialized.
1. move the transition trie package to trie, as this will be requested
by the geth team and it's where it belongs.
2. panic if state creation has failed in execution.go, to catch issues
early.
3. use a BinaryTrie and not a TransitionTrie, as evm t8n doesn't support
state trie transitions at this stage.
1. TransitionTrie doesn't work at this stage, we are only testing the
binary trees at genesis, so let's make sure TransitionTrie isn't
activated anywhere.
2. There was a superfluous check for Transitioned(), which actually made
the code take the wrong turn: if the verkle fork has occured, and if the
transition isn't in progress, it means that the conversion has ended*

* to be complete, it could be that this is in the period before the
start of the iterator sweep, but in that case the transition tree also
needs to be returned. So there is a missing granularity in this code,
but it's ok at this stage since the transition isn't supported yet.
The codebase uses IsVerkle flag to indicate Binary Trie mode (not renamed
to avoid large diff). However, OpenTrie was incorrectly creating a VerkleTrie
which expects Verkle node format, while we store Binary Trie nodes.

This mismatch caused "invalid serialized node length" errors when VerkleTrie
tried to deserialize Binary Trie nodes.

We left some instantiation of `VerkleTree` arround which we have found
and changed too here.
The Commit function incorrectly assumed that the root node was always
an InternalNode, causing a panic when the root was a StemNode:
"interface conversion: bintrie.BinaryNode is *bintrie.StemNode, not *bintrie.InternalNode"

Changes:
- Remove type assertion `root := t.root.(*InternalNode)`
- Call CollectNodes directly on t.root which works for any BinaryNode type
- Add comment explaining the root can be any BinaryNode type

This fix allows BinaryTrie to properly commit trees where the root is
a StemNode, which can happen in small tries or during initial setup.
Previously, state.New errors were silently ignored, leading to nil pointer
panics later when statedb was accessed. This made debugging difficult as
the actual error was hidden.

Changes:
- Add explicit error checking when creating initial statedb
- Add explicit error checking when re-opening statedb after commit
- Include meaningful error messages with context (e.g., root hash)
This commit fixes two in the BinaryTrie iterator:

1. Fixed Leaf() method to only return true when positioned at a specific
   non-nil value within a StemNode, not just at the StemNode itself.
   The iterator tracks position using an Index that points to the NEXT
   position after the current value.

2. Fixed stack underflow in Next() method by checking if we're at the
   root before popping from the stack. This prevents index out of range
   errors when iterating through the tree.

These fixes resolve panics in DumpBinTrieLeaves when generating VKT
output for Binary Tries in the t8n tool.
When inserting values into an InternalNode with nil children, the code
was attempting to dereference nil pointers, causing panics. This fix
ensures that nil children are initialized to Empty{} nodes before
attempting to call methods on them.

This resolves crashes when building the Binary Trie from an empty or
sparse initial state.
When in Binary Trie mode (isEIP4762), the state.New() call after
commit was creating a new StateDB with MPT trie, losing the Binary
Trie. This resulted in nil trie when attempting to dump Binary Trie
leaves.

Fix: Skip state.New() for Binary Trie mode since the trie is already
correct after commit. Only reopen state for MPT mode.
…d name

This commit fixes two issues that prevented VKT (Binary Trie) data from
being correctly generated and passed to execution-spec-tests:

1. Binary Trie Preservation (execution.go):
   After statedb.Commit(), the code was calling state.New() which
   recreated the StateDB with an MPT trie, losing the Binary Trie.
   For Binary Trie mode (EIP-4762), we now skip state.New() since
   the trie is already correct after commit.

2. JSON Field Name (transition.go):
   The dispatch function was using name="bt" which created a JSON
   output field called "bt", but execution-spec-tests expects the
   field to be called "vkt" for compatibility with the upstream
   implementation. Changed to use name="vkt" instead. So translating,
   I'm dumb and I shouldn't have changed them..

These fixes allow execution-spec-tests to successfully extract Binary
Trie leaf data from t8n output via stdout JSON parsing.

Fixes test failures in:
- tests/verkle/eip6800_genesis_verkle_tree/test_contract_creation.py
Prior, the code was working such that we were getting `nil` when re-opening. We weren't using `TransitionTree` thus reader and other interfaces were working hardcoded to MPT. This was fixed in prev. commits. Thus, this was no longer needed.
There was a leftover as we modified already in `transition.go` flagnames to be `vkt` again and prevent errors for now.
Replace magic numbers with named constants for better code clarity:
- NodeTypeBytes = 1 (size of node type prefix)
- HashSize = 32 (size of a hash)
- BitmapSize = 32 (size of the bitmap in stem nodes)

This makes the serialization format more self-documenting and easier
to maintain.
Replace hardcoded values (31, 32, 256) with named constants throughout
the Binary Trie implementation for improved code maintainability.

- Add NodeWidth (256), StemSize (31), HashSize (32), BitmapSize (32), NodeTypeBytes (1) constants
- Update all serialization, deserialization, and node operations to use constants
- Revert error wrapping to maintain test compatibility
Implement GetBinaryTreeKeyBasicData function similar to GetBinaryTreeKeyCodeHash
but with offset 0 (BasicDataLeafKey) instead of 1, as per EIP-7864 tree embedding spec.
The constant NodeWidth was renamed to StemNodeWidth to better reflect
its purpose as the number of children per stem node (256 values).
This change ensures consistency across the Binary Trie implementation
and aligns with the EIP-7864 specification for Binary Merkle Tree format.

Changes made:
- Updated constant definition in binary_node.go from NodeWidth to StemNodeWidth
- Replaced all references across the bintrie package files
- Ensures compilation succeeds and all tests pass

This fix was necessary after recent refactoring of the Binary Trie code
that replaced the Verkle tree implementation at genesis.
This commit adds support for dumping state from Binary Trie (the new
storage format replacing Verkle at genesis) to enable multi-block test
execution in the t8n tool.

The t8n tool was failing on multi-block withdrawal tests with "nonce
too high" errors. Investigation revealed that state wasn't being
persisted between blocks when using Binary Trie mode because
DumpToCollector couldn't handle the Binary Trie format.

Binary Trie stores data differently than MPT:
- MPT: RLP-encoded StateAccount structures
- Binary Trie: Raw bytes with specific offsets (per EIP-7864)
  - Nonce at offset 8 (8 bytes, big-endian)
  - Balance at offset 16 (16 bytes, big-endian)
  - Code hash in separate leaf (suffix byte 1)

Implemented dumpBinaryTrieToCollector function that:
1. Iterates through Binary Trie leaf nodes
2. Extracts account data from raw byte format
3. Maps hashed keys back to Ethereum addresses
4. Reconstructs full account state for the collector

- Added Binary Trie detection in DumpToCollector
- Created specialized dumping logic for Binary Trie format
- Documented each step with extensive comments
- Added panic for unrecoverable addresses (temporary)

Currently uses stateObjects cache to map keys to addresses, which only
includes modified accounts. Future improvements needed:
- Implement preimage database (like MPT)
- Or iterate all known addresses
- Or store addresses in leaf values

All 4 previously failing withdrawal tests now pass:
- test_use_value_in_contract
- test_many_withdrawals
- test_self_destructing_account
- test_balance_within_block

This enables proper multi-block test execution with Binary Trie.
@CPerezz CPerezz requested a review from gballet as a code owner September 29, 2025 22:19
Comment on lines 200 to 207
s.DumpToCollector(collector, nil)
isBinary := chainConfig.IsVerkle(big.NewInt(int64(prestate.Env.Number)), prestate.Env.Timestamp)
if !isBinary {
s.DumpToCollector(collector, nil)
} else {
if isBinary {
// For Binary Trie mode, we need to dump both:
// 1. The state to collector (alloc) for multi-block tests to maintain state between blocks
// 2. The Binary Trie leaves for witness data
// Without the collector dump, the test framework cannot track state changes (like nonce updates)
// between blocks, causing tests with multiple blocks (e.g., withdrawal tests) to fail.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was dumb. But it allowed a nil pointer to be fixed. Now, this fixes the core issue behind the scenes.

Comment on lines +285 to +289
// Step 4: Parse the key structure
// First 31 bytes: stem (encodes the account address)
// Last byte: suffix (indicates the type of data)
stem := string(leafKey[:31])
suffixByte := leafKey[31]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yesterday's explanation came in handy!

Comment on lines +318 to +334
// Step 9: Map the Binary Trie key back to an Ethereum address
// This is the challenging part: Binary Trie keys are hashed versions of addresses
// We need to find which address maps to this particular key
//
// Current approach: (Made up by Claude) ->
// Iterate through known addresses in stateObjects
// and check if their Binary Trie key matches our leaf key
var foundAddr *common.Address
for addr := range s.stateObjects {
// Generate the Binary Trie key for this address
testKey := bintrie.GetBinaryTreeKeyBasicData(addr)
if bytes.Equal(testKey, leafKey) {
a := addr // Create a copy to avoid reference issues
foundAddr = &a
break
}
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is Claude's idea. I did not know how to circumvent not implementing a preimage storage or something similar. But I wanted to avoid adding more complexity until I get an ACK that indeed something like this is needed. And there isn't a much simpler way (which I assume there has to).

Comment on lines +336 to +341
// Step 10: Error if we couldn't find the corresponding address
// This might happen for accounts not in the current state cache
if foundAddr == nil {
// TODO(@CPerezz): Figure out how to proceed.
panic("Binary Trie dump error: Cannot recover address from hash.")
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we do if this happens? Maybe we should not even be checking it?

Comment on lines +363 to +374
// Step 14: Fetch storage values if needed
if !conf.SkipStorage {
dumpAccount.Storage = make(map[common.Hash]string)
// TODO(CPerezz): Properly iterate through Binary Trie storage slots
// Storage slots are at suffix bytes 64+ in the Binary Trie
// Idea from Claude:
// Use the cached dirty storage from state objects
if obj := s.getStateObject(addr); obj != nil {
for key, value := range obj.dirtyStorage {
dumpAccount.Storage[key] = common.Bytes2Hex(value[:])
}
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugliest trick ever.
This was all Claude's idea. As I wasn't sure which was the way to do this. It makes the tests pass so it works. But damn I hope there's a better way. I just wasn't sure how could we proceeed.

@CPerezz
Copy link
Collaborator Author

CPerezz commented Sep 30, 2025

Deprecated in favour of #555

@CPerezz CPerezz closed this Sep 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants