Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combine domain payloads and provide on-the-fly migration #3664

Conversation

fxamacker
Copy link
Member

@fxamacker fxamacker commented Oct 31, 2024

Closes #3584

Description

This PR:

  1. Combines all domain (non-atree) payloads into one account (non-atree) payload per account.
  2. Combines all domain (atree) storage maps into one account (atree) storage map per account.
  3. Uses on-the-fly (OTF) migration when accounts are modified (write ops), so this won't change idle or read-only accounts.

NOTE: This requires HCU to deploy. Full impact (e.g. eliminating 425+ million mtrie nodes, etc.) won't be seen until all accounts (including idle accounts with no write ops) are eventually migrated.

Context

Currently, accounts store data on-chain under pre-defined domains, such as "storage". Each domain requires domain payload (8-byte non-atree payload) and domain storage map payload (atree payload). Also, each payload requires ~2 mtrie nodes (~2x96 byte overhead).

New domains were added in Cadence 1.0 and domain payloads count increased to 150 million on Sept. 4 (was 80 million pre-spork). Nearly 25% of total payloads on-chain are 8-byte domain payloads. Each account on mainnet has an average of ~4 domain payloads and ~4 domain storage maps.

Solution

This PR creates 1 account (non-atree) payload and 1 account (atree) storage map per account, eliminating all domain (non-atree) payloads and domain storage maps for that given account.

Based on preliminary estimates using Sept. 17, 2024 mainnet state, this approach can:

  • eliminate mtrie nodes: -425 million (-28.5%)
  • reduce payload count: -174 million (also -28.5%)

This commit also includes on-the-fly migration so we can see improvements to accounts that have write activity without requiring downtime. Given this, we won't see the full benefits/impact until all accounts (including idle accounts) are eventually migrated (e.g. using full migration or other means).

Full Migration Test Results (Nov 4, 2024)

Full migration (on test vm) using this PR produced expected results:

  • input: mainnet26 state from Nov 1, 2024
  • duration: 7m31s to migrate all accounts after loading state (m1-ultramem-160, nw=60)
  • mtrie nodes reduction: ~439 million (-28.7%)
  • payload count reduction: ~179 million (also -28.7%)

Storage health check and diff-states on the migrated state did not detect any problems. 🎉

NOTE: On-the-fly migration will not produce full impact until all accounts (including idle and read-only accounts) are eventually migrated.

Initial Code Review Session (Nov 5, 2024)

We agreed on-the-fly migration is feasible because migrating each account avoids unbounded cost: accounts average 4.3 domains each and we can only migrate a max of 9 domains.

No bugs or showstoppers were found in the PR (yet) but the code can be refactored (e.g. more decoupling to reduce complexity). I used up ~12 out of 15 devdays to open PR and run full migration test.

Great suggestion by Bastian: it would be OK to replace domain name string with integer.

Bastian also suggested looking into possibility of avoiding HCU by using feature flag.

We agreed to create feature branch for this PR and open additional PRs to feature branch.

TODO

  • @fxamacker: design, implement, open PR (Oct 31, 2024)
  • @fxamacker: test full migration using mainnet state (Nov 4, 2024)
    • Storage health check and diff-states did not find any problems. 🎉
  • @turbolent and Cadence team: initial design & code review meeting (Nov 5, 2024)
  • @turbolent and Cadence team: PR review and feedback

Next:


  • Targeted PR against master branch
  • Linked to Github issue with discussion and accepted design OR link to spec that describes this work
  • Code follows the standards mentioned here
  • Updated relevant documentation
  • Re-reviewed Files changed in the Github PR explorer
  • Added appropriate labels

This commit:
1. Combines all domain (non-atree) payloads into
   one account (non-atree) payload per account.
2. Combines all domain (atree) storage maps into
   one account (atree) storage map per account.
3. Uses on-the-fly (OTF) migration when an
   account is modified (write ops).

Currently, accounts store data on-chain under pre-defined domains,
such as "storage".  Each domain requires domain payload
(8-byte non-atree payload) and domain storage map payload (atree payload).
Also, each payload requires ~2 mtrie nodes (~2x96 byte overhead).

New domains were added in Cadence 1.0 and domain payloads count
increased to 150 million on Sept. 4 (was 80 million pre-spork).
Nearly 25% of total payloads on-chain are 8-byte domain payloads.
Each account on mainnet has an average of ~4 domain payloads and
~4 domain storage maps.

This commit creates 1 account (non-atree) payload and 1 account
(atree) storage map per account, eliminating all domain (non-atree)
payloads and domain storage maps for that given account.

Based on preliminary estimates using Sept. 17, 2024 mainnet state,
this approach can:
- eliminate mtrie nodes: -425 million (-28.5%)
- reduce payload count: -174 million (also -28.5%)

This commit also includes on-the-fly migration so we can see
improvements to accounts that have write activity without requiring
downtime.  Given this, we won't see the full benefits/impact until
all accounts (including idle accounts) are eventually migrated
(e.g. using full migration or other means).
Copy link

github-actions bot commented Oct 31, 2024

Cadence Benchstat comparison

This branch with compared with the base branch onflow:feature/combine-domain-payloads-and-domain-storage-maps commit a56e521
The command for i in {1..N}; do go test ./... -run=XXX -bench=. -benchmem -shuffle=on; done was used.
Bench tests were run a total of 7 times on each branch.

Collapsed results for better readability

@fxamacker
Copy link
Member Author

fxamacker commented Nov 5, 2024

To test the impact of this PR, I created a full migration program yesterday and ran it on recent mainnet data (Nov 1, 2024 state).

The full migration program on a test vm produced expected results for reducing number of payloads and mtrie nodes by 28.7% each. 🎉

The plan is to deploy this PR as HCU and use its on-the-fly migration, so we won't see full impact until all accounts are eventually migrated (including idle and read-only accounts).

Also ran storage health check and diff-states after migration on Nov 4, 2024 with no problems detected.

@fxamacker
Copy link
Member Author

@turbolent and Cadence team, many thanks for code review session today!

Next steps (as discussed):

  • Create a feature branch and merge this PR into the feature branch (after approval)
  • Open new PRs to:
    • replace domain key string to integer in account storage map (great idea @turbolent!)
    • refactor to simplify code in storage (no problems spotted but code is complex)
    • look into maybe using a feature flag (to eliminate the need for HCU)

I will also cleanup and share the private gist I presented in the meeting.

@fxamacker fxamacker changed the base branch from master to feature/combine-domain-payloads-and-domain-storage-maps November 5, 2024 22:25
runtime/migrate_domain_registers.go Outdated Show resolved Hide resolved
Copy link
Member

@turbolent turbolent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! I think we can maybe merge this into the feature branch to start with, and then we can open follow-up PRs as discussed, and finally have a review of the final feature branch when merging it into master

@fxamacker fxamacker merged commit b13fdba into feature/combine-domain-payloads-and-domain-storage-maps Nov 12, 2024
7 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Combine non-atree domain payloads into atree payloads
3 participants