Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add possibility to save trie nodes while re-applying blocks #11811

Merged
merged 6 commits into from
Jul 30, 2024

Conversation

Trisfald
Copy link
Contributor

Adds a new argument to apply and apply range commands to save computed trie nodes into hot or cold storage. Normally, this operation should be idempotent, but it can also be used to reconstruct missing trie nodes.

@Trisfald
Copy link
Contributor Author

Tested with the following procedure:

  1. Verify that

    curl -X POST http://35.204.169.85:3030 \
        -H "Content-Type: application/json" \
        -d '
        { "id": "dontcare", "jsonrpc": "2.0", "method": "query", "params": { "account_id": "b001b461c65aca5968a0afab3302a5387d128178c99ff5b2592796963407560a", "block_id": 109913260, "request_type": "view_account" } }'
    

    returns an error

  2. Stop neard

  3. Verify that

    ./neard view-state -t cold view-trie --shard-id 2 --shard-version 1 --max-depth 1000 --hash 36SkUU8tgetUtVL2a5JPwKB6F29yKBFjF5PFukZ8HRFH --from b001aea591ef68681e59a4149b1ab8bc56d8f22e34be24 --to b001c0de4c6929c5289b65044249830466ffea27680bc1 --format pretty --record-type account
    

    Gives MissingTrieValue error

  4. Run

    ./neard view-state -t cold --readwrite apply-range --start-index 109913255 --end-index 109913260 --shard-id 2 --storage trie-free --save-trie cold sequential
    
  5. Now step 3 doesn't fail anymore

  6. Start neard

  7. Step 1 returns the same result as read RPC

@Trisfald Trisfald marked this pull request as ready for review July 18, 2024 12:57
@Trisfald Trisfald requested a review from a team as a code owner July 18, 2024 12:57
Copy link

codecov bot commented Jul 18, 2024

Codecov Report

Attention: Patch coverage is 49.62963% with 136 lines in your changes missing coverage. Please review.

Project coverage is 71.68%. Comparing base (55d0df0) to head (4ca19f4).

Files Patch % Lines
core/store/src/db/recoverydb.rs 67.36% 47 Missing ⚠️
tools/state-viewer/src/commands.rs 18.86% 42 Missing and 1 partial ⚠️
tools/state-viewer/src/cli.rs 0.00% 30 Missing ⚠️
tools/state-viewer/src/apply_chain_range.rs 72.97% 8 Missing and 2 partials ⚠️
core/store/src/lib.rs 0.00% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #11811      +/-   ##
==========================================
- Coverage   71.72%   71.68%   -0.05%     
==========================================
  Files         803      804       +1     
  Lines      163763   163996     +233     
  Branches   163763   163996     +233     
==========================================
+ Hits       117466   117564      +98     
- Misses      41231    41362     +131     
- Partials     5066     5070       +4     
Flag Coverage Δ
backward-compatibility 0.23% <0.00%> (-0.01%) ⬇️
db-migration 0.23% <0.00%> (-0.01%) ⬇️
genesis-check 1.34% <0.00%> (-0.01%) ⬇️
integration-tests 37.75% <0.00%> (-0.10%) ⬇️
linux 71.45% <49.62%> (-0.04%) ⬇️
linux-nightly 71.27% <49.62%> (-0.04%) ⬇️
macos 54.31% <49.62%> (-0.01%) ⬇️
pytests 1.61% <0.00%> (-0.01%) ⬇️
sanity-checks 1.41% <0.00%> (-0.01%) ⬇️
unittests 66.12% <49.62%> (-0.03%) ⬇️
upgradability 0.28% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@wacban wacban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very confused :)

You're storing the trie changes (DBCol::TrieChanges) but as far as I know this column is only used for garbage collection and cold migration. In particular I don't think it's used for answering rpc or viewing state with neard subcommands. I thought you'll be storing the trie nodes directly into the DBCol::State column which is where I believe the data is missing. That being said it seems like it worked somehow :) Can you have a look into how this works and explain it briefly for me?

@Trisfald
Copy link
Contributor Author

I'm very confused :)

You're storing the trie changes (DBCol::TrieChanges) but as far as I know this column is only used for garbage collection and cold migration. In particular I don't think it's used for answering rpc or viewing state with neard subcommands. I thought you'll be storing the trie nodes directly into the DBCol::State column which is where I believe the data is missing. That being said it seems like it worked somehow :) Can you have a look into how this works and explain it briefly for me?

Yes, I'm saving the trie changes explicitly although I believe it isn't strictly required.
From my understanding trie changes isn't the only column touched by this section of code https://github.com/near/nearcore/blob/master/chain/chain/src/store/mod.rs#L2533-L2550, for example insertions into DBCol::State and changes to memtries.

@Trisfald
Copy link
Contributor Author

Trisfald commented Jul 18, 2024

Updated PR:

  • disabled save-trie for hot DB: too dangerous
  • checked that cold DB operations are applied in the correct way
  • deletions shouldn't happen because their state apply isn't committed
  • hardcoded save_trie_changes to false, no need to write into this column

TODO:

  • check node state after GC
  • test save-trie on different heights / shards combinations

@Trisfald
Copy link
Contributor Author

On the positive side: recovered data is still present after leaving the node running over the weekend. Regenerating trie nodes is idempotent.

Not so positive: apply-range does not work at all for many ranges of blocks after 1st and 2nd resharding. The reason is that "missing trie values" break the apply stage of transactions. We'll need to proceed step by step and redo resharding before replaying blocks.

Overall I haven't found any inherent issue with this PR


/// Returns whether the operation should be kept or dropped.
fn keep_db_op(&self, op: &mut DBOp) -> bool {
let overwrites_same_data = |col: &mut DBCol, key: &mut Vec<u8>, value: &mut Vec<u8>| {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this pattern of read-before-write to avoid re-writing the entire DB. It should help in theory, but I don't see any speed up in practice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, at least it helps preventing the so-common write stalls.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it may not help now but it would be useful to have later. For example you can use this to write only the missing data to a separate db that can later be shared with node operators for recovery.

@saketh-are saketh-are removed their request for review July 25, 2024 15:43
Copy link
Contributor

@wacban wacban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks, this is great effort and a big leap!

I left a few nits but otherwise it's ready to go.

core/store/src/db/recoverydb.rs Show resolved Hide resolved
self.cold.iter_range(col, lower_bound, upper_bound)
}

/// Atomically applies operations in given transaction.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also mention that it filters / adapts operations?


/// Returns whether the operation should be kept or dropped.
fn keep_db_op(&self, op: &mut DBOp) -> bool {
let overwrites_same_data = |col: &mut DBCol, key: &mut Vec<u8>, value: &mut Vec<u8>| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can you refactor it to a helper?


/// Returns whether the operation should be kept or dropped.
fn keep_db_op(&self, op: &mut DBOp) -> bool {
let overwrites_same_data = |col: &mut DBCol, key: &mut Vec<u8>, value: &mut Vec<u8>| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it may not help now but it would be useful to have later. For example you can use this to write only the missing data to a separate db that can later be shared with node operators for recovery.

Comment on lines 101 to 109
if col.is_rc() {
if let Ok(Some(old_value)) = self.get_with_rc_stripped(*col, &key) {
let value = DBSlice::from_vec(value.clone()).strip_refcount();
if let Some(value) = value {
if value == old_value {
return true;
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, since you are only interested in State, you can

  • Assert that is_rc() is true and not worry about the other case
  • Only check that the key exists, without loading the value into memory to speed up the process. Other than the reference count the value at a key cannot be changed because the key is a hash of the value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, I've made the method a bit smarter with this new knowledge

Comment on lines 124 to 126
if !matches!(col, DBCol::State) {
return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: You can move that check to before the match to make it a bit slimmer here. You can use the col method on DBOp to do that.

}

/// Returns whether the operation should be kept or dropped.
fn keep_db_op(&self, op: &mut DBOp) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure you need mutable here? It seems like it shouldn't be necessary.

@@ -213,17 +223,27 @@ pub struct ApplyCmd {
shard_id: ShardId,
#[clap(long, default_value = "trie")]
storage: StorageSource,
/// Save the trie nodes generated by applying the block into the selected store (hot or cold).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also mention that it will only write to the state column?
Also maybe worth renaming to save_state or similar?

tools/state-viewer/src/commands.rs Show resolved Hide resolved
@Trisfald Trisfald added this pull request to the merge queue Jul 30, 2024
Merged via the queue into near:master with commit f3c219f Jul 30, 2024
28 of 30 checks passed
@Trisfald Trisfald deleted the add-save-trie-replay branch July 30, 2024 13:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants