feat(state-sync): sync to the current epoch instead of the previous #12102

marcelo-gonzalez · 2024-09-17T15:49:36Z

When a node processes a block that’s the first block of epoch T, and it realizes that it will need to track shards that it doesn’t currently track in epoch T+1, it syncs state to the end of epoch T-1 and then applies chunks until it’s caught up. We want to change this so that it syncs state to epoch T instead, so that the integration of state sync/catchup and resharding will be simpler.

In this PR, this is done by keeping most of the state sync logic unchanged, but changing the “sync_hash” that’s used to identify what point in the chain we want to sync to. Before, “sync_hash” was set to the first block of an epoch, and the existing state sync logic would have us sync the state as of two chunks before this hash. So here we change the sync hash to be the hash of the first block for which at least two new chunks have been seen for each shard in its epoch. This allows us to sync state to epoch T with minimal modifications, because the old logic is still valid.

Note that this PR does not implement support for this new way of syncing for nodes that have fallen several epochs behind the chain, rather than nodes that need to catchup for an upcoming epoch. This can be done in a future PR

this just implements the requester side of things, and does not work because there is no support in the state dumper for this. But it sets up all the logic to request state headers and parts in the new way

marcelo-gonzalez · 2024-09-17T15:51:22Z

Btw, for reviewers, this PR is not quite ready to be submitted because it is only minimally tested on localnet, where I just checked that it syncs properly. You can try it with this:

diff --git a/core/primitives/src/epoch_manager.rs b/core/primitives/src/epoch_manager.rs
index 281be4399..b73b9348a 100644
--- a/core/primitives/src/epoch_manager.rs
+++ b/core/primitives/src/epoch_manager.rs
@@ -166,6 +166,8 @@ impl AllEpochConfig {
     pub fn generate_epoch_config(&self, protocol_version: ProtocolVersion) -> EpochConfig {
         let mut config = self.genesis_epoch_config.clone();
 
+        config.validator_selection_config.shuffle_shard_assignment_for_chunk_producers = true;
+
         Self::config_mocknet(&mut config, &self.chain_id);
 
         if !self.use_production_config {

Then if you run the transactions.py pytest, it should be able to finish after nodes sync state in the new way (you might have to comment out some asserts in that test that fail sometimes, looks unrelated but will check it)

So before submitting, I need to try this with more meaningful state and traffic/receipts, probably on forknet. Also would be good to add some integration tests, and fix whichever integration tests or pytests might have been broken by this. Also the FIXME comment in this PR needs to be fixed before I can submit this. But in any case, it should mostly be ready for review

One thing to be decided in this PR review is whether the gating via current protocol version I put in there looks okay. It feels kind of ugly to me, but it might be the easiest way to go

tayfunelmas · 2024-09-18T12:45:51Z

chain/chain/src/block_processing_utils.rs

+    /// is the first block of the epoch, these two meanings are the same. But if the sync_hash is moved forward
+    /// in order to sync the current epoch's state instead of last epoch's, this field being false no longer implies
+    /// that we want to apply this block during catchup, so some care is needed to ensure we start catchup at the right
+    /// point in Client::run_catchup()
    pub(crate) is_caught_up: bool,


is it possible to split this field into multiple fields (or enum) to differentiate these meanings? it feels like the field being false indicates both we want to apply the chunks and not apply the chunks based on other state such as sync_hash.

I think actually in both cases if this field is false, we don't want to apply the chunks for shards we don't currently track, and this logic should be the same:

nearcore/chain/chain/src/chain.rs

Line 3980 in 82bd46b

fn get_should_apply_chunk(

I think we probably could split it, but it's a little bit tricky. Let me think about it actually... For now in this PR it is kept as is to not have to touch too many things and possibly break something. the tricky part is that right now we add the first block of the epoch to the BlocksToCatchup column based on this field, which is then read to see if we'll need to catch up the next block after this one as well:

nearcore/chain/chain/src/chain.rs

Line 2288 in 82bd46b

Ok((self.prev_block_is_caught_up(&prev_prev_hash, &prev_hash)?, None))

I guess where that is called maybe we can just call get_state_sync_info() again, and also check if catchup is already done, but it requires some care

marcelo-gonzalez · 2024-09-18T18:44:25Z

I actually just removed the test_mock_node_basic() test, since I think it's kind of outdated anyway. hopefully nobody objects to that... I kind of have some plans to just delete the hacky part of the mock-node code that generates home dirs in favor of just providing the home dirs from mainnet or testnet up-front anyway

wacban

I didn't manage to get through everything but I left some comments to keep you busy for now ;) Please have a look and let's reconvene.

It may be easier to review and merge this if you can split it down into smaller PRs. Not sure if it makes sense but something to consider.

I still need to catch up on the reading to understand why we need to have two new chunks in every shard. It would make it way easier with one so I want to make sure. I'll have another look at your proposal doc and come back.

Once this is done can you run nayduck and make sure that all the tests are still passing?

I think it would be best to add to nightly first and stabilize only after DSS is released.

chain/chain/src/chain.rs

wacban · 2024-09-23T11:19:42Z

chain/chain/src/chain.rs

+            let protocol_version =
+                self.epoch_manager.get_epoch_protocol_version(block.header().epoch_id())?;
+            let sync_hash =
+                if protocol_version < 72 { *block.header().hash() } else { CryptoHash::default() };


You should still use a proper protocol feature here. There is no harm being more conservative and treating some features as protocol features.

@saketh-are How are you planning to rollout decentralized state sync? Will it also be gated by a protocol feature? I think it would be nice even though it's not strictly necessary. Even with fallback to cloud state sync it would make for a cleaner transition in the period where nodes are upgrading from the old binary to the new one.

made it a nightly feature

wacban · 2024-09-23T11:27:24Z

chain/chain/src/chain.rs

+            let protocol_version =
+                self.epoch_manager.get_epoch_protocol_version(block.header().epoch_id())?;
+            let sync_hash =
+                if protocol_version < 72 { *block.header().hash() } else { CryptoHash::default() };


What is CryptoHash::default? Is it meant to be a magic value to indicate that state sync should be delayed until there is a chunk in every shard? We should not rely on magic values, instead you can use an enum to fully capture the meaning e.g.

enum StateSyncHandle { // or id, anchor, reference, ..., whatever name feels right None, Waiting, Hash(CryptoHash), // Or perhaps even OldStateSyncHash and NewStateSyncHash to further differentiate between the old implementation and the new one. The names should be improved :) }

I agree that magic values are ugly in general, but here it’s done to avoid the need for a database migration.

So before this change, this field is always set to the hash of the first block of the epoch (which is the sync_hash we use for state sync). And this is also used as the key we look up this value with. But this sync_hash field in the StateSyncInfo struct is actually only ever checked once, where we just check if it’s equal to the key that looked up that struct.

So this sync_hash field is actually just redundant data and is not used meaningfully, which is why I took the opportunity to change the way we set it, because it can be done with logic that’s consistent before/after the binary upgrade. We just set it to zero after the protocol change, and if we read it and that field is set, then that’s the sync hash we want to use, otherwise, we’re still looking for the sync hash.

So all this is just to avoid a DB migration. I felt that if we could do so, it’s not so bad to accept slightly ugly code with a magic value of zeros, because a DB migration is a sort of nonzero development/UX churn point. But wdyt? I could of course just do this the right way and add a DB migration if we don’t think it’s such a big deal

Thanks for sharing context, I didn't realize it would necessitate a db migration.

Still I think state sync being in a pretty bad place as it is I wouldn't want to add more tech debt to it. I don't think a db migration is a big deal even if it takes a bit extra effort. I would expect the state sync info to be minimal in size so the migration should be really quick too. Can you try and see how bad is it?

wacban · 2024-09-23T11:30:27Z

chain/chain/src/chain.rs

        // 2) Start creating snapshot if needed.
-        if let Err(err) = self.process_snapshot() {
+        if let Err(err) = self.process_snapshot(block.header()) {
            tracing::error!(target: "state_snapshot", ?err, "Failed to make a state snapshot");
        }


Unrelated to this pr, I'm curious why process snapshot happens in start_process_block. It's not important but if you happen to know please tell :)

This is a v good question, I am not sure… I think it should be pretty possible to just decouple this, since it’s not needed in any block/chunk processing. My guess is it was just easy to express it as a callback that gets called every time we update the HEAD, in a way that you could sort of imagine the indexer being rewritten to do (instead of polling every so often for a new block)

I'm fine having this check to happen on every block, it's a common pattern. It just seems strange that it happens before the block is applied. If the block turns out to be invalid the node may do an "invalid" snapshot for some definition of invalid.

wait yea that is a good point.. and I feel like not only should we wait until it's applied, but maybe we should wait until it's final too

wacban · 2024-09-23T11:42:05Z

chain/chain/src/chain.rs

@@ -2407,6 +2414,84 @@ impl Chain {
        )
    }

+    fn get_epoch_start_sync_hash_impl(


Without further context this method seems very inefficient. In the worst case it would need to iterate all blocks in an epoch to get to the first one. Looking at what we have in the db the EpochStart column and get_epoch_start_from_epoch_id may already have what you need.

Yeah I agree, although here I’m actually just moving the existing code from here. For now I just left it as-is without trying to optimize it any further, but maybe in another PR we should do something like that

Cool, it's fine if you want to do it in a follow up, can you just add a TODO in there?

wacban · 2024-09-23T12:12:25Z

chain/chain/src/store_validator.rs

@@ -249,13 +249,6 @@ impl StoreValidator {
                DBCol::StateDlInfos => {
                    let block_hash = CryptoHash::try_from(key_ref)?;
                    let state_sync_info = StateSyncInfo::try_from_slice(value_ref)?;
-                    // StateSyncInfo is valid
-                    self.check(


Why did you remove this check?

Also see the comment above about avoiding the DB migration

wacban · 2024-09-23T12:12:56Z

chain/client/src/client.rs

+pub struct CatchupState {
+    pub state_sync: StateSync,
+    pub state_downloads: HashMap<u64, ShardSyncDownload>,
+    pub catchup: BlocksCatchUpState,
+}
+


add comments please

wacban · 2024-09-23T12:14:52Z

chain/client/src/client.rs

+    /// A mapping from the first block of an epoch that we know needs to be state synced for some shards
+    /// to a tracker that will find an appropriate sync_hash for state sync
+    catchup_tracker: HashMap<CryptoHash, NewChunkTracker>,


mini nit: Intuitively it would make sense to track state sync per epoch (epoch_id). Would it make sense in practice?

yup, agreed

Perhaps we could even change the sync hash to state sync id that is the epoch id. I'm not sure if it makes sense, how much work would it take and if it's worth it though

chain/client/src/client.rs

marcelo-gonzalez · 2024-09-24T02:36:12Z

It may be easier to review and merge this if you can split it down into smaller PRs. Not sure if it makes sense but something to consider.

Actually here I tried to make the individual commits inside this PR somewhat well contained on their own, so hopefully those are easier to look at?

marcelo-gonzalez · 2024-10-03T02:26:15Z

So added a DB migration to store a new cleaner version of the StateSyncInfo struct. I added a state_sync_version field to it currently just set to 0, but I think it might be helpful if we want to change things about the meaning of the sync hash in the future. We really should only need to wait for one new chunk per shard, and the reason we wait for two in this PR is because it's easier to implement a first version doing that. But if we want to make that change then we'll need to indicate what we mean by the sync hash somehow.

Also see this commit heheh... 78d6a75 . I removed the part that says "before submitting", because I think it's probably okay to submit it as-is for now, since it's mostly going to be fine as long as we make sure to fix it before this makes it into a release. Fixing that seems like a pretty well-scoped individual PR that would be a little easier to review on its own

let me know what ppl think

marcelo-gonzalez · 2024-10-03T02:33:00Z

also there's a test here to make sure the migration from the previous StateSyncInfo to the new works okay: https://github.com/marcelo-gonzalez/nearcore/blob/catchup-migration-test/pytest/tests/sanity/state_sync_info_migration.py

don't think that's really worth adding here though, but good to sanity check it once

marcelo-gonzalez · 2024-10-04T04:15:11Z

aghh, just noticed the nightly test is failing... will fix it tmr

wacban

some partial review, to be continued

wacban · 2024-09-24T09:17:13Z

chain/chain/src/chain.rs

+            // This is not the way protocol features are normally gated, but this is not actually
+            // a protocol feature/change. The protocol version is just an easy way to coordinate
+            // among nodes for now


nit: please update the comment, now it is exactly the way protocol features are gated ;)

wacban · 2024-09-24T09:19:22Z

chain/chain/src/chain.rs

+            // among nodes for now
+            let protocol_version =
+                self.epoch_manager.get_epoch_protocol_version(block.header().epoch_id())?;
+            let sync_hash = if checked_feature!("stable", StateSyncHashUpdate, protocol_version) {


nit: The ProtocolFeature now has a method enabled that is a nice alternative to checked_feature. Up to you, both are fine.

wacban · 2024-09-24T09:28:38Z

chain/chain/src/chain.rs

    /// Function to check whether we need to create a new snapshot while processing the current block
    /// Note that this functions is called as a part of block preprocesing, so the head is not updated to current block
-    fn should_make_or_delete_snapshot(&mut self) -> Result<(bool, bool), Error> {
+    fn should_make_or_delete_snapshot(
+        &mut self,
+        next_block: &BlockHeader,
+    ) -> Result<(bool, bool), Error> {


Actually can we just make it a bit simpler and only rely on head? It should be fine if we take the snapshot one block later.

wacban · 2024-10-03T15:51:58Z

core/primitives/src/sharding.rs

+    /// is that when syncing to the current epoch's state, we currently wait for two new chunks in each shard, but
+    /// with some changes to the meaning of the "sync_hash", we should only need to wait for one. So this is included
+    /// in order to allow for this change in the future without needing another database migration.
+    pub state_sync_version: u32,


The more idiomatic approach to versioning would be to make StateSyncInfo into an enum and have each version to be a variant of that enum e.g. how we do it for block, chunks, etc.

Would it make sense here? Any pros or cons to each?

wacban · 2024-10-03T15:55:35Z

chain/chain/src/chain.rs

+            header = if header.hash() == next_header.prev_hash() {
+                next_header.clone()
+            } else {


Why do you need to special case the next header? Is it because it's not yet stored in chain? Can you add a comment about this logic please?

wacban · 2024-10-03T15:59:08Z

chain/chain/src/chain_update.rs

@@ -237,7 +237,7 @@ impl<'a> ChainUpdate<'a> {
            );
        }
        if let Some(state_sync_info) = state_sync_info {
-            self.chain_store_update.add_state_sync_info(state_sync_info);
+            self.chain_store_update.add_state_sync_info(*block.hash(), state_sync_info);


Can this be done now?

wacban · 2024-10-03T16:30:09Z

chain/client/src/client.rs

+    /// Manages downloading the state.
+    pub state_sync: StateSync,
+    /// Keeps track of state downloads, and gets passed to `state_sync`.
+    pub state_downloads: HashMap<u64, ShardSyncDownload>,


nit: please use ShardId

wacban · 2024-10-03T16:30:37Z

chain/client/src/client.rs

@@ -111,6 +111,17 @@ pub enum AdvProduceBlocksMode {
    OnlyValid,
 }

+/// The state associated with downloading state for a shard this node will track in the
+/// future but does not currently.
+pub struct CatchupState {


wacban · 2024-10-03T16:31:32Z

chain/client/src/client.rs

+/// This keeps track of the numbef of new chunks seen in each shard since the block that was passed to new()
+/// This whole thing could be replaced with a much simpler function that just computes the numbef of new chunks
+/// in each shard from scratch every time we call it, but in the unlikely and unfortunate case where a shard
+/// hasn't had any chunks for a very long time, it would end up being a nontrivial inefficiency to do that
+/// every time run_catchup() is called


How about the approach to use height_included?

wacban · 2024-10-03T16:33:11Z

chain/client/src/client.rs

+        }
+    }
+
+    fn record_new_chunks(&mut self, header: &BlockHeader) -> Result<bool, Error> {


This looks like a duplicate of get_current_epoch_sync_hash, is it feasible to refactor it?

marcelo-gonzalez added 10 commits September 17, 2024 11:14

rename epoch_tail_hash -> sync_hash

ad4d1d1

change var names

76f9350

organize catchup_state_syncs value into CatchupState

a445393

use hashmap entry explicitly so i can use question mark

174b62f

rename sync_hash -> epoch_first_block

a59e594

state sync to the current epoch during catchup

f8f6fb6

this just implements the requester side of things, and does not work because there is no support in the state dumper for this. But it sets up all the logic to request state headers and parts in the new way

move get_epoch_start_sync_hash to Chain

f4eba89

dump state for the current epoch

53e933c

get rid of store validator check

5c9055f

add TODOs

281c156

marcelo-gonzalez requested review from wacban and saketh-are September 17, 2024 15:49

marcelo-gonzalez requested a review from a team as a code owner September 17, 2024 15:49

tayfunelmas reviewed Sep 18, 2024

View reviewed changes

remove test_mock_node_basic

386df33

wacban requested a review from VanBarbascu September 19, 2024 10:13

wacban reviewed Sep 23, 2024

View reviewed changes

saketh-are mentioned this pull request Sep 23, 2024

feat(sync): use routed state part request in sync actor #12111

Merged

marcelo-gonzalez added 5 commits September 23, 2024 21:19

Merge remote-tracking branch 'origin/master' into state-sync-epoch

df90eb0

make it a nightly protocol feature

2eda311

comments

5900d8c

map epoch ID to NewChunkTracker

8dea6ad

remove set_state_sync_hash

78610c7

marcelo-gonzalez added 4 commits September 24, 2024 11:18

add TODO

e5d428e

rename sync_header -> next_header

2379aec

add integration test

5e2f654

update the StateSyncInfo struct and make it a DB migration

4e9195a

marcelo-gonzalez added 4 commits October 2, 2024 21:55

update comment :)

78d6a75

Merge remote-tracking branch 'origin/master' into state-sync-epoch

f07977a

remove comment

135921d

fix test

15d707b

marcelo-gonzalez requested a review from wacban October 3, 2024 02:21

nit

776de8f

wacban reviewed Oct 4, 2024

View reviewed changes

feat(state-sync): sync to the current epoch instead of the previous #12102

Are you sure you want to change the base?

feat(state-sync): sync to the current epoch instead of the previous #12102

Conversation

marcelo-gonzalez commented Sep 17, 2024

marcelo-gonzalez commented Sep 17, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marcelo-gonzalez commented Sep 18, 2024

wacban left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marcelo-gonzalez commented Sep 24, 2024

marcelo-gonzalez commented Oct 3, 2024

marcelo-gonzalez commented Oct 3, 2024

marcelo-gonzalez commented Oct 4, 2024

wacban left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment