Skip to content

Commit

Permalink
test: testloop stub for resharding v3 (#12156)
Browse files Browse the repository at this point in the history
### Goal

Write stub for test for resharding v3 switch. For this, I want the chain
to switch between shard layouts.
And for that, I switch to `EpochConfigStore` as much as I can, which
implies skipping `use_production_config`, overrides like
`AllEpochConfig::config_max_kickout_stake`, `EpochConfig` generations
from `GenesisConfig`. This is a big step towards #11265.

The most visible changes are: 
Now TestLoop generates `genesis_and_epoch_config_store` instead of just
`genesis`. Later we should have a separate `EpochConfigStoreBuilder`
which may accept some data shared between genesis and epoch configs,
e.g. validators set. This is done to minimise changes.

`EpochManager::new_arc_handle` is the way how epoch manager is
constructed on production nodes. Its logic is changed as follows:
* if chain = mainnet/testnet, only `EpochConfigStore::for_chain_id` is
used for getting epoch configs.
* if `chain_id.starts_with("test-chain-")`, we use only
`EpochConfig::from(genesis_config)` (see below!)
* otherwise, we use only `Genesis::test_epoch_config`. **It doesn't use
any genesis data**, just stays in this crate for now for convenience.
This is for simple tests in single module.

### Achievements

* `test_fix_min_stake_ratio` tests exactly what we want - we take
`EpochConfigStore::for_chain_id("mainnet")` and see that it allows to
include small validator after protocol upgrade.
* In `test_resharding_v3` we define old and new shard layouts, and test
the switch explicitly without hidden overrides.
* Usage of hacky overrides is reduced. For example,
`EpochManager::new_from_genesis_config_with_test_overrides` is removed.
* If we want to launch forknet with custom epoch config, the behaviour
will be more straightforward. For example, one can copy latest epoch
config from mainnet to mocknet/ folder and add new condition to
`for_epoch_id` for custom mocknet chain name.

### Failures

Nayduck often configures epoch config through genesis, e.g. by setting
`block_producer_kickout_threshold` to 0. It is much more work to change
this configuration, so I add a hack: if chain_id starts with
`test-chain-` - name which nayduck uses - epoch config is derived from
genesis. Many old integration tests use this chain id as well.
However, the improvement here is that we generate only **one** epoch
config, without any overrides.

epoch_length is sometimes taken from `ChainGenesis`, not from
`EpochConfig`. To be safe, I set epoch length in both genesis and epoch
configs.

This still lacks testing on live node. Using this on canary or forknet
could be insightful.
  • Loading branch information
Longarithm authored Oct 1, 2024
1 parent 359564c commit 810e820
Show file tree
Hide file tree
Showing 32 changed files with 574 additions and 271 deletions.
6 changes: 5 additions & 1 deletion chain/chain/src/runtime/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,11 @@ impl NightshadeRuntime {
let runtime = Runtime::new();
let trie_viewer = TrieViewer::new(trie_viewer_state_size_limit, max_gas_burnt_view);
let flat_storage_manager = FlatStorageManager::new(store.flat_store());
let shard_uids: Vec<_> = genesis_config.shard_layout.shard_uids().collect();
let epoch_config = epoch_manager
.read()
.get_config_for_protocol_version(genesis_config.protocol_version)
.unwrap();
let shard_uids: Vec<_> = epoch_config.shard_layout.shard_uids().collect();
let tries = ShardTries::new(
store.trie_store(),
trie_config,
Expand Down
1 change: 0 additions & 1 deletion chain/chunks/src/client.rs
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,6 @@ impl ShardedTransactionPool {
"resharding the transaction pool"
);
debug_assert!(old_shard_layout != new_shard_layout);
debug_assert!(old_shard_layout.version() + 1 == new_shard_layout.version());

let mut transactions = vec![];

Expand Down
6 changes: 5 additions & 1 deletion chain/client/src/client.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2467,7 +2467,10 @@ impl Client {
assert_eq!(sync_hash, state_sync_info.epoch_tail_hash);
let network_adapter = self.network_adapter.clone();

let shards_to_split = self.get_shards_to_split(sync_hash, &state_sync_info, &me)?;
// I *think* this is not relevant anymore, since we download
// already the next epoch's state.
// let shards_to_split = self.get_shards_to_split(sync_hash, &state_sync_info, &me)?;
let shards_to_split = HashMap::new();
let state_sync_timeout = self.config.state_sync_timeout;
let block_header = self.chain.get_block(&sync_hash)?.header().clone();
let epoch_id = block_header.epoch_id();
Expand Down Expand Up @@ -2574,6 +2577,7 @@ impl Client {
///
/// Returns a map from the shard_id to ShardSyncDownload only for those
/// shards that need to be split.
#[allow(unused)]
fn get_shards_to_split(
&mut self,
sync_hash: CryptoHash,
Expand Down
26 changes: 14 additions & 12 deletions chain/client/src/sync/state.rs
Original file line number Diff line number Diff line change
Expand Up @@ -291,7 +291,7 @@ impl StateSync {
panic!("Resharding V2 scheduling is no longer supported")
}
ShardSyncStatus::ReshardingApplying => {
panic!("Resharding V2 scheduling is no longer supported")
panic!("Resharding V2 applying is no longer supported")
}
ShardSyncStatus::StateSyncDone => {
shard_sync_done = true;
Expand Down Expand Up @@ -1013,7 +1013,7 @@ impl StateSync {
shard_uid: ShardUId,
chain: &mut Chain,
sync_hash: CryptoHash,
need_to_reshard: bool,
#[allow(unused)] need_to_reshard: bool,
shard_sync_download: &mut ShardSyncDownload,
) -> Result<bool, near_chain::Error> {
// Keep waiting until our shard is on the list of results
Expand All @@ -1026,16 +1026,18 @@ impl StateSync {
result?;

chain.set_state_finalize(shard_uid.shard_id(), sync_hash)?;
if need_to_reshard {
// If the shard layout is changing in this epoch - we have to apply it right now.
let status = ShardSyncStatus::ReshardingScheduling;
*shard_sync_download = ShardSyncDownload { downloads: vec![], status };
} else {
// If there is no layout change - we're done.
let status = ShardSyncStatus::StateSyncDone;
*shard_sync_download = ShardSyncDownload { downloads: vec![], status };
shard_sync_done = true;
}
// I *think* this is not relevant anymore, since we download
// already the next epoch's state.
// if need_to_reshard {
// // If the shard layout is changing in this epoch - we have to apply it right now.
// let status = ShardSyncStatus::ReshardingScheduling;
// *shard_sync_download = ShardSyncDownload { downloads: vec![], status };
// }

// If there is no layout change - we're done.
let status = ShardSyncStatus::StateSyncDone;
*shard_sync_download = ShardSyncDownload { downloads: vec![], status };
shard_sync_done = true;

Ok(shard_sync_done)
}
Expand Down
48 changes: 44 additions & 4 deletions chain/client/src/test_utils/test_env_builder.rs
Original file line number Diff line number Diff line change
Expand Up @@ -17,14 +17,14 @@ use near_epoch_manager::{EpochManager, EpochManagerAdapter, EpochManagerHandle};
use near_network::test_utils::MockPeerManagerAdapter;
use near_parameters::RuntimeConfigStore;
use near_primitives::epoch_info::RngSeed;
use near_primitives::epoch_manager::AllEpochConfigTestOverrides;
use near_primitives::epoch_manager::{AllEpochConfigTestOverrides, EpochConfig, EpochConfigStore};
use near_primitives::test_utils::create_test_signer;
use near_primitives::types::{AccountId, NumShards};
use near_store::config::StateSnapshotType;
use near_store::test_utils::create_test_store;
use near_store::{NodeStorage, ShardUId, Store, StoreConfig, TrieConfig};
use near_vm_runner::{ContractRuntimeCache, FilesystemContractRuntimeCache};
use std::collections::HashMap;
use std::collections::{BTreeMap, HashMap};
use std::path::PathBuf;
use std::sync::Arc;

Expand All @@ -47,6 +47,7 @@ impl EpochManagerKind {
pub struct TestEnvBuilder {
clock: Option<Clock>,
genesis_config: GenesisConfig,
epoch_config_store: Option<EpochConfigStore>,
clients: Vec<AccountId>,
validators: Vec<AccountId>,
home_dirs: Option<Vec<PathBuf>>,
Expand Down Expand Up @@ -78,6 +79,7 @@ impl TestEnvBuilder {
Self {
clock: None,
genesis_config,
epoch_config_store: None,
clients,
validators,
home_dirs: None,
Expand Down Expand Up @@ -114,6 +116,12 @@ impl TestEnvBuilder {
self
}

pub fn epoch_config_store(mut self, epoch_config_store: EpochConfigStore) -> Self {
assert!(self.epoch_config_store.is_none(), "Cannot set epoch_config_store twice");
self.epoch_config_store = Some(epoch_config_store);
self
}

/// Sets random seed for each client according to the provided HashMap.
pub fn clients_random_seeds(mut self, seeds: HashMap<AccountId, RngSeed>) -> Self {
self.seeds = seeds;
Expand Down Expand Up @@ -256,12 +264,32 @@ impl TestEnvBuilder {
"Cannot set both num_shards and epoch_managers at the same time"
);
let ret = self.ensure_stores();

// TODO(#11265): consider initialising epoch config separately as it
// should be decoupled from the genesis config.
// However, there are a lot of tests which only initialise genesis.
let mut base_epoch_config: EpochConfig = (&ret.genesis_config).into();
if let Some(block_producer_kickout_threshold) =
test_overrides.block_producer_kickout_threshold
{
base_epoch_config.block_producer_kickout_threshold = block_producer_kickout_threshold;
}
if let Some(chunk_producer_kickout_threshold) =
test_overrides.chunk_producer_kickout_threshold
{
base_epoch_config.chunk_producer_kickout_threshold = chunk_producer_kickout_threshold;
}
let epoch_config_store = EpochConfigStore::test(BTreeMap::from_iter(vec![(
ret.genesis_config.protocol_version,
Arc::new(base_epoch_config),
)]));

let epoch_managers = (0..ret.clients.len())
.map(|i| {
EpochManager::new_arc_handle_with_test_overrides(
EpochManager::new_arc_handle_from_epoch_config_store(
ret.stores.as_ref().unwrap()[i].clone(),
&ret.genesis_config,
Some(test_overrides.clone()),
epoch_config_store.clone(),
)
})
.collect();
Expand All @@ -274,6 +302,18 @@ impl TestEnvBuilder {
if ret.epoch_managers.is_some() {
return ret;
}
if let Some(epoch_config_store) = &ret.epoch_config_store {
let epoch_managers = (0..ret.clients.len())
.map(|i| {
EpochManager::new_arc_handle_from_epoch_config_store(
ret.stores.as_ref().unwrap()[i].clone(),
&ret.genesis_config,
epoch_config_store.clone(),
)
})
.collect();
return ret.epoch_managers(epoch_managers);
}
ret.epoch_managers_with_test_overrides(AllEpochConfigTestOverrides::default())
}

Expand Down
99 changes: 74 additions & 25 deletions chain/epoch-manager/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,12 @@

use crate::metrics::{PROTOCOL_VERSION_NEXT, PROTOCOL_VERSION_VOTES};
use near_cache::SyncLruCache;
use near_chain_configs::GenesisConfig;
use near_chain_configs::{Genesis, GenesisConfig};
use near_primitives::block::{BlockHeader, Tip};
use near_primitives::epoch_block_info::{BlockInfo, SlashState};
use near_primitives::epoch_info::EpochInfo;
use near_primitives::epoch_manager::{
AllEpochConfig, AllEpochConfigTestOverrides, EpochConfig, EpochSummary, ShardConfig,
AGGREGATOR_KEY,
AllEpochConfig, EpochConfig, EpochConfigStore, EpochSummary, ShardConfig, AGGREGATOR_KEY,
};
use near_primitives::errors::EpochError;
use near_primitives::hash::CryptoHash;
Expand Down Expand Up @@ -179,17 +178,8 @@ impl EpochManager {
store: Store,
genesis_config: &GenesisConfig,
) -> Result<Self, EpochError> {
Self::new_from_genesis_config_with_test_overrides(store, genesis_config, None)
}

pub fn new_from_genesis_config_with_test_overrides(
store: Store,
genesis_config: &GenesisConfig,
test_overrides: Option<AllEpochConfigTestOverrides>,
) -> Result<Self, EpochError> {
let reward_calculator = RewardCalculator::new(genesis_config);
let all_epoch_config =
Self::new_all_epoch_config_with_test_overrides(genesis_config, test_overrides);
let reward_calculator = RewardCalculator::new(genesis_config, genesis_config.epoch_length);
let all_epoch_config = Self::new_all_epoch_config(genesis_config);
Self::new(
store,
all_epoch_config,
Expand All @@ -200,36 +190,95 @@ impl EpochManager {
}

pub fn new_arc_handle(store: Store, genesis_config: &GenesisConfig) -> Arc<EpochManagerHandle> {
Self::new_arc_handle_with_test_overrides(store, genesis_config, None)
let chain_id = genesis_config.chain_id.as_str();
if chain_id == near_primitives::chains::MAINNET
|| chain_id == near_primitives::chains::TESTNET
{
let epoch_config_store = EpochConfigStore::for_chain_id(chain_id).unwrap();
return Self::new_arc_handle_from_epoch_config_store(
store,
genesis_config,
epoch_config_store,
);
}

let epoch_config = if chain_id.starts_with("test-chain-") {
// We still do this for localnet as nayduck depends on it.
// TODO(#11265): remove this dependency for tests using
// `random_chain_id`.
EpochConfig::from(genesis_config)
} else {
Genesis::test_epoch_config(
genesis_config.num_block_producer_seats,
genesis_config.shard_layout.clone(),
genesis_config.epoch_length,
)
};

let epoch_config_store = EpochConfigStore::test(BTreeMap::from_iter(vec![(
genesis_config.protocol_version,
Arc::new(epoch_config),
)]));
Self::new_arc_handle_from_epoch_config_store(store, genesis_config, epoch_config_store)
}

pub fn new_arc_handle_with_test_overrides(
/// DEPRECATED.
/// Old version of deriving epoch config from genesis config.
/// Can be used for testing.
/// Keep it until #11265 is closed and the new code is released.
#[allow(unused)]
pub fn new_arc_handle_deprecated(
store: Store,
genesis_config: &GenesisConfig,
test_overrides: Option<AllEpochConfigTestOverrides>,
) -> Arc<EpochManagerHandle> {
let reward_calculator = RewardCalculator::new(genesis_config, genesis_config.epoch_length);
let all_epoch_config = Self::new_all_epoch_config(genesis_config);
Arc::new(
Self::new_from_genesis_config_with_test_overrides(
Self::new(
store,
genesis_config,
test_overrides,
all_epoch_config,
genesis_config.protocol_version,
reward_calculator,
genesis_config.validators(),
)
.unwrap()
.into_handle(),
)
}

fn new_all_epoch_config_with_test_overrides(
pub fn new_arc_handle_from_epoch_config_store(
store: Store,
genesis_config: &GenesisConfig,
test_overrides: Option<AllEpochConfigTestOverrides>,
) -> AllEpochConfig {
epoch_config_store: EpochConfigStore,
) -> Arc<EpochManagerHandle> {
let genesis_protocol_version = genesis_config.protocol_version;
let epoch_length = genesis_config.epoch_length;
let reward_calculator = RewardCalculator::new(genesis_config, epoch_length);
let all_epoch_config = AllEpochConfig::from_epoch_config_store(
genesis_config.chain_id.as_str(),
epoch_length,
epoch_config_store,
);
Arc::new(
Self::new(
store,
all_epoch_config,
genesis_protocol_version,
reward_calculator,
genesis_config.validators(),
)
.unwrap()
.into_handle(),
)
}

fn new_all_epoch_config(genesis_config: &GenesisConfig) -> AllEpochConfig {
let initial_epoch_config = EpochConfig::from(genesis_config);
let epoch_config = AllEpochConfig::new_with_test_overrides(
let epoch_config = AllEpochConfig::new(
genesis_config.use_production_config(),
genesis_config.protocol_version,
initial_epoch_config,
&genesis_config.chain_id,
test_overrides,
);
epoch_config
}
Expand Down
5 changes: 3 additions & 2 deletions chain/epoch-manager/src/reward_calculator.rs
Original file line number Diff line number Diff line change
Expand Up @@ -38,16 +38,17 @@ pub struct RewardCalculator {
}

impl RewardCalculator {
pub fn new(config: &GenesisConfig) -> Self {
pub fn new(config: &GenesisConfig, epoch_length: u64) -> Self {
RewardCalculator {
max_inflation_rate: config.max_inflation_rate,
num_blocks_per_year: config.num_blocks_per_year,
epoch_length: config.epoch_length,
epoch_length,
protocol_reward_rate: config.protocol_reward_rate,
protocol_treasury_account: config.protocol_treasury_account.clone(),
num_seconds_per_year: NUM_SECONDS_IN_A_YEAR,
}
}

/// Calculate validator reward for an epoch based on their block and chunk production stats.
/// Returns map of validators with their rewards and amount of newly minted tokens including to protocol's treasury.
/// See spec <https://nomicon.io/Economics/Economic#validator-rewards-calculation>.
Expand Down
Loading

0 comments on commit 810e820

Please sign in to comment.