-
Notifications
You must be signed in to change notification settings - Fork 692
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add dedicated IBD rules for Nakamoto #5655
base: develop
Are you sure you want to change the base?
Conversation
…ures is mutated over the course of the state machine's lifetime
…s tenure, so check this when inferring IBD
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some style comments but looks fine to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some minor comments for now, but there are conflicts and failing tests.
if TEST_BLOCK_ANNOUNCE_STALL.get() { | ||
if relay::fault_injection::stacks_announce_is_blocked() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These seem like they could be independently controlled... TEST_BLOCK_ANNOUNCE_STALL
is used in the integration tests to prevent the miner themselves from announcing a new block (this is used to test that the signer set can announce blocks by themselves) -- but if the relayer stall is also active, it seems like it prevents the chains coordinator thread from waking up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the time this PR was written, this was an attempt to make some CI tests less flaky. It appears that not only is this not necessary anymore, but also my code here has led to CI breakage.
config.tenure_last_block_proposal_timeout = Duration::from_secs(0); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is the forked_tenure_testing
test related to this changeset? Why did this test need to be altered, and can you describe the intended behavior changes to the test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reverted it -- it was part of me trying to un-flake this test from a while ago.
// if the highest available tenure is known, then is it the same as the ongoing stacks | ||
// tenure? If so, then we're not IBD. If not, then we're IBD. | ||
// If it is not known, then we're not in IBD. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this mean that a running node operating at chain tip would switch into IBD = true
at every tenure boundary?
@@ -400,6 +400,67 @@ impl RelayerThread { | |||
|| !self.config.miner.wait_for_block_download | |||
} | |||
|
|||
/// Compute and set the global IBD flag from a NetworkResult |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Compute and set the global IBD flag from a NetworkResult | |
/// Compute and set the global initial block download (IBD) flag using data from the given NetworkResult |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This changeset appears to have broken many of the integration tests.
This should also have some kind of unit test (or tight assertions in an integration test) about what the expected value of IBD should be given different values of the NetworkResult
: its not clear to me what the intended behavior around the arrival of a new tenure is, and then what the downstream impact on miner commitments would end up being.
I also have some questions about the necessity of this -- it seems like #5735 actually resolves the testnet genesis sync issues (and the mainnet ones as well), so what is this PR solving? Is it speeding up genesis sync? Restarts? I can't tell from looking at this PR.
… (which is what the relayer would do anyway since the absence of a highest-known tenure on another node implies that the local view is the highest)
This fixes #5642 by adding a dedicated IBD inference rule for Nakamoto. The Stacks node is in IBD mode when either of the following conditions are true:
This (hopefully) fixes some edge cases we've seen on testnet whereby a node can erroneously believe it is not sync'ed when it really is. This had impacted the affected node's ability to participate in StackerDB replication. I intend to test this on naka3.sh, on testnet, and on a single mainnet signer.
Writing the code for that second criterion led to the discoveries of #5649 and #5650, since I had been using
rc_consensus_hash
as the ongoing Stacks tenure when in fact it was not.EDIT: this also contains #5667, so let's merge that first.