The consensus specification describes the mechanism used by the Aptos validators to agree on both ordering and transaction execution output under byzantine fault-tolerant (BFT) conditions - at most f validators (where f < (all validators)/3) are faulty or malicious. Currently, the consensus specification is an implementation of AptosBFT, a BFT consensus protocol that ensures liveness and safety under partial synchrony. The AptosBFT whitepaper describes a high level overview of the protocol, the liveness and safety proofs, and a rationale on why AptosBFT was adopted for the LPN. This document specifies how to implement the AptosBFT protocol in order to participate as a validating node in the LPN.
This document is organized as follows:
- Architecture - the components of the specification and how they interact.
- Data structures - common data structures that are part of this specification.
- Network messages - consensus messages that are sent across the wire to other validators.
- Abstracted modules - The components this specification depends on.
- Consensus modules - The components built upon common data structures that are described as a part of this specification.
All network communication occurs over AptosNet and any serialization, deserialization and hashing is determined by BCS.
TODO: Add picture
In this section, we specify data structures specific to consensus and how to verify the validity of these data structures - both for instantiation and acceptance from other validators. Any dependent data structures not described here are specified in the common data structures document. Additionally, there is context specific verification of these data structures that is described in the consensus modules. For example, a proposed block is invalid and not accepted if its round is less than the current round of the receiving validator.
The payload inside a block.
type Payload = Vec<SignedTransaction>;
Block extends block_data
with a unique identifier and a signature for proposed blocks.
struct Block {
id: HashValue,
block_data: BlockData,
signature: Option<Ed25519Signature>,
}
Fields:
id
is the unique identifier of this block. It is calculated with the BCS serialized hash ofblock_data
. This field is not serialized but calculated whenever deserialized.block_data
is the container of the block with zero or more transactionssignature
is populated if this block was proposed by a validator. If this block is a genesis or NIL block, it isNone
.
Verification
- if
signature
is set, ensureblock_data.block_type == BlockType::Proposal {author, ..}
ed25519_verify(signature, block_data.hash(), OnChainConfig.validators().public_key(author))
Block data is all the information necessary to vote on a block of transactions. Block
wraps this structure as a convenient way storage of its BCS serialized hash and a signature (if this is proposed by a validator).
struct BlockData {
epoch: u64,
round: u64,
timestamp_usecs: u64,
quorum_cert: QuorumCert,
block_type: BlockType,
}
Fields
-
epoch
is an u64 that identifies the number of reconfiguration operations that have occurred in the past before this block -
round
is a round this block data is proposed in -
timestamp_usecs
the approximate physical time a block is proposed by a proposer. This timestamp is used for- Time-dependent logic in smart contracts (the current time of execution)
- Clients determining if they are relatively up-to-date with respect to the block chain.
It makes the following guarantees:
- Time Monotonicity: Time is monotonically increasing in the block chain. (i.e. If H1 < H2, H1.Time < H2.Time).
- If a block of transactions B is agreed on with timestamp T, then at least f+1 honest validators think that T is in the past. An honest validator will only vote on a block when its own clock >= timestamp T.
- If a block of transactions B has a QC with timestamp T, an honest validator will not serve such a block to other validators until its own clock >= timestamp T.
- Current: an honest validator is not issuing blocks with a timestamp in the future. Currently we consider a block is malicious if it was issued more that 5 minutes in the future.
-
quorum_cert
is the quorum certified ancestor and whether the quorum certified ancestor was voted on successfully -
block_type
is the type of the block which is specified below.
Verification
-
ensure that if
BlockData
is received from the network thatblock_type
, is not genesis. Genesis is only constructed from end-epoch LedgerInfo locally and is never sent to other validators. -
Assume that
parent_block
is theBlockInfo
ofquorum_cert.vote_data.proposed
.- ensure
round
is greater thanparent_block.round
- ensure
epoch
is equal toparent_block.epoch
- if
parent_block.next_epoch_state
is set, ensureblock_type == BlockType::NIL
or the payload in theBlockType::Proposal
is empty - if a block is a nil block, or if
parent_block.has_reconfiguration
, ensure the block and the parent block have the same timestamp, otherwise ensuretimestamp_usecs
>parent_block.timestamp_usecs
. In any case ensure thattimestamp_usecs
is not too far (5 minutes or more) in the future compared to the current time.
- ensure
-
ensure that if
quorum_cert.commit_info().next_epoch_state()
is set, a proposal should not be generated if the epoch ends, it should transition to next epoch. -
verify
quorum_cert
There are 3 different types of consensus blocks and BlockType
enumerates each one.
pub enum BlockType {
Proposal {
payload: Vec<SignedTransaction>,
author: AccountAddress,
},
NilBlock,
Genesis,
}
Fields
-
Proposal
blocks are proposed by a single validator according to the round and committed history.payload
is zero or more transactions proposed.author
of the block that can be validated by the author's public key and the signature
-
NilBlock
blocks don't have authors or signatures. They are generated when a validator reaches a timeout for a round without having already voted for a block. If >2f validators vote for aNilBlock
, it can be added to aQuorumCert
chain and reduce round gaps and improve commit latency. -
Genesis
blocks are the first committed block in any epoch that is identically constructed on all validators by any (potentially different) LedgerInfo that justifies the epoch change from the previous epoch. The genesis block is used as the first root block of the BlockTree for all epochs.
Verification
- See
BlockData
This structure maintains information about a block and its parent block as a helper for QuorumCert
. Its BCS-generated hash is used for the consensus_data_hash
field in LedgerInfo
.
pub struct VoteData {
proposed: BlockInfo,
parent: BlockInfo,
}
Fields:
proposed
contains the block info being voted on.parent
contains the parent block info of theproposed
.
Verification
- ensure parent.epoch == proposed.epoch
- ensure parent.round < proposed.round
- ensure parent.timestamp_usecs <= proposed.timestamp_usecs
- ensure parent.version <= proposed.version
A vote is sent across the network to be aggregated by another validator such that a QuorumCert
or a TimeoutCertificate
can be created from >2f votes.
struct Vote {
vote_data: VoteData,
author: AccountAddress,
ledger_info: LedgerInfo,
signature: Ed25519Signature,
timeout_signature: Option<Ed25519Signature>,
}
Fields:
vote_data
is the proposed and previous block and also contains the round information being voted on. The BCS-serialized hash ofvote_data
must be the same asledger_info.consensus_data_hash
author
is the identify of the voter with respect to its account addressledger_info
contains the ledger state of the block that will be committed if >2f uniquesignatures
are gatheredsignature
is the signature from theauthor
onledger_info
that attests to the vote of thevote_data.proposed
block, its parent and theledger_info
timeout_signature
is optional and can contain a signature on aTimeout
formed byvote_data.proposed.epoch
andvote_data.proposed.round
Verification
- verify
vote_data
- ensure
vota_data.hash() == ledger_info.consensus_data_hash
ed25519_verify(signature, ledger_info.hash(), OnChainConfig.validators().public_key(author))
- if
timeout_signature
is set,ed25519_verify(timeout_signature, (epoch, round).hash(), OnChainConfig.validators().public_key(author))
A quorum certificate is the aggregation of >2f votes on the same block info. It can be used as a proof to advance validators to the next round.
struct QuorumCert {
vote_data: VoteData,
signed_ledger_info: LedgerInfoWithSignatures,
}
Fields:
vote_data
is a helper structure that includes block information about a proposed block and its parent block. The BCS hash ofvote_data
must be equal to thesigned_ledger_info
.ledger_info.consensus_data_hash as a part of verificationsigned_ledger_info
contains the ledger state of the committed block. Since a signature agrees on the hash ofvote_data
, it certifies the proposed block and its parent fromvote_data
as well as the highest committed state on this blockchain branch.
Derived Fields
certified_block
equalsvote_data.proposed
parent_block
equalsvote_data.parent
ends_epoch
is a bool refers to ifsigned_leger_info.ledger_info.commit_info.next_epoch_state
is set
Verification
-
verify
vote_data
-
ensure
vote_data.hash() == signed_ledger_info.ledger_info.consensus_data_hash
-
if
certified_block.round()
== 0 (it certifies genesis block and is generated via end-epoch LedgerInfo)- ensure certified_block == parent_block
- ensure certified_block == signed_ledger_info.ledger_info.commit_info
- ensure signed_leger_info.signatures is empty
-
else
ed25519_verify(signature, signed_ledger_info.ledger_info.hash(), OnChainConfig.validators().public_key(author)) for (author, signature) in signed_ledger_info.signatures
-
ensure
len(signed_ledger_info.signatures >= len(OnChainConfig.validators()) * 2 / 3 + 1
This structure represents an epoch and a round that a validator did not receive a proposal for and indicates that a validator would like to move to the next round. The timeout interval for a round depends on the timeout formula determined by the RoundState
. If >2f validators signatures on this structure are collected into a TimeoutCertificate
, it can be used as a proof to advance all validators to the next round.
struct Timeout {
epoch: u64,
round: Round,
}
Fields
epoch
is the epoch associated with theround
that the validator thinks has passed without a proposal for the roundround
is the consensus round that has timed out from a validator's point of view
Verification
- See timeout_signature in
Vote
A TimeoutCertificate is the aggregation of >2f signatures on the same Timeout
. It can be used as a proof to advance validators to the next round too. When there's both QuorumCert and TimeoutCert present for the same round, QuorumCert is preferred.
pub struct TimeoutCertificate {
timeout: Timeout,
signatures: BTreeMap<Author, Ed25519Signature>,
}
Fields
timeout
is theTimeout
being signed.signatures
contains >2f signatures signed on the BCS-serializedtimeout
.
Verification
ed25519_verify(signature, signed_ledger_info.ledger_info.hash(), OnChainConfig.validators().public_key(author)) for (author, signature) in signatures
- ensure
len(signed_ledger_info.signatures >= len(OnChainConfig.validators()) * 2 / 3 + 1
This structure provides verifiable certificates that can allow a validator to catch up to another validator. The max round r of the highest_quorum_cert
and highest_timeout_cert
have already completed according to > 2f validators and convince any validator join round r+1 immediately if they are currently on a round < r+1. A correct implementation ensures that highest_commit_cert
and highest_quorum_cert
are on the same chain id. If a validator learns about more recent state upon receiving SyncInfo
, then it should catch up through BlockRetrievalRequest, EpochRetrievalRequest or state synchronization.
pub struct SyncInfo {
highest_quorum_cert: QuorumCert,
highest_commit_cert: Option<QuorumCert>,
highest_timeout_cert: Option<TimeoutCertificate>,
}
Fields:
highest_quorum_cert
is the highest roundQuorumCert
known to the creator meaning thathighest_quorum_cert.vote_data.proposed
has the highest round of anyQuorumCert
knownhighest_commit_cert
has the highestLedgerInfo
committed to the blockchain, it's set to None if it's the same ashighest_quorum_cert
.highest_timeout_cert
is the highest timeout certificate known whichhighest_timeout_cert.timeout.round
is higher thanhighest_quorum_cert.vote_data.proposed.round
(if available).
Derived Fields
highest_round
is themax(highest_quorum_cert.certified_block().round(), highest_timeout_cert.map_or(0, |tc| tc.round))
highest_commit_round
is thehighest_commit_cert.signed_ledger_info.ledger_info.commit_info.round
Verification
- verify all fields independently
- ensure all certificates are in the same epoch, highest_quorum_cert.certified_block().epoch == highest_commit_cert.certified_block().epoch == highest_timeout_cert.epoch if set
- ensure highest_quorum_cert has higher certified round than highest_commit_cert if set
- ensure highest_commit_cert has commit, highest_commit_cert.signed_ledger_info.ledger_info.commit_info != BlockInfo::empty
In this section we specify all the consensus message types that validators send across network and the verification requirements upon deserialization of the data.
This is the wrapper around all types that are specified below and is the only message type sent by validators with respect to consensus.
enum ConsensusMsg {
BlockRetrievalRequest(BlockRetrievalRequest),
BlockRetrievalResponse(BlockRetrievalResponse),
EpochRetrievalRequest(EpochRetrievalRequest),
ProposalMsg(ProposalMsg),
SyncInfo(SyncInfo),
EpochChangeProof(EpochChangeProof),
VoteMsg(VoteMsg),
}
This is the networking message that is sent by a proposer to all validators for a particular round and epoch. It contains a sync_info
in order to justify that the that proposed block should be voted on.
struct ProposalMsg {
proposal: Block,
sync_info: SyncInfo,
}
Fields
proposal
is the proposed block for execution and can be voted on by a receiving validatorsync_info
is the consensus state that justifies theproposal
Verification
- verify all fields independently
- ensure
proposal.block_data.quorum_cert == sync_info.highest_quorum_cert
- ensure
proposal.block_data.round == sync_info.highest_round()+ 1
- ensure
proposal.block_data.block_type == BlockType::Proposal
This is the highest-level vote network message that is sent after either of the following events occur.
- A proposal was received, executed, and then is ultimately voted on by this validator.
- A local timeout occurred on this validator for a given round
r
. To represent this timeout,vote.timeout_signature
is set. If the validator already voted on a proposal block forr
, theVoteMsg
will contain the samevote_data
andledger_info
as the previous vote. Otherwise, it will include a NIL block inVoteData
.
struct VoteMsg {
vote: Vote,
sync_info: SyncInfo,
}
Fields
vote
contains the proposed block voting state of this validator and if it has locally timed out for a roundsync_info
justifies the vote is on the correct round, specificallyvote.vote_data.proposed.round == sync_info.highest_round + 1
Verification
- verify the
vote
according to the Vote section - verify the
sync_info
according to the SyncInfo section - ensure all
epoch
fields invote
andsync_info
are equal
A validator will send this request for missing blocks from peers since it needs all block ancestors in order to execute and vote on a proposal block. It starts with by returning a particular block, followed by its ancestors up to a preferred chunk of blocks (i.e num_blocks
).
pub struct BlockRetrievalRequest {
block_id: HashValue,
num_blocks: u64,
}
Fields
block_id
is the first block id requested in a tree of blocksnum_blocks
specifies how many blocks are to be returned starting fromblock_id
and moving backward through the tree. It is an upper bound of the number of blocks retrieved as a server may return fewer or no blocks.
Verification
- None
Response for BlockRetrievalRequest
.
pub struct BlockRetrievalResponse {
status: BlockRetrievalStatus,
blocks: Vec<Block>,
}
Fields
status
indicates whether the requestSucceeded
orIdNotFound
when theblock_id
not found locally orNotEnoughBlocks
when theblock_id
is found but fewer than (num_blocks
) are returnedblocks
contains the requestednum_blocks
if thestatus
isSucceeded
. Blocks will be returned starting the withblock_id
followed by their respective parent block.
Verification
-
ensure
status
== Succeeded- TODO: Consider whether
status
== NotEnoughBlocks is also correct behavior. If the number of blocks to catch up exceeds a limit, a validator should rely on state synchronization to catch up.
- TODO: Consider whether
-
verify each blocks independently
-
verify the blocks form a chain where the first one's id == the
block_id
in BlockRetrievalRequest.[blocks[i].block_data.quorum_cert.certified_block.id == blocks[i+1].id for i in 0..blocks.len()-1]
Request for proof that a validator should advance its epoch. A validator requests this after receiving a claim that a newer epoch is available.
pub struct EpochRetrievalRequest {
pub start_epoch: u64,
pub end_epoch: u64,
}
Fields
start_epoch
is epoch where the proof should startend_epoch
is epoch where the proof should end
Verification
- None
A vector of LedgerInfoWithSignatures
' with contiguous increasing epoch numbers to prove a sequence of epoch changes from the first LedgerInfoWithSignatures
's epoch.
pub struct EpochChangeProof {
pub ledger_info_with_sigs: Vec<LedgerInfoWithSignatures>,
pub more: bool,
}
Fields
-
ledger_info_with_sigs
is a vector ofLedgerInfoWithSignatures
with contiguous increasing epoch numbers to prove a sequence of epoch changes from the epoch of the firstLedgerInfoWithSignatures
. -
more
indicates the sender is on a higher epoch than is returned fromledger_info_with_sigs
.- TODO: Consider returning the actual higher epoch so EpochRetrievalRequest can be invoked again
Derived Fields
start_epoch
is the start epoch of the proof,ledger_info_with_sigs.first().ledger_info.commit_info.epoch
target
is the last one of the proof,ledger_info_with_sigs.last()
Verification
-
ensure
ledger_info_with_sigs
is not empty -
ensure
start_epoch
matches the current epoch -
ensure every ledger info has a commit info with
next_epoch_state
set -
for li in ledger_info_with_sigs, follow a chain verification starting with epoch = current epoch, validators = OnChainConfig.validators(),
- ensure
epoch == li.ledger_info.commit_info.epoch
ed25519_verify(signature, li.ledger_info.hash(), validators.public_key(author)) for (author, signature) in li.signatures
- ensure
len(li.signatures >= len(OnChainConfig.validators()) * 2 / 3 + 1
- assign
(epoch, validators) = li.ledger_info.next_epoch_state
- ensure
SyncInfo
is piggybacked alongside ProposalMsg
and VoteMsg
. A standalone SyncInfo
message is sent from a validator that has newer information than its recipient.
Verification
- Same as specified in SyncInfo
In this section, we specify the abstracted external components consensus relies on. We only specify the interface and the expected behavior without implementation details.
Consensus has the following abstracted modules:
- Network - sends network messages to peer validators
- StateComputer - persist storage of ledger history and state as well as execute transactions
- PersistentStorage - persist consensus internal data
- OnChainConfig - configuration data from the ledger state e.g. validator set, vm versions.
- BlockStore - in-memory storage for blocks
fn direct_send(peer: AccountAddress, msg: ConsensusMsg);
- send
msg
to thepeer
validator
fn rpc(peer: AccountAddress, msg: BlockRetrievalRequest) -> Result<BlockRetrievalResponse>;
- remote procedure call(rpc) to peer and expect a result or timeout error
- the only request/response type is for block retrieval now but may be extended in the future
fn broadcast(msg: ConsensusMsg);
direct_send
the msg to all peer validators in the current epoch i.e.OnChainConfig.validators()
fn compute(block: Block) -> BlockInfo;
-
if the parent block has reconfiguration i.e.
block.quorum_cert.certified_block().next_epoch_state
is set, return the same executed result as the parent block.BlockInfo { id: block.id, round: block.block_data.round, ..block.quorum_cert.certified_block(), }
-
convert the block into a block metadata txn (TODO reference) and append at the front of all user transactions for execution
-
execute all transactions on top of the parent's block state (may be pending to commit)
-
return the executed result i.e. BlockInfo which captures the root hash of the new merkle tree and the
next_epoch_state
fn commit(blocks: Vec<Block>, finality_proof: LedgerInfoWithSignatures, callback: StateComputerCommitCallBackType);
- ensure
finality_proof.commit_info()
match the last block inblocks
and its executed result - ensure the
blocks
form a chain and the parent of the first is the current committed block - commit and persist
blocks
and the prooffinality_proof
- finally, call
callback()
fn sync_to(target: LedgerInfoWithSignatures);
- fetch all the missing transactions from peers until
target.commit_info().version
- execute all the fetched transactions
- ensure the root hash of the merkle tree matches the
target.commit_info().executed_state_id
- commit and persist all the transactions and the proof
target
- if
target
is across multipleepoch
s, fetch and persist all the end-epoch LedgerInfoWithSignatures
fn get_epoch_change_ledger_info(start_epoch: u64, end_epoch: u64) -> EpochChangeProof;
- construct the proof with end-epoch LedgerInfoWithSignatures from
start_epoch
tomin(end_epoch, start_epoch + MAX_NUM_EPOCH_CHANGE_LEDGER_INFO) - 1
- set
more
to true if the proof's target epoch is lower than the current epoch
Persistent layer for block_store and safety_rules.
fn validators() -> ValidatorSet;
- return the current validator set which includes the account addresses and consensus public keys, it also includes network endpoint and keys but is irrelevant to consensus.
fn hash<T>(value: T) -> HashValue;
- hash function (TODO: reference)
fn ed25519_sign(hash: HashValue) -> Result<Signature>;
- sign the hash with the consensus private key corresponding to the
OnChainConfig.validators().public_key(self.account_address)
- if the expected key not found (e.g. key manager unreachable, account address not in the set, etc.), return an error.
fn ed25519_verify(signature: Signature, hash: HashValue, public_key: PublicKey) -> bool;
- verify if the
signature
is signed on thehash
corresponding to thepublic_key
In this section, we specify BlockStore
which stores the pending blocks and quorum/timeout certificates. Blocks are organized as a tree structure where the root is the latest committed block, and follows the parent relationship in the Block
data structure.
BlockStore should implement the following read functions:
fn root(&self) -> Block;
- the current root block of the block tree.
fn sync_info(&self) -> SyncInfo;
- the current highest certificates in the block tree.
fn get_block(&self, id: HashValue) -> Option<Block>;
- if a block with
id
is present, return it otherwise None.
fn path_from_root(&self, id: HashValue) -> Option<Vec<Block>>;
- if a block with
id
is present, returns all the blocks in (root, block] otherwise None.
BlockStore should implement the following write functions:
fn build (root: (ExecutedBlock, QuorumCert, QuorumCert), blocks: Vec<Block>, quorum_cert: Vec<QuorumCert>, timeout_cert: Option<TimeoutCertificate>) -> BlockStore;
- build block tree with the root block and the quorum cert/commit cert on the root block and timeout cert if available.
execute_and_insert
all theblocks
insert_single_quorum_cert
all thequorum_cert
fn execute_and_insert(&self, block: Block);
- ensure the
block.round
is ahead of root block's round - ensure a block with id
block.parent_id
is present - compute the block via state_computer
- persist the
block
via PersistentStorage - ensure local timestamp is higher than
block.timestamp
- insert the
block
and its compute result in block_tree
fn insert_single_quorum_cert(&self, quorum_cert: QuorumCert);
- ensure a block with
quorum.certified_block().id()
is present - ensure the executed result matches
quorum.certified_block()
- persist the
quorum_cert
via PersistentStorage - insert the
quorum_cert
in block_tree
fn insert_timeout_certificate(&self, timeout_cert: TimeoutCertificate);
- ensure
timeout_cert.round
is higher than the currenttimeout_cert.round
in block tree - persist the
timeout_cert
via PersistentStorage - replace the
timeout_cert
in block_tree with the new one
fn commit(&self, finality_proof: LedgerInfoWithSignatures);
- ensure a block with
finality_proof.commit_info().id()
is present (refer as committed block below) - ensure the committed block has higher round than the current root block
- commit all the blocks from
path_from_root(committed_block.id())
via StateComputer - prune the block tree to committed block
fn prune(&self, block_id: HashValue);
- ensure a block with
block_id
is present (refer as new root block) - calculate the blocks that's not in the subtree of the new root block (refer as to-be-pruned blocks)
- delete to-be-pruned blocks and associated quorum certs via PersistentStorage
- delete to-be-pruned blocks from block_tree
fn insert_quorum_cert(&self, quorum_cert: QuorumCert);
- ensure a block with
quorum_cert.certified_block().id()
is present in block_tree otherwise initiate rpc calls withBlockRetrievalRequest
to peers until one of the block's parent id is present in block_tree (fetch the chain of missing blocks) - insert the fetched blocks in block_tree via
insert_single_quorum_cert
andexecute_and_insert_block
repeatively. - if quorum_cert is a valid commit cert (commit info is not empty),
commit(quorum_cert.ledger_info())
fn sync_to_highest_commit_cert(&self, quorum_cert: QuorumCert);
- ensure the quorum_cert is a valid commit cert (commit info is not empty)
- ensure the
quorum_cert.commit_info().id()
is not present in block_tree (otherwise it can go throughinsert_quorum_cert
) - ensure the
quorum_cert.commit_info().round()
is higher than the root - initiate rpc calls with
BlockRetrievalRequest {block_id, num_blocks: 3}
(refer the rpc result as blocks[0..3], quorum_cert[0..3] where quorum_cert[i] certifies block[i]) - persist the rpc result via PersistentStorage
sync_to(quorum_cert.ledger_info)
via StateComputer.- replace self with a new block_store constructed by
build(root: (block[0], quorum_cert[0], quorum_cert[2]), blocks[1..3], quorum_cert[1..3], timeout_cert: None, ..)
- prune all blocks in the old block store via PersistentStorage
fn add_certs(&self, sync_info: SyncInfo);
sync_to_highest_commit_cert(sync_info.highest_commit_cert)
insert_quorum_cert(sync_info.highest_commit_cert)
insert_quorum_cert(sync_info.highest_quorum_cert)
insert_timeout_certificate(sync_info.highest_timeout_cert)
The mempool is a component that stores pending transactions. Consensus pulls transactions from there in order to create blocks, when generating proposals. It also notifies this component when transactions have been committed.
In this section, we define a number of modules relevant to the consensus protocol:
- safety rules, used to maintain safety in the protocol
- proposer election, used to figure out who the leader of an {epoch, round} tuple is
- proposal generator, used to generate proposals when a validator enters a round when they are the leader
In this section, we specify the voting rules that are essential to the safety of the consensus. The reference implementation involves Trust Computing Base and is specified independently but is optional.
SafetyState is a per-epoch state which describes two values relates to the voting rules and should be persisted.
struct SafetyState {
preferred_round: Round,
last_vote_round: Round,
}
Fields
preferred_round
: for any quorum cert the validator received, the highest value ofquorum_cert.parent_block().round()
last_voted_round
: for any vote the validator send out, the highest value ofvote.vote_data().proposed_block().round()
It's initialized with 0.
InitialSafetyRules = SafetyState {
preferred_round: 0,
last_vote_round: 0,
};
There's two voting rules in the consensus protocol:
- A validator can only vote if the round of the proposal block >
last_vote_round
. - A validator can only vote if the round of the quorum cert that the proposal block extends from >=
preferred_round
.
- Vote for a proposal
block
is also a vote for commiting the grand parent (block.quorum_cert().parent_block()
) if we have 3 contiguous rounds certified i.e.block.round()
==block.quorum_cert().proposed_block().round()+1
==block.quorum_cert().parent_block().round()+2
Generate a VoteData
if we decide to vote for a proposal block
(pass voting rule).
let vote_data = VoteData {
proposed: block.block_info(), // executed result,
parent: block.quorum_cert().certified_block(),
}
We check if this vote also triggers a commit (pass commit rule).
let commit_info = if commit_rule() {
block.quorum_cert().parent_block()
} else {
BlockInfo::empty()
}
Generate a LedgerInfo with bcs serialize hash of vote_data and sign the bcs serialize hash to generate a vote signature which captures both vote and commit.
let ledger_info = LedgerInfo {
commit_info,
consensus_data_hash: vote_data.hash(),
};
let signature = ed25519_sign(ledger_info.hash());
Finally we construct a Vote
as following
Vote {
vote_data,
author: self.account_addres,
ledger_info,
signature,
timeout_signature: None,
}
In this section, we specify ProposerElection
interface and the LeaderReputation
algorithm we choose.
fn get_valid_proposer(round: Round) -> AccountAddress;
- gieven a round, return the valid proposer for this round.
We use a committed history based ProposerElection
implementation which provides bias towards successful proposers to improve transaction throughput and reduce commit latency.
Proposer candidates are all validators in the current epoch, represented as vector and ordered by account address.
For a given round r
, we define target_round = min(max(r - 4, 0), latest_committed_round)
and fetch up to window_size
blocks since target_round
(included).
For the fetched blocks, if a proposer candidate is a proposer of any block or a voter of the quorum cert in any block, it's given active_weight
otherwise inactive_weight
.
Use the first 8 bytes of sha3-256(round) as a random u64 (little endian encoding), the chosen weight is the random number % the sum of all proposers' weights.
The first proposer in the proposer candidates such that sum of the weights from 0 to its index is greater or equal to the chosen weight is the proposer in this round.
In this section, we specify the ProposalGenerator
module which is responsible to generate a proposal block if the validator is the proposer and a nil block if the round is timed out.
fn generate_proposal(round: Round) -> Block;
-
ensure the round is higher than the round of
block_store.highest_quorum_cert
(refer as hqc) -
ensure hqc is not the end of an epoch
-
if
hqc.certified_block().has_reconfiguration()
- use an empty
payload
- use
hqc.certified_block().timestamp()
astimestamp
- use an empty
-
otherwise
- pull transactions from mempool to form a
payload
, excluding pending transactions already used (by callingblock_store.path_from_root(hqc.certified_block().id())
) - use the current UNIX time in microseconds as a
timestamp
- pull transactions from mempool to form a
-
construct a BlockData as following, and sign the bcs serialized hash with consensus key to construct a Block
BlockData { epoch: hqc.certified_block().epoch(), round, timestamp, quorum_cert: hqc, block_type: BlockType { payload: txns, author: self.account_address, } }
fn generate_nil(round: Round) -> Block;
-
ensure the round is higher than the round of
block_store.highest_quorum_cert
(refer as hqc) -
ensure hqc is not the end of an epoch
-
construct a BlockData as following and construct a Block without signature.
BlockData { epoch: hqc.certified_block().epoch(), round, timestamp: hqc.certified_block().timestamp(), quoruom_cert: hqc, block_type: BlockType::NilBLock, }
Consensus operates in units of epoch
s. Each epoch has fixed, agreed on validator configuration (validator sets, vm version etc.), and progresses with increasing round
s within an epoch.
RoundState describes all information about a specific round. A validator enters round r
by receiving valid a QuorumCert or TimeoutCertificate at round r-1
.
struct RoundState {
round: Round,
committed_round: Round,
pending_votes: HashMap<Author, Vote>,
vote_sent: Option<Vote>,
timeout: u64,
}
Fields
round
: the current round number.committed_round
: the latest committed round number.pending_votes
: all the votes received for this round.vote_sent
: the vote we sent on this round.timeout
: the round interval - how long this round lasts before a validator unilaterally decides to join a movement to the next round.
Transition to a new round.
fn new(sync_info: SyncInfo) -> RoundState {
let round = sync_info.highest_round() + 1;
let committed_round = sync_info.highest_committed_round();
let timeout = BASE_INTERVAL * EXPONENT_BASE ^ min(MAX_EXPONENT, max(0, round - committed_round - 3));
timer.expire(timeout).send(round);
RoundState {
round,
committed_round,
pending_votes: HashMap::new(),
vote_sent: None,
timeout,
}
}
- enter new round with quorum/timeout certificates
- calculate timeout value as exponential of unhappy round gap
- start timer to signal a local timeout of
round
aftertimeout
timeout
should be reasonably large considering network latency and execution time, suggested value- BASE_INTERVAL = 1 sec
- EXPONENT_BASE = 1.2
- MAX_EXPONENT = 6
EpochState describes all information about a specific epoch. Each validator enters epoch e
by receiving a valid LedgerInfoWithSignatures at epoch e-1
carrying non-empty next_epoch_state
.
struct EpochState {
epoch: u64,
round_state: RoundState,
block_store: BlockStore,
proposer_election: ProposerElection,
proposal_generator: ProposalGenerator,
safety_rules: SafetyRules,
}
Fields
epoch
: the current epoch number.round_state
: the current round state in this epoch.block_store
: the current block store we have, specified in block store.proposer_election
: decides the proposer of each round, specified in proposer election.proposal_generator
: decide how to propose a block of transactions, specified in proposal generator.safety_rules
: the voting and commit rule of AptosBFT, specified in safety rules.
The initial state is constructed with an initial LedgerInfo
on a genesis transaction following the reconfiguration state transition.
TODO: Add detail
In this section we specify how the state transitions with events. There are 3 types of events in consensus:
- Reconfiguration from StateComputer
- ConsensusMsg from network
- Timeout of a round from timer
Upon receiving a reconfiguration notification (end-epoch LedgerInfo
) from StateComputer
, a validator transitions to a new epoch
with a generated genesis block at round 0 and round state at round 1. This is the only way to enter a new epoch.
fn reconfig(ledger_info: LedgerInfoWithSignatures, config: OnChainConfig) -> EpochState {
let epoch = ledger_info.next_epoch_state().unwrap().epoch;
EpochState {
epoch: epoch,
round_state: InitialRoundState,
block_store: BlockStore::build(
vec![make_genesis_block(ledger_info)],
vec![certificate_for_genesis(ledger_info)],
None, // timeout cert
), // generated genesis and the quorum cert (which is also commit cert) of it
proposer_election: ProposerElection::new(OnChainConfig.validators()), // proposer candidates's account address
safety_rules: InitialSafetyRules,
}
}
fn make_genesis_block(ledger_info: LedgerInfoWithSignatures) -> Block {
// a placeholder parent of the genesis
let ancestor = BlockInfo::empty();
let parent_qc = QuorumCert {
vote_data: VoteData {
proposed: ancestor,
parent: ancestor,
},
signed_ledger_info: LedgerInfoWithSignatures {
commit_info: ancestor,
signatures: BTreeMap::empty(),
},
};
let block_data = BlockData {
epoch: ledger_info.commit_info().epoch + 1,
round: 0,
timestamp_usecs: ledger_info.commit_info().timestamp,
parent_qc,
block_type: BlockType::Genesis,
};
Block {
id: block_data.hash(),
block_data,
signature: None,
}
}
fn certificate_for_genesis(ledger_info: LedgerInfoWithSignatures) -> QuorumCert {
let genesis_block_info = BlockInfo {
epoch: ledger_info.commit_info().epoch + 1,
round: 0,
next_epoch_state: None,
..ledger_info.commit_info(),
};
let vote_data = VoteData {
proposed: genesis_block_info,
parent: genesis_block_info,
};
QuorumCert {
vote_data,
signed_ledger_info: LedgerInfoWithSignatures {
LedgerInfo {
commit_info: genesis_block_info,
consensus_data_hash: vote_data.hash(),
}
signatures: BTreeMap::empty(),
},
}
}
- a validator is free to prune any data from the previous epoch in PersistentStorage after the transition to a new epoch.
- genesis block and its quorum cert/commit cert are locally generated from previous end-epoch LedgerInfoWithSignatures's deterministic fields
- the Aptos network is bootstrapped with a genesis transaction (note the different from genesis block) and a ledger_info commits it with next_epoch_state set.
State transitions upon receiving a message from network (sender
of the message is also referred below), assuming all messages are verified. State transitions are event driven from either network messages or local timeouts.
fn update(&mut self) -> Result<()> {
let local_sync_info = self.block_store.sync_info();
self.safety_rules.update(local_sync_info.highest_quorum_cert());
if local_sync_info.highest_round() + 1 > self.round_state.round {
self.round_state = RoundState::new(
local_sync_info.highest_round() + 1,
local_sync_info.highest_commit_round()
);
if self.account_address == self.proposer_election.get_valid_proposer(self.round_state.round) {
network.broadcast(ProposalMsg {
proposal: self.proposal_generator.generate_proposal(self.round_state.round),
sync_info: self.block_store.sync_info(),
)
}
}
}
- update safety rules with new quorum cert.
- transition to a new RoundState if there's a QuorumCert or TimeoutCert with higher round.
- broadcast a new proposal with local sync info if the validator is the proposer for the new round.
fn ensure_epoch(&self, msg_epoch: u64) -> Result<()>{
if self.epoch == msg_epoch {
return Ok(());
}
if msg_epoch > self.epoch {
network.direct_send(`sender`, EpochRetrievalRequest {
start_epoch: self.epoch,
end_epoch: msg_epoch,
});
}
if msg_epoch < self.epoch {
// sent back EpochChangeProof from msg_epoch to self.epoch
network.direct_send(
`sender`,
StateComputer.get_epoch_change_ledger_info(msg_epoch, self.epoch));
}
bail!("Epoch doesn't match");
}
- this doesn't change the state but only sends network messages
- no-op if the msg epoch is the same as the current epoch
- try to retrieve the proof if the current epoch is behind to join the message epoch
- try to send the proof if the current epoch is ahead to convince the sender to join the current epoch
- if epoch is different, return an error and ignore the message.
fn ensure_round(&mut self, msg_round: u64, sync_info: SyncInfo) -> Result<()> {
if self.round_state.round == msg_round {
return Ok(());
}
let local_sync_info = self.block_store.sync_info();
if local_sync_info.has_newer_certificates(sync_info) {
network.direct_send(`sender`, local_sync_info);
}
if sync_info.has_newer_certificates(self.block_store.sync_info()) {
self.block_store.add_certs(sync_info)?;
self.update()?;
}
ensure!(msg_round == self.round_state.round, "Round doesn't match");
Ok()
}
- this function ensures the round state is at the
msg_round
ifself.round < msg_round
- no-op if it's already the same
- if the sync_info has newer certificates, try to sync the block_store to the state, and update safety rules and transition to new RoundState
- if the validator has newer certificates, try to send to
sender
to help it move forward. - if the round doesn't match e.g. msg_round is behind, sync failed, return errors and ignore the message.
fn process_proposal_msg(&mut self, proposal_msg: ProposalMsg) -> Result<()> {
let ProposalMsg {proposal, sync_info} = proposal_msg;
self.ensure_epoch(proposal.epoch())?;
self.ensure_round(proposal.round(), sync_info)?;
let executed_block = self.block_store.execute_and_insert(proposal)?;
if let Some(vote) = self.safety_rules.vote(executed_block) {
self.round_state.save(vote);
network.direct_send(
self.proposer_election.get_valid_proposer(self.round_state.round + 1),
VoteMsg {
vote,
sync_info: self.block_store.sync_info(),
});
}
}
- ensure and sync the epoch and round
- execute the block
- save and persist the vote in round state
- send the vote to next proposer if decide to vote
This section documents the steps to follow after receiving a VoteMsg.
-
verify the validity of the VoteMsg according to the VoteMsg section
-
ensure that the message is at the right epoch by calling
ensure_epoch(vote.vote_data.proposed.epoch)
-
ensure that the message is at the right round by calling
ensure_round(vote.vote_data.proposed.round, sync_info)
-
ensure that either:
- this validator is the proposer of the next round (TODO: not specified)
- or the vote is a timeout vote (a timeout signature is set)
-
if the author has voted for this round already (
pending_votes[sender]
is not empty):- ensure that the votes are equivalent (both
vote.ledger_info.hash()
are equal) - only process the new vote if it is a new timeout vote (if it contains a timeout signature and the previously seen one doesn't)
- ensure that the votes are equivalent (both
-
store the vote under
pending_votes[sender]
-
if there's enough votes (2f+1) for the same
VoteData
andLedgerInfo
, store the following quorum certificate in the block store viablock_store.insert_quorum_cert()
:QuorumCert { vote_data, signed_ledger_info: LedgerInfoWithSignatures::V0(LedgerInfoWithV0{ ledger_info, signatures, }), }
where
vote_data
andledger_info
are the common values that the votes signed, and wheresignatures
is a map of validator account addresses to their vote signatures. -
if there was no QC created and the new vote is a timeout vote (contains a timeout signature), check if there are now enough (2f+1) timeout signatures and store the following timeout certificate in the todofixblock store via
block_store.insert_timeout_certificate()
:TimeoutCertificate { timeout: Timeout{ epoch, round, }, signatures, }
where
epoch
andround
are the current epoch and round, and wheresignatures
is a map of validator account addresses to their timeout vote signatures. -
if a QC or TC was created, transition to a new round if possible by calling
update()
see pseudo-code equivalent
fn process_vote_msg(&mut self, vote_msg: VoteMsg) -> Result<()> {
let VoteMsg{ vote, sync_info } = vote_msg;
self.ensure_epoch(vote.vote_data().proposed().epoch())?;
self.ensure_round(vote.vote_data().proposed().round(), sync_info)?;
if vote.timeout_signature.is_none() {
let next_proposer = get_proposer_for(vote.vote_data.proposed.epoch, vote.vote_data.proposed.round + 1);
ensure!(next_proposer, self.address);
}
match self.round_state.pending_votes.insert(`sender`, vote) {
QuorumCert(qc) => {
self.block_store.insert_quorum_cert(qc),
self.update();
}
TimeoutCert(tc) => {
self.block_store.insert_timeout_certificate(tc),
self.update();
}
}
}
fn process_sync_info(&mut self, sync_info: SyncInfo) -> Result<()> {
self.ensure_epoch(sync_info.highest_quorum_cert().certified_block().epoch())?;
self.ensure_round(sync_info.highest_round() + 1, sync_info)?;
}
- ensure and sync the epoch and round
- this message is received because a peer thinks it has newer certificates than this validator.
fn process_epoch_change_proof(&mut self, proof: EpochChangeProof) -> Result<()> {
ensure!(self.epoch == proof.start_epoch, "Proof doesn't start at the current epoch");
proof.verify()?;
StateComputer.sync_to(proof.target());
}
- ensure the proof starts at the current epoch so that we're able to verify the chain of signatures.
- sync to the target ledger info (the last one in the proof)
- a reconfiguration notification will be sent via StateComputer to transition to a new epoch
fn process_block_retrieval(&self, request: BlockRetreivalRequest);
Read only request.
- find if request.block_id is present and
num_blocks
before it (included) exist in self.block_store - construct BlockRetrievalResposne correspondingly and send back the rpc response to the
sender
fn process_epoch_retrieval(&self, request: EpochRetrievalRequest);
Read only request.
- ensure
request.end_epoch <= self.epoch
- construct EpochChangeProof via
StateComputer.get_epoch_change_ledger_info(request.start_epoch, request.end_epoch)
and send it back to thesender
Not expected, this message type is only expected as a rpc response corresponding to BlockRetrievalRequest
.
A round
number is received when the timer setup in the constructor of RoundState
expired.
-
process_local_timeout(round: Round)
-
ensure that
round_state.round
matchesround
-
reset a timer (a validator broadcasts its vote message every
timeout
interval until it moves to the next roundtimer.expire(round_state.timeout).send(round)
-
if no vote has been cached (
round_state.vote_sent
is empty), create one:- create a NIL block by calling
nil_block = proposal_generator.generate_nil_block(round)
- execute and store the block by calling
executed_block = block_store.execute_and_insert_block(nil_block)
- obtain a signed vote by calling
safety_rules.vote(executed_block)
- cache the vote in
round_state.vote_sent
- create a NIL block by calling
-
if the cached vote
round_state.vote
has an emptytimeout_signature
, set one:-
create a
timeout
payload with the current epoch and round:Timeout { epoch: round_state.epoch, round: round_state.round, }
-
obtain a
digest
of thetimeout
by callingSHA-3-256(bcs.serialize(timeout))
-
obtain a
signature
over the digest by callinged25519_sign(private_key, digest)
-
set the
round_state.vote.timeout_signature
to thesignature
. -
cache the vote in
round_state.vote_sent
-
-
construct a
vote_message
with the cached vote and the latestsync_info
:VoteMsg { vote: round_state.vote_sent, sync_info: block_store.sync_info(), }
-
broadcast the vote message by calling
network.broadcast(vote_message)
-
see pseudo-code equivalent
fn process_local_timeout(&mut self, round: Round) -> Result<()> {
if self.round_state.round != round {
return Ok(())
}
timer.expire(self.round_state.timeout).send(round);
let vote = match self.round_state.vote_sent {
Some(vote) => vote,
None => {
let nil_block = self.proposal_generator.generate_nil_block(round)?;
let executed_block = self.block_store.execute_and_insert_block(nil_block)?;
self.safety_rules.vote(executed_block)?
}
};
if vote.timeout_signature.is_none() {
let signature = ed25519_sign(Timeout {
epoch: vote.vote_data().proposed().epoch(),
round: vote.vote_data().proposed().round(),
}.hash());
vote.timeout_signature = Some(signature);
}
self.round_state.save(vote);
network.broadcast(VoteMsg {
vote,
sync_info: self.block_store.sync_info(),
})
}