A TLA+ formal specification of the Zcash peer-to-peer network protocol, following ZIP-0204.
messages.tla— Message constructors for all protocol messages (version,verack,ping,pong,inv,getheaders,headers,getdata,block,reject).protocol.tla— Protocol actions, connection state machine, and liveness property.protocol.cfg— TLC model checker configuration.sync_scheduler.tla— Download-scheduler / inventory-routing model that reproduces the Zebra sync stall (see below).
The spec covers the connection lifecycle between peers using a message consumption model.
In a real network, peers communicate only through messages — no node can inspect another node's internal state. The spec enforces this same constraint: each peer has an inbox (FIFO queue) per connection. Sending appends to the remote peer's inbox; receiving dequeues from the local inbox. No action reads remote node variables directly — all decisions are based solely on message payloads. This means that any property the model checker verifies holds under the same information constraints that real implementations face.
- Handshake —
version/verackexchange. Version validation is deterministic: peers whose advertised version is belowMinPeerProtoVersionare rejected. - Keepalive —
ping/pongwith nonce echo, triggered when a connection is idle. - Block sync —
inv→getheaders→headers→getdata→block, looping until the lagging peer catches up. Both peers exchange invs before either processes the other's. - Disconnection — unilateral (TCP FIN): if no message is received within
DisconnectTimeoutticks, the detecting side resets toinitand the TCP pipe is torn down. The remote peer discovers the disconnection independently via its own timeout. Stale messages from the old connection are discarded during the new handshake, and unexpected version messages on a post-handshake connection trigger a reset — matching Zebra'sDuplicateHandshakebehavior.
Each connection is modeled as an explicit state machine from the perspective of peer n tracking its relationship with peer m. Ping/pong can fire from any post-handshake state when the connection is idle.
stateDiagram-v2
direction LR
[*] --> init
init --> version_sent : SendVersion
state "Handshake" as hs {
version_sent --> established : RecvVersion (valid)
version_sent --> init : RecvVersion (invalid)
version_sent --> version_sent : DiscardStaleMessage
}
state "Block sync (n lags m)" as sync {
established --> inv_sent : SendInv
inv_sent --> synced : RecvInv (already caught up)
inv_sent --> getheaders_sent : RecvInv (lagging)
getheaders_sent --> getdata_sent : RecvHeaders
getdata_sent --> block_received : RecvBlock (still behind)
block_received --> getdata_sent : SendGetData
getdata_sent --> synced : RecvBlock (caught up)
}
note right of sync
ping/pong can fire from any
post-handshake state when idle
end note
note left of sync
RecvVersionReset transitions any
post-handshake state back to init
end note
The block sync states are only entered when n has fewer blocks than m (determined from the inv payload). Once n catches up (synced), the session for that direction is complete — m independently goes through the same states from its own perspective if it also lags n.
The spec checks:
- Liveness —
EventualConsensus: eventually all peers reach the same block height. - Safety invariants from ZIP-0204:
InvCountBounded/GetDataCountBounded— inventory vectors carry ≤ 50,000 entries.HeadersCountBounded— headers messages carry ≤ 160 headers.VersionBounded— peers advertise a version ≥MinPeerProtoVersion(170002).PingOnEstablished— ping nonces are only active after the handshake completes.SyncDirection— a peer only enters sync states when it has ≤ blocks than its partner.
- Peer discovery — the spec assumes a fixed set of peers (
InitialPeers) that already know about each other. DNS seed lookups,addr/getaddrmessage exchange, and dynamic peer set changes are not included. Peer discovery is orthogonal to the connection-level protocol — it determines who you connect to, not how the connection behaves once established. Excluding it keeps the state space focused on the properties we want to verify (handshake correctness, sync convergence, keepalive bounds). See #2 for discussion.
sync_scheduler.tla models the layer above the per-connection
protocol: the inventory-routing registry and block-download scheduler that pick which
peer to ask for each block. This is where Zebra's genesis-to-tip sync stall lived
(ZcashFoundation/zebra#10679,
symptom #5709). Two boolean
constants select Zebra's pre- and post-fix behaviour, and the model reproduces the stall
as a TLC counterexample under the buggy behaviour while satisfying both safety and
liveness under the fix.
| Config | Behaviour | Checks | Expected |
|---|---|---|---|
sync_scheduler_fixed.cfg |
fixed | invariants + liveness | passes |
sync_scheduler_buggy.cfg |
buggy | EventuallyAllVerified |
violated (the stall) |
sync_scheduler_poison.cfg |
buggy | RegistryHonest |
violated (timeout ≠ notfound) |
java -jar tla2tools.jar -config sync_scheduler_fixed.cfg sync_scheduler.tla # passes
java -jar tla2tools.jar -config sync_scheduler_buggy.cfg sync_scheduler.tla # liveness stall
java -jar tla2tools.jar -config sync_scheduler_poison.cfg sync_scheduler.tla # registry poisoningFull write-up: documents/sync-stall-modeling.md.
Requires Java and tla2tools.jar.
Two configurations are provided:
| Config | Peers | Symmetry | Verification |
|---|---|---|---|
protocol.cfg |
2 | No | Complete — fully explores the state space |
protocol_3peers.cfg |
3 | Yes | Bounded — runs until timeout, no errors found |
# Complete liveness proof (2 peers, finishes in ~10 seconds)
java -jar tla2tools.jar -config protocol.cfg protocol.tla
# Bounded stress test (3 peers, run until timeout)
java -jar tla2tools.jar -config protocol_3peers.cfg protocol.tlaPermuting peer names produces structurally identical states — peer1={1 block}, peer2={3 blocks} is the same scenario as peer1={3 blocks}, peer2={1 block}. Declaring SYMMETRY Permutations(InitialPeers) tells TLC to collapse those equivalence classes, reducing the state space by up to N! (6x for 3 peers).
For symmetry to work, peers must be declared as model values (abstract atoms) rather than strings in the config:
CONSTANT peer1 = peer1 \* model value
Important caveat: symmetry reduction is sound for safety properties but can theoretically miss liveness counterexamples (a known TLC limitation). The 2-peer config intentionally omits symmetry to give a complete, trustworthy proof of the EventualConsensus liveness property.
Typeset versions of the spec are available in documents/:
PDFs are automatically regenerated by CI on every push to main that modifies .tla files.
| Constant | Default | Description |
|---|---|---|
InitialPeers |
{"peer1", "peer2"} |
Set of peers in the model. |
MaxBlock |
3 |
Maximum initial block height per peer. |
MaxClock |
5 |
Upper bound on the clock (limits ping/pong interleaving). |
DisconnectTimeout |
4 |
Ticks of silence before a peer disconnects. |
MinPeerProtoVersion |
170002 |
Minimum acceptable protocol version (ZIP-0204 §3). |