Skip to content

[Bug]: Node loses consensus and doesn't restart syncing #3846

@dylanschultzie

Description

@dylanschultzie

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

Nodes sometimes stop staying in sync with the rest of the network, but mark themselves as catching_up: false if you check the /status rpc endpoint.

This can be seen across basically every tendermint/cometbft network.

Previous logs where it happened:

1:43AM ERR Stopping peer for error err=EOF module=p2p peer="Peer{MConn{195.14.6.15:60942} 7f3a25adf2bb049c3a9ad17a29bc5c59ce5dd239 in}"
1:44AM ERR prevote step: consensus deems this block invalid; prevoting nil err="invalid proof for encrypted random. Height: 21039134, Random: ba46a8dceaf1639fc62452fb6328d80bdba5888fb596d72fbbe6ce6ab1680a80ce36b51b4f9532244a7527063ffa1e62, Proof: 26c4c3f7d3e2d56ef6e101a07f3e155354654e1f59c50b32fae417ad3812b4fc, DataHash: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855" height=21039134 module=consensus round=0
1:44AM ERR CONSENSUS FAILURE!!! err="precommit step; +2/3 prevoted for an invalid block: invalid proof for encrypted random. Height: 21039134, Random: ba46a8dceaf1639fc62452fb6328d80bdba5888fb596d72fbbe6ce6ab1680a80ce36b51b4f9532244a7527063ffa1e62, Proof: 26c4c3f7d3e2d56ef6e101a07f3e155354654e1f59c50b32fae417ad3812b4fc, DataHash: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855" module=consensus stack="goroutine 1000 [running]:\nruntime/debug.Stack()\n\t/go/pkg/mod/golang.org/[email protected]/src/runtime/debug/stack.go:26 +0x5e\ngithub.com/cometbft/cometbft/consensus.(*State).receiveRoutine.func2()\n\t/go/pkg/mod/github.com/scrtlabs/[email protected]/consensus/state.go:801 +0x46\npanic({0x4468c00?, 0xc180400870?})\n\t/go/pkg/mod/golang.org/[email protected]/src/runtime/panic.go:787 +0x132\ngithub.com/cometbft/cometbft/consensus.(*State).enterPrecommit(0xc001406008, 0x141081e, 0x0)\n\t/go/pkg/mod/github.com/scrtlabs/[email protected]/consensus/state.go:1526 +0x159f\ngithub.com/cometbft/cometbft/consensus.(*State).addVote(0xc001406008, 0xc38f8c1e10, {0xc1e93a7c20, 0x28})\n\t/go/pkg/mod/github.com/scrtlabs/[email protected]/consensus/state.go:2315 +0x19c5\ngithub.com/cometbft/cometbft/consensus.(*State).tryAddVote(0xc001406008, 0xc38f8c1e10, {0xc1e93a7c20?, 0x0?})\n\t/go/pkg/mod/github.com/scrtlabs/[email protected]/consensus/state.go:2067 +0x26\ngithub.com/cometbft/cometbft/consensus.(*State).handleMsg(0xc001406008, {{0x549a320, 0xc0a54b12b8}, {0xc1e93a7c20, 0x28}})\n\t/go/pkg/mod/github.com/scrtlabs/[email protected]/consensus/state.go:929 +0x3d0\ngithub.com/cometbft/cometbft/consensus.(*State).receiveRoutine(0xc001406008, 0x0)\n\t/go/pkg/mod/github.com/scrtlabs/[email protected]/consensus/state.go:836 +0x3f1\ncreated by github.com/cometbft/cometbft/consensus.(*State).OnStart in goroutine 389\n\t/go/pkg/mod/github.com/scrtlabs/[email protected]/consensus/state.go:398 +0x107\n"
7:16PM ERR prevote step: consensus deems this block invalid; prevoting nil err="invalid proof for encrypted random. Height: 21050390, Random: 269f9c4830706f6f3f662218e018aedd9c0750273e40b0a8e7cd95f21180b5e686bca270037c19c6882cbfa43ad4696a, Proof: f5732b85cb7ee78c5f2abf2e7f54a90e4578b51fd4139092dfaa735894a7907a, DataHash: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855" height=21050390 module=consensus round=0
7:16PM ERR CONSENSUS FAILURE!!! err="precommit step; +2/3 prevoted for an invalid block: invalid proof for encrypted random. Height: 21050390, Random: 269f9c4830706f6f3f662218e018aedd9c0750273e40b0a8e7cd95f21180b5e686bca270037c19c6882cbfa43ad4696a, Proof: f5732b85cb7ee78c5f2abf2e7f54a90e4578b51fd4139092dfaa735894a7907a, DataHash: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855" module=consensus stack="goroutine 444 [running]:\nruntime/debug.Stack()\n\t/go/pkg/mod/golang.org/[email protected]/src/runtime/debug/stack.go:26 +0x5e\ngithub.com/cometbft/cometbft/consensus.(*State).receiveRoutine.func2()\n\t/go/pkg/mod/github.com/scrtlabs/[email protected]/consensus/state.go:801 +0x46\npanic({0x4469c00?, 0xc1663e0900?})\n\t/go/pkg/mod/golang.org/[email protected]/src/runtime/panic.go:787 +0x132\ngithub.com/cometbft/cometbft/consensus.(*State).enterPrecommit(0xc001804008, 0x1413416, 0x0)\n\t/go/pkg/mod/github.com/scrtlabs/[email protected]/consensus/state.go:1526 +0x159f\ngithub.com/cometbft/cometbft/consensus.(*State).addVote(0xc001804008, 0xc163e864e0, {0xc267ad28d0, 0x28})\n\t/go/pkg/mod/github.com/scrtlabs/[email protected]/consensus/state.go:2315 +0x19c5\ngithub.com/cometbft/cometbft/consensus.(*State).tryAddVote(0xc001804008, 0xc163e864e0, {0xc267ad28d0?, 0x0?})\n\t/go/pkg/mod/github.com/scrtlabs/[email protected]/consensus/state.go:2067 +0x26\ngithub.com/cometbft/cometbft/consensus.(*State).handleMsg(0xc001804008, {{0x549b320, 0xc163b4ce88}, {0xc267ad28d0, 0x28}})\n\t/go/pkg/mod/github.com/scrtlabs/[email protected]/consensus/state.go:929 +0x3d0\ngithub.com/cometbft/cometbft/consensus.(*State).receiveRoutine(0xc001804008, 0x0)\n\t/go/pkg/mod/github.com/scrtlabs/[email protected]/consensus/state.go:836 +0x3f1\ncreated by github.com/cometbft/cometbft/consensus.(*State).OnStart in goroutine 46\n\t/go/pkg/mod/github.com/scrtlabs/[email protected]/consensus/state.go:398 +0x107\n"
7:16PM ERR Stopping peer for error err=EOF module=p2p peer="Peer{MConn{135.234.184.248:10560} ab6394e953e0b570bb1deeb5a8b387aa0dc6188a in}"
7:16PM ERR Stopping peer for error err=EOF module=p2p peer="Peer{MConn{50.85.102.57:5168} 6fb7169f7630da9468bf7cc0bcbbed1eb9ed0d7b in}"

Gaia Version

v25.0.1

How to reproduce?

Unclear. It's a "known" issue that node runners experience, but there hasn't been a single unified theory on why it happens. It's not (necessarily?) load related, nor machine spec related. Here are grafana charts after it happened, where CPU and Memory were fine but you can see the network traffic drops out.

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    status: waiting-triageThis issue/PR has not yet been triaged by the team.type: bugIssues that need priority attention -- something isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions