-
Notifications
You must be signed in to change notification settings - Fork 766
Open
Labels
status: waiting-triageThis issue/PR has not yet been triaged by the team.This issue/PR has not yet been triaged by the team.type: bugIssues that need priority attention -- something isn't workingIssues that need priority attention -- something isn't working
Description
Is there an existing issue for this?
- I have searched the existing issues
What happened?
Nodes sometimes stop staying in sync with the rest of the network, but mark themselves as catching_up: false
if you check the /status
rpc endpoint.
This can be seen across basically every tendermint/cometbft network.
Previous logs where it happened:
1:43AM ERR Stopping peer for error err=EOF module=p2p peer="Peer{MConn{195.14.6.15:60942} 7f3a25adf2bb049c3a9ad17a29bc5c59ce5dd239 in}"
1:44AM ERR prevote step: consensus deems this block invalid; prevoting nil err="invalid proof for encrypted random. Height: 21039134, Random: ba46a8dceaf1639fc62452fb6328d80bdba5888fb596d72fbbe6ce6ab1680a80ce36b51b4f9532244a7527063ffa1e62, Proof: 26c4c3f7d3e2d56ef6e101a07f3e155354654e1f59c50b32fae417ad3812b4fc, DataHash: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855" height=21039134 module=consensus round=0
1:44AM ERR CONSENSUS FAILURE!!! err="precommit step; +2/3 prevoted for an invalid block: invalid proof for encrypted random. Height: 21039134, Random: ba46a8dceaf1639fc62452fb6328d80bdba5888fb596d72fbbe6ce6ab1680a80ce36b51b4f9532244a7527063ffa1e62, Proof: 26c4c3f7d3e2d56ef6e101a07f3e155354654e1f59c50b32fae417ad3812b4fc, DataHash: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855" module=consensus stack="goroutine 1000 [running]:\nruntime/debug.Stack()\n\t/go/pkg/mod/golang.org/[email protected]/src/runtime/debug/stack.go:26 +0x5e\ngithub.com/cometbft/cometbft/consensus.(*State).receiveRoutine.func2()\n\t/go/pkg/mod/github.com/scrtlabs/[email protected]/consensus/state.go:801 +0x46\npanic({0x4468c00?, 0xc180400870?})\n\t/go/pkg/mod/golang.org/[email protected]/src/runtime/panic.go:787 +0x132\ngithub.com/cometbft/cometbft/consensus.(*State).enterPrecommit(0xc001406008, 0x141081e, 0x0)\n\t/go/pkg/mod/github.com/scrtlabs/[email protected]/consensus/state.go:1526 +0x159f\ngithub.com/cometbft/cometbft/consensus.(*State).addVote(0xc001406008, 0xc38f8c1e10, {0xc1e93a7c20, 0x28})\n\t/go/pkg/mod/github.com/scrtlabs/[email protected]/consensus/state.go:2315 +0x19c5\ngithub.com/cometbft/cometbft/consensus.(*State).tryAddVote(0xc001406008, 0xc38f8c1e10, {0xc1e93a7c20?, 0x0?})\n\t/go/pkg/mod/github.com/scrtlabs/[email protected]/consensus/state.go:2067 +0x26\ngithub.com/cometbft/cometbft/consensus.(*State).handleMsg(0xc001406008, {{0x549a320, 0xc0a54b12b8}, {0xc1e93a7c20, 0x28}})\n\t/go/pkg/mod/github.com/scrtlabs/[email protected]/consensus/state.go:929 +0x3d0\ngithub.com/cometbft/cometbft/consensus.(*State).receiveRoutine(0xc001406008, 0x0)\n\t/go/pkg/mod/github.com/scrtlabs/[email protected]/consensus/state.go:836 +0x3f1\ncreated by github.com/cometbft/cometbft/consensus.(*State).OnStart in goroutine 389\n\t/go/pkg/mod/github.com/scrtlabs/[email protected]/consensus/state.go:398 +0x107\n"
7:16PM ERR prevote step: consensus deems this block invalid; prevoting nil err="invalid proof for encrypted random. Height: 21050390, Random: 269f9c4830706f6f3f662218e018aedd9c0750273e40b0a8e7cd95f21180b5e686bca270037c19c6882cbfa43ad4696a, Proof: f5732b85cb7ee78c5f2abf2e7f54a90e4578b51fd4139092dfaa735894a7907a, DataHash: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855" height=21050390 module=consensus round=0
7:16PM ERR CONSENSUS FAILURE!!! err="precommit step; +2/3 prevoted for an invalid block: invalid proof for encrypted random. Height: 21050390, Random: 269f9c4830706f6f3f662218e018aedd9c0750273e40b0a8e7cd95f21180b5e686bca270037c19c6882cbfa43ad4696a, Proof: f5732b85cb7ee78c5f2abf2e7f54a90e4578b51fd4139092dfaa735894a7907a, DataHash: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855" module=consensus stack="goroutine 444 [running]:\nruntime/debug.Stack()\n\t/go/pkg/mod/golang.org/[email protected]/src/runtime/debug/stack.go:26 +0x5e\ngithub.com/cometbft/cometbft/consensus.(*State).receiveRoutine.func2()\n\t/go/pkg/mod/github.com/scrtlabs/[email protected]/consensus/state.go:801 +0x46\npanic({0x4469c00?, 0xc1663e0900?})\n\t/go/pkg/mod/golang.org/[email protected]/src/runtime/panic.go:787 +0x132\ngithub.com/cometbft/cometbft/consensus.(*State).enterPrecommit(0xc001804008, 0x1413416, 0x0)\n\t/go/pkg/mod/github.com/scrtlabs/[email protected]/consensus/state.go:1526 +0x159f\ngithub.com/cometbft/cometbft/consensus.(*State).addVote(0xc001804008, 0xc163e864e0, {0xc267ad28d0, 0x28})\n\t/go/pkg/mod/github.com/scrtlabs/[email protected]/consensus/state.go:2315 +0x19c5\ngithub.com/cometbft/cometbft/consensus.(*State).tryAddVote(0xc001804008, 0xc163e864e0, {0xc267ad28d0?, 0x0?})\n\t/go/pkg/mod/github.com/scrtlabs/[email protected]/consensus/state.go:2067 +0x26\ngithub.com/cometbft/cometbft/consensus.(*State).handleMsg(0xc001804008, {{0x549b320, 0xc163b4ce88}, {0xc267ad28d0, 0x28}})\n\t/go/pkg/mod/github.com/scrtlabs/[email protected]/consensus/state.go:929 +0x3d0\ngithub.com/cometbft/cometbft/consensus.(*State).receiveRoutine(0xc001804008, 0x0)\n\t/go/pkg/mod/github.com/scrtlabs/[email protected]/consensus/state.go:836 +0x3f1\ncreated by github.com/cometbft/cometbft/consensus.(*State).OnStart in goroutine 46\n\t/go/pkg/mod/github.com/scrtlabs/[email protected]/consensus/state.go:398 +0x107\n"
7:16PM ERR Stopping peer for error err=EOF module=p2p peer="Peer{MConn{135.234.184.248:10560} ab6394e953e0b570bb1deeb5a8b387aa0dc6188a in}"
7:16PM ERR Stopping peer for error err=EOF module=p2p peer="Peer{MConn{50.85.102.57:5168} 6fb7169f7630da9468bf7cc0bcbbed1eb9ed0d7b in}"
Gaia Version
v25.0.1
How to reproduce?
Unclear. It's a "known" issue that node runners experience, but there hasn't been a single unified theory on why it happens. It's not (necessarily?) load related, nor machine spec related. Here are grafana charts after it happened, where CPU and Memory were fine but you can see the network traffic drops out.
Metadata
Metadata
Assignees
Labels
status: waiting-triageThis issue/PR has not yet been triaged by the team.This issue/PR has not yet been triaged by the team.type: bugIssues that need priority attention -- something isn't workingIssues that need priority attention -- something isn't working