-
Notifications
You must be signed in to change notification settings - Fork 400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
re-joining after simulated node crash panics on trying to re-add an already existing node. #314
Comments
/cc @darach |
If a node has joined in a cluster, you should just restart it without sending re-join msg. |
There is no rejoin message sent, that’s the odd issue about it. And node 1 restarts but it claims that node 2 (one node) is readded |
The node 2 and node 3 both panicked directly when I tried to reproduce the scenario in step 8:
Maybe there is something wrong in the state machine 🤔️. |
I am baffled, I’ve been dancing around this for a day tried both the 0.6.0 crate and git. The error changes slightly if I don’t persist the ConfState but that feels wrong and causes other issues |
After some digging I noticed that logs are not replicated to joining nodes. So joining 2 and the 3 they do share the same snapshot but they don't share the same logs - could that be where the error originates from? |
@Hoverbear PTAL |
Hmm... This is typically caused by one of these: raft-rs/src/progress/progress_set.rs Lines 301 to 341 in d34f693
You can see when a node is removed from the raft-rs/src/progress/progress_set.rs Lines 347 to 355 in d34f693
That's called by Lines 2104 to 2123 in d34f693
Which is called via Lines 324 to 340 in d34f693
I see you're updating it: I need to play with this some :) |
Thanks taking the time to provide all the context! The assumption made is that add and remove node via a conf change event is for orderly growing or shrinking the raft group not intermittent failure. So currently there is no code path in which we send a remove-node conf change event. So basically in the code we call add_node only when a new node joining the cluster (on the simplest possible algorithm of pre-defined/shared node id's and if they're known or not), we didn't got to removing nodes again yet since we're still battling the restart case. |
So some preliminary tests:
trace: Nov 06 10:53:55.571 DEBG Sending from 2 to 1, msg: msg_type: MsgHeartbeatResponse to: 1, to: 1, from: 2, raft_id: 2
[WS Onramp] Connection terminated.
system stopped
Error: Failed to connect to host: Connection refused (os error 111)
Nov 06 10:53:55.745 WARN down(local), id: 1
Nov 06 10:53:57.321 INFO starting a new election, term: 2, raft_id: 2
Nov 06 10:53:57.321 INFO became pre-candidate at term 2, term: 2, raft_id: 2
send raft message to 1 fail, let Raft retry it
Nov 06 10:53:57.322 INFO 2 received message from 2, term: 2, msg: MsgRequestPreVote, from: 2, id: 2, raft_id: 2
Nov 06 10:53:57.323 INFO [logterm: 2, index: 5] sent request to 1, msg: MsgRequestPreVote, term: 2, id: 1, log_index: 5, log_term: 2, raft_id: 2
Nov 06 10:53:57.324 DEBG Sending from 2 to 1, msg: msg_type: MsgRequestPreVote to: 1 term: 3 log_term: 2 index: 5, to: 1, from: 2, raft_id: 2
Nov 06 10:53:57.325 INFO [logterm: 2, index: 5] sent request to 3, msg: MsgRequestPreVote, term: 2, id: 3, log_index: 5, log_term: 2, raft_id: 2
Nov 06 10:53:57.326 DEBG Sending from 2 to 3, msg: msg_type: MsgRequestPreVote to: 3 term: 3 log_term: 2 index: 5, to: 3, from: 2, raft_id: 2
Nov 06 10:53:57.327 DEBG State transition, next-state: PreCandidate, last-state: Follower
Nov 06 10:53:57.332 INFO received from 3, term: 2, msg type: MsgRequestPreVoteResponse, from: 3, raft_id: 2
Nov 06 10:53:57.333 DEBG reset election timeout 18 -> 16 at 0, election_elapsed: 0, timeout: 16, prev_timeout: 18, raft_id: 2
Nov 06 10:53:57.333 INFO became candidate at term 3, term: 3, raft_idsend raft message to 1 fail, let Raft retry it
: 2
Nov 06 10:53:57.334 INFO 2 received message from 2, term: 3, msg: MsgRequestVote, from: 2, id: 2, raft_id: 2
Nov 06 10:53:57.335 INFO [logterm: 2, index: 5] sent request to 1, msg: MsgRequestVote, term: 3, id: 1, log_index: 5, log_term: 2, raft_id: 2
Nov 06 10:53:57.336 DEBG Sending from 2 to 1, msg: msg_type: MsgRequestVote to: 1 term: 3 log_term: 2 index: 5, to: 1, from: 2, raft_id: 2
Nov 06 10:53:57.337 INFO [logterm: 2, index: 5] sent request to 3, msg: MsgRequestVote, term: 3, id: 3, log_index: 5, log_term: 2, raft_id: 2
Nov 06 10:53:57.338 DEBG Sending from 2 to 3, msg: msg_type: MsgRequestVote to: 3 term: 3 log_term: 2 index: 5, to: 3, from: 2, raft_id: 2
Nov 06 10:53:57.339 DEBG State transition, next-state: Candidate, last-state: PreCandidate
Nov 06 10:53:57.344 INFO received from 3, term: 3, msg type: MsgRequestVoteResponse, from: 3, raft_id: 2
Nov 06 10:53:57.345 DEBG reset election timeout 16 -> 19 at 0, election_elapsed: 0, timeout: 19, prev_timeout: 16, raft_id: 2
Nov 06 10:53:57.345 INFO became leader at term 3, [src/raft_node.rs:434] raft_group.raft.raft_log.last_index() + 1 = 7
term: 3, raft_id: 2
Nov 06 10:53:57.345 DEBG Sending from 2 to 1, msg: msg_type: MsgAppend to: 1 log_term: 2 index: 5 entries {term: 3 index: 6} commit: 5, to: 1, from: 2, raft_id: 2
Nov 06 10:53:57.346 DEBG Sending from 2 to 3, msg: msg_type: MsgAppend to: 3 log_term: 2 index: 5 entries {term: 3 index: 6} commit: 5, to: 3, from: 2, raft_id: 2
Nov 06 10:53:57.347 DEBG State transition, next-state: Leader, last-state: Candidate
send raft message to 1 fail, let Raft retry it
Nov 06 10:53:57.409 DEBG committing index 6, index: 6, raft_id: 2
Nov 06 10:53:57.409 DEBG Sending from 2 to 3, msg: msg_type: MsgAppend to: 3 log_term: 3 index: 6 entries {entry_type: EntryConfChange term: 3 index: 7 data: "\030\003"} commit: 6, to: 3, from: 2, raft_id: 2
Nov 06 10:53:57.458 DEBG committing index 7, index: 7, raft_id: 2
Nov 06 10:53:57.458 DEBG Sending from 2 to 3[src/raft_node.rs:343] &cc = node_id: 3
, msg: msg_type: MsgAppend to: 3 log_term: 3 index: 7 commit: 7, tothread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: Exists(3, "voters")', src/libcore/result.rs:1165:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
: 3, from: 2, raft_id: 2
Nov 06 10:53:57.459 DEBG adding node (learner: false) with ID 3 to peers., id: 3, learner: false, raft_id: 2
Nov 06 10:53:57.460 DEBG Inserting voter with id 3, id: 3, raft_id: 2
Nov 06 10:53:57.461 ERRO e: The node 3 already exists in the voters set., raft_id: 2
thread 'actix-rt:worker:1' panicked at 'called `Result::unwrap()` on an `Err` value: "SendError(..)"', src/libcore/result.rs:1165:5
thread 'actix-rt:worker:1' panicked at 'called `Result::unwrap()` on an `Err` value: "SendError(..)"', src/libcore/result.rs:1165:5
stack backtrace:
0: 0x56294b85ced4 - backtrace::backtrace::libunwind::trace::hf9ad636648efb226
at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/libunwind.rs:88
1: 0x56294b85ced4 - backtrace::backtrace::trace_unsynchronized::hed853f204024d151
at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/mod.rs:66
2: 0x56294b85ced4 - std::sys_common::backtrace::_print_fmt::h540be03650997d7f
at src/libstd/sys_common/backtrace.rs:77
3: 0x56294b85ced4 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h859895b5f65feccf
at src/libstd/sys_common/backtrace.rs:61
4: 0x56294b8805ac - core::fmt::write::h55a2a37a9ae04dad
at src/libcore/fmt/mod.rs:1028
5: 0x56294b857d87 - std::io::Write::write_fmt::h55b418b8331f602e
at src/libstd/io/mod.rs:1412
6: 0x56294b85f44e - std::sys_common::backtrace::_print::h01cf1aa97e3bff6e
at src/libstd/sys_common/backtrace.rs:65
7: 0x56294b85f44e - std::sys_common::backtrace::print::h992feaba39b9b6be
at src/libstd/sys_common/backtrace.rs:50
8: 0x56294b85f44e - std::panicking::default_hook::{{closure}}::hf519e5920434d96d
at src/libstd/panicking.rs:188
9: 0x56294b85f141 - std::panicking::default_hook::ha5560763526d8f2a
at src/libstd/panicking.rs:205
10: 0x56294b85fb4b - std::panicking::rust_panic_with_hook::h23e3b045936d3e0a
at src/libstd/panicking.rs:464
11: 0x56294b85f6ee - std::panicking::continue_panic_fmt::h22cad8a6b4a9a888
at src/libstd/panicking.rs:373
12: 0x56294b85f5d6 - rust_begin_unwind
at src/libstd/panicking.rs:302
13: 0x56294b87c4ce - core::panicking::panic_fmt::h9f376b306ccc647c
at src/libcore/panicking.rs:139
14: 0x56294b87c5c7 - core::result::unwrap_failed::ha857e2e239846901
at src/libcore/result.rs:1165
15: 0x56294a38d6d4 - core::result::Result<T,E>::unwrap::h2b62ba69aa6e5d84
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/result.rs:933
16: 0x56294a3e2a1a - <uring::UrSocket as actix::actor::Actor>::stopped::h39bc887b77fa9f3d
at src/main.rs:129
17: 0x56294a31dec8 - <actix::contextimpl::ContextFut<A,C> as futures::future::Future>::poll::h20f0330cfe0a782f
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/actix-0.8.3/src/contextimpl.rs:438
18: 0x56294a31c705 - <actix::contextimpl::ContextFut<A,C> as core::ops::drop::Drop>::drop::h7575ff147dd421c0
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/actix-0.8.3/src/contextimpl.rs:220
19: 0x56294a36ad01 - core::ptr::real_drop_in_place::h0f7834c8d060f1e5
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
20: 0x56294a36b521 - core::ptr::real_drop_in_place::h15157f2273174a4a
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
21: 0x56294a382eb1 - core::ptr::real_drop_in_place::hfd3912dcfbeb53e5
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
22: 0x56294a379348 - core::ptr::real_drop_in_place::h9bdb1342952ae961
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
23: 0x56294a3715bb - core::ptr::real_drop_in_place::h4cd96d3c8d8904ab
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
24: 0x56294a377cd5 - core::ptr::real_drop_in_place::h89d944f4395f2954
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
25: 0x56294a380383 - core::ptr::real_drop_in_place::he465ebf8b5d81fc4
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
26: 0x56294a36e565 - core::ptr::real_drop_in_place::h303c9666062b1852
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
27: 0x56294a373439 - core::ptr::real_drop_in_place::h639d6c80397d3994
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
28: 0x56294a37c476 - core::ptr::real_drop_in_place::hbd286707c9379e36
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
29: 0x56294a369842 - core::ptr::real_drop_in_place::h050601c6fa9adbb3
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
30: 0x56294a382e91 - core::ptr::real_drop_in_place::hfd1b973e3f3405b7
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
31: 0x56294a37d47f - core::ptr::real_drop_in_place::hc6257ba62dd3ee4e
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
32: 0x56294a3800a1 - core::ptr::real_drop_in_place::he21e27ce352065f6
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
33: 0x56294a37ca48 - core::ptr::real_drop_in_place::hbfdd5e57c1ea6a8a
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
34: 0x56294a36fd81 - core::ptr::real_drop_in_place::h3e1b516237dc7899
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
35: 0x56294a36f1f4 - core::ptr::real_drop_in_place::h384c3f60fbd1cfd3
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
36: 0x56294aeb48b8 - core::ptr::real_drop_in_place::h25dd0baed7f0f375
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
37: 0x56294aeb4fe1 - core::ptr::real_drop_in_place::hdd6aac9bbf1b25da
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
38: 0x56294aeb4851 - core::ptr::real_drop_in_place::h1bc710e61ac51dd8
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
39: 0x56294aeb488f - core::ptr::real_drop_in_place::h20559ea5933991c4
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
40: 0x56294aeb6d47 - core::mem::drop::h7804235b62a558b2
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/mem/mod.rs:704
41: 0x56294ae10035 - tokio_current_thread::scheduler::release_node::h72a3b1970158c2e8
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-current-thread-0.1.6/src/scheduler.rs:382
42: 0x56294ae0ff4f - <tokio_current_thread::scheduler::Scheduler<U>::tick::Bomb<U> as core::ops::drop::Drop>::drop::{{closure}}::hf313f7599952abe0
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-current-thread-0.1.6/src/scheduler.rs:271
43: 0x56294ae0cbe4 - tokio_current_thread::Borrow<U>::enter::{{closure}}::{{closure}}::h93152cf040b0cfc6
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-current-thread-0.1.6/src/lib.rs:788
44: 0x56294ae0c593 - tokio_current_thread::CurrentRunner::set_spawn::hb299958eebe1449b
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-current-thread-0.1.6/src/lib.rs:825
45: 0x56294ae0ca85 - tokio_current_thread::Borrow<U>::enter::{{closure}}::h8bb87fc431b3369e
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-current-thread-0.1.6/src/lib.rs:788
46: 0x56294adf4b1a - std::thread::local::LocalKey<T>::try_with::h4b4da45cb0953780
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libstd/thread/local.rs:262
47: 0x56294adf3568 - std::thread::local::LocalKey<T>::with::hbe6f12a20964b981
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libstd/thread/local.rs:239
48: 0x56294ae0c61b - tokio_current_thread::Borrow<U>::enter::h0d6f514f2853732f
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-current-thread-0.1.6/src/lib.rs:786
49: 0x56294ae0feb7 - <tokio_current_thread::scheduler::Scheduler<U>::tick::Bomb<U> as core::ops::drop::Drop>::drop::h60e904c13e6346e3
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-current-thread-0.1.6/src/scheduler.rs:271
50: 0x56294ae066a5 - core::ptr::real_drop_in_place::h87d3db269008cc8c
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ptr/mod.rs:175
51: 0x56294ae11879 - tokio_current_thread::scheduler::Scheduler<U>::tick::h8ba9f0e91fe9beb0
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-current-thread-0.1.6/src/scheduler.rs:342
52: 0x56294ae0ce2d - tokio_current_thread::Entered<P>::tick::h015233dbf859aa87
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-current-thread-0.1.6/src/lib.rs:612
53: 0x56294ae0cff0 - tokio_current_thread::Entered<P>::block_on::h5c966374bfc009d6
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-current-thread-0.1.6/src/lib.rs:502
54: 0x56294ae0168e - actix_rt::runtime::Runtime::block_on::{{closure}}::ha255279c9e9e0678
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/actix-rt-0.2.5/src/runtime.rs:128
55: 0x56294ae01d61 - actix_rt::runtime::Runtime::enter::{{closure}}::{{closure}}::{{closure}}::{{closure}}::h71c41cadaf16c741
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/actix-rt-0.2.5/src/runtime.rs:168
56: 0x56294ae2851d - tokio_executor::global::with_default::{{closure}}::hf2f1bdd7887fd578
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-executor-0.1.8/src/global.rs:209
57: 0x56294adf6448 - std::thread::local::LocalKey<T>::try_with::heda6599f2047da06
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libstd/thread/local.rs:262
58: 0x56294adf348c - std::thread::local::LocalKey<T>::with::hb8098ea981ebb349
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libstd/thread/local.rs:239
59: 0x56294ae280a2 - tokio_executor::global::with_default::hc1252736b4e426c1
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-executor-0.1.8/src/global.rs:178
60: 0x56294ae01f7e - actix_rt::runtime::Runtime::enter::{{closure}}::{{closure}}::{{closure}}::h9de40e2003eada88
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/actix-rt-0.2.5/src/runtime.rs:166
61: 0x56294ae19dc2 - tokio_timer::timer::handle::with_default::{{closure}}::h6e3dd5bd04bbe4cc
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-timer-0.2.11/src/timer/handle.rs:101
62: 0x56294adf3a28 - std::thread::local::LocalKey<T>::try_with::h04334be1dd5a6501
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libstd/thread/local.rs:262
63: 0x56294adf2c1c - std::thread::local::LocalKey<T>::with::h1110cb2dd7a5e41a
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libstd/thread/local.rs:239
64: 0x56294ae19955 - tokio_timer::timer::handle::with_default::h6e7191e9227a3395
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-timer-0.2.11/src/timer/handle.rs:84
65: 0x56294ae020a9 - actix_rt::runtime::Runtime::enter::{{closure}}::{{closure}}::hcb4ffcbdedde3879
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/actix-rt-0.2.5/src/runtime.rs:159
66: 0x56294adf0df8 - tokio_timer::clock::clock::with_default::{{closure}}::h4d2baf6a8640417e
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-timer-0.2.11/src/clock/clock.rs:137
67: 0x56294adf4844 - std::thread::local::LocalKey<T>::try_with::h44e90ccbce8ba7a0
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libstd/thread/local.rs:262
68: 0x56294adf38ad - std::thread::local::LocalKey<T>::with::hf8efabeded619c12
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libstd/thread/local.rs:239
69: 0x56294adf0a7e - tokio_timer::clock::clock::with_default::h744d246f9ee1c3c7
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-timer-0.2.11/src/clock/clock.rs:117
70: 0x56294ae020fd - actix_rt::runtime::Runtime::enter::{{closure}}::h25beeae16b0aab2b
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/actix-rt-0.2.5/src/runtime.rs:158
71: 0x56294ae1a53f - tokio_reactor::with_default::{{closure}}::h047eea81de149d26
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-reactor-0.1.10/src/lib.rs:237
72: 0x56294adf5544 - std::thread::local::LocalKey<T>::try_with::h983b47c211a53258
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libstd/thread/local.rs:262
73: 0x56294adf2b9d - std::thread::local::LocalKey<T>::with::h0ddb474724c356fd
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libstd/thread/local.rs:239
74: 0x56294ae1a318 - tokio_reactor::with_default::hedeceb1f1204f41e
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-reactor-0.1.10/src/lib.rs:217
75: 0x56294ae0196c - actix_rt::runtime::Runtime::enter::he0a8a92e295c18e4
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/actix-rt-0.2.5/src/runtime.rs:157
76: 0x56294ae01478 - actix_rt::runtime::Runtime::block_on::h5a965072a66504f6
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/actix-rt-0.2.5/src/runtime.rs:126
77: 0x56294ae26303 - actix_rt::arbiter::Arbiter::new::{{closure}}::h03e49a8780bdb49d
at /home/hoverbear/.cargo/registry/src/github.com-1ecc6299db9ec823/actix-rt-0.2.5/src/arbiter.rs:113
78: 0x56294ae1f0f5 - std::sys_common::backtrace::__rust_begin_short_backtrace::h11b303417e3a8564
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libstd/sys_common/backtrace.rs:129
79: 0x56294ae040c1 - std::thread::Builder::spawn_unchecked::{{closure}}::{{closure}}::he7652596c7722712
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libstd/thread/mod.rs:469
80: 0x56294adf6664 - <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::h8613466756385bec
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libstd/panic.rs:317
81: 0x56294adf1911 - std::panicking::try::do_call::h711ea6f48f6c4718
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libstd/panicking.rs:287
82: 0x56294b86414a - __rust_maybe_catch_panic
at src/libpanic_unwind/lib.rs:78
83: 0x56294adf1768 - std::panicking::try::h7cbe878e6ead05a2
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libstd/panicking.rs:265
84: 0x56294adf69d6 - std::panic::catch_unwind::hf74332f300029515
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libstd/panic.rs:396
85: 0x56294ae03eb2 - std::thread::Builder::spawn_unchecked::{{closure}}::h7970385913689ac2
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libstd/thread/mod.rs:468
86: 0x56294ae04434 - core::ops::function::FnOnce::call_once{{vtable.shim}}::hf90741fee7c7944d
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/libcore/ops/function.rs:227
87: 0x56294b85100f - <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once::hab0345739862a0e3
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/liballoc/boxed.rs:942
88: 0x56294b863210 - <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once::h97269b0dcf5f4700
at /rustc/b520af6fd515b186caed436d75162a42aa183d95/src/liballoc/boxed.rs:942
89: 0x56294b863210 - std::sys_common::thread::start_thread::h6d1cb01c00583451
at src/libstd/sys_common/thread.rs:13
90: 0x56294b863210 - std::sys::unix::thread::Thread::new::thread_start::h2c73c5e5c4fa31f9
at src/libstd/sys/unix/thread.rs:79
91: 0x7f4cd68f4669 - start_thread
92: 0x7f4cd6800323 - clone
93: 0x0 - <unknown>
thread panicked while panicking. aborting.
Illegal instruction |
Seems this bit is what's important:
|
Ja, which is odd, the only time we fire a AddNode event if the raft state says it's an unknown node: https://github.com/wayfair-incubator/uring/blob/master/src/main.rs#L557 since node 1 did see 2 and 3 before and persists the state to disk it should never get there - I don't see any add-node messages fire either. |
Hmm, so If you want to add a node you can try in a way like: raft-rs/examples/five_mem_node/main.rs Lines 405 to 419 in d34f693
This is the correct way to add something to the log. But I'm not sure you want to try to do that in this situation. :) Just commenting out that code seems to remove the panic... It does seem like the remaining followers fail to correctly hold a new election though. I'm going to let you take a look and let me know if this helps. :) |
Also: I'm sorry the docs aren't so clear. :( We are trying to improve it and would definitely love any concrete issues/prs with improvements. |
No reason to be sorry, I think everyone who has ever worked on a project of any kind of complexity can understand, it’s hard to document what you know since everything is obvious at that point ;) To the code, we started with a direct copy but had to add the second change even or the other nodes didn’t learn about the first node. But the method is even after that still the same - it gets published over a proposal and agreed on not injected in the logs. But I’ll try to remove / replace the code later perhaps the issue it was added for originally was a different one. That said it’s a bit different from the memory nodes since they all exist in their own world and are networked do we can’t just join 5 nudes at boot time, they got to grow as they connect |
You should definitely be able to grow/shrink your Raft cluster online. :) |
So I tried the suggestion of commenting out the code, but that just doesn't crash because it never creates a group. Node 1 knows itself, as a voter, but neither node 2 or three know of any other node Node1
Node2
Node3
I'll try to make an explicit 'add node' action that isn't connected to the join on the clsuter perhaps that works better |
So I changed how chance events are send and made it explicit. (tremor-rs/uring@a791b33) That shifts the problem slightly, it now goes to (after restarring node 1 (initial leade), and node 3 (new leader) - on node 1):
which is a bit odd since none of the nodes has 6 as a index. 1 (before it crashed)
2
3
|
Hmmm, that's a bit of a puzzle isn''t it! I see that index 5 is done: Nov 07 12:17:02.582 INFO [commit: 5, term: 3] restored snapshot [index: 5, term: 3], snapshot_term: 3, snapshot_index: 5, commit: 5, term: 3, raft_id: 1 |
Yup, but only on node 2 & 3, node 2 is still at 4 - I suspect 5 was the election of the new leader when 1 was killed so it makes sense that it doesn't have it yet. I'm still unsure where it gets 6 from |
Yup I misread and deleted the comment :-P |
:D ah sorry, I just saw the mail. |
@Licenser If you don't call |
Thanks! That's an interesting observation. I didn't try that yet - it's a bit late here (close to midnight) but I'll give that a try tomorrow and keep digging. The failing to send message might be because the node was booted without knowing it's peers that's a easy to fix thing (famous last words) at least easy to work around for the same of getting the raft-rs use case working enough to have a minimal functioning kv store in it by just passing all peers to all nodes with more |
Yeah, let's get this working! :) Would love to see more raft-rs derivatives and this is informing how we can better document things. (Un)Fortunately we're updating our membership change mechanisms (hello Joint Consensus) hopefully soon once we resolve some lingering safety issues with the etcd folks. If you need help learning to use Raft @LucioFranco made a simple KV here: https://github.com/LucioFranco/kv, and the TiKV Raftstore is a maximal example: https://github.com/tikv/tikv/tree/master/src/raftstore |
Learning raft isn't really my problem I'm not unfamiliar with it. Working through the assumptions in raft-rs is where I am stuck :(. I looked at @LucioFranco and as far as I can tell it suffers from the same problems. bootstrap 1, join 2 1, join 3 1, kill 1, join 1 2 ends up in "The node 1 already exists in the voters set." I gave not adding node 1 a try and it ends with the same issue of trying to access index 6, it just takes one more kill to get there, (start1, start2, start3, join2, join3, kill1, start1, kill2(was leader), join2 => tries to access index 6) |
So I've come full circle with this. The original 'node already known problem' back where I am. The solution to that was to not load the More general it seems that nodes only know about nodes added after them, so node 1 after a restart knows about 2/3, node 2 after a restart knows about 3, node 3 after a restart knows none. |
At the Here's how the process should go:
So you'd do remove/add when you were decommisioning a host, and if you were just taking a host down for maintenance or a reboot, you can just leave it in the membership pool. :) |
kv needs to be called with either
This is where it gets problematic, after recovery the nodes either run into a "node already known" problem when the ConfState is persisted to disk and re-loaded or it forgets about the nodes. That's where I'm stuck at the moment.
I know, there are no calls to add outside of the PUT node/X and no calls to remove at all. That's the mystery. |
So when 1 recovers and it loads from the hard state, it should already know about 1,2,3 already no? Why is it adding nodes? |
That's where I'm stuck. There is no code path in uring that would add the nodes again after a reboot the call to Where it explodes is here: Line 74 in d34f693
|
Hmm.... I wrote a test at one point that simulated power cycles with persisted state: raft-rs/harness/tests/integration_cases/test_membership_changes.rs Lines 1534 to 1566 in faa4384
That test suite was for Joint Consensus, but the poweroff/on behavior should be similar. Can you help me find where you're loading the storage from your save state? When node 1 reboots the I see https://github.com/wayfair-incubator/uring/search?q=RawNode%3A%3Anew&unscoped_q=RawNode%3A%3Anew shows loading via I see in https://github.com/wayfair-incubator/uring/blob/bd7dac7dd986bf087e6c3216c2277ffc06e6d253/src/storage.rs#L223-L244, perhaps this could be the problem? According to https://docs.rs/raft/0.6.0-alpha/raft/storage/trait.Storage.html#tymethod.initial_state if the initial state returned in Let's look at TiKV for an example: https://github.com/tikv/tikv/blob/dcde0919aec6ea6b2acf4ed4e73c1634fe5e7412/src/raftstore/store/peer_storage.rs#L451-L454 Here is the |
So the 'load from disk' is currently commented out: https://github.com/wayfair-incubator/uring/blob/master/src/storage.rs#L230 since it caused crashes for "node already exists". There are 3 options I tried in there:
The TiKV code and #2/#3 look the same to just that instead of storing voters and learners in a seperate structure it just persists the ConfState state to disk (I tried only taking voters and learners out there is no change) it still causes the "Voter already known..." crash If we look at where AddNode is called in the codebase There is only one place (617-633 is a comment: https://github.com/wayfair-incubator/uring/blob/master/src/main.rs#L612-L640) The place it's called the (Badly named) PutNode message is https://github.com/wayfair-incubator/uring/blob/master/src/main.rs#L488-L504 that is called from the (also badly named, sorry) post_node handler (https://github.com/wayfair-incubator/uring/blob/master/src/main.rs#L128-L141) for So I think the request to re-add a node that is already know has to come somewhere from within raft-rs unless I'm missing something. |
Sorry I forgot to add context. |
So I could see I see a |
I'd be a bit worried to use restore() directly. It looks like a internal method that should be trigged by the FSM or am I wrong there? And node 1 for example never receives a snapshot so there is nothing to initialise it with, unless I'd store the proposed snapshot somewhere, and that feels just wrong as it would bypass the consensus on it . |
restore() also isn't called anywhere in tikv that I can find :( |
I think I've solved the mystery. tremor-rs/uring@347c3a1 is the relevant commit. The issue was caused by raft-rs replaying logs that were already applied to the raft state machine. I would suggest the following:
The reason for 1. is that persisting hard state and truncating logs can't be done in an atomic operation (at least not in any storage I can think of). So the 'safe' way of persisting hard state would be: First persist hardstate, then truncate log. That case if the system crashes between the two steps there are logs prior to hard state but they do no harm as they'll be ignored and truncated on the next hard state update. The reason for 2. is that it is not obvious form any of the examples that this needs to happen, I only found this by going side by side of code & raft-paper to check on all logic and the sentence "The leader decides when it is safe to apply a log entry to the state machine; such an entry is called committed." ticked me off that either logs are not treated as idempotent (which they don't seem to be) and have to be manually truncated. The reason for 3. this is complicated enough that someone using raft-rs doesn't easily come to the conclusion of this being required, even with prior knowledge of raft. So having it in the examples, even so the memory nodes can't be restarted, will help adoption. The reason for 4. the snapshot in the examples doesn't seem to include the 'application' data, with logs being truncated application state can't be recovered via log replay, so it needs to be transfered via snapshot (which is sensible!) but the example relies on log replay to active this which obfuscates that it doesn't work in a real environment. The reason for 5. without idempotent logs, log truncation is essential part of the raft logic, so it should be handled by the library and not forced onto the user, the implementation of truncation will differ on use case so callbacks seem sensible. I'm happy to open separate tickets and (for some of the issue PR's) but wanted to discuss it first. |
On a second thought, it feels more correct to treat all raft related logs as idempotent, see section 5.5 of the raft paper - at the point of a restart a node is a follower, and replaying the raft log is effectively replaying the RPC calls to it. Making the assumption ( I suspect raft-rs enforces this) that hard state is only ever committed to disk after the logs are written replaying all logs up to hard state should resolve in the same config state as before (i.e. re-adding a node does no harm during replay, as if it was deleted later that deletion will be replayed as well). The only issue would be if the hard state commit is not larger then the last log (at which point it feels like a good time to hard crash and call that node bricked) |
Sleeping over it I get the feeling that truncating in commit might be wasteful. It forces a complete snapshot for every time a leader needs to catch up a follower. While that will work for small working sets it will probably break down quickly. So I think it is back to raft-rs ignoring logs it already committed (or treating them as idempotent) and not replaying them and moving truncation logic into raft-rs as the Leader should eventually get to know the last log committee to all followers |
Hi @Licenser , sorry for the radio silence, holiday weekend. :( I'm glad you seem to have it resolved! I wonder if there is still some lingering problem though. @hicqu do you think you could have a look? Perhaps we missed something? Specific issue tickets are always welcome, @Licenser, I think that's an easier way to have a discussion about multiple issues. :) Also welcome on tikv.org/chat! |
Describe the bug
Re-joining a leader to a cluster crashes with the attempt to re-add already known node to the progress state.
To Reproduce
I set up a mini demo to replicate:
cargo run -- -e 127.0.0.1:8081 -i 1
cargo run -- -e 127.0.0.1:8082 -i 2 -p 127.0.0.1:8081
cargo run -- -e 127.0.0.1:8083 -i 3 -p 127.0.0.1:8081
CTRL+C
cargo run -- -e 127.0.0.1:8081 -i 01 -p 127.0.0.1:8082
Expected behavior
Stopping and starting nodes in a cluster should be handled gracefully
System information
(probably not relevant)
Additional context
The shared repo is a minimal demo app trying to put raft-rs into a re-state for a reft cluster.
The text was updated successfully, but these errors were encountered: