RFC: Follower replication#98
Conversation
Signed-off-by: LintianShi lintian.shi@pingcap.com Signed-off-by: LintianShi <lintian.shi@pingcap.com>
Signed-off-by: LintianShi <lintian.shi@pingcap.com>
Signed-off-by: LintianShi <lintian.shi@pingcap.com>
Signed-off-by: LintianShi <lintian.shi@pingcap.com>
Signed-off-by: LintianShi <lintian.shi@pingcap.com>
Signed-off-by: LintianShi <lintian.shi@pingcap.com>
2c6dddd to
c10113d
Compare
|
|
||
| ### Preparation | ||
|
|
||
| Every peer must have a field indicating which AZ it belongs to. This field does not have to be persistent. As mentioned in the comment, TiKV has the knowledge which AZ the store locates. We need to design the interface so that `RawNode` in raft-rs can get the AZ information in TiKV. Maybe we can initialize Peer with AZ information, and then Peer initializes the `RawNode`. |
There was a problem hiding this comment.
We need to design the interface so that `RawNode` in raft-rs can get the AZ information in TiKV. Maybe we can initialize Peer with AZ information, and then Peer initializes the `RawNode`.
This should be settled in the design.
There was a problem hiding this comment.
I will take it into consideration.
Signed-off-by: LintianShi <lintian.shi@pingcap.com>
| @@ -0,0 +1,248 @@ | |||
| # Follower Replication for Write-Flow | |||
|
|
|||
| <!-- - RFC PR: https://github.com/tikv/rfcs/pull/0000 | |||
| We design a group broadcast schema which is proposed by this [rfc](https://github.com/tikv/rfcs/blob/362865abe36b5771d525310b449c07a57da7ef85/text/2019-11-13-follower-replication.md). | ||
|
|
||
| The main idea is that the leader only sends `MsgBroadcast` to the agent of a certain AZ. `MsgBroadcast` contains log entries that the leader replicates to the agent, and several ranges of log that the leader replicates to other followers and learners. | ||
| Once the agent receives `MsgBroadcast`, it appends log entries in the message and assembles `MsgAppend` according to the ranges in `MsgBroadcast` with its own log. Then the agent sends `MsgAppend` to other followers/learners in the same AZ. Thus, the leader can avoid sending `MsgAppend` across AZs. |
There was a problem hiding this comment.
Paragraph should be separate by one blank line.
| We design a group broadcast schema which is proposed by this [rfc](https://github.com/tikv/rfcs/blob/362865abe36b5771d525310b449c07a57da7ef85/text/2019-11-13-follower-replication.md). | ||
|
|
||
| The main idea is that the leader only sends `MsgBroadcast` to the agent of a certain AZ. `MsgBroadcast` contains log entries that the leader replicates to the agent, and several ranges of log that the leader replicates to other followers and learners. | ||
| Once the agent receives `MsgBroadcast`, it appends log entries in the message and assembles `MsgAppend` according to the ranges in `MsgBroadcast` with its own log. Then the agent sends `MsgAppend` to other followers/learners in the same AZ. Thus, the leader can avoid sending `MsgAppend` across AZs. |
There was a problem hiding this comment.
This may be conflict with log async fetch.
|
|
||
| In TiKV architecture, TiKV contains multiple peers that belong to different raft groups. So TiKV needs to update zone information to each peer. | ||
|
|
||
| We add an extra field `peer_zone` in the peer, a hashmap that records store_id -> AZ. Every time the zone information stored in TiKV is updated, a peer message `UpdataZoneInfo` is generated. Then it will be broadcast to all peer in this TiKV server. When a peer receives `UpdataZoneInfo`, it will update its `peer_zone`. |
There was a problem hiding this comment.
I think the zone information only flows from TiKV server to peers, and peers never modify it. So it is read-only for peers. If the zone information is updated, Node can use a PeerMsg to update zone information stored in each Peer. PollContext is more like a layer where peers interact with TiKV server. If the zone information is stored in PollContext and shared with multiple peers, we need to consider the concurrency conflict.
However, putting zone information in PollContext uses less memory.
There was a problem hiding this comment.
I think about it again. I think zone information should be put on PollContext. Otherwise, we have to handle zone information migration when split or merge region.
|
/cc @Fullstop000 |
Signed-off-by: LintianShi <lintian.shi@pingcap.com>
A design of follower replication