-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Follower replication #98
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: LintianShi [email protected] Signed-off-by: LintianShi <[email protected]>
Signed-off-by: LintianShi <[email protected]>
Signed-off-by: LintianShi <[email protected]>
Signed-off-by: LintianShi <[email protected]>
Signed-off-by: LintianShi <[email protected]>
Signed-off-by: LintianShi <[email protected]>
2c6dddd
to
c10113d
Compare
text/0098-follower-replication.md
Outdated
|
||
### Preparation | ||
|
||
Every peer must have a field indicating which AZ it belongs to. This field does not have to be persistent. As mentioned in the comment, TiKV has the knowledge which AZ the store locates. We need to design the interface so that `RawNode` in raft-rs can get the AZ information in TiKV. Maybe we can initialize Peer with AZ information, and then Peer initializes the `RawNode`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to design the interface so that `RawNode` in raft-rs can get the AZ information in TiKV. Maybe we can initialize Peer with AZ information, and then Peer initializes the `RawNode`.
This should be settled in the design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will take it into consideration.
Signed-off-by: LintianShi <[email protected]>
Signed-off-by: LintianShi <[email protected]>
text/0098-follower-replication.md
Outdated
@@ -0,0 +1,248 @@ | |||
# Follower Replication for Write-Flow | |||
|
|||
<!-- - RFC PR: https://github.com/tikv/rfcs/pull/0000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be updated.
We design a group broadcast schema which is proposed by this [rfc](https://github.com/tikv/rfcs/blob/362865abe36b5771d525310b449c07a57da7ef85/text/2019-11-13-follower-replication.md). | ||
|
||
The main idea is that the leader only sends `MsgBroadcast` to the agent of a certain AZ. `MsgBroadcast` contains log entries that the leader replicates to the agent, and several ranges of log that the leader replicates to other followers and learners. | ||
Once the agent receives `MsgBroadcast`, it appends log entries in the message and assembles `MsgAppend` according to the ranges in `MsgBroadcast` with its own log. Then the agent sends `MsgAppend` to other followers/learners in the same AZ. Thus, the leader can avoid sending `MsgAppend` across AZs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Paragraph should be separate by one blank line.
We design a group broadcast schema which is proposed by this [rfc](https://github.com/tikv/rfcs/blob/362865abe36b5771d525310b449c07a57da7ef85/text/2019-11-13-follower-replication.md). | ||
|
||
The main idea is that the leader only sends `MsgBroadcast` to the agent of a certain AZ. `MsgBroadcast` contains log entries that the leader replicates to the agent, and several ranges of log that the leader replicates to other followers and learners. | ||
Once the agent receives `MsgBroadcast`, it appends log entries in the message and assembles `MsgAppend` according to the ranges in `MsgBroadcast` with its own log. Then the agent sends `MsgAppend` to other followers/learners in the same AZ. Thus, the leader can avoid sending `MsgAppend` across AZs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be conflict with log async fetch.
text/0098-follower-replication.md
Outdated
|
||
In TiKV architecture, TiKV contains multiple peers that belong to different raft groups. So TiKV needs to update zone information to each peer. | ||
|
||
We add an extra field `peer_zone` in the peer, a hashmap that records store_id -> AZ. Every time the zone information stored in TiKV is updated, a peer message `UpdataZoneInfo` is generated. Then it will be broadcast to all peer in this TiKV server. When a peer receives `UpdataZoneInfo`, it will update its `peer_zone`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not put it to PollContext
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the zone information only flows from TiKV server to peers, and peers never modify it. So it is read-only for peers. If the zone information is updated, Node can use a PeerMsg
to update zone information stored in each Peer
. PollContext
is more like a layer where peers interact with TiKV server. If the zone information is stored in PollContext
and shared with multiple peers, we need to consider the concurrency conflict.
However, putting zone information in PollContext
uses less memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think about it again. I think zone information should be put on PollContext. Otherwise, we have to handle zone information migration when split or merge region.
/cc @Fullstop000 |
Signed-off-by: LintianShi <[email protected]>
A design of follower replication