Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raftstore: reduce message flush #5475

Merged
merged 5 commits into from
Oct 10, 2019
Merged

Conversation

BusyJay
Copy link
Member

@BusyJay BusyJay commented Sep 17, 2019

What have you changed?

Technically, a thread only needs to flush messages when it's going to
block. This PR add hooks to every block point instead of flushing at
the end of every ready.

What is the type of the changes?

  • Improvement

How is the PR tested?

integration tests

Does this PR affect documentation (docs) or should it be mentioned in the release notes?

No.

Does this PR affect tidb-ansible?

No.

Benchmark result if necessary (optional)

Benchmark shows that when there are about 25k idle regions, raftstore CPU is reduced from 55% to 40%, and grpc is reduced from 60% to 40%。

Tenically, a thread only needs to flush messages when it's going to
sleep or block. This PR add hooks to every block point instead of
flushing at the end of every ready.

Signed-off-by: Jay Lee <[email protected]>
@BusyJay BusyJay added sig/raft Component: Raft, RaftStore, etc. component/performance Component: Performance labels Sep 17, 2019
@BusyJay BusyJay added needs-cherry-pick-release-3.0 Type: Need cherry pick to release 3.0 needs-cherry-pick-release-3.1 Type: Need cherry pick to release 3.1 labels Sep 17, 2019
@BusyJay
Copy link
Member Author

BusyJay commented Sep 17, 2019

/bench

@siddontang
Copy link
Contributor

amazing @BusyJay

Benchmark shows that when there are about 25k idle regions, raftstore CPU is reduced from 55% to 40%, and grpc is reduced from 60% to 40%。

Is it tested without Hibernate region?

@BusyJay
Copy link
Member Author

BusyJay commented Sep 17, 2019

Yes.

@BusyJay
Copy link
Member Author

BusyJay commented Sep 17, 2019

/bench

@overvenus
Copy link
Member

This PR add hooks to every block point instead of flushing at the end of every ready.

Sounds like a large batch may increase commit duration? 🤔

@overvenus overvenus requested a review from NingLin-P September 17, 2019 12:51
@siddontang
Copy link
Contributor

maybe we can use commit duration metric to verify this.

@@ -252,6 +252,9 @@ pub trait PollHandler<N, C> {

/// This function is called at the end of every round.
fn end(&mut self, batch: &mut [Box<N>]);

/// This function is called when batch system is going to sleep.
fn pause(&mut self) {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer name it yield.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? PollHandler can't do the yield job. And in my opinion, yield usually means short switch.

@siddontang
Copy link
Contributor

how about if you enable hibernate region?

@sre-bot
Copy link
Contributor

sre-bot commented Sep 17, 2019

@@                               Benchmark Diff                               @@
================================================================================
tidb: 5f6b22cf90b19a5f6659a03aa34dcb49e4758a8e
--- tikv: f718999e17b7fe6fa6438a5fd1e5efbcef5490c9
+++ tikv: eecd6c398c515b347207c9e3b4a669ad1895c852
pd: cfa5706aec6466da3d2a2c4d0c6e713779fde67e
================================================================================
test-1: < oltp_read_write >
    * QPS : 36755.81 ± 0.2145% (std=61.54) delta: -0.19%
    * AvgMs : 139.84 ± 0.2074% (std=0.23) delta: 0.19%
    * PercentileMs99 : 262.64 ± 0.0000% (std=0.00) delta: 0.72%
            
test-2: < oltp_point_select >
    * QPS : 82898.92 ± 0.5791% (std=285.27) delta: -0.21%
    * AvgMs : 3.09 ± 0.5829% (std=0.01) delta: 0.32%
    * PercentileMs99 : 5.92 ± 1.1141% (std=0.05) delta: 0.41%
            
test-3: < oltp_insert >
    * QPS : 21244.06 ± 0.3751% (std=50.63) delta: 0.09%
    * AvgMs : 12.05 ± 0.3819% (std=0.03) delta: -0.07%
    * PercentileMs99 : 43.55 ± 2.1495% (std=0.58) delta: -0.36%
            
test-4: < oltp_update_index >
    * QPS : 16904.13 ± 0.1551% (std=18.65) delta: 0.09%
    * AvgMs : 15.12 ± 0.3804% (std=0.04) delta: -0.25%
    * PercentileMs99 : 48.69 ± 1.0721% (std=0.43) delta: 0.00%
            
test-5: < oltp_update_non_index >
    * QPS : 29063.10 ± 0.0541% (std=11.12) delta: 0.08%
    * AvgMs : 8.81 ± 0.0757% (std=0.00) delta: 0.00%
    * PercentileMs99 : 30.81 ± 0.0000% (std=0.00) delta: -1.79%
            

https://perf.pingcap.com

@BusyJay
Copy link
Member Author

BusyJay commented Sep 17, 2019

This PR add hooks to every block point instead of flushing at the end of every ready.

Sounds like a large batch may increase commit duration? 🤔

No, raft client will flush internally every time it receives 8 messages.

@siddontang
Copy link
Contributor

it receives 8 messages

why 8 messages? can we configure it?

@siddontang
Copy link
Contributor

seem the performance not increased, interesting.

@zhangjinpeng87
Copy link
Member

zhangjinpeng87 commented Sep 20, 2019

seem the performance not increased, interesting.

Maybe the dataset is not large enough.

@BusyJay
Copy link
Member Author

BusyJay commented Sep 20, 2019

why 8 messages? can we configure it?

It's hard coded by batch raft. Further plan is to refactor the whole batch mechanism.

@BusyJay
Copy link
Member Author

BusyJay commented Sep 26, 2019

PTAL

if self.poll_ctx.need_flush_trans {
if self.poll_ctx.need_flush_trans
&& (!self.poll_ctx.kv_wb.is_empty() || !self.poll_ctx.raft_wb.is_empty())
{
self.poll_ctx.trans.flush();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems messages are flushed before disk write. If the messsages contain vote messages, could the peer vote twice in a term if the machine fail after here and before disk write?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are pending messages, it means they are from leader, it's OK to flush before writing disk. Also related to tikv/raft-rs#292.

Signed-off-by: Jay Lee <[email protected]>
@hicqu
Copy link
Contributor

hicqu commented Oct 8, 2019

LGTM.

@BusyJay BusyJay added status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2. labels Oct 10, 2019
@sre-bot
Copy link
Contributor

sre-bot commented Oct 10, 2019

/run-all-tests

@gengliqi gengliqi merged commit 5d6686b into tikv:master Oct 10, 2019
@BusyJay BusyJay deleted the reduce-flush-master branch October 10, 2019 05:41
@sre-bot
Copy link
Contributor

sre-bot commented Oct 10, 2019

cherry pick to release-3.0 failed

@sre-bot
Copy link
Contributor

sre-bot commented Oct 10, 2019

cherry pick to release-3.1 failed

BusyJay added a commit to BusyJay/tikv that referenced this pull request Oct 10, 2019
* raftstore: reduce message flush

Signed-off-by: Jay Lee <[email protected]>
sre-bot pushed a commit that referenced this pull request Oct 10, 2019
sticnarf pushed a commit to sticnarf/tikv that referenced this pull request Oct 27, 2019
* raftstore: reduce message flush

Signed-off-by: Jay Lee <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/performance Component: Performance needs-cherry-pick-release-3.0 Type: Need cherry pick to release 3.0 needs-cherry-pick-release-3.1 Type: Need cherry pick to release 3.1 sig/raft Component: Raft, RaftStore, etc. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants