raftstore: reduce message flush #5475

BusyJay · 2019-09-17T10:40:40Z

What have you changed?

Technically, a thread only needs to flush messages when it's going to
block. This PR add hooks to every block point instead of flushing at
the end of every ready.

What is the type of the changes?

Improvement

How is the PR tested?

integration tests

Does this PR affect documentation (docs) or should it be mentioned in the release notes?

No.

Does this PR affect `tidb-ansible`?

No.

Benchmark result if necessary (optional)

Benchmark shows that when there are about 25k idle regions, raftstore CPU is reduced from 55% to 40%, and grpc is reduced from 60% to 40%。

Tenically, a thread only needs to flush messages when it's going to sleep or block. This PR add hooks to every block point instead of flushing at the end of every ready. Signed-off-by: Jay Lee <[email protected]>

Signed-off-by: Jay Lee <[email protected]>

BusyJay · 2019-09-17T10:41:02Z

/bench

siddontang · 2019-09-17T10:45:49Z

amazing @BusyJay

Benchmark shows that when there are about 25k idle regions, raftstore CPU is reduced from 55% to 40%, and grpc is reduced from 60% to 40%。

Is it tested without Hibernate region?

BusyJay · 2019-09-17T10:46:16Z

Yes.

BusyJay · 2019-09-17T12:12:37Z

/bench

overvenus · 2019-09-17T12:39:57Z

This PR add hooks to every block point instead of flushing at the end of every ready.

Sounds like a large batch may increase commit duration? 🤔

siddontang · 2019-09-17T12:58:22Z

maybe we can use commit duration metric to verify this.

zhangjinpeng87 · 2019-09-17T13:00:35Z

src/raftstore/store/fsm/batch.rs

@@ -252,6 +252,9 @@ pub trait PollHandler<N, C> {

    /// This function is called at the end of every round.
    fn end(&mut self, batch: &mut [Box<N>]);
+
+    /// This function is called when batch system is going to sleep.
+    fn pause(&mut self) {}


I prefer name it yield.

Why? PollHandler can't do the yield job. And in my opinion, yield usually means short switch.

siddontang · 2019-09-17T13:01:00Z

how about if you enable hibernate region?

sre-bot · 2019-09-17T14:06:35Z

@@                               Benchmark Diff                               @@
================================================================================
tidb: 5f6b22cf90b19a5f6659a03aa34dcb49e4758a8e
--- tikv: f718999e17b7fe6fa6438a5fd1e5efbcef5490c9
+++ tikv: eecd6c398c515b347207c9e3b4a669ad1895c852
pd: cfa5706aec6466da3d2a2c4d0c6e713779fde67e
================================================================================
test-1: < oltp_read_write >
    * QPS : 36755.81 ± 0.2145% (std=61.54) delta: -0.19%
    * AvgMs : 139.84 ± 0.2074% (std=0.23) delta: 0.19%
    * PercentileMs99 : 262.64 ± 0.0000% (std=0.00) delta: 0.72%
            
test-2: < oltp_point_select >
    * QPS : 82898.92 ± 0.5791% (std=285.27) delta: -0.21%
    * AvgMs : 3.09 ± 0.5829% (std=0.01) delta: 0.32%
    * PercentileMs99 : 5.92 ± 1.1141% (std=0.05) delta: 0.41%
            
test-3: < oltp_insert >
    * QPS : 21244.06 ± 0.3751% (std=50.63) delta: 0.09%
    * AvgMs : 12.05 ± 0.3819% (std=0.03) delta: -0.07%
    * PercentileMs99 : 43.55 ± 2.1495% (std=0.58) delta: -0.36%
            
test-4: < oltp_update_index >
    * QPS : 16904.13 ± 0.1551% (std=18.65) delta: 0.09%
    * AvgMs : 15.12 ± 0.3804% (std=0.04) delta: -0.25%
    * PercentileMs99 : 48.69 ± 1.0721% (std=0.43) delta: 0.00%
            
test-5: < oltp_update_non_index >
    * QPS : 29063.10 ± 0.0541% (std=11.12) delta: 0.08%
    * AvgMs : 8.81 ± 0.0757% (std=0.00) delta: 0.00%
    * PercentileMs99 : 30.81 ± 0.0000% (std=0.00) delta: -1.79%

https://perf.pingcap.com

BusyJay · 2019-09-17T16:53:22Z

This PR add hooks to every block point instead of flushing at the end of every ready.

Sounds like a large batch may increase commit duration? 🤔

No, raft client will flush internally every time it receives 8 messages.

siddontang · 2019-09-18T06:36:40Z

it receives 8 messages

why 8 messages? can we configure it?

siddontang · 2019-09-18T06:37:13Z

seem the performance not increased, interesting.

zhangjinpeng87 · 2019-09-20T01:17:29Z

seem the performance not increased, interesting.

Maybe the dataset is not large enough.

BusyJay · 2019-09-20T08:27:19Z

why 8 messages? can we configure it?

It's hard coded by batch raft. Further plan is to refactor the whole batch mechanism.

BusyJay · 2019-09-26T07:19:49Z

PTAL

hicqu · 2019-10-08T05:51:31Z

src/raftstore/store/fsm/store.rs

-        if self.poll_ctx.need_flush_trans {
+        if self.poll_ctx.need_flush_trans
+            && (!self.poll_ctx.kv_wb.is_empty() || !self.poll_ctx.raft_wb.is_empty())
+        {
            self.poll_ctx.trans.flush();


Seems messages are flushed before disk write. If the messsages contain vote messages, could the peer vote twice in a term if the machine fail after here and before disk write?

If there are pending messages, it means they are from leader, it's OK to flush before writing disk. Also related to tikv/raft-rs#292.

Signed-off-by: Jay Lee <[email protected]>

hicqu · 2019-10-08T07:09:32Z

LGTM.

sre-bot · 2019-10-10T05:37:32Z

/run-all-tests

sre-bot · 2019-10-10T05:41:59Z

cherry pick to release-3.0 failed

sre-bot · 2019-10-10T05:42:47Z

cherry pick to release-3.1 failed

* raftstore: reduce message flush Signed-off-by: Jay Lee <[email protected]>

Signed-off-by: Jay Lee <[email protected]>

* raftstore: reduce message flush Signed-off-by: Jay Lee <[email protected]>

BusyJay added 2 commits September 17, 2019 18:32

raftstore: reduce message flush

6bb4b91

Tenically, a thread only needs to flush messages when it's going to sleep or block. This PR add hooks to every block point instead of flushing at the end of every ready. Signed-off-by: Jay Lee <[email protected]>

*: check if flush is really necessary

eecd6c3

Signed-off-by: Jay Lee <[email protected]>

BusyJay added sig/raft Component: Raft, RaftStore, etc. component/performance Component: Performance labels Sep 17, 2019

BusyJay requested review from overvenus, hicqu and gengliqi September 17, 2019 10:40

BusyJay added needs-cherry-pick-release-3.0 Type: Need cherry pick to release 3.0 needs-cherry-pick-release-3.1 Type: Need cherry pick to release 3.1 labels Sep 17, 2019

overvenus requested a review from NingLin-P September 17, 2019 12:51

zhangjinpeng87 reviewed Sep 17, 2019

View reviewed changes

Merge branch 'master' into reduce-flush-master

06cb413

hicqu reviewed Oct 8, 2019

View reviewed changes

make it compile

a402f1b

Signed-off-by: Jay Lee <[email protected]>

hicqu approved these changes Oct 8, 2019

View reviewed changes

gengliqi approved these changes Oct 10, 2019

View reviewed changes

Merge branch 'master' into reduce-flush-master

509201f

BusyJay added status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2. labels Oct 10, 2019

gengliqi merged commit 5d6686b into tikv:master Oct 10, 2019

BusyJay deleted the reduce-flush-master branch October 10, 2019 05:41

BusyJay mentioned this pull request Oct 10, 2019

raftstore: reduce message flush #5617

Merged

BusyJay added a commit to BusyJay/tikv that referenced this pull request Oct 10, 2019

raftstore: reduce message flush (tikv#5475)

79ed168

* raftstore: reduce message flush Signed-off-by: Jay Lee <[email protected]>

BusyJay mentioned this pull request Oct 10, 2019

raftstore: reduce message flush (#5475) #5618

Merged

sre-bot pushed a commit that referenced this pull request Oct 10, 2019

raftstore: reduce message flush (#5475) (#5618)

52e472e

Signed-off-by: Jay Lee <[email protected]>

sticnarf pushed a commit to sticnarf/tikv that referenced this pull request Oct 27, 2019

raftstore: reduce message flush (tikv#5475)

ab55144

* raftstore: reduce message flush Signed-off-by: Jay Lee <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

raftstore: reduce message flush #5475

raftstore: reduce message flush #5475

BusyJay commented Sep 17, 2019

BusyJay commented Sep 17, 2019

siddontang commented Sep 17, 2019

BusyJay commented Sep 17, 2019

BusyJay commented Sep 17, 2019

overvenus commented Sep 17, 2019

siddontang commented Sep 17, 2019

zhangjinpeng87 Sep 17, 2019

BusyJay Sep 17, 2019

siddontang commented Sep 17, 2019

sre-bot commented Sep 17, 2019

BusyJay commented Sep 17, 2019

siddontang commented Sep 18, 2019

siddontang commented Sep 18, 2019

zhangjinpeng87 commented Sep 20, 2019 •

edited

Loading

BusyJay commented Sep 20, 2019

BusyJay commented Sep 26, 2019

hicqu Oct 8, 2019

BusyJay Oct 8, 2019

hicqu commented Oct 8, 2019

sre-bot commented Oct 10, 2019

sre-bot commented Oct 10, 2019

sre-bot commented Oct 10, 2019

raftstore: reduce message flush #5475

raftstore: reduce message flush #5475

Conversation

BusyJay commented Sep 17, 2019

What have you changed?

What is the type of the changes?

How is the PR tested?

Does this PR affect documentation (docs) or should it be mentioned in the release notes?

Does this PR affect tidb-ansible?

Benchmark result if necessary (optional)

BusyJay commented Sep 17, 2019

siddontang commented Sep 17, 2019

BusyJay commented Sep 17, 2019

BusyJay commented Sep 17, 2019

overvenus commented Sep 17, 2019

siddontang commented Sep 17, 2019

zhangjinpeng87 Sep 17, 2019

Choose a reason for hiding this comment

BusyJay Sep 17, 2019

Choose a reason for hiding this comment

siddontang commented Sep 17, 2019

sre-bot commented Sep 17, 2019

BusyJay commented Sep 17, 2019

siddontang commented Sep 18, 2019

siddontang commented Sep 18, 2019

zhangjinpeng87 commented Sep 20, 2019 • edited Loading

BusyJay commented Sep 20, 2019

BusyJay commented Sep 26, 2019

hicqu Oct 8, 2019

Choose a reason for hiding this comment

BusyJay Oct 8, 2019

Choose a reason for hiding this comment

hicqu commented Oct 8, 2019

sre-bot commented Oct 10, 2019

sre-bot commented Oct 10, 2019

sre-bot commented Oct 10, 2019

Does this PR affect `tidb-ansible`?

zhangjinpeng87 commented Sep 20, 2019 •

edited

Loading