Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: Optimize executor runtime stats performance #1532

Merged
merged 12 commits into from
Dec 25, 2024
4 changes: 2 additions & 2 deletions .github/workflows/compatibility.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,8 @@ jobs:
- name: Checkout TiDB
uses: actions/checkout@v2
with:
repository: JmPotato/tidb
ref: update_pd_client
repository: crazycs520/tidb
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why our own repo here? not offical pingcap/tidb? @cfzjywxk

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to pass the CI test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After pingcap/tidb#58420 merged, #1540 will recover ci reference to original tidb.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siddontang

The TiDB compatibility tests in client-go can be removed in the future. These tests make modifying client-go very cumbersome after the repository split. In practice, updating the client-go dependency in TiDB already serves as a compatibility test.

ref: opt-stats
path: tidb

- name: Check build
Expand Down
4 changes: 2 additions & 2 deletions integration_tests/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ require (
github.com/pkg/errors v0.9.1
github.com/stretchr/testify v1.10.0
github.com/tidwall/gjson v1.14.1
github.com/tikv/client-go/v2 v2.0.8-0.20241220061251-c5d92baf4928
github.com/tikv/client-go/v2 v2.0.8-0.20241223070848-fd950fcf9fcc
github.com/tikv/pd/client v0.0.0-20241220053006-461b86adc78d
go.uber.org/goleak v1.3.0
)
Expand Down Expand Up @@ -114,7 +114,7 @@ require (

replace (
github.com/go-ldap/ldap/v3 => github.com/YangKeao/ldap/v3 v3.4.5-0.20230421065457-369a3bab1117
github.com/pingcap/tidb => github.com/JmPotato/tidb v1.1.0-beta.0.20241223063748-1032fe27d1d9
github.com/pingcap/tidb => github.com/crazycs520/tidb v1.1.0-beta.0.20241224021142-32ed5a21be20

github.com/tikv/client-go/v2 => ../
)
4 changes: 2 additions & 2 deletions integration_tests/go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -840,8 +840,6 @@ github.com/DataDog/zstd v1.5.5 h1:oWf5W7GtOLgp6bciQYDmhHHjdhYkALu6S/5Ni9ZgSvQ=
github.com/DataDog/zstd v1.5.5/go.mod h1:g4AWEaM3yOg3HYfnJ3YIawPnVdXJh9QME85blwSAmyw=
github.com/HdrHistogram/hdrhistogram-go v1.1.2 h1:5IcZpTvzydCQeHzK4Ef/D5rrSqwxob0t8PQPMybUNFM=
github.com/HdrHistogram/hdrhistogram-go v1.1.2/go.mod h1:yDgFjdqOqDEKOvasDdhWNXYg9BVp4O+o5f6V/ehm6Oo=
github.com/JmPotato/tidb v1.1.0-beta.0.20241223063748-1032fe27d1d9 h1:AjyLRfAHeNSwmYwgT+PgffG2VGxbRIVXgvh4Uqssr8I=
github.com/JmPotato/tidb v1.1.0-beta.0.20241223063748-1032fe27d1d9/go.mod h1:4yjTiTiYTvEYe8lI1p2JwYz6UayXzUw5S4IdtOyTAJc=
github.com/JohnCGriffin/overflow v0.0.0-20211019200055-46fa312c352c/go.mod h1:X0CRv0ky0k6m906ixxpzmDRLvX58TFUKS2eePweuyxk=
github.com/Masterminds/semver v1.5.0 h1:H65muMkzWKEuNDnfl9d70GUjFniHKHRbFPGBuZ3QEww=
github.com/Masterminds/semver v1.5.0/go.mod h1:MB6lktGJrhw8PrUyiEoblNEGEQ+RzHPF078ddwwvV3Y=
Expand Down Expand Up @@ -948,6 +946,8 @@ github.com/coreos/go-semver v0.3.1/go.mod h1:irMmmIw/7yzSRPWryHsK7EYSg09caPQL03V
github.com/coreos/go-systemd/v22 v22.5.0 h1:RrqgGjYQKalulkV8NGVIfkXQf6YYmOyiJKk8iXXhfZs=
github.com/coreos/go-systemd/v22 v22.5.0/go.mod h1:Y58oyj3AT4RCenI/lSvhwexgC+NSVTIJ3seZv2GcEnc=
github.com/cpuguy83/go-md2man v1.0.10/go.mod h1:SmD6nW6nTyfqj6ABTjUi3V3JVMnlJmwcJI5acqYI6dE=
github.com/crazycs520/tidb v1.1.0-beta.0.20241224021142-32ed5a21be20 h1:Jv0hCL3tFUvta2FpunDbeDdrKogVd+MtUFtHnhOd5UY=
github.com/crazycs520/tidb v1.1.0-beta.0.20241224021142-32ed5a21be20/go.mod h1:UajA3Myc0XcvU5dVBO/f0lUG4x8MgMjYgnYF3th4x10=
github.com/creack/pty v1.1.9/go.mod h1:oKZEueFk5CKHvIhNR5MUki03XCEU+Q6VDXinZuGJ33E=
github.com/cznic/mathutil v0.0.0-20181122101859-297441e03548 h1:iwZdTE0PVqJCos1vaoKsclOGD3ADKpshg3SRtYBbwso=
github.com/cznic/mathutil v0.0.0-20181122101859-297441e03548/go.mod h1:e6NPNENfs9mPDVNRekM7lKScauxd5kXTr1Mfyig6TDM=
Expand Down
7 changes: 7 additions & 0 deletions integration_tests/snapshot_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -347,6 +347,13 @@ func (s *testSnapshotSuite) TestSnapshotRuntimeStats() {
"scan_detail: {total_process_keys: 20, total_process_keys_size: 20, total_keys: 30, get_snapshot_time: 1µs, " +
"rocksdb: {delete_skipped_count: 10, key_skipped_count: 2, block: {cache_hit_count: 20, read_count: 40, read_byte: 30 Bytes}}}"
s.Equal(expect, snapshot.FormatStats())
snapshot.GetResolveLockDetail().ResolveLockTime = int64(time.Second)
expect = "Get:{num_rpc:4, total_time:2s},txnLockFast_backoff:{num:2, total_time:10ms}, " +
"time_detail: {total_process_time: 200ms, total_wait_time: 200ms}, " +
"resolve_lock_time:1s, " +
"scan_detail: {total_process_keys: 20, total_process_keys_size: 20, total_keys: 30, get_snapshot_time: 1µs, " +
"rocksdb: {delete_skipped_count: 10, key_skipped_count: 2, block: {cache_hit_count: 20, read_count: 40, read_byte: 30 Bytes}}}"
s.Equal(expect, snapshot.FormatStats())
}

func (s *testSnapshotSuite) TestRCRead() {
Expand Down
120 changes: 91 additions & 29 deletions internal/locate/region_request.go
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,11 @@ func (s *RegionRequestSender) String() string {

// RegionRequestRuntimeStats records the runtime stats of send region requests.
type RegionRequestRuntimeStats struct {
RPCStats map[tikvrpc.CmdType]*RPCRuntimeStats
// FirstRPCStats is the stats of first kinds of rpc request, since in most cases, only one kind of rpc request is sent at a time,
// this is to avoid allocating map memory.
FirstRPCStats RPCRuntimeStats
// OtherRPCStatsMap uses to record another types of RPC requests.
OtherRPCStatsMap map[tikvrpc.CmdType]*RPCRuntimeStats
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a suggestion, since one RegionRequestSender typically only send one (or a few) kind(s) of RPC, I guess we can just use a slice here to make the code clean and simple.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great, done.

RequestErrorStats
}

Expand All @@ -141,30 +145,60 @@ type RequestErrorStats struct {

// NewRegionRequestRuntimeStats returns a new RegionRequestRuntimeStats.
func NewRegionRequestRuntimeStats() *RegionRequestRuntimeStats {
return &RegionRequestRuntimeStats{
RPCStats: make(map[tikvrpc.CmdType]*RPCRuntimeStats),
}
return &RegionRequestRuntimeStats{}
}

// RPCRuntimeStats indicates the RPC request count and consume time.
type RPCRuntimeStats struct {
Count int64
Cmd tikvrpc.CmdType
Count uint32
// Send region request consume time.
Consume int64
Consume time.Duration
}

// RecordRPCRuntimeStats uses to record the rpc count and duration stats.
func (r *RegionRequestRuntimeStats) RecordRPCRuntimeStats(cmd tikvrpc.CmdType, d time.Duration) {
stat, ok := r.RPCStats[cmd]
if r.FirstRPCStats.Count == 0 || r.FirstRPCStats.Cmd == cmd {
r.FirstRPCStats.Cmd = cmd
r.FirstRPCStats.Count++
r.FirstRPCStats.Consume += d
return
}
if r.OtherRPCStatsMap == nil {
r.OtherRPCStatsMap = make(map[tikvrpc.CmdType]*RPCRuntimeStats)
}
stat, ok := r.OtherRPCStatsMap[cmd]
if !ok {
r.RPCStats[cmd] = &RPCRuntimeStats{
r.OtherRPCStatsMap[cmd] = &RPCRuntimeStats{
Cmd: cmd,
Count: 1,
Consume: int64(d),
Consume: d,
}
return
}
stat.Count++
stat.Consume += int64(d)
stat.Consume += d
}

// GetRPCCount returns the total rpc types count.
func (r *RegionRequestRuntimeStats) GetRPCCount() int {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about to use GetRPCStatsCount (or sth else)? This name make me think it returns the total number of RPCs of all cmd types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great, done.

if r.FirstRPCStats.Count > 0 {
return len(r.OtherRPCStatsMap) + 1
}
return len(r.OtherRPCStatsMap)
}

// GetCmdRPCCount returns the rpc count of the specified cmd type.
func (r *RegionRequestRuntimeStats) GetCmdRPCCount(cmd tikvrpc.CmdType) uint32 {
if r.FirstRPCStats.Cmd == cmd {
return r.FirstRPCStats.Count
}
if r.OtherRPCStatsMap != nil {
if stats := r.OtherRPCStatsMap[cmd]; stats != nil {
return stats.Count
}
}
return 0
}

// RecordRPCErrorStats uses to record the request error(region error label and rpc error) info and count.
Expand Down Expand Up @@ -198,16 +232,14 @@ func (r *RegionRequestRuntimeStats) String() string {
return ""
}
var builder strings.Builder
for k, v := range r.RPCStats {
if r.FirstRPCStats.Count > 0 {
r.FirstRPCStats.buildString(&builder)
}
for _, v := range r.OtherRPCStatsMap {
if builder.Len() > 0 {
builder.WriteByte(',')
}
builder.WriteString(k.String())
builder.WriteString(":{num_rpc:")
builder.WriteString(strconv.FormatInt(v.Count, 10))
builder.WriteString(", total_time:")
builder.WriteString(util.FormatDuration(time.Duration(v.Consume)))
builder.WriteString("}")
v.buildString(&builder)
}
if errStatsStr := r.RequestErrorStats.String(); errStatsStr != "" {
builder.WriteString(", rpc_errors:")
Expand All @@ -216,6 +248,15 @@ func (r *RegionRequestRuntimeStats) String() string {
return builder.String()
}

func (s *RPCRuntimeStats) buildString(builder *strings.Builder) {
builder.WriteString(s.Cmd.String())
builder.WriteString(":{num_rpc:")
builder.WriteString(strconv.FormatUint(uint64(s.Count), 10))
builder.WriteString(", total_time:")
builder.WriteString(util.FormatDuration(s.Consume))
builder.WriteString("}")
}

// String implements fmt.Stringer interface.
func (r *RequestErrorStats) String() string {
if len(r.ErrStats) == 0 {
Expand All @@ -242,7 +283,11 @@ func (r *RequestErrorStats) String() string {
// Clone returns a copy of itself.
func (r *RegionRequestRuntimeStats) Clone() *RegionRequestRuntimeStats {
newRs := NewRegionRequestRuntimeStats()
maps.Copy(newRs.RPCStats, r.RPCStats)
newRs.FirstRPCStats = r.FirstRPCStats
if r.OtherRPCStatsMap != nil {
newRs.OtherRPCStatsMap = make(map[tikvrpc.CmdType]*RPCRuntimeStats)
maps.Copy(newRs.OtherRPCStatsMap, r.OtherRPCStatsMap)
}
if len(r.ErrStats) > 0 {
newRs.ErrStats = make(map[string]int)
maps.Copy(newRs.ErrStats, r.ErrStats)
Expand All @@ -256,17 +301,11 @@ func (r *RegionRequestRuntimeStats) Merge(rs *RegionRequestRuntimeStats) {
if rs == nil {
return
}
for cmd, v := range rs.RPCStats {
stat, ok := r.RPCStats[cmd]
if !ok {
r.RPCStats[cmd] = &RPCRuntimeStats{
Count: v.Count,
Consume: v.Consume,
}
continue
}
stat.Count += v.Count
stat.Consume += v.Consume
if rs.FirstRPCStats.Count > 0 {
r.mergeRPCRuntimeStats(&rs.FirstRPCStats)
}
for _, v := range rs.OtherRPCStatsMap {
r.mergeRPCRuntimeStats(v)
}
if len(rs.ErrStats) > 0 {
if r.ErrStats == nil {
Expand All @@ -279,6 +318,29 @@ func (r *RegionRequestRuntimeStats) Merge(rs *RegionRequestRuntimeStats) {
}
}

func (r *RegionRequestRuntimeStats) mergeRPCRuntimeStats(rs *RPCRuntimeStats) {
if r.FirstRPCStats.Count == 0 || r.FirstRPCStats.Cmd == rs.Cmd {
r.FirstRPCStats.Cmd = rs.Cmd
r.FirstRPCStats.Count += rs.Count
r.FirstRPCStats.Consume += rs.Consume
return
}
if r.OtherRPCStatsMap == nil {
r.OtherRPCStatsMap = make(map[tikvrpc.CmdType]*RPCRuntimeStats)
}
stat, ok := r.OtherRPCStatsMap[rs.Cmd]
if !ok {
r.OtherRPCStatsMap[rs.Cmd] = &RPCRuntimeStats{
Cmd: rs.Cmd,
Count: rs.Count,
Consume: rs.Consume,
}
return
}
stat.Count += rs.Count
stat.Consume += rs.Consume
}

// ReplicaAccessStats records the replica access info.
type ReplicaAccessStats struct {
// AccessInfos records the access info
Expand Down
20 changes: 10 additions & 10 deletions internal/locate/region_request3_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -1168,10 +1168,10 @@ func (s *testRegionRequestToThreeStoresSuite) TestSendReqFirstTimeout() {
regionErr, err := resp.GetRegionError()
s.Nil(err)
s.True(IsFakeRegionError(regionErr))
s.Equal(1, len(s.regionRequestSender.Stats.RPCStats))
s.Equal(int64(3), s.regionRequestSender.Stats.RPCStats[tikvrpc.CmdGet].Count) // 3 rpc
s.Equal(3, len(reqTargetAddrs)) // each rpc to a different store.
s.Equal(0, bo.GetTotalBackoffTimes()) // no backoff since fast retry.
s.Equal(1, s.regionRequestSender.Stats.GetRPCCount())
s.Equal(uint32(3), s.regionRequestSender.Stats.GetCmdRPCCount(tikvrpc.CmdGet)) // 3 rpc
s.Equal(3, len(reqTargetAddrs)) // each rpc to a different store.
s.Equal(0, bo.GetTotalBackoffTimes()) // no backoff since fast retry.
// warn: must rest MaxExecutionDurationMs before retry.
resetStats()
if staleRead {
Expand All @@ -1187,9 +1187,9 @@ func (s *testRegionRequestToThreeStoresSuite) TestSendReqFirstTimeout() {
s.Nil(err)
s.Nil(regionErr)
s.Equal([]byte("value"), resp.Resp.(*kvrpcpb.GetResponse).Value)
s.Equal(1, len(s.regionRequestSender.Stats.RPCStats))
s.Equal(int64(1), s.regionRequestSender.Stats.RPCStats[tikvrpc.CmdGet].Count) // 1 rpc
s.Equal(0, bo.GetTotalBackoffTimes()) // no backoff since fast retry.
s.Equal(1, s.regionRequestSender.Stats.GetRPCCount())
s.Equal(uint32(1), s.regionRequestSender.Stats.GetCmdRPCCount(tikvrpc.CmdGet)) // 1 rpc
s.Equal(0, bo.GetTotalBackoffTimes()) // no backoff since fast retry.
}
}

Expand Down Expand Up @@ -1398,9 +1398,9 @@ func (s *testRegionRequestToThreeStoresSuite) TestStaleReadTryFollowerAfterTimeo
s.Nil(err)
s.Nil(regionErr)
s.Equal([]byte("value"), resp.Resp.(*kvrpcpb.GetResponse).Value)
s.Equal(1, len(s.regionRequestSender.Stats.RPCStats))
s.Equal(int64(2), s.regionRequestSender.Stats.RPCStats[tikvrpc.CmdGet].Count) // 2 rpc
s.Equal(0, bo.GetTotalBackoffTimes()) // no backoff since fast retry.
s.Equal(1, s.regionRequestSender.Stats.GetRPCCount())
s.Equal(uint32(2), s.regionRequestSender.Stats.GetCmdRPCCount(tikvrpc.CmdGet)) // 2 rpc
s.Equal(0, bo.GetTotalBackoffTimes()) // no backoff since fast retry.
}

func (s *testRegionRequestToThreeStoresSuite) TestDoNotTryUnreachableLeader() {
Expand Down
Loading
Loading