Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

integrate circuitbreaker for region calls #1543

Merged
merged 8 commits into from
Jan 15, 2025

Conversation

Tema
Copy link
Contributor

@Tema Tema commented Jan 7, 2025

What problem does this PR solve?

Issue Number: ref tikv/pd#8678

What is changed and how does it work?

Add a global circuit breaker to the context of each call to get region data from PD.
The default circuit breaker is in the disabled state and has no effect.
This PR also exposes a method to tweak circuit breaker settings which can be used from TiDB layer to enable and configure it.

@ti-chi-bot ti-chi-bot bot added the dco-signoff: yes Indicates the PR's author has signed the dco. label Jan 7, 2025
@Tema
Copy link
Contributor Author

Tema commented Jan 7, 2025

PTAL @rleungx
TiDB PR and tests are in progress

@ti-chi-bot ti-chi-bot bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jan 7, 2025
@rleungx
Copy link
Member

rleungx commented Jan 7, 2025

/cc @okJiang

Copy link

ti-chi-bot bot commented Jan 7, 2025

@rleungx: GitHub didn't allow me to request PR reviews from the following users: okJiang.

Note that only tikv members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @okJiang

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@rleungx
Copy link
Member

rleungx commented Jan 7, 2025

FYI, I have updated the pd client version through #1542

@MyonKeminta
Copy link
Contributor

Please fix CI

@Tema Tema force-pushed the pd-circuit-breaker branch from 7dde728 to 01fadf5 Compare January 7, 2025 21:53
@ti-chi-bot ti-chi-bot bot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 7, 2025
@ti-chi-bot ti-chi-bot bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jan 8, 2025
Signed-off-by: artem_danilov <[email protected]>
@Tema
Copy link
Contributor Author

Tema commented Jan 8, 2025

/retest

Copy link

ti-chi-bot bot commented Jan 8, 2025

@Tema: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@Tema
Copy link
Contributor Author

Tema commented Jan 8, 2025

/ok-to-test

Copy link

ti-chi-bot bot commented Jan 8, 2025

@Tema: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/ok-to-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@Tema
Copy link
Contributor Author

Tema commented Jan 8, 2025

@rleungx I've rebased branch to bring your #1542 and added test through enforcing the CB in the context in mock PDClient. Not sure if we need more tests as it is challenging given all tests are using pdClient mock which does not have an interceptor.

@@ -133,6 +134,20 @@ func nextTTL(ts int64) int64 {
return ts + regionCacheTTLSec + jitter
}

var pdRegionMetaCircuitBreaker = circuitbreaker.NewCircuitBreaker(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we move it to config? @MyonKeminta

Copy link
Contributor

@MyonKeminta MyonKeminta Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine to me.
But if you are not going to add any fields to the config file, I think maybe it would be better to add another single file in the internal/locate directory for holding this variable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I extracted it into TiKVClient config. Tests and integration for TiKVClient toml files are in pingcap/tidb#58737.

I think maybe it would be better to add another single file in the internal/locate directory for holding this variable.

Can you elaborate a bit more on that? Would it be a go file with single like like:
var pdRegionMetaCircuitBreaker = circuitbreaker.NewCircuitBreaker("region-meta", circuitbreaker.AlwaysClosedSettings) or what do you mean?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind. I just feel it still makes me kind of incomfortable if it's put in the config package while nothing of it is configurable by config file. Now it looks fine to me.

@rleungx
Copy link
Member

rleungx commented Jan 8, 2025

@rleungx I've rebased branch to bring your #1542 and added test through enforcing the CB in the context in mock PDClient. Not sure if we need more tests as it is challenging given all tests are using pdClient mock which does not have an interceptor.

IMO, we can add more integration tests later.

artem_danilov added 2 commits January 8, 2025 13:23
Signed-off-by: artem_danilov <[email protected]>
config/client.go Outdated
@@ -149,6 +151,23 @@ type CoprocessorCache struct {
AdmissionMinProcessMs uint64 `toml:"admission-min-process-ms" json:"-"`
}

// CircuitBreakerSettings is the config for default circuit breaker settings excluding the error rate
type CircuitBreakerSettings struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about referring to config/retry/backoff.go? We can use NewCircuitBreakerWithVars or something like that and only add a var for the error rate threshold. The other configs can be hard coded and we don't need to support changing them through the config file. @Tema @MyonKeminta @okJiang WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is problematic. Backoff is a request scope object so we can create it from vars all the time, however CircuitBreaker is an instance scope, which is created on startup and aggregates stats across all requests. Any concerns from propagating error rate directly from system variables like pingcap/tidb#58737 proposes?

As for all other configs, I think it is still make sense to keep them configurable as e.g. MinQPSToOpen could depend on the cluster size and might need tuning per workload. But I don't feel too strong about that and can rollback the last change. Otherwise, I will move them to PDClient section instead of TiKVClient as they belong there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about propagating the error rate instead of the whole Settings? As for the other configs, as we discussed before, we can make them hard-coded temporarily. If we do need to change them, then we can make them configurable. Right now, it's better to hide them to reduce the complexity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't propagate the full settings. Circuit breaker has a callback to modify any part of the settings:

func (cb *CircuitBreaker) ChangeSettings(apply func(config *Settings)) {
	apply(cb.config)
}

so in pingcap/tidb#58737 I propagate only error rate through system variables:

func (do *Domain) changePDMetadataCircuitBreakerErrorRateThresholdPct(errorRatePct uint32) {
	tikv.ChangePdRegionMetaCircuitBreakerSettings(func(config *circuitbreaker.Settings) {
		config.ErrorRateThresholdPct = errorRatePct
	})

I would like to keep this API as it allows easily to propagate any other part of the setting if needed, but not required so.

I will remove everything from TiKVClient config back to hardcoded values and keep propagating only error rate system variable as it was in the original version of this PR. Let me know if there is still any concern with that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, /cc @okJiang

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I removed all configurations except sysvar from this PR and pingcap/tidb#58737

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems you forgot to update L2218 and L2283?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally I thought it is not needed as they low qps and didn't want to affect GC unnecessary, but it is probably better to throttle them as well. Just added them.

Copy link
Member

@rleungx rleungx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rest LGTM

}

// ChangePdRegionMetaCircuitBreakerSettings changes circuit breaker changes for region metadata calls
func ChangePdRegionMetaCircuitBreakerSettings(apply func(config *circuitbreaker.Settings)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func ChangePdRegionMetaCircuitBreakerSettings(apply func(config *circuitbreaker.Settings)) {
func ChangePDRegionMetaCircuitBreakerSettings(apply func(config *circuitbreaker.Settings)) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed

tikv/region.go Outdated
@@ -197,6 +198,11 @@ func SetRegionCacheTTLSec(t int64) {
locate.SetRegionCacheTTLSec(t)
}

// ChangePdRegionMetaCircuitBreakerSettings changes circuit breaker settings for region metadata calls
func ChangePdRegionMetaCircuitBreakerSettings(apply func(config *circuitbreaker.Settings)) {
locate.ChangePdRegionMetaCircuitBreakerSettings(apply)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

artem_danilov added 2 commits January 13, 2025 19:18
Signed-off-by: artem_danilov <[email protected]>
Copy link

ti-chi-bot bot commented Jan 15, 2025

@okJiang: adding LGTM is restricted to approvers and reviewers in OWNERS files.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Jan 15, 2025
@ti-chi-bot ti-chi-bot bot added the lgtm label Jan 15, 2025
Copy link

ti-chi-bot bot commented Jan 15, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: MyonKeminta, okJiang, rleungx

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added approved and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Jan 15, 2025
Copy link

ti-chi-bot bot commented Jan 15, 2025

[LGTM Timeline notifier]

Timeline:

  • 2025-01-15 03:21:59.596734926 +0000 UTC m=+88791.051781072: ☑️ agreed by rleungx.
  • 2025-01-15 04:02:03.236910213 +0000 UTC m=+91194.691956362: ☑️ agreed by MyonKeminta.

@ti-chi-bot ti-chi-bot bot merged commit be4b478 into tikv:master Jan 15, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved dco-signoff: yes Indicates the PR's author has signed the dco. lgtm size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants