Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unnecessary periodic safepoint checker in TiKV+PD (without TiDB) deployments #1506

Open
ArthurChiao opened this issue Nov 20, 2024 · 0 comments

Comments

@ArthurChiao
Copy link

Hi,

Noticed that there will be a safepoint checker when initiating a tikv/pd client,
https://github.com/tikv/client-go/blob/master/tikv/safepoint.go#L209

which will periodically check the /tidb/store/gcworker/saved_safe_point key in PD'd embedded etcd,
https://github.com/tikv/client-go/blob/master/tikv/safepoint.go#L58
https://github.com/tikv/client-go/blob/master/tikv/safepoint.go#L209

However, in TiKV+PD deployments (without TiDB), such in the JuiceFS +TiKV/PD case, this check is unnecessary, because no components will ever set that key (use xxx/gc/safe_point for GC instead).

What's more, when one or multiple PD has problems, all clients will reconnect the PD cluster, resulting in massive concurrent requests to /tidb/store/gcworker/saved_safe_point, which may freeze the embedded etcd, with PD logs as below,

[WARN] [util.go:144] ["apply request took too long"] [took=4.96746s] [expected-duration=100ms] [prefix="read-only range "] [request="key:\"/tidb/store/gcworker/saved_safe_point\" "] [response="range_response_count:0 size:7"] []
...

A stability concern for big clusters (e.g. with thousands of clients).

Maybe make this checker optional when initiating a client is better? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant