Unnecessary periodic safepoint checker in TiKV+PD (without TiDB) deployments #1506

ArthurChiao · 2024-11-20T08:54:12Z

Hi,

Noticed that there will be a safepoint checker when initiating a tikv/pd client,
https://github.com/tikv/client-go/blob/master/tikv/safepoint.go#L209

which will periodically check the /tidb/store/gcworker/saved_safe_point key in PD'd embedded etcd,
https://github.com/tikv/client-go/blob/master/tikv/safepoint.go#L58
https://github.com/tikv/client-go/blob/master/tikv/safepoint.go#L209

However, in TiKV+PD deployments (without TiDB), such in the JuiceFS +TiKV/PD case, this check is unnecessary, because no components will ever set that key (use xxx/gc/safe_point for GC instead).

What's more, when one or multiple PD has problems, all clients will reconnect the PD cluster, resulting in massive concurrent requests to /tidb/store/gcworker/saved_safe_point, which may freeze the embedded etcd, with PD logs as below,

[WARN] [util.go:144] ["apply request took too long"] [took=4.96746s] [expected-duration=100ms] [prefix="read-only range "] [request="key:\"/tidb/store/gcworker/saved_safe_point\" "] [response="range_response_count:0 size:7"] []
...

A stability concern for big clusters (e.g. with thousands of clients).

Maybe make this checker optional when initiating a client is better? Thanks!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unnecessary periodic safepoint checker in TiKV+PD (without TiDB) deployments #1506

Unnecessary periodic safepoint checker in TiKV+PD (without TiDB) deployments #1506

ArthurChiao commented Nov 20, 2024

Unnecessary periodic safepoint checker in TiKV+PD (without TiDB) deployments #1506

Unnecessary periodic safepoint checker in TiKV+PD (without TiDB) deployments #1506

Comments

ArthurChiao commented Nov 20, 2024