You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello @luohaixiannz , sorry for the late reply.
Is it possible to organize a client-go test case to describe the mentioned issue so it could be more specific? It would be better if there are specific improvement suggestions.
版本:
pd-server和tikv-server都是7.1.5版本
client端使用的client-go版本是v2.0.7
节点拓扑:
pd节点:3节点
xx.xxx.111.76:10002 xx.xxx.113.11:10002 xx.xxx.112.202:10002
tikv-server节点:有3个节点(跟pd在同节点),每个节点6个tikv-server实例,服务的端口号是10010-10021
client端节点:2个节点
注入的故障:
对其中一个client节点端进行故障注入,让他发往xx.xxx.112.202的10013-10017端口范围进行丢包,丢包时间30分钟
从端口来看这个故障只会对202节点上的部分tikv-server造成不可访问,对pd的访问是没有故障的
预期结果:
故障恢复后,请求延时应该能得到恢复。
实际结果:
client端还是有很多超时,有时10几分钟就恢复了,有时需要几个小时都还没恢复。
当时的监控和日志情况:
30分钟后故障恢复后的监控情况:
对应的client-go的日志打印:
这里看着对应store状态一直在reachable和unknow之间切换这里看到说region状态需要更新,但理论上跟pd的交互应该一直都是稳定的
The text was updated successfully, but these errors were encountered: