You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A customer finds multiple fullsync coordinator workers running simultaneously on each of two clusters. This causes multiple fullsync schedules to run concurrently; the actual fullsync operations may or may not overlap, but each coordinator is active and has its own timer.
This state is reproducible as follows:
Set up two clusters, A & B.
Set up REPL and connect them (cluster manager 0.0.0.0:9080).
Set fullsync_on_connect to true (unclear whether this step is required).
Push continuous load onto cluster A.
Start fullsync with A as source and B as sink.
While fullsync is running, join one or more new nodes to A.
On all nodes riak attach and run supervisor:count_children(whereis(riak_repl2_fscoordinator_sup))..
Observe that worker count > 0 on more than one node. In my test, it was on the original coordinator and also the newly joined node.
The workaround for this issue is to manually kill all riak_repl2_fscoordinator_sup processes as follows:
stop & disable fullsync
wait a few minutes
on each node attach and run: Pid = whereis(riak_repl2_fscoordinator_sup). then erlang:exit(Pid,kill)..
wait a few minutes
enable & start fullsync
The symptoms of this issue are extremely slow fullsync operations, cluster overload / slowness, and fullsync activity in the logs when no fullsync ought to be running.
The text was updated successfully, but these errors were encountered:
Basho-JIRA
changed the title
Multiple riak_repl2_fscoordinator_sup workers per cluster when adding nodes
Multiple riak_repl2_fscoordinator_sup workers per cluster when adding nodes [JIRA: RIAK-2675]
Jul 7, 2016
A customer finds multiple fullsync coordinator workers running simultaneously on each of two clusters. This causes multiple fullsync schedules to run concurrently; the actual fullsync operations may or may not overlap, but each coordinator is active and has its own timer.
This state is reproducible as follows:
fullsync_on_connect
totrue
(unclear whether this step is required).riak attach
and runsupervisor:count_children(whereis(riak_repl2_fscoordinator_sup)).
.The workaround for this issue is to manually kill all
riak_repl2_fscoordinator_sup
processes as follows:Pid = whereis(riak_repl2_fscoordinator_sup).
thenerlang:exit(Pid,kill).
.The symptoms of this issue are extremely slow fullsync operations, cluster overload / slowness, and fullsync activity in the logs when no fullsync ought to be running.
The text was updated successfully, but these errors were encountered: