Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple riak_repl2_fscoordinator_sup workers per cluster when adding nodes [JIRA: RIAK-2675] #748

Open
nerophon opened this issue Jul 7, 2016 · 1 comment

Comments

@nerophon
Copy link

nerophon commented Jul 7, 2016

A customer finds multiple fullsync coordinator workers running simultaneously on each of two clusters. This causes multiple fullsync schedules to run concurrently; the actual fullsync operations may or may not overlap, but each coordinator is active and has its own timer.

This state is reproducible as follows:

  1. Set up two clusters, A & B.
  2. Set up REPL and connect them (cluster manager 0.0.0.0:9080).
  3. Set fullsync_on_connect to true (unclear whether this step is required).
  4. Push continuous load onto cluster A.
  5. Start fullsync with A as source and B as sink.
  6. While fullsync is running, join one or more new nodes to A.
  7. On all nodes riak attach and run supervisor:count_children(whereis(riak_repl2_fscoordinator_sup))..
  8. Observe that worker count > 0 on more than one node. In my test, it was on the original coordinator and also the newly joined node.

The workaround for this issue is to manually kill all riak_repl2_fscoordinator_sup processes as follows:

  1. stop & disable fullsync
  2. wait a few minutes
  3. on each node attach and run: Pid = whereis(riak_repl2_fscoordinator_sup). then erlang:exit(Pid,kill)..
  4. wait a few minutes
  5. enable & start fullsync

The symptoms of this issue are extremely slow fullsync operations, cluster overload / slowness, and fullsync activity in the logs when no fullsync ought to be running.

@Basho-JIRA Basho-JIRA changed the title Multiple riak_repl2_fscoordinator_sup workers per cluster when adding nodes Multiple riak_repl2_fscoordinator_sup workers per cluster when adding nodes [JIRA: RIAK-2675] Jul 7, 2016
@nerophon
Copy link
Author

nerophon commented Jul 7, 2016

Public facing duplicate:
basho/riak_ee-issues#30

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants