Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Number of processes started by IPyParallel clusters #693

Open
sahil1105 opened this issue Apr 19, 2022 · 8 comments
Open

Number of processes started by IPyParallel clusters #693

sahil1105 opened this issue Apr 19, 2022 · 8 comments

Comments

@sahil1105
Copy link
Collaborator

A lot of users seem to be surprised by the number of processes that are started as part of an IPyParallel cluster. In my testing, we seem to start 2 processes per engine (the engine itself and the nanny), an mpiexec process in case of MPI-Launcher, and 9 processes for the controller. The number of controller processes seem to be higher than expected (5 vs 9) based on: https://ipyparallel.readthedocs.io/en/latest/reference/connections.html#all-connections.

  1. Is this in line with what you expect (9 processes)? If so, is this a documentation issue that we could fix? If not, is this a bug that needs to be fixed?
  2. Is there a way to reduce the number of processes required in general across the board, in particular for the controller?
  3. Can we improve the documentation on this in general?
@minrk
Copy link
Member

minrk commented Apr 21, 2022

The connection/process diagram hasn't been updated to include the broadcast scheduler, which is itself multi-process for the tree, and accounts for $2^{depth+1}-1$ processes. The default is a depth of 1, which makes 3 processes. So that's a documentation issue.

You can run engines without the nanny, but that also loses all the benefits of the nanny (remote engine signalling, prompt crash events, etc.). You get a reduced-functionality version of this with MPI without the nanny, so it may make sense for you.

The controller can be started with --usethreads. This puts the schedulers all in threads instead of processes. It saves memory, but of course at the expense of parallelism because Python is itself single-threaded. - the Hub will slow way down if you have lots of executions going on, and the broadcast scheduler loses all parallelism so it probably only makes sense to use a broadcast depth of 0 if you do this.

This config should minimie processes:

# ipengine_config.py
c.IPEngine.use_nanny = False

# ipcontroller_config.py
c.IPController.broadcast_scheduler_depth = 0
c.IPController.use_threads = True

It's worth profiling the performance with threads to understand when the memory trade off may be worth it.

with just one process for the controller and each engine.

@sahil1105
Copy link
Collaborator Author

Thanks for the detailed explanation @minrk! That makes complete sense.
I agree, for small setups, e.g. on laptops, where you may only need a few engines, the minimized configuration could make sense. I think just documenting these details explicitly might be sufficient.

@sahil1105
Copy link
Collaborator Author

# ipengine_config.py
c.IPEngine.use_nanny = False

Quick correction: Should be enable_nanny instead of use_nanny.

@sahil1105
Copy link
Collaborator Author

sahil1105 commented May 1, 2022

@minrk I was able to get this working. Thanks again.
Is there a way to specify these options in ipcluster_config.py or set them in user-code using cluster.config?

@minrk
Copy link
Member

minrk commented May 2, 2022

# ipcluster_config.py (or cluster.config)
c.EngineLauncher.engine_args=["--IPEngine.enable_nanny=False"]
c.ControllerLauncher.controller_args = ["--usethreads"]

@sahil1105
Copy link
Collaborator Author

Great, tysm @minrk!
This worked for me:

c = ipp.Cluster(engines='mpi', n=4)
c.config.EngineLauncher.engine_args=["--IPEngine.enable_nanny=False"]
c.config.ControllerLauncher.controller_args = ["--IPController.broadcast_scheduler_depth=0", "--IPController.use_threads=True"]

Would be great if we can update the docs with the updated diagram and this example.

@sahil1105
Copy link
Collaborator Author

Hi @minrk, any update on when the documentation can be updated with these details?
We're planning to add it to our docs as well, but would be good to reference the official docs.

@minrk
Copy link
Member

minrk commented Jun 8, 2022

I don't have time to work on this right now, but if you wanted to have a stab, I'm happy to review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants