-
-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use multiprocess if available #368
base: master
Are you sure you want to change the base?
Conversation
Thanks.
|
could you please elaborate |
It will also be useful on macOS, where |
It seems |
67c5c4e
to
bd82a09
Compare
bd82a09
to
909db9b
Compare
Hi @schettino72, since I was touching the doit code for #377 I updated this one also to take your feedback into account.
Done, I implemented the check in
Yes, that makes sense. I instead made it an optional dependency that you can install with
Done, the imports are done in
Done!
I didn't add anything to CI, but you could either modify one of the existing builds in your build matrix to install with multiprocess, by changing
to
for that build (might need quotes, like If you like, I can try to add a test that would fail with |
from multiprocess import Process, Queue as MQueue | ||
HAS_MULTIPROCESS = True | ||
except ImportError: | ||
from multiprocessing import Process, Queue as MQueue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now this will break for BSD users without any of them installed.
except ImportError: | ||
from multiprocessing import Process, Queue as MQueue | ||
HAS_MULTIPROCESS = False | ||
Process # pyflakes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think this should be top level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I would suggest that instead of having two copies of the import try/except we have only one version. get_multiprocess_lib()
.
If None
that just means "not available".
And it should take an optional parameter start_method
, if set you call:
multiprocess.set_start_method('spawn')
If the problem could be solved just by changing the "start_method" we could add a configuration parameter for that (not required on this PR).
That would be great. |
Windows doesn't have a true
fork
implementation, so multiprocessing and pickling are limited compared to other platforms. There are also some situations in Linux / Mac OS X where we would like to pickle something, but it does not work without some code changes. This PR makes pickling for mulitprocessing work in more situations by using themultiprocess
library, if it's available.multiprocess
is a fork ofmultiprocessing
that usesdill
instead ofpickle
.This PR is set up such that we do not require the user to have
multiprocess
installed, but if it is installed we'll use it -- except in Windows, where pickling is so limited that it will make things easier to require it. There are a few other routes we could go:Don't require it in Windows: This makes things simpler for Windows users who don't want to use multiprocessing. Installing the
multiprocess
package should work on all platforms, but if there's a platform where there are likely to be issues, it's Windows.Require it for all platforms: This will likely be a net benefit to everyone, but it introduces a new dependency. Also,
dill
will always be a little bit slower thanpickle
since it's able to pickle more things. However, if we're not pickling anything too large anyway, performance shouldn't be an issue, and it would be possible to simplify some parts of the codebase by usingdill
instead ofcloudpickle
and removing some of the hacks that we currently use to get Windows to pickle things that it normally can't.I'm happy to modify this PR to implement either option if you want! I've been using
doit
for a relatively large data science project and it's been a huge help.