-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to parallelise efficiently? #3725
Comments
Is your data stored on a SSD or a HDD ? If the preprocessing isn't using as much resources as you allow it to, your bottleneck might be your disk's read speed. Same goes for manually using |
The files are staged to an SSD. The reason I am asking is that joblib does maximise CPU usage, so there is untapped potential there. My problem with using joblib onto spikeinterface functions is that the child processes die, due to some thread management issues and memory leaks/overflows. |
Hi,
For pre/post processing our internal machanism of parralelisation is quite flexible and should be efficient. Be also aware that :
|
@samuelgarcia I agree with you on avoiding joblib. But in principle using If that is not the case, we might have a bug. Can you try to instead of using total memory, setting the
Should use 10 times the default RAM, since default duration is |
At the moment I have decoupled preprocessing, sorting, and postprocessing from one another, to run them independently instead of all in one go. Let's focus on just preprocessing. run_sorter_jobs() works well in terms of CPU capacity on its own anyway, so I am not inclined to wrap it in joblib. And just to be clear, this is a single workstation. All processing and storage are local to the machine, and it is a Linux machine. pool_engine="process" is not a recognised kwarg in the version I have installed (0.101.2). Filling memory capacity does not seem to be a problem. In fact the opposite is true, it exceeds the expected usage, by a lot. I was monitoring a run with 20 wells preprocessing wrapped in joblib, with 1 core and 12GB each. So I expected 50% CPU capacity and 240G (20 * 12) RAM capacity. But the actual RAM usage was peaking repeatedly at 320GB (100%). I tried a run without the joblib workaround: Something there is not right with the memory usage. It might be the reason I experience so many crashes every time I try to scale up my processing, as it consumes a lot more memory than I allocate to it. |
Oh, I just discovered that using less chunk_memory results in more cores. If I assign 10G per chunk, I get maybe 5 of the 30 assigned cores. If I assign 500M instead, all 30 cores come into play properly. Interesting. Maybe I really need to rethink how much memory is worth allocating. (the RAM footprint is still much larger than the parameters would indicate) |
Hello!
I have a large backlog of data (many thousands of Maxwell recordings), so I want to take maximum advantage of the computer's's capacity. I have a workstation with 20 Intel cores (ie. 40 "cores") and about 320GB of RAM.
Using the built-in parallelization option
does not make use of the provided capacity. Preprocessing for example uses only about max 5% of the CPU capacity and 20% of the RAM capacity. Clearly there are bottlenecks in parallelizing the processing of a single electrode array.
So instead of preprocessing one recording at a time with many cores, I tried processing many recordings at once with one core each, by setting
n_jobs=1
and usingParallel()
from thejoblib
package. This works very well in terms of making use of the CPU capacity, but inevitably causes a crash sooner or later.For the actual sorting step I just use the provided
run_sorter_jobs(engine_kwargs = {'n_jobs': 36})
with each joblist item having 'job_kwargs' : {'n_jobs': 1}`, and this at least does use the available capacity well.I am not a python expert nor a computer scientist. I also don't know the internal designs and limitations of spikeinterface. So I am not getting anywhere trying to troubleshoot this by myself.
My hope is that you might have some insight on how I could get through preprocessing and postprocessing a very large number of recordings more quickly. Thank you in advance!
The text was updated successfully, but these errors were encountered: