Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelization support #3

Open
the-x-at opened this issue Oct 6, 2016 · 5 comments
Open

Parallelization support #3

the-x-at opened this issue Oct 6, 2016 · 5 comments

Comments

@the-x-at
Copy link
Contributor

the-x-at commented Oct 6, 2016

Support for parallel simulation computation on a multiprocessor/multicore machine would be great. Limiting the number of cores used should be an optional parameter when running simulations, ideally defaulting to a single job, as many queuing systems have their own load balancing and discourage use of multiple cores for a single job.

@jorainer
Copy link
Member

jorainer commented Oct 6, 2016

parallel random sampling might be tricky but eventually there might be something in BiocParallel.

@the-x-at
Copy link
Contributor Author

Five years gone and not much happened. In the meantime, fixing issue #22 splits the whole simulation into small chunks of short simulations. This would theoretically be a possibility to add parallelization. OTH, this type of running threads in parallel is not appreciated by queuing systems like SLURM, as you gain an advantage over other by using multiple cores. Unless SLURM is informed about this, it will kill the job assuming excess CPU usage.

@jorainer
Copy link
Member

parallel processing with SLURM works like a marvel with:

ncores <- as.integer(Sys.getenv("SLURM_JOB_CPUS_PER_NODE", 7)) - 1L
register(MulticoreParam(ncores))

any subsequent call to bplapply will then by default use the parallel processing setup with the number of nodes assigned by SLURM. The main issue I see is with the random numbers - we would have to ensure that not the same random numbers are picked up in the parallel jobs. Anyway, since we're running FamAgg on multiple traits in one analysis, parallelizing by trait is at present my favorite approach.

@the-x-at
Copy link
Contributor Author

Looks very simple, and it also looks like one has to supply the number of CPUs (cores/threads) used by a single process when submitting a job to SLURM via sbatch -c N, where N is the number of threads you want to use. This parameter then ends up in the environment variable SLURM_JOB_CPUS_PER_NODE.
Anyway, at the moment we don't see much need for this. So the issue will remain open but no plans to tackle it.

@jorainer
Copy link
Member

yep exactly, sbatch -c N is assigned the environment variable SLURM_JOB_CPUS_PER_NODE (I guess also the other variables will be available). And yes, I agree, no need to implement something at present.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants