-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem when running on hpc computing cluster. #326
Comments
On a HPC it's normal for Nextflow to submit many smaller jobs when you use the When a process exits with code 137, it means a process has been killed because it exceeded requested resources. The Here's a configuration profile I use for UK Biobank: #328 (comment) It works fine for ~150 scores. This configuration does a few things:
|
Thank You.
Then I submitted the job with new config file as follows: (I asked for 4 cores, and 64Gb for each) since the module requring the largest process in your config file had such requirements.
Here is my nextflow log file I am going to try deleting the entire |
Can I download all PGS traits before hand and ask nextflow to use the downloaded files from a direcotry, rather than trying to download files live when the pipeline is running? Something similar to the reference files? |
You could use The You can install the |
Thanks. I downloaded some PRS score files and then ran a job testing 1 PRS trait ( using --scorefile) and it runs to completion. So I've bypassed the issue of the job not being able to execute the "DOWNLOAD_SCORE" step. I will now begin benchmarking the resource requirements for multiple traits since we intend to run all ~4800 traits from PGScatalog for both our CAU and AFR subset of target data which each have >4000 samples. I did have to reduce all the memory requirements for subtasks (in the config file) to 8GB for anything more than 8GB, otherwise my jobs were pending on the cluster queues without entering the "run" stage. If you ever figure out the original "pgscatalog.core.lib.pgsexceptions.QueryError: Can't query PGS Catalog API" error, or if our HPC informaticians help me figure this out, I will update here. thank you. |
Hi @samreenzafer, you probably won't be able to run all scores at once with only 8 GB of RAM. Running all scores at once increases the RAM used in matching (number of variants read into memory) and scoring (as it has to create a scores x samples matrix, plink will likely complain at less than 16 GB for that amount of data). |
Yes I did infact get that error, even when I tried 150 traits at one time. I might just fire one job per trait, and then merge the individual results CSV at the end, although I did like to see one HTML report with multiple traits, for easier comparison, which is why I thought it would be better to run in batches of 150 or 100 together. Thank You. |
Hi.
I've been able to run a few traits along with my data on command line (on our department's computing cluster) and I'm now trying to scale it for thousands of PGSids, by using the lsf queue system. I've finally been able to get 1 job running as I'll show below, but it fails every time I submit it at a different point. The log files are quite large, so I'll try to upload them here instead of pasting here.
My main job has exited with error but I still see one of the sub-jobs that the workflow creates and submits to the cluster, being in PENDING state on our cluster, which is strange.
I submitted the main job as below.
The shell script looks like this:
And this is what the
nextflow.lsf.config
file looks like.I still see the following sub-job pending execution on the cluster queue, even though the main job "CNICS.CAU" had exited with error.
[zafers02@li03c02 test_nextflow_CNICSonly]$ bjobs
JOBID USER JOB_NAME STAT QUEUE FROM_HOST EXEC_HOST SUBMIT_TIME START_TIME TIME_LEFT
131073368 zafers02 *3900ddf6662 PEND premium lc02c03.ch - Jun 27 14:13 - -
-rw-rw-rw- 1 zafers02 nicolp01a 52K Jun 27 14:15 .nextflow.log
-rw-rw-rw- 1 zafers02 nicolp01a 0 Jun 27 14:15 job.CNICS.lsf.sh.CAU.e
-rw-rw-rw- 1 zafers02 nicolp01a 8.0K Jun 27 14:15 job.CNICS.lsf.sh.CAU.o
I am uploading the
job.CNICS.lsf.sh.CAU.o
file as job.CNICS.lsf.sh.CAU.o.txt and.nextflow.log
file as job1.nextflow.log.txt here.I'm wondering what I'm doing wrong here.
Thank you for your time.
job.CNICS.lsf.sh.CAU.o.txt
job1.nextflow.log.txt
The text was updated successfully, but these errors were encountered: