Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error paired_sample_wgs:reheader_interval_bams #50

Open
tgebo opened this issue Jan 5, 2022 · 2 comments
Open

Error paired_sample_wgs:reheader_interval_bams #50

tgebo opened this issue Jan 5, 2022 · 2 comments
Assignees

Comments

@tgebo
Copy link

tgebo commented Jan 5, 2022

  • Pipeline release version v7.2.0
  • Cluster you are using (SGE/Slurm-Dev/Slurm-Test) Slurm-Dev
  • Node type (F2s (lowmem) / F72s (midmem) / M64s (execute)) F72
  • Submission method (interactive/submission script) python
  • Actual submission script (python submission script, "nextflow run ...", etc.) .py script
  • Sbatch or qsub command and logs if applicable
  • Config files /hot/users/tgebo/WCDT/scripts/call-gSNP/DTB-005.config
  • Path to the working directory /hot/users/tgebo/pipelines/pipeline-call-gSNP
  • Any logs produced by the pipeline /hot/users/tgebo/pipelines/pipeline-call-gSNP/DTB-005.log

*** Changed parameter from previous run in #49 back to default value: scatter_count = 50

Error executing process > 'paired_sample_wgs:reheader_interval_bams:run_BuildBamIndex_Picard_normal (24)'

Caused by:
  Process `paired_sample_wgs:reheader_interval_bams:run_BuildBamIndex_Picard_normal (24)` terminated with an error exit status (134)

Command executed:

  set -euo pipefail
  java -Xmx1024m -Djava.io.tmpdir=/scratch         -jar /usr/local/share/picard-slim-2.26.8-0/picard.jar BuildBamIndex         -VALIDATION_STRINGEN
CY LENIENT         -INPUT DTB-005_DNA_N_recalibrated_reheadered_24.bam         -OUTPUT DTB-005_DNA_N_recalibrated_reheadered_24.bam.bai

Command exit status:
  134

Command output:
  (empty)

Command error:
  03:32:33.080 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/usr/local/share/picard-slim-2.26.8-0/picard.jar!/com/intel/gkl/native/libgkl_compression.so
  [Wed Jan 05 03:32:33 GMT 2022] BuildBamIndex --INPUT DTB-005_DNA_N_recalibrated_reheadered_24.bam --OUTPUT DTB-005_DNA_N_recalibrated_reheadered_24.bam.bai --VALIDATION_STRINGENCY LENIENT --VERBOSITY INFO --QUIET false --COMPRESSION_LEVEL 5 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
  [Wed Jan 05 03:32:33 GMT 2022] Executing as ?@9ba14d8de151 on Linux 3.10.0-1127.19.1.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_152-release-1056-b12; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: Version:2.26.8
  runtime/cgo: pthread_create failed: Resource temporarily unavailable
  .command.run: line 273: 70764 Aborted                 docker run -i --cpus 1.0 --memory 1024m -e "NXF_DEBUG=${NXF_DEBUG:=0}" -v /scratch:/scratch -v "$PWD":"$PWD" -w "$PWD" --entrypoint /bin/bash -u $(id -u):$(id -g) $(for i in `id --real --groups`; do echo -n "--group-add=$i "; done) --volume /scratch:/scratch --name $NXF_BOXID blcdsdockerregistry/picard:2.26.8 -c "/bin/bash .command.run nxf_trace"
@yashpatel6 yashpatel6 mentioned this issue Jan 21, 2022
7 tasks
@tyamaguchi-ucla
Copy link

tyamaguchi-ucla commented Jan 29, 2022

@yashpatel6 we can add some comments about the fix for the record.

I think the root cause may be related to max user processes, which is 4096 as default. I had a similar issue before with hatchet. (OpenBLAS and I had to add some extra env variables to adjust # threads) If we see this issue with different tools and want to increate the ulimit, we'll have to ask OHIA or we may need to adjust the number of intervals/jobs running at the same time.

See max user processes below.

(base) [tyamaguchi@ip-0A12521D CN_20]$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 15068
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) 3145728
open files                      (-n) 131072
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) unlimited
cpu time               (seconds, -t) unlimited
max user processes              (-u) 4096
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

@yashpatel6
Copy link
Collaborator

@yashpatel6 we can add some comments about the fix for the record.

I think the root cause may be related to max user processes, which is 4096 as default. I had a similar issue before with hatchet. (OpenBLAS and I had to add some extra env variables to adjust # threads) If we see this issue with different tools and want to increate the ulimit, we'll have to ask OHIA or we may need to adjust the number of intervals/jobs running at the same time.

See max user processes below.

(base) [tyamaguchi@ip-0A12521D CN_20]$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 15068
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) 3145728
open files                      (-n) 131072
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) unlimited
cpu time               (seconds, -t) unlimited
max user processes              (-u) 4096
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Got it, I think part of the reason is also the scratch space running out due to ApplyBQSR being parallelized and the pipelines having to wait for both Indel Realignment and BQSR to complete before deleting files. I've tried lowering the number of split intervals but the disk space issue causes the pipeline to fail so once I add the fix for processing the normal and tumour BQSR together, I'll test it again and see if the same issue pops up again.

@yashpatel6 yashpatel6 self-assigned this Jan 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants