Update resource handling #266

yashpatel6 · 2023-05-04T19:04:48Z

Description

Closes #264

Updating some resource allocations:

Add retry mechanism for retrying alignment process with fewer CPUs
Add retry mechanism for retrying MarkDuplicates with Picard
Add memory difference for Picard/GATK since MarkDuplicates with alt-aware mode seems to require a few GB over the setting specified by Java options memory

Testing Results

BWA-MEM2 - DTB-002T
- Alignment fails for one pair of FASTQs due to memory and retry with fewer CPUs succeeds
- Config: /hot/software/pipeline/pipeline-align-DNA/Nextflow/development/unreleased/yashpatel-update-resource-handling/DTB-002T.config
- Input: /hot/software/pipeline/pipeline-align-DNA/Nextflow/development/unreleased/yashpatel-update-resource-handling/DTB-002T.csv
- Output: /hot/software/pipeline/pipeline-align-DNA/Nextflow/development/unreleased/yashpatel-update-resource-handling
Tested Picard memory difference - /hot/project/disease/HeadNeckTumor/HNSC-000084-LNMEvolution/pipelines/align-DNA/yashpatel_test

Checklist

I have read the code review guidelines and the code review best practice on GitHub check-list.
I have reviewed the Nextflow pipeline standards.
The name of the branch is meaningful and well formatted following the standards, using [AD_username (or 5 letters of AD if AD is too long)]-[brief_description_of_branch].
I have set up the branch protection rule following the github standards before opening this pull request, or the branch protection rule has already been set up.
I have added my name to the contributors listings in the manifest block in the nextflow.config as part of this pull request, am listed
already, or do not wish to be listed. (This acknowledgement is optional.)
I have added the changes included in this pull request to the CHANGELOG.md under the next release version or unreleased, and updated the date.
I have updated the version number in the metadata.yaml and manifest block of the nextflow.config file following semver, or the version number has already been updated. (Leave it unchecked if you are unsure about new version number and discuss it with the infrastructure team in this PR.)
I have tested the pipeline on at least one A-mini sample with aligner setting to BWA-MEM2, HISAT2, and both. The paths to the test config files and output directories were attached in the Testing Results section.

tyamaguchi-ucla · 2023-05-04T20:14:38Z

Looks good to me. @yashpatel6 do we want additional tests from @graceooh or @rhughwhite (or somebody else) ?

yashpatel6 · 2023-05-04T21:44:28Z

Looks good to me. @yashpatel6 do we want additional tests from @graceooh or @rhughwhite (or somebody else) ?

I tested with the samples that Jieun and Rupert had errors with and they were fine; Nick also had errors related to this branch so I've asked him to test the fix as well

yashpatel6 · 2023-05-08T21:15:48Z

Nick ran some tests and they were successful so we should be good for these allocations

tyamaguchi-ucla

Looks good. Anything else to add guys? @rhughwhite @nkwang24

tyamaguchi-ucla · 2023-05-09T19:41:03Z

config/F72.config

        cpus = 1
-        memory = 10.GB
+        memory = 60.GB


This is a huge increase but I guess some multi-library samples require so much memory. Picard is single-threaded and slow and this is also another reason we want to implement #234 although Picard is library-aware.

Yeah I had ~6 multi-lane samples from the head and neck project (HNSC0000016) that each took ~50GB for this step. Initially tried with spark but ran out of scratch space.

Yeah Picard can actually run with very little memory but I increased the value here since the process is generally a bottle-neck process in each of the tool workflows currently. So I kept the allocation at roughly a little under half of the total memory to use as much as possible while leaving some for other misc. processes like validation/checksum generation

yashpatel6 added 4 commits May 1, 2023 10:10

Add retry to reduce CPU num for aligners

fb1fae0

Enable retry for alignment processes

7330e30

Add retry for Picard MarkDuplicates

b06eaee

Update CHANGELOG

cf5738d

yashpatel6 requested review from graceooh, rhughwhite and tyamaguchi-ucla May 4, 2023 19:04

yashpatel6 requested a review from a team as a code owner May 4, 2023 19:04

tyamaguchi-ucla self-assigned this May 4, 2023

tyamaguchi-ucla approved these changes May 9, 2023

View reviewed changes

yashpatel6 merged commit 5d2f9a7 into main May 12, 2023

yashpatel6 deleted the yashpatel-update-resource-handling branch May 12, 2023 21:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update resource handling #266

Update resource handling #266

yashpatel6 commented May 4, 2023

tyamaguchi-ucla commented May 4, 2023

yashpatel6 commented May 4, 2023

yashpatel6 commented May 8, 2023

tyamaguchi-ucla left a comment

tyamaguchi-ucla May 9, 2023

rhughwhite May 9, 2023

yashpatel6 May 10, 2023

Update resource handling #266

Update resource handling #266

Conversation

yashpatel6 commented May 4, 2023

Description

Closes #264

Testing Results

Checklist

tyamaguchi-ucla commented May 4, 2023

yashpatel6 commented May 4, 2023

yashpatel6 commented May 8, 2023

tyamaguchi-ucla left a comment

Choose a reason for hiding this comment

tyamaguchi-ucla May 9, 2023

Choose a reason for hiding this comment

rhughwhite May 9, 2023

Choose a reason for hiding this comment

yashpatel6 May 10, 2023

Choose a reason for hiding this comment