-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add retry method for run_MarkDuplicatesSpark_GATK and run_sort_SAMtools #226
Conversation
@yashpatel6: I added the retry module, but it failed when testing
So |
This warning is because the process in align-DNA is called
Right, I didn't realize align-DNA doesn't have that function. You should only need to add that and update the process name in |
Are you re-running the |
Yeah it's been running for 17 hours so it should be done soon. However, @tyamaguchi-ucla is fairly certain the retry won't work with The only reason I didn't kill the job is because it is almost finished and I want to confirm whether Taka is right. This PR is still ready for review though. I will release 8.1.0 after it is approved |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. For CPCG0196-F1, I commented here #229 but most likely scratch space issue. @yashpatel6 anything else to add?
@@ -5,7 +5,12 @@ | |||
process { | |||
cpus = { methods.check_max( 1 * task.attempt, 'cpus' ) } | |||
|
|||
errorStrategy = { task.exitStatus in [143, 137, 104, 134, 139] ? 'retry' : 'finish' } | |||
commonRetryCodes = [104, 134, 137, 139, 143, 247] // Common out-of-memory error codes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was 247 a GATK specific (Java) or Nextflow exit code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We added 247 in align-RNA; it's a Nextflow-recommended out of memory exit code (from the table here). I just opened a PR to add it to the NF template repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll want to add this to F16.config
as well but looks good otherwise!
Description
Add retry method to
run_MarkDuplicatesSpark_GATK
andrun_sort_SAMtools
. If pipeline runs out of memory for either process, then it will retry with more memory.Closes #225
Testing Results
a_mini_n1
Although retry is not needed for
a_mini_n1
, I want to test to make sure the newly added code doesn't break anythingTesting bash script:
/hot/software/pipeline/pipeline-align-DNA/Nextflow/development/unreleased/jarbet-mark-dup-spark-mem-retry/testing-a_mini_n1.sh
/hot/software/pipeline/pipeline-align-DNA/Nextflow/development/input/csv/a_mini_n1.csv
/hot/software/pipeline/pipeline-align-DNA/Nextflow/development/unreleased/jarbet-mark-dup-spark-mem-retry/BWA-MEM2-a_mini_n1.config
/hot/software/pipeline/pipeline-align-DNA/Nextflow/development/unreleased/jarbet-mark-dup-spark-mem-retry/BWA-MEM2-a_mini_n1.log
/hot/software/pipeline/pipeline-align-DNA/Nextflow/development/unreleased/jarbet-mark-dup-spark-mem-retry/HISAT2-a_mini_n1.config
/hot/software/pipeline/pipeline-align-DNA/Nextflow/development/unreleased/jarbet-mark-dup-spark-mem-retry/HISAT2-a_mini_n1.log
/hot/software/pipeline/pipeline-align-DNA/Nextflow/development/unreleased/jarbet-mark-dup-spark-mem-retry/BWA-MEM2-HISAT2-a_mini_n1.config
/hot/software/pipeline/pipeline-align-DNA/Nextflow/development/unreleased/jarbet-mark-dup-spark-mem-retry/BWA-MEM2-HISAT2-a_mini_n1.log
BE-1-RP3-ffpe_input
/hot/software/pipeline/pipeline-align-DNA/Nextflow/development/input/csv/BE-1-RP3-ffpe_input.csv
/hot/software/pipeline/pipeline-align-DNA/Nextflow/development/unreleased/jarbet-mark-dup-spark-mem-retry/BE-1-RP3.config
/hot/software/pipeline/pipeline-align-DNA/Nextflow/development/unreleased/jarbet-mark-dup-spark-mem-retry/BWA-MEM2-BE-1-RP3.log
/hot/software/pipeline/pipeline-align-DNA/Nextflow/development/unreleased/jarbet-mark-dup-spark-mem-retry/align-DNA-8.0.0/BE-1-RP3-ffpe_input/log-align-DNA-8.0.0-20220726T175629Z/nextflow-log/report.html
Checklist
I have read the code review guidelines and the code review best practice on GitHub check-list.
I have reviewed the Nextflow pipeline standards.
The name of the branch is meaningful and well formatted following the standards, using [AD_username (or 5 letters of AD if AD is too long)]-[brief_description_of_branch].
I have set up the branch protection rule following the github standards before opening this pull request, or the branch protection rule has already been set up.
I have added my name to the contributors listings in the
manifest
block in thenextflow.config
as part of this pull request, am listedalready, or do not wish to be listed. (This acknowledgement is optional.)
I have added the changes included in this pull request to the
CHANGELOG.md
under the next release version or unreleased, and updated the date.I have updated the version number in the
metadata.yaml
andmanifest
block of thenextflow.config
file following semver, or the version number has already been updated. (Leave it unchecked if you are unsure about new version number and discuss it with the infrastructure team in this PR.)I have tested the pipeline on at least one A-mini sample with aligner setting to
BWA-MEM2
,HISAT2
, and both. The paths to the test config files and output directories were attached in the Testing Results section.