Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sfitz add F8 and adjust F16 and F32 configs #256

Merged
merged 12 commits into from
Jan 12, 2024
Merged

Conversation

sorelfitzgibbon
Copy link
Contributor

@sorelfitzgibbon sorelfitzgibbon commented Dec 12, 2023

Description - Configuration and testing for lower resource nodes.

Added F8.config and adjusted F16 and F32. Several tests were run and general conclusions added to README. I plan in a future PR to additionally update the Performance Validation section with recent performance data for higher resource nodes.

Closes #249

Testing Results

output directory: /hot/software/pipeline/pipeline-call-sSNV/Nextflow/development/unreleased/sfitz-add-F8

  • Runs on F8:
    • Exome (3.8/4.9G) failed on call_sSNV_MuSE (even after retry with all mem - 16G)
      • log: /hot/software/pipeline/pipeline-call-sSNV/Nextflow/development/unreleased/sfitz-add-F8/SACC-exome-all-tools-exome-F8-failed-slurm-81445.out
    • Exome (3.8/4.9G), no MuSE, succeeded, runtime = 2 hr 5 min
      • log: /hot/software/pipeline/pipeline-call-sSNV/Nextflow/development/unreleased/sfitz-add-F8/SACC-exome-all-tools-except-muse-F8-success-slurm-81827.out
    • A-partial (37/20G) failed on call_sSNV_MuSE (even after retry with all mem - 16G)
      • log: /hot/software/pipeline/pipeline-call-sSNV/Nextflow/development/unreleased/sfitz-add-F8/a_partial-all-tools-F8-failed-slurm-81803.out
    • CPCG0000000196 (397/212G), no MuSE, succeeded, runtime = 2 days 4 hr 20 min.
      • log: /hot/software/pipeline/pipeline-call-sSNV/Nextflow/development/unreleased/sfitz-add-F8/CPCG.noContam-all-tools-except-muse-F8-slurm-81834.out
  • Runs on F16:
    • Exome (3.8/4.9G), succeeded on 2nd try (call_sSNV_MuSE: 24G + 8G = 32G), runtime = 1 hr 21 min
      • log: /hot/software/pipeline/pipeline-call-sSNV/Nextflow/development/unreleased/sfitz-add-F8/SACC-exome-all-tools-F16-succeed2ndtry-slurm-81846.out
    • A-partial (37/20G) failed on call_sSNV_MuSE(even after retry with all mem: 32G)
      • log: /hot/software/pipeline/pipeline-call-sSNV/Nextflow/development/unreleased/sfitz-add-F8/a_partial-all-tools-F16-failed-slurm-81873.out

Re-tested after merging in main (sfitz-external-parse-bam branch)

  • F8:
    • fails when muse is included - as expected
    • Succeeds when muse is excluded

Checklist

  • I have read the code review guidelines and the code review best practice on GitHub check-list.

  • I have reviewed the Nextflow pipeline standards.

  • The name of the branch is meaningful and well formatted following the standards, using [AD_username (or 5 letters of AD if AD is too long)]-[brief_description_of_branch].

  • I have set up or verified the branch protection rule following the github standards before opening this pull request.

  • I have added my name to the contributors listings in the manifest block in the nextflow.config as part of this pull request; I am listed already, or do not wish to be listed. (This acknowledgement is optional.)

  • I have added the changes included in this pull request to the CHANGELOG.md under the next release version or unreleased, and updated the date.

  • I have updated the version number in the metadata.yaml and manifest block of the nextflow.config file following semver, or the version number has already been updated. (Leave it unchecked if you are unsure about new version number and discuss it with the infrastructure team in this PR.)

  • I have tested the pipeline on at least one A-mini sample.

README.md Outdated
- [Testing and Validation](#testing-and-validation)
- [Test Data Set](#test-data-set)
- [Performance Validation](#performance-validation)
- [Performance Validation and Resource Requirements](#performance-validation)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern here was that almost all users want to know what resources are required, but it was somewhat hidden under Performance Validation.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah i think this is a good change, more clearly outlines what they can expect to find

README.md Outdated
Comment on lines 292 to 293
### Performance Validation
Testing was performed in the Boutros Lab SLURM Development cluster. Metrics below will be updated where relevant with additional testing and tuning outputs. Pipeline version used here is v4.0.0-rc.1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will update the metrics using a more recent pipeline version in another PR

config/F16.config Show resolved Hide resolved
config/F2.config Outdated
@@ -6,19 +6,19 @@
process {
withName: run_validate_PipeVal {
cpus = 1
memory = 1.GB
memory = 2.GB
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For these processes that require 1 cpu, may as well give half of the memory. Two of these processes can run at a time and no others.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The F-series nodes don't have exactly 2xCPUs memory available so this will likely result in only one process of PipeVal running at a time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right! I changed them to 1500.MB

config/default.config Show resolved Hide resolved
@kiarod
Copy link

kiarod commented Dec 12, 2023

Sorry I am new to this pipeline, @sorelfitzgibbon is there a known reason why MuSE causes failures for this pipeline? Is it just a memory issue and it requires an F32 or higher to run with MuSE?

EDIT: disregard I see the notes in the README

main.nf Outdated
@@ -54,7 +54,18 @@ if (params.max_cpus < 16 || params.max_memory < 30) {
------------------------------------
ERROR: Insufficient resources: ${params.max_cpus} CPUs and ${params.max_memory} of memory.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems you cited 'To run Mutect2 the pipeline requires at least 8 CPUs and 16 GB of memory', so I think the nested if block needs to be changed or updated depending on intended use.

if (params.max_cpus < 16 || params.max_memory < 30) {
    if (params.algorithm.contains('muse') || params.algorithm.contains('mutect2'))

currently I think if you specify mutect2 and provide the cited resources it will error.

I might suggest inverting these if blocks to check first for algorithm, and then check the resources meet the minimum requirement for the given algorithm.

if (params.algorithm.contains('muse') && (params.max_cpus < 16 || params.max_memory < 32)){...}
else if (params.algorithm.contains('mutect2') && (params.max_cpus < 8 || params.max_memory < 16)){...}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good catch @kiarod. The first pair of cutoffs was supposed to be 8 and 16. It was accidentally reverted when I merged in the main branch using --no-ff. I need to learn a better way to do this.

Do you think it's worth inverting the blocks? I think it would still require the same number of nested if statements and (trivially) would require evaluation of both every time.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no worries! you're 100% right. Operationally they are the same thing, so your call on the if-blocks. The motivation for inverting the if-blocks was more so because the purpose of the if-blocks are to check sufficient resource requirement for specific algorithms, so in the case I suggested, there is one check per algorithm so it might be more straightforward to read/update/maintain, but no real performance difference especially considering these blocks are encountered once in a program that basically runs for a minimum of 2 hours haha

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there may be an advantage to this way in that the user is given relevant information for their resources. If it's done by algorithm they may have to fail twice before getting all the information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed only one line of the error message was being printed, so I've updated them.
See new error messages:
/hot/code/sfitzgibbon/gitHub/uclahs-cds/pipeline-call-sSNV/slurm-82745.out
/hot/code/sfitzgibbon/gitHub/uclahs-cds/pipeline-call-sSNV/slurm-82746.out

@kiarod
Copy link

kiarod commented Dec 12, 2023

Generally looks good @sorelfitzgibbon! Added one note about a potentially misbehaving if block

Copy link

@kiarod kiarod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good @sorelfitzgibbon!

@tyamaguchi-ucla tyamaguchi-ucla changed the title Sfitz add f8 Sfitz add F8 and adjust F16 and F32 configs Dec 13, 2023
Copy link
Contributor

@yashpatel6 yashpatel6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of comments:

config/F2.config Outdated
@@ -6,19 +6,19 @@
process {
withName: run_validate_PipeVal {
cpus = 1
memory = 1.GB
memory = 2.GB
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The F-series nodes don't have exactly 2xCPUs memory available so this will likely result in only one process of PipeVal running at a time

config/F16.config Show resolved Hide resolved
@tyamaguchi-ucla tyamaguchi-ucla mentioned this pull request Jan 9, 2024
8 tasks
Copy link
Contributor

@yashpatel6 yashpatel6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@sorelfitzgibbon sorelfitzgibbon merged commit 73d6530 into main Jan 12, 2024
2 checks passed
@sorelfitzgibbon sorelfitzgibbon deleted the sfitz-add-F8 branch January 12, 2024 22:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add F8.config
3 participants