Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add merging workflow #15

Merged
merged 10 commits into from
Jul 3, 2023
Merged

Add merging workflow #15

merged 10 commits into from
Jul 3, 2023

Conversation

yashpatel6
Copy link
Collaborator

Description

Add workflow to merge split BAMs.

If split by chromosome, deduplication is skipped. If split more generally, deduplication process is run. Interestingly, the runtime seems to be higher with the deduplication included even though the scattered split had 50 intervals vs only 26 from the chromosomes.

Testing Results

  • Case 1
    • sample: A-mini S2.T-n1
    • input yaml: /hot/software/pipeline/pipeline-recalibrate-BAM/Nextflow/development/input/yaml/single_test_input.yaml
    • config: /hot/software/pipeline/pipeline-recalibrate-BAM/Nextflow/development/unreleased/yashpatel-add-merging-workflow/single_wgs/single_wgs.config
    • output: /hot/software/pipeline/pipeline-recalibrate-BAM/Nextflow/development/unreleased/yashpatel-add-merging-workflow/single_wgs
  • Case 2
    • sample: TCEB1-RCC
    • input yaml: /hot/software/pipeline/pipeline-recalibrate-BAM/Nextflow/development/input/yaml/paired_test_input.yaml
    • config: /hot/software/pipeline/pipeline-recalibrate-BAM/Nextflow/development/unreleased/yashpatel-add-merging-workflow/multi_wgs/multi_wgs.config
    • output: /hot/software/pipeline/pipeline-recalibrate-BAM/Nextflow/development/unreleased/yashpatel-add-merging-workflow/multi_wgs

Checklist

  • I have read the code review guidelines and the code review best practice on GitHub check-list.

  • I have reviewed the Nextflow pipeline standards.

  • The name of the branch is meaningful and well formatted following the standards, using [AD_username (or 5 letters of AD if AD is too long)]-[brief_description_of_branch].

  • I have set up or verified the branch protection rule following the github standards before opening this pull request.

  • I have added my name to the contributors listings in the manifest block in the nextflow.config as part of this pull request, am listed
    already, or do not wish to be listed. (This acknowledgement is optional.)

  • I have added the changes included in this pull request to the CHANGELOG.md under the next release version or unreleased, and updated the date.

  • I have updated the version number in the metadata.yaml and manifest block of the nextflow.config file following semver, or the version number has already been updated. (Leave it unchecked if you are unsure about new version number and discuss it with the infrastructure team in this PR.)

  • I have tested the pipeline on at least one A-mini sample.

Copy link

@tyamaguchi-ucla tyamaguchi-ucla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, the runtime seems to be higher with the deduplication included even though the scattered split had 50 intervals vs only 26 from the chromosomes.

This makes sense to me and your benchmarking really proved it. I think we can do #12 next?

params.gatk_command_mem_diff: float(memory)
params.parallelize_by_chromosome: bool.
*/
process run_MergeSamFiles_Picard {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you are aware, we can implement #12 in a separate PR and that should help reduce the run time for large BAMs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should reduce the runtime yes, but before doing that, I'm planning to add in the remaining processes (intermediate file deletion and the coverage processes) to have an exact copy of what's currently in call-gSNP before modifying the processes

@tyamaguchi-ucla
Copy link

Also, let's add @Faizal-Eeman as a reviewer from now on. I want Faizal to familiarize himself with this new pipeline and call-gSNP. @Faizal-Eeman let us know if you have any questions.

@yashpatel6 yashpatel6 merged commit a877df0 into main Jul 3, 2023
1 check passed
@yashpatel6 yashpatel6 deleted the yashpatel-add-merging-workflow branch July 3, 2023 16:59
@Faizal-Eeman
Copy link

Is the recalibration step getting removed from call-gSNP and becoming into this stand-alone pipeline?

@yashpatel6
Copy link
Collaborator Author

Correct, all BAM processing steps will be removed from call-gSNP and put into this pipeline

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants