Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sfitz vcf sample orders #237

Merged
merged 21 commits into from
Oct 4, 2023
Merged

Sfitz vcf sample orders #237

merged 21 commits into from
Oct 4, 2023

Conversation

sorelfitzgibbon
Copy link
Contributor

@sorelfitzgibbon sorelfitzgibbon commented Sep 29, 2023

Description

The sample order in the VCFs need to be consistent for the Intersect steps to work. It was previously assumed Mutect2 would always present the tumor and normal samples in a consistent order, but in fact it presents them alphanumerically. To buffer against future changes, the reordering of samples step is now implemented on all four output VCFs before intersection.

Additionally,

  • the three identical filter_VCF_BCFtools processes were moved to common.nf
  • NFTest for two tools run added

Closes #178

Testing Results

nftest run

(a_mini-all-tools-std-input, a_mini-mutect2-multiple-samples, a_mini-mutect2-tumor-only, a_mini-two-tools)

output

/hot/software/pipeline/pipeline-call-sSNV/Nextflow/development/unreleased/sfitz-vcf-sample-orders/

log

/hot/software/pipeline/pipeline-call-sSNV/Nextflow/development/unreleased/sfitz-vcf-sample-orders/log-nftest-20231002T201534Z.log

Checklist

  • I have read the code review guidelines and the code review best practice on GitHub check-list.

  • I have reviewed the Nextflow pipeline standards.

  • The name of the branch is meaningful and well formatted following the standards, using [AD_username (or 5 letters of AD if AD is too long)]-[brief_description_of_branch].

  • I have set up or verified the branch protection rule following the github standards before opening this pull request.

  • I have added my name to the contributors listings in the manifest block in the nextflow.config as part of this pull request; I am listed already, or do not wish to be listed. (This acknowledgement is optional.)

  • I have added the changes included in this pull request to the CHANGELOG.md under the next release version or unreleased, and updated the date.

  • I have updated the version number in the metadata.yaml and manifest block of the nextflow.config file following semver, or the version number has already been updated. (Leave it unchecked if you are unsure about new version number and discuss it with the infrastructure team in this PR.)

  • I have tested the pipeline on at least one A-mini sample.

@sorelfitzgibbon sorelfitzgibbon changed the base branch from main to sfitz-nftest-assertions September 30, 2023 15:36
@sorelfitzgibbon sorelfitzgibbon changed the base branch from sfitz-nftest-assertions to main October 2, 2023 18:08
Channel.empty().set { strelka2_vcf_ch }
Channel.empty().set { mutect2_vcf_ch }
Channel.empty().set { muse_vcf_ch }
Channel.empty().set { somaticsniper_gzvcf_ch }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating these names as I had failed to make them reflect the status of the compression and the channel (vs list), improving readability.

| while read a b c d; do
mv \$a \$d
mv \$a.tbi \$d.tbi
done
# intersect, keeping all variants, to create presence/absence list of variants in each VCF
bcftools isec \
bcftools isec --nfiles +1\
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing with just two tools revealed that the default behavior of bcftools isec depends on the number of vcfs being intersected. Adding --nfiles +1 makes explicit which method we want. I will also add a new assertion for testing 2 tool runs.

Copy link
Contributor

@yashpatel6 yashpatel6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good! One suggestion:

Anything to add @maotian06 ?

Comment on lines 50 to 52
compress_index_VCF_reordered(reorder_samples_BCFtools.out.gzvcf
.map{ it -> ["${file(it).getName().split('-')[0]}-SNV", it]}
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider converting this tool extraction from the filename into a function since it's duplicated several times

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done! Ready for your review @maotian06 !

Copy link
Contributor

@maotian06 maotian06 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@sorelfitzgibbon sorelfitzgibbon merged commit 96aceef into main Oct 4, 2023
1 check passed
@sorelfitzgibbon sorelfitzgibbon deleted the sfitz-vcf-sample-orders branch October 4, 2023 17:03
@nwiltsie nwiltsie mentioned this pull request Mar 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

move filter_VCF_BCFtools to common.nf
3 participants