Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider switching default runs to mosdepth - faster with similar resources #56

Open
sorelfitzgibbon opened this issue May 22, 2024 · 0 comments · May be fixed by #70
Open

Consider switching default runs to mosdepth - faster with similar resources #56

sorelfitzgibbon opened this issue May 22, 2024 · 0 comments · May be fixed by #70

Comments

@sorelfitzgibbon
Copy link
Collaborator

mosdepth (Brent Pedersen, Aaron Quinlan), for depth of coverage calculations, supports threads (decompression threads with speed ups limited to 4 cpu). It is faster than CollectWgsMetrics and samtools depth and has a variety of useful options and outputs for WGS and targeted sequencing. It uses only around 1-2 GB of memory.

Comparison:

Coverage differences are likely due to different cutoffs that can be changed.

Tool Sample BAM size run time cpu memory mean coverage
CollectWgsMetrics PCAWG-63_SA65868.bam 177 GB 4h 24m 54% 1.3 GB 29.4x
CollectWgsMetrics PCAWG-63_SA65784.bam 295 GB 6h 42m 62% 1.3 GB 50.2x
mosdepth PCAWG-63_SA65868.bam 177 GB 1h 23m <300% < 2 GB 34.5x
mosdepth PCAWG-63_SA65784.bam 295 GB 2h 38m <300% < 2 GB 56x

It also has an option for d4 output format - "better than bigwig"

CollectWgsMetrics trace files:

/hot/user/sfitzgibbon/PCAWG-63/data/pipeline-runs/generate-SQC-BAM/generate-SQC-BAM-1.0.0/DO2629/log-generate-SQC-BAM-1.0.0-20240428T022515Z/nextflow-log/trace.txt
/hot/user/sfitzgibbon/PCAWG-63/data/pipeline-runs/generate-SQC-BAM/generate-SQC-BAM-1.0.0/DO2629/log-generate-SQC-BAM-1.0.0-20240430T211711Z/nextflow-log/trace.txt

CollectWgsMetrics output:

/hot/user/sfitzgibbon/PCAWG-63/data/pipeline-runs/generate-SQC-BAM/generate-SQC-BAM-1.0.0/DO2629/Picard-3.1.0/output/Picard-3.1.0_PCAWG-63_SA65784_wgs-metrics.txt
/hot/user/sfitzgibbon/PCAWG-63/data/pipeline-runs/generate-SQC-BAM/generate-SQC-BAM-1.0.0/DO2629/Picard-3.1.0/output/Picard-3.1.0_PCAWG-63_SA65868_wgs-metrics.txt

mosdepth output:

Windows of 500bp seem useful for WGS data. Overall mean coverage is at the bottom of the .summary.txt files.

/hot/data/unregistered/PCAWG-63/pipeline-runs/mosdepth/BWA-MEM2-2.2.1_GATK-4.2.4.1_PCAWG-63_SA65784.window500*
/hot/data/unregistered/PCAWG-63/pipeline-runs/mosdepth/BWA-MEM2-2.2.1_GATK-4.2.4.1_PCAWG-63_SA65784.window500*
@sorelfitzgibbon sorelfitzgibbon linked a pull request Jun 20, 2024 that will close this issue
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant