How to reduce the runtime? #1

spcspin · 2023-08-23T12:00:40Z

The most time-consuming part of the workflow is the calling variant using the HaplotypeCaller. Therefore we focused on the HaplotypeCaller step.

Here are a few ways to try:

HaplotypeCallerSpark:
HaplotypeCallerSpark is a tool designed by gatk to replace the threading functionality in gatk3. However, it is still in BETA stage, and
many attempts to use the bee data caused problems. After a discussion with the gatk team, it is confirmed that it is a problem with
HaplotypeCallerSpark itself.

The discussion link
break down Reference into smaller chunks for HaplotypeCaller:
scattered intervals based on N masked regions of the reference genome and collecting each intervals calls at the end using
GatherVcfs tool.
Optimize JAVA setting:
Trying to adjust the parameters related to garbage collection:
- -XX:ParallelGCThreads
- Heap Space -Xmx
  There is not much difference in the results.
CPU utilization:
Using the --native-pair-hmm-threads option in HaplotypeCaller there is not much difference in the results.
3 and 4 can refer to this website
Try different variant calling tools:
- VarScan
- DeepVariant

spcspin · 2023-08-31T08:43:16Z

To deal with the low integrity problem when breaking down the Reference genome into smaller chunks for HaplotypeCaller, the GATK team replied as follows:
Depending on how you scatter your intervals it should still hold true. Worst case scenario you may have to run HaplotypeCaller per contig/chromosome which is probably the safest way but if your reference is split by long repeats of N then you may want to split your intervals based on the positions of N repeats.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to reduce the runtime? #1

How to reduce the runtime? #1

spcspin commented Aug 23, 2023 •

edited

Loading

spcspin commented Aug 31, 2023

How to reduce the runtime? #1

How to reduce the runtime? #1

Comments

spcspin commented Aug 23, 2023 • edited Loading

spcspin commented Aug 31, 2023

spcspin commented Aug 23, 2023 •

edited

Loading