Skip to content

Problem when running Longshot with too few reads/too small coverage using -A #72

@Coppini

Description

@Coppini

I'm having problems with Longshot when running the Artic pipeline (even after updating Longshot to v0.4.3, since Artic still uses v0.4.1), in this command:

longshot \
    -P 0 \
    -F -A \
    --no_haps \
    --bam barcode91.primertrimmed.rg.sorted.bam \
    --ref primer_schemes/SARS-CoV-2/V1200/SARS-CoV-2.reference.fasta \
    --out barcode91.merged.vcf \
    --potential_variants barcode91.merged.vcf.gz

The BAM file I'm using there has only one read, with few bases, resulting in a very small coverage. This was expected, since these were negative samples, where we shouldn't be finding (Sars-Cov-2) reads.
Longshot ends up erroring with the following messages:

2021-08-23 10:03:32 Automatically determining max read coverage.
2021-08-23 10:03:32 Estimating mean read coverage...
2021-08-23 10:03:32 WARNING: Max coverage calculation is highly likely to be incorrect. The number of reference bases covered by the bam file (857) differs significantly from the expected number of positions in the reference (29903). If you are using a bam file that only covers part of the genome, please specify this region exactly with the --region argument so the number of reference bases is known. Alternatively, disable maximum coverage filtering by setting -C to a large number.
2021-08-23 10:03:32 Total reference positions: 29903
2021-08-23 10:03:32 Total bases in bam: 857
2021-08-23 10:03:32 Mean read coverage: 0.03
error: {} ERROR: Max read coverage set to 0. printing empty VCF file

It seems the problem comes from the -A flag:

-A, --auto_max_cov        Automatically calculate mean coverage for region and set max coverage to mean_coverage +
                              5*sqrt(mean_coverage). (SLOWER)

Since Mean read coverage = 0.03, it calculates Max coverage = 0.03+5*sqrt(0.03) = 0.89602540378, and rounds down to 0.

Would it be possible to have -A round the maximum coverage up, instead of down?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions