Variants long table CSV - clarifications #23

tavareshugo · 2022-07-08T10:41:25Z

Need to clarify what some of the columns in variants_long_table.csv file indicate.
There's some inconsistencies sometimes, for example: REF_DP + ALT_DP doesn't add up; for indels, often REF_DP = ALT_DP with AF = 1.

It's also quite hard to know which of those mutations are actually part of the final consensus sequence.

Would be good to clarify these things.

The text was updated successfully, but these errors were encountered:

tavareshugo · 2022-07-08T10:56:08Z

For example, the column FILTER includes PASS, even if a variant doesn't make it to the final consensus.
For Illumina, ivar consensus removes variants below a certain threshold of AF, but this is not indicated in that variants table.

We can see which variants are in the final consensus from the VCF files in

Illumina: variants/ivar/consensus/bcftools/*.filtered.vcf.gz
Nanopore: variants/medaka/*.pass.unique.vcf.gz (?? not sure - need to double-check)

tavareshugo · 2022-08-09T15:40:50Z

Possibly use bcftools merge | bcftools query to do this.

The bcftools query command used in the pipeline is here (notice it's slightly different depending on the caller).

tavareshugo · 2022-10-26T14:39:27Z

The strategy to bcftools merge | bcftools query doesn't really work, because:

After merging, samples without a variant in a given position are given ./.. It can be set to output 0/0, but that's not accurate either, because the samples may have had missing data in those positions.
The output of the query is not in "long" format, which is also not ideal for downstream analysis.

An alternative (which is a bit more involved) is to do the following:

# Create a shell variable with the sample names from our clean FASTA file
SAMPLES=$(grep ">" report/consensus.fa | sed 's/>//')

# Create a CSV file containing the column names of our new table
echo "sample,chrom,ref,alt" > report/variants.csv

# Use a for loop to run bcftools query on each sample
# adding the result of each iteration to the CSV file we created above
for SAMPLE in $SAMPLES
do
  bcftools query -f "${SAMPLE},%CHROM,%POS,%REF,%ALT\n" results/viralrecon/medaka/${SAMPLE}.pass.unique.vcf.gz >> report/variants.csv
done

Which results in a CSV file in long format. This is probably enough for reporting, etc.

tavareshugo · 2022-10-28T14:46:15Z

This is my current understanding about variants results:

--platform nanopore --> only uses filtered variants for all analysis (MultiQC, SnpEff, long table).
--platform illumina --> the "# SNPs" and "# INDELs" reported in the MultiQC table are for filtered variants. However, the long table and SnpEff analysis include all variants (including those that do not make it through the 0.75 threshold).
So, if one wants a list of the actual variants that make it through to the consensus, then filter this table for AF > 0.75 and DP > 10.

I have also empirically checked this on a set of 48 samples each on Illumina and Nanopore pipelines.

tavareshugo · 2022-10-28T15:00:01Z

See #19 for details of where this is found in the pipeline

tavareshugo added enhancement New feature or request invalid This doesn't seem right labels Jul 8, 2022

tavareshugo mentioned this issue Oct 28, 2022

Which variants are used in consensus assembly? #19

Open

tavareshugo changed the title ~~Variant calling CSV~~ Variants long table CSV - clarifications Oct 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Variants long table CSV - clarifications #23

Variants long table CSV - clarifications #23

tavareshugo commented Jul 8, 2022

tavareshugo commented Jul 8, 2022

tavareshugo commented Aug 9, 2022

tavareshugo commented Oct 26, 2022

tavareshugo commented Oct 28, 2022 •

edited

Loading

tavareshugo commented Oct 28, 2022

Variants long table CSV - clarifications #23

Variants long table CSV - clarifications #23

Comments

tavareshugo commented Jul 8, 2022

tavareshugo commented Jul 8, 2022

tavareshugo commented Aug 9, 2022

tavareshugo commented Oct 26, 2022

tavareshugo commented Oct 28, 2022 • edited Loading

tavareshugo commented Oct 28, 2022

tavareshugo commented Oct 28, 2022 •

edited

Loading