-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Variants long table CSV - clarifications #23
Comments
For example, the column We can see which variants are in the final consensus from the VCF files in
|
Possibly use The |
The strategy to
An alternative (which is a bit more involved) is to do the following: # Create a shell variable with the sample names from our clean FASTA file
SAMPLES=$(grep ">" report/consensus.fa | sed 's/>//')
# Create a CSV file containing the column names of our new table
echo "sample,chrom,ref,alt" > report/variants.csv
# Use a for loop to run bcftools query on each sample
# adding the result of each iteration to the CSV file we created above
for SAMPLE in $SAMPLES
do
bcftools query -f "${SAMPLE},%CHROM,%POS,%REF,%ALT\n" results/viralrecon/medaka/${SAMPLE}.pass.unique.vcf.gz >> report/variants.csv
done Which results in a CSV file in long format. This is probably enough for reporting, etc. |
This is my current understanding about variants results:
I have also empirically checked this on a set of 48 samples each on Illumina and Nanopore pipelines. |
See #19 for details of where this is found in the pipeline |
Need to clarify what some of the columns in
variants_long_table.csv
file indicate.There's some inconsistencies sometimes, for example:
REF_DP
+ALT_DP
doesn't add up; for indels, oftenREF_DP
=ALT_DP
withAF
= 1.It's also quite hard to know which of those mutations are actually part of the final consensus sequence.
Would be good to clarify these things.
The text was updated successfully, but these errors were encountered: