You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to create a subset of a large .vcf.gz file so that I would be able to read it in R with read.vcfR from the vcfR package (I get memory issues if I try to read the non-subsetted .vcf.gz file). I only need certain variants given in a list. What I have tried:
The 'snplist.txt' is tab-delimited and includes columns '#CHROM' and 'POS' (not sure if they were required).
I have also tried option '-R' instead of '-T' for the 'view' command, and command 'filter' instead of 'view' with both options '-T' and '-R'. But depending on which variants are included in snplist.txt, in the subsetted there is always either just one variant or no variants at all, even though
less -S hbcs_sisu_b38.vcf.gz | grep -f snplist.txt
prints lines for more variants.
I am not sure if .csi file was required here, but I have created hbcs_sisu_b38.vcf.gz.csi like this:
~/bcftools-1.12/bcftools index hbcs_sisu_b38.vcf.gz
The text was updated successfully, but these errors were encountered:
The command looks correct. This is a very basic functionality, so it's strange it wouldn't work. Can you try to upgrade to the latest version of bcftools, we are at 1.21 now. If there is something wrong with the input data, the newer version might give some informative error messages.
The -T option does not require an index, so it's unlikely that it is the problem.
If upgrading does not help, can you provide a small test case for us to reproduce the problem?
Thanks for the fast reply. I downloaded and installed version 1.21 but now I get an error message saying 'Could not parse 2-th line of file snplist.txt, using the columns 1,2[,3] Failed to read the targets: snplist.txt'
Here is a head of snplist.txt:
#CHROM POS
1 19831748
1 30185237
1 30187395
Head of hbcs_sisu_b38.vcf.gz would be quite massive so I copy-pasted here only seven first columns of the output when I run less -S hbcs_sisu_b38.vcf.gz | grep -f snplist.txt:
#CHROM POS ID REF ALT QUAL FILTER
chr1 19831748 rs4509550 T C . PASS
chr1 30185237 rs7536179 T C . PASS
chr1 30187395 rs11371593 T TG . PASS
Hi,
I would like to create a subset of a large .vcf.gz file so that I would be able to read it in R with
read.vcfR
from thevcfR
package (I get memory issues if I try to read the non-subsetted .vcf.gz file). I only need certain variants given in a list. What I have tried:~/bcftools-1.12/bcftools view -T snplist.txt hbcs_sisu_b38.vcf.gz -o hbcs_sisu_b38_subset.vcf.gz
The 'snplist.txt' is tab-delimited and includes columns '#CHROM' and 'POS' (not sure if they were required).
I have also tried option '-R' instead of '-T' for the 'view' command, and command 'filter' instead of 'view' with both options '-T' and '-R'. But depending on which variants are included in snplist.txt, in the subsetted there is always either just one variant or no variants at all, even though
less -S hbcs_sisu_b38.vcf.gz | grep -f snplist.txt
prints lines for more variants.
I am not sure if .csi file was required here, but I have created hbcs_sisu_b38.vcf.gz.csi like this:
~/bcftools-1.12/bcftools index hbcs_sisu_b38.vcf.gz
The text was updated successfully, but these errors were encountered: