Skip to content

VCF chromosome contig names

Dave Lawrence edited this page Nov 16, 2023 · 2 revisions

When handling VCFs it's crucial that chromosome names match what we expect. I think it's best to just use contig ids (eg "NC_000001.10") as that is explicit and you can't get builds mixed up.

If you want to convert chromosome names, you can do so via bcftools annotate:

bcftools annotate --rename-chrs chrom_contig.map file.vcf -o converted_file.vcf

you can generate the mapping files (converting to contig accession) via running in VG Django shell:

from snpdb.models import GenomeBuild

def write_chrom_mapping_file(genome_build):
   with open(f"chrom_mapping_{genome_build}.map", "w") as f:
       for contig in genome_build.contigs.filter(role='AM'):
           f.write("\t".join([contig.name, contig.refseq_accession]) + "\n")

write_chrom_mapping_file(GenomeBuild.grch37())
write_chrom_mapping_file(GenomeBuild.grch38())

I have also added these files in:

snpdb/genome/chrom_mapping_GRCh37.map
snpdb/genome/chrom_mapping_GRCh38.map
Clone this wiki locally