-
Notifications
You must be signed in to change notification settings - Fork 2
Liftover
Liftover is the conversion of a variant from one genome build's coordinates to another.
Variants can be linked to an Allele (a change independent of genome build), variants representing the same change in different builds link to the same Allele. See Variants and Alleles
The conversion is handled by calling web services. First ClinGen Allele Registry, then if that fails, NCBI remap.
The ClinGen Allele Registry is a web API that provides unique identifiers (eg "CA113827") for variants, across genome builds.
We send the g.HGVS (as the c.HGVS only handles GRCh38) and if successful, the Allele stores the ClinGen Allele ID and API response (which contains coordinates for different genome builds).
Sometimes a variant can't be given an identifier, or coordinates are not available for our desired build.
The VariantGrid liftover process is handled via a UploadPipeline of type LIFTOVER. Steps involve:
- Create an Allele for a source variant
- Write a VCF record with new coordinate + Allele ID (as VCF ID field)
- Normalize and create this new variant if it doesn't exist
- Link the new variant to the Allele ID (from VCF ID field)
The final job of ClassificationImportProcessVariantsTask is to call create_liftover_pipelines
which creates a UploadPipeline of type LIFTOVER
for each genome build other than its own (those variants already exist and have linked Alleles from previous steps)
Step 2 involves:
ClinGen - call ClinGenAllele.get_variant_tuple(genome_build)
and write the desired genome build VCF record directly
NCBI Remap - write a VCF in source genome build, call Perl script which creates a VCF in desired genome build (and leaves ID column intact)
Steps 3 and 4 are performed by BulkAlleleLinkingVCFProcessor
which does the standard normalization/insert/redis etc as other VCF_Import, but also creates a VariantAllele
linking to the Allele (from VCF ID) for each Variant created.