Skip to content

Liftover

Dave Lawrence edited this page Feb 27, 2023 · 3 revisions

Liftover is the conversion of a variant from one genome build's coordinates to another.

Variants can be linked to an Allele (a change independent of genome build), variants representing the same change in different builds link to the same Allele. See Variants and Alleles

The conversion is handled by calling web services. First ClinGen Allele Registry, then if that fails, NCBI remap.

ClinGen Allele Registry

The ClinGen Allele Registry is a web API that provides unique identifiers (eg "CA113827") for variants, across genome builds.

We send the g.HGVS (as the c.HGVS only handles GRCh38) and if successful, the Allele stores the ClinGen Allele ID and API response (which contains coordinates for different genome builds).

Sometimes a variant can't be given an identifier, or coordinates are not available for our desired build.

NCBI Remap

See Install NCBI liftover

Liftover process

The VariantGrid liftover process is handled via a UploadPipeline of type LIFTOVER. Steps involve:

  1. Create an Allele for a source variant
  2. Write a VCF record with new coordinate + Allele ID (as VCF ID field)
  3. Normalize and create this new variant if it doesn't exist
  4. Link the new variant to the Allele ID (from VCF ID field)

The final job of ClassificationImportProcessVariantsTask is to call create_liftover_pipelines which creates a UploadPipeline of type LIFTOVER for each genome build other than its own (those variants already exist and have linked Alleles from previous steps)

Step 2 involves:

ClinGen - call ClinGenAllele.get_variant_tuple(genome_build) and write the desired genome build VCF record directly

NCBI Remap - write a VCF in source genome build, call Perl script which creates a VCF in desired genome build (and leaves ID column intact)

Steps 3 and 4 are performed by BulkAlleleLinkingVCFProcessor which does the standard normalization/insert/redis etc as other VCF_Import, but also creates a VariantAllele linking to the Allele (from VCF ID) for each Variant created.

Clone this wiki locally