Skip to content

Variant Annotation adding new columns

Dave Lawrence edited this page Oct 14, 2020 · 2 revisions

VEP

VEP has a number of Plugins and it's quite easy to add a column from VCF, BED, GFF, GTF, BigWig - see VEP Custom.

I suggesting running VEP manually via command line, there are some test VCFs for GRCh37/GRCh38 in annotation/tests/test_data/

This is also useful to check how the output is formatted, and whether it varies per transcript.

VariantAnnotation Model

Determine whether the field varies per transcript (such as amino_acids/exon) or not (eg population frequency)

Add the column to AbstractVariantAnnotation (which is the base class of both VariantAnnotation and VariantTranscriptAnnotation) if you need a copy per transcript, otherwise add it to VariantAnnotation

To save space, use a choices field if it uses a limited number of text values.

You need to run python3 manage.py makemigrations annotation to generate the schema migration script after changing the model.

VariantGridColumn

If you want to display the annotation field on an analysis grid, you need to first create a new snpdb.VariantGridColumn record via data migration, eg:

python3 manage.py makemigrations snpdb --empty --name "variant_grid_column_new_exon_field"

Then add the new record (see snpdb/migrations/0002_initial_data.create_columns to see how initial default columns are created) eg:

VariantGridColumn.objects.create(grid_column_name='exon',
                                 grid_column_name='variantannotation__exon',
                                 annotation_level': 'T',
                                 label='Exon',
                                 description='Number(s) of affected exon(s)',
                                 model_field=True,
                                 queryset_field=True)

Users can add this column to their custom columns via the settings page, or if you'd like it added to default columns, create CustomColumn records in the data migration.

VariantGrid VEP pipeline

The command line for VEP is generated in annotation.vep_annotation.get_vep_command and is driven by the database records annotation.ColumnVEPField

If you're adding a Plugin or new custom VCF for the first time, you'll need to modify VEPPlugin or VEPCustom and then run an annotation model migration. If the plugin requires annotation data, you'll need to add a record to settings eg settings.ANNOTATION["GRCh37"]["vep_config"] and then reference that via PLUGINS mapping in get_vep_command()

Add a new data migration:

python3 manage.py makemigrations annotation --empty --name "annotation_vep_new_exon_field"

See annotation/migrations/0003_initial_data.populate_column_vep_fields for examples.

If you want the new column to be exported via VCF (save a grid as VCF in an analysis) then you can also add to the annotation migration a ColumnVCFInfo record (see annotation/migrations/0003_initial_data.populate_column_vcf_info for examples)

Loading code

The VEP VCF file is processed by annotation.vcf_files.bulk_vep_vcf_annotation_inserter.BulkVEPVCFAnnotationInserter

If you need to format/modify a field from VCF into data, add a formatter in _add_vep_field_handlers

Variant details

To display this new field on the variant details page, you need to add it to variant_details.html

Testing

There's a management command to run the currently configured VEP pipeline, and "--test" runs it over the test VCFs:

python3.8 manage.py vep_run --test --genome-build=GRCh37
python3.8 manage.py vep_run --test --genome-build=GRCh38

This will re-generate the annotated test VCFs with the new fields, and you can add a new test for the column to annotation.tests.test_annotation_vcf

Clone this wiki locally