vkgl

Variant annotator for vkgl variant sharing. Consists of two parts:

HGVS translator

The active HGVS translator service code lives in another repository . The service uses the biocommons/hgvs library to convert HGVS to VCF. To start the translator service:

> cd docker
> docker-compose up

To see it in action, post an array of HGVS strings to the h2v endpoint:

curl -H 'Content-Type: application/json' -d '["NM_000088.3:c.589G>T", "NC_000017.10:g.48275363C>A"]' 'http://localhost:1234/h2v?keep_left_anchor=False'

These string can either be genomic HGVS (NC_000017.10:g.48275363C>A) or transcript HGVS (NM_000088.3:c.589G>T). Two flags can be set for this service:

Flag	Usage	Example
`keep_left_anchor`	The service can either return variants with the last unchanged nucleotide (=left anchor), or without it. Please note that this will go for all variants, including insertions for which an anchor is preferred.	`True`: `{"ref": "CT", "alt": "CC", "chrom":"19", "pos":"49473113", "type":"sub"}`
		`False`: `{"ref": "T", "alt": "C", "chrom":"19", "pos":"49473114", "type":"sub"}`
`strict`	The strict mode of the HGVS libary will be applied when strict is set to `True`. This means that validation warnings will result in errors rather than just warnings.

file processing pipeline

Spring Boot application that uses Apache Camel to read input files from src/test/inbox and annotate the lines with vcf info retrieved from the hgvs annotator.

Results will be stored in results dir once finished. Results with an error message will be routed to error file.

Requires JDK 11. To run the pipeline:

> mvn spring-boot:run -Dspring-boot.run.arguments=--hgnc.genes="location/of/your/hgnc/genes/file"

Add --exitOnError=true if you want the service to exit when an exception occurs.

Scroll down for more information about the HGNC genes file.

Gene validation (HGNC genes file)

Validation and correction of provided HGNC symbols is done using an export downloaded from the biomart website . The selected attributes are:

HGNC ID
Status
Approved symbol
Approved name
Alias symbol
Previous symbol
Chromosome
Chromosome location
Locus group
NCBI gene ID
Ensembl gene ID
UCSC gene ID

The downloaded file is named hgnc_genes.tsv and is stored in src/main/resources. To have the most accurate validation, it is recommended to update this file before running the pipeline.

Release

Releasing this repository is kind of special because it's a collection of scripts. First do simple github release. Then checkout the master, increase the version number in run.sh and commit it. Put a tag on the new version using the data-release-v prefix. Then push it to the master. Finalise the release by releasing the tag you just pushed.

Name		Name	Last commit message	Last commit date
Latest commit History 197 Commits
.github		.github
data-release-pipeline		data-release-pipeline
src		src
.gitignore		.gitignore
Jenkinsfile		Jenkinsfile
LICENSE		LICENSE
README.md		README.md
codecov.yml		codecov.yml
pom.xml		pom.xml
vkgl.svg		vkgl.svg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vkgl

HGVS translator

file processing pipeline

Gene validation (HGNC genes file)

Release

About

Releases

Packages

Languages

License

dtroelofsprins/data-transform-vkgl

Folders and files

Latest commit

History

Repository files navigation

vkgl

HGVS translator

file processing pipeline

Gene validation (HGNC genes file)

Release

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages