Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong HGVS consequence for clinvar reported variants in gnomAD v4 #1453

Open
thedrakesng opened this issue Mar 21, 2024 · 3 comments
Open

Wrong HGVS consequence for clinvar reported variants in gnomAD v4 #1453

thedrakesng opened this issue Mar 21, 2024 · 3 comments
Assignees

Comments

@thedrakesng
Copy link

thedrakesng commented Mar 21, 2024

What you did:

I was checking a variant in Clinvar Variants section of gnomAD v4.

HGVS consequence shown in ClinVar variants section of gnomAD v4 is different from that of clinvar and gnomAD v2.1

for example, consequence of ClinVar Variation ID914550, is displayed as p.Gln334Ter in gnomAD v4 (https://gnomad.broadinstitute.org/gene/ENSG00000156466?dataset=gnomad_r4), however it is shown as NM_001001557.4(GDF6):c.1251C>T (p.Pro417=) in clinvar(https://www.ncbi.nlm.nih.gov/clinvar/variation/914550/) and gnomAD v2.1(https://gnomad.broadinstitute.org/gene/ENSG00000156466?dataset=gnomad_r2_1)

The same problem was discovered in other types of variants in different genes.

image

What you expected to see after you did that:

image

What you actually saw after you did that:

-gnomAD v4.0, GDF6, 8-96144680-G-A, ClinVar Variation ID914550, HGVS consequence: p.Gln334Ter, VEPannoation:stop gained
https://gnomad.broadinstitute.org/gene/ENSG00000156466?dataset=gnomad_r4

image


As I was writing this bug report, I found few more variants with different HGVS consequence from that of clinvar regardless of gnomAD version.

for example,
gnomADversion: gnomAD v2.1.1, Gene:P4HB, variant:17-79817175-C-G
, ClinVarVariationID:1507457, HGVSconsequence: c.233+1G>C, VEPannotation:splicedonor
image

gnomADversion: gnomAD v4.0, Gene:P4HB, variant:17-81859299-C-G, ClinVarVariationID:1507457, HGVSconsequence: c.233+1G>C, VEPannoation:splicedonor
image

is reported in clinvar as NM_000918.4(P4HB):c.234G>C (p.Arg78Ser).

@rileyhgrant
Copy link
Contributor

Hiya @thedrakesng

Thanks for bringing this to our attention. I did some digging into this this afternoon, and it appears to be a discrepancy between which of the transcript consequences are displayed.

I'll ask our product owner about what we want to display here, as I don't think I'm in a position to unilaterally make this decision. The discrepancy is that we're showing the most severe consequence of all the possible transcript consequences, which is the stop gained p.Gln334Ter you see, instead of the synonymous p.Pro417Pro, which is the consequence for the mane select transcript, and my hunch on what we actually want to display.


Information from my digging session for future reference

For v4, when we annotate the transcript consequences with vep105, we get four different consequences, including the synonymous one reported in v2 (which uses a different version of vep) and on ClinVar's website.

Here's a few portions of the table I for the transcript consequences

+---------------------------------------+---------------------------------------------+-----------------------------------------+
| transcript_consequences.transcript_id | transcript_consequences.polyphen_prediction | transcript_consequences.sift_prediction |
+---------------------------------------+---------------------------------------------+-----------------------------------------+
| str                                   | str                                         | str                                     |
+---------------------------------------+---------------------------------------------+-----------------------------------------+
| "ENST00000621429"                     | NA                                          | NA                                      |
| "ENST00000287020"                     | NA                                          | NA                                      |
| "NM_001001557.4"                      | NA                                          | NA                                      |
| "ENST00000620978"                     | NA                                          | NA                                      |
+---------------------------------------+---------------------------------------------+-----------------------------------------+
+-------------------------------------+-------------------------------+-------------------------------+--------------------------------------+
| transcript_consequences.gene_symbol | transcript_consequences.hgvsc | transcript_consequences.hgvsp | transcript_consequences.is_canonical |
+-------------------------------------+-------------------------------+-------------------------------+--------------------------------------+
| str                                 | str                           | str                           |                                 bool |
+-------------------------------------+-------------------------------+-------------------------------+--------------------------------------+
| "GDF6"                              | "c.1000C>T"                   | "p.Gln334Ter"                 |                                   NA |
| "GDF6"                              | "c.1251C>T"                   | "p.Pro417Pro"                 |                                 True |
| "GDF6"                              | "c.1251C>T"                   | "p.Pro417Pro"                 |                                 True |
| "GDF6"                              | "c.794-3C>T"                  | NA                            |                                   NA |
+-------------------------------------+-------------------------------+-------------------------------+--------------------------------------+

In our logic we sort this list such that the most severe transcript comes before the mane select transcript. The result is that on a page such as the gene page above, even though it's referencing the mane select transcript, we display the transcript consequence for what is possibly a different transcript, here this results in a discrepancy between v4's clinvar plot (shows a stop gained for this variant) and v2's clinvar plot and clinvar's website (shows a synonymous variant).

If we want to prioritize the mane select consequence, it should be as straightforward as modifying lines 130 - 138 in annotate_transcript_consequences.py. However, as this function is also referenced in our variant pipelines, it might better serve to add another parameter you pass to determine the sort order of the transcripts.

We could also modify transcriptConsequence.ts to determine which consequence it pulls in the context of a gene. Currently it just takes the first one, which for clinvar is the most severe consequence.

@thedrakesng
Copy link
Author

thedrakesng commented Apr 2, 2024

Thank you @rileyhgrant , for such thorough explanation.

@rileyhgrant rileyhgrant self-assigned this Apr 2, 2024
@rileyhgrant
Copy link
Contributor

From some discussion -- there's a seperate issue here in that we don't actually even display some of these transcripts that VEP is giving us consequences for

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants