-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
T2T-CHM13v2.0 gnomAD liftover VCF should contain gnomAD ID or a way to map back to 38 coordinates #1817
Comments
Hi @davmlaw, It is good to see you using the gnomAD frequency files for T2T. The issue is pointed out is interesting and correct. Many of the gnomAD variants in v4.1 are not yet accessioned in dbSNP and it creates this problem of tracing them back to GRCh38 source. I have created a ticket to update the T2T gnomAD file with source identifier (likely to be Best regards, |
Sweet, thanks. I am only just trying out T2T. While here I'll just note a few things annotation wise I've missed (I don't expect you to fix all of these, but FYI)
|
For the lifted over ClinVar from here: the contigs have "chr1" but the records have "1"
|
Thanks reporting the issue in ClinVar file. I need to see how they were generated to cause the mismatch but probably a simple reheader would do. I have added a note to the existing ticket to fix this. |
Thanks so much for creating the gnomAD liftover files for T2T located here:
https://ftp.ensembl.org/pub/rapid-release/species/Homo_sapiens/GCA_009914755.4/ensembl/variation/2022_10/vcf/2024_07/
I am not sure if this is the right place though I have seen @nakib103 who I think did this work on this repo - happy to move wherever you think is more appropriate
One of the things you want to do with gnomAD is look at the values on the site. If there is no dbSNP rs ID then this is very difficult with the T2T liftover, as you don't know the GRCh38 coordinates to use.
If possible - could you please add some kind of identifier (perhaps your liftover tool allows you to add the source coordinates as an INFO field?)
Another reason I wanted the ID is to investigate the 2 GRCh38 variants that mapped to the same T2T coordinate. It is not unexpected that this happens (it's one of the reasons we want to upgrade to T2T!) but it's difficult to work out where the 1st variant (without a dbSNP rsID) came from:
Output:
The text was updated successfully, but these errors were encountered: