Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

T2T-CHM13v2.0 gnomAD liftover VCF should contain gnomAD ID or a way to map back to 38 coordinates #1817

Open
davmlaw opened this issue Dec 18, 2024 · 4 comments
Assignees

Comments

@davmlaw
Copy link

davmlaw commented Dec 18, 2024

Thanks so much for creating the gnomAD liftover files for T2T located here:

https://ftp.ensembl.org/pub/rapid-release/species/Homo_sapiens/GCA_009914755.4/ensembl/variation/2022_10/vcf/2024_07/

I am not sure if this is the right place though I have seen @nakib103 who I think did this work on this repo - happy to move wherever you think is more appropriate

One of the things you want to do with gnomAD is look at the values on the site. If there is no dbSNP rs ID then this is very difficult with the T2T liftover, as you don't know the GRCh38 coordinates to use.

If possible - could you please add some kind of identifier (perhaps your liftover tool allows you to add the source coordinates as an INFO field?)

Another reason I wanted the ID is to investigate the 2 GRCh38 variants that mapped to the same T2T coordinate. It is not unexpected that this happens (it's one of the reasons we want to upgrade to T2T!) but it's difficult to work out where the 1st variant (without a dbSNP rsID) came from:

tabix gnomad.exomes.v4.1.sites.GCA_009914755.4.trimmed_liftover.vcf.gz chr4:148869903-148869903

Output:

chr4	148869903	.	GC	G	.	PASS	AC=3;AN=1221986;AF=2.45502e-06;grpmax=nfe;fafmax_faf95_max=8.2e-07;fafmax_faf95_max_gen_anc=nfe;AC_XX=1;AF_XX=1.60411e-06;AN_XX=623398;nhomalt_XX=0;AC_XY=2;AF_XY=3.3412e-06;AN_XY=598588;nhomalt_XY=0;nhomalt=0;AC_afr=0;AF_afr=0;AN_afr=26218;nhomalt_afr=0;AC_amr=0;AF_amr=0;AN_amr=25976;nhomalt_amr=0;AC_asj=0;AF_asj=0;AN_asj=19166;nhomalt_asj=0;AC_eas=0;AF_eas=0;AN_eas=34596;nhomalt_eas=0;AC_fin=0;AF_fin=0;AN_fin=37502;nhomalt_fin=0;AC_mid=0;AF_mid=0;AN_mid=4130;nhomalt_mid=0;AC_nfe=3;AF_nfe=3.08507e-06;AN_nfe=972424;nhomalt_nfe=0;AC_raw=79;AF_raw=5.45803e-05;AN_raw=1447410;nhomalt_raw=0;AC_remaining=0;AF_remaining=0;AN_remaining=50660;nhomalt_remaining=0;AC_sas=0;AF_sas=0;AN_sas=51314;nhomalt_sas=0;AC_grpmax=3;AF_grpmax=3.08507e-06;AN_grpmax=972424;nhomalt_grpmax=0;fafmax_faf99_max=2.3e-07;fafmax_faf99_max_gen_anc=nfe;age_hist_het_bin_freq=0|0|0|0|0|0|1|0|0|0;age_hist_het_n_smaller=0;age_hist_het_n_larger=0;age_hist_hom_bin_freq=0|0|0|0|0|0|0|0|0|0;age_hist_hom_n_smaller=0;age_hist_hom_n_larger=0;AS_VQSLOD=3.1304;allele_type=del;n_alt_alleles=20;variant_type=mixed;was_mixed;lcr
chr4	148869903	rs1408958407	GC	G	.	PASS	AC=128;AN=1221882;AF=0.000104756;grpmax=sas;fafmax_faf95_max=0.00042005;fafmax_faf95_max_gen_anc=sas;AC_XX=60;AF_XX=9.62541e-05;AN_XX=623350;nhomalt_XX=0;AC_XY=68;AF_XY=0.000113611;AN_XY=598532;nhomalt_XY=0;nhomalt=0;AC_afr=0;AF_afr=0;AN_afr=26218;nhomalt_afr=0;AC_amr=7;AF_amr=0.000269521;AN_amr=25972;nhomalt_amr=0;AC_asj=5;AF_asj=0.000260879;AN_asj=19166;nhomalt_asj=0;AC_eas=6;AF_eas=0.000173451;AN_eas=34592;nhomalt_eas=0;AC_fin=1;AF_fin=2.66667e-05;AN_fin=37500;nhomalt_fin=0;AC_mid=0;AF_mid=0;AN_mid=4130;nhomalt_mid=0;AC_nfe=69;AF_nfe=7.09631e-05;AN_nfe=972336;nhomalt_nfe=0;AC_raw=590;AF_raw=0.000407625;AN_raw=1447410;nhomalt_raw=0;AC_remaining=10;AF_remaining=0.000197394;AN_remaining=50660;nhomalt_remaining=0;AC_sas=30;AF_sas=0.000584704;AN_sas=51308;nhomalt_sas=0;AC_grpmax=30;AF_grpmax=0.000584704;AN_grpmax=51308;nhomalt_grpmax=0;fafmax_faf99_max=0.00036491;fafmax_faf99_max_gen_anc=sas;age_hist_het_bin_freq=0|0|2|0|4|4|2|2|0|0;age_hist_het_n_smaller=3;age_hist_het_n_larger=0;age_hist_hom_bin_freq=0|0|0|0|0|0|0|0|0|0;age_hist_hom_n_smaller=0;age_hist_hom_n_larger=0;AS_VQSLOD=3.0278;allele_type=del;n_alt_alleles=20;variant_type=mixed;was_mixed;lcr
@nakib103 nakib103 self-assigned this Dec 18, 2024
@nakib103
Copy link
Contributor

Hi @davmlaw,

It is good to see you using the gnomAD frequency files for T2T. The issue is pointed out is interesting and correct. Many of the gnomAD variants in v4.1 are not yet accessioned in dbSNP and it creates this problem of tracing them back to GRCh38 source.

I have created a ticket to update the T2T gnomAD file with source identifier (likely to be CHROM:POS:REF:ALT) and hopefully get to it in the new year. I will update here once we have the file in FTP.

Best regards,
Nakib

@davmlaw
Copy link
Author

davmlaw commented Dec 19, 2024

Sweet, thanks. I am only just trying out T2T. While here I'll just note a few things annotation wise I've missed (I don't expect you to fix all of these, but FYI)

  • gnomAD Structural Variants (lifted over myself)
  • Repeats - UCSC provide repeat masker in BigBed (bb) which VEP doesn't support, I converted to bed.gz easily enough
  • Conservation - UCSC provide Cactus - https://hgdownload.soe.ucsc.edu/gbdb/hs1/ but it is in HAL format - though I assume I can probably convert that to bigWig somehow

@davmlaw
Copy link
Author

davmlaw commented Dec 19, 2024

For the lifted over ClinVar from here:

https://ftp.ensembl.org/pub/rapid-release/species/Homo_sapiens/GCA_009914755.4/ensembl/variation/2022_10/vcf/2024_07/

the contigs have "chr1" but the records have "1"

##contig=<ID=chr1,length=248387328,assembly=Homo_sapiens_gca009914755v4.T2T_CHM13_v2.dna.primary_assembly.fa.gz>
1       355579  1924157 C       G       .       .       ALLELEID=1983057;CLNDISDB=MedGen:C3661900;CLNDN=not_provided;CLNHGVS=NC_000001.11:g.925946C>G;CLNREVSTAT=criteria_provided,_single_submitter;CLNSIG=Uncertain_significance;CLNVC=single_nucleotide_variant;CLNVCSO=SO:0001483;GENEINFO=SAMD11:148398;MC=SO:0001583|missense_variant;ORIGIN=1

@nakib103
Copy link
Contributor

Thanks reporting the issue in ClinVar file. I need to see how they were generated to cause the mismatch but probably a simple reheader would do. I have added a note to the existing ticket to fix this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants