Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter a gene list including intergenic regions #2319

Open
ccbruels opened this issue Nov 12, 2024 · 3 comments
Open

Filter a gene list including intergenic regions #2319

ccbruels opened this issue Nov 12, 2024 · 3 comments

Comments

@ccbruels
Copy link

Hi,

I see how to filter a gene list for most snv/indels in issue Filter a gene list #1964.

However, I want to look at intergenic variants as well. Annovar includes other info in the Gene.refGene field like
Gene.refGene=FAM138A\x3bOR4F5

If my gene.txt file only contains FAM138A, the intergenic variants are not included.

I'm using bcftools v1.21. My command is in the format bcftools view -i '[email protected] ' file.vcf

Including wildcards in the command or in the genes.txt file didn't work.

Do you have any suggestions?

@pd3
Copy link
Member

pd3 commented Nov 18, 2024

The problem is somewhat confusing as it is stated: you say you want to filter in intergenic regions but the example you gave seems unrelated. Instead, it seems the variant is in two overlapping genes (here FAM138A and OR4F5) and the problem is that matching by gene name does not work for these records.
So I am unsure what is it you want?

@ccbruels
Copy link
Author

Perhaps I picked a bad example, that variant was tagged as intergenic by annovar but I did not look at it in a genome browser.

Looking at another clearly intergenic variant, here is the annovar vcf output
chr1 3439841 . A C 31.76 PASS P;ANNOVAR_DATE=2020-06-08;Func.refGene=intergenic;Gene.refGene=PRDM16\x3bARHGEF16;GeneDetail.refGene=dist\x3d1220\x3bdist\x3d14824;ExonicFunc.refGene=.;AAChange.refGene=.;Xref.refGene=.;avsnp151=rs2483250;gnomad41_genome_AF=0.8166;gnomad41_genome_AF_raw=0.8160;CLNSIG=.

My question is: how would I filter for this variant if I am looking for variants flagged as intergenic, but specifically variants that might affect ARHGEF16? I have a very large list of genes, and it would be difficult to correctly list all of the possible variations if I want to find intergenic variants near it.

@pd3
Copy link
Member

pd3 commented Dec 1, 2024

The question seems focused on the variant being intergenic. I am sorry but I still don't understand what is not working for you exactly. Can you provide a small test case, a VCF with full header, the gene list you are using, the command which is not working for you, and the output you expect?

I see the VCF has the Func.refGene=intergenic field, and also the gene name. I would expect it should be possible to combine that in the filtering expression as -i 'Func.refGene="intergenic" && Gene.refGene="PRDM16\x3bARHGEF16"' . I am wondering if the \x3b part is maybe the cause of the problems? It would help to provide more information, as indicated above

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants