Skip to content

macarthur-lab/gene_lists

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

List of gene lists

Often in bioinformatics we want a list of genes so that we can ask, "are genes in this list more X than other genes?" or "are genes in this list enriched in this other list?" and so on. There are many useful lists out there, but many of them are in an Excel file supplement to a paper, or an XML format with loads of other info you don't need, or use outdated gene symbols. For one reason or another, it often takes a lot of work to wrestle them into a format you can use. This repository is the MacArthur Lab's effort to collect all the lists we find useful into one place, with each formatted as just a single-column text file listing the current gene symbols.

Here is a guide to the lists we currently have in this repo:

List Count Description Please cite
Universe 19,194 Approved symbols for 18,991 protein-coding genes according to HGNC as of Feb 9, 2015. For details see src/create_universe.bash. This list is the "universe" of which all subsequent lists are subsets. See genenames.org/about/overview. Users are asked to web reference "HUGO Gene Nomenclature Committee at the European Bioinformatics Institute" (http://www.genenames.org/) if possible.
FDA-approved drug targets 385 Genes whose protein products are known to be the mechanistic targets of FDA-approved drugs (updated 2018-09-13). For details on the exact criteria we used for inclusion in this list, see src/drug_targets.py See drugbank.ca/about. Please cite [Law 2014, Knox 2011, Wishart 2008, Wishart 2006, and/or Wishart 2018].
Drug targets by Nelson et al 2012 201 Drug targets according to Nelson et al 2012, with reference to Russ & Lampel 2005. [Nelson 2012, Russ & Lampel 2005]
Autosomal dominant genes by Blekhman et al 2008 307 OMIM disease genes deemed to follow autosomal dominant inheritance according to extensive manual curation by Molly Przeworski's group. [Blekhman 2008]
Autosomal dominant genes by Berg et al 2013 631 OMIM disease genes (as of June 2011) deemed to follow autosomal dominant inheritance according Berg et al, 2013. [Berg 2013]
Autosomal recessive genes by Blekhman et al 2008 527 OMIM disease genes deemed to follow autosomal recessive inheritance according to extensive manual curation by Molly Przeworski's group. [Blekhman 2008]
Autosomal recessive genes by Berg et al 2013 1073 OMIM disease genes (as of June 2011) deemed to follow autosomal recessive inheritance according Berg et al, 2013. [Berg 2013]
X-linked genes by Blekhman et al 2008 66 OMIM disease genes deemed to follow X-linked inheritance (dominant/recessive not specified) according to extensive manual curation by Molly Przeworski's group. [Blekhman 2008]
X-linked recessive genes by Berg et al 2013 102 OMIM disease genes (as of June 2011) deemed to follow X-linked recessive inheritance according Berg et al, 2013. [Berg 2013]
X-linked dominant genes by Berg et al 2013 34 OMIM disease genes (as of June 2011) deemed to follow X-linked dominant inheritance according Berg et al, 2013. [Berg 2013]
X-linked ClinVar genes 61 X chromosome genes in the August 6, 2015 ClinVar release that have at least 3 reportedly pathogenic, non-conflicted variants in ClinVar with at least one submitter other than OMIM or GeneReviews. Code here. Cite the ClinVar paper [Landrum 2014]
All dominant genes 709 Currently the union of the Berg and Blekhman dominant lists, may add more lists later. [Blekhman 2008, Berg 2013]
All recessive genes 1183 Currently the union of the Berg and Blekhman recessive lists, may add more lists later. [Blekhman 2008, Berg 2013]
Homozygous LoF tolerant 330 Genes with at least two different high-confidence LoF variants found in a homozygous state in at least one individual in ExAC. By Konrad Karczewski. Just cite the ExAC paper [Lek 2016]
Essential in culture 283 Genes deemed essential in multiple cultured cell lines based on shRNA screen data [Hart 2014]
Essential in culture (CRISPR screening) 683 Genes deemed essential in multiple cultured cell lines based on CRISPR/Cas screen data [Hart 2017]
Non-essential in culture (CRISPR screening) 913 Genes deemed non-essential in multiple cultured cell lines based on CRISPR/Cas screen data [Hart 2017]
Essential in mice 2,454 Genes where homozygous knockout in mice results in pre-, peri- or post-natal lethality. The mouse phenotypes were reported by Jackson Labs [Blake 2011], then essential gene list was extracted via manual review of phenotypes by [Georgi 2013], and the essential/non-essential flag was put into dbNSFP [Liu 2013]. We extracted the genes from dbNSFP. [Blake 2011, Georgi 2013, and Liu 2013]
Genes nearest to GWAS peaks 6,336 Closest gene to GWAS hits with P < 5-e8 in the NHGRI GWAS catalog (MAPPED_GENE column) as of Sep 13, 2018 [MacArthur 2017]
DNA Repair Genes, WoodRD 178 An updated inventory of human DNA repair genes. (Last modified on Tuesday 15th April 2014). For details see src/DRG_WoodRD.R Cite [Wood 2005] and include a web reference to this URL.
DNA Repair Genes, KangJ 151 Supplementary Table 1. 151 DNA repair genes. DNA repair genes from DNA repair pathways: ATM, BER, FA/HR, MMR, NHEJ, NER, TLS, XLR, RECQ, and other. Cite [Kang 2012]
ClinGen haploinsufficient genes 294 Genes with sufficient evidence for dosage pathogenicity (level 3) as determined by the ClinGen Dosage Sensitivity Map as of Sep 13, 2018 Cite [Rehm 2015]. See also ClinGen's TOU
Olfactory receptors 371 Olfactory receptors from the Mainland 2015's data release Mainland 2015
Genes with any disease association reported in ClinVar 3078 Using this simple script, downloaded the ClinVar tab-delimited summary as of May 12, 2015, and took all gene symbols for which there is at least one variant with an assertion of pathogenic or likely pathogenic in ClinVar. Cite the ClinVar paper [Landrum 2014]
Kinases 347 From UniProt's pkinfam list [UniProt Consortium 2018], and also according to UniProt this list is based on 3 publications: [Hunter 2000, Manning 2002, Miranda-Saavedra & Barton 2007]
GPCRs from guidetopharmacology 391 GPCR list from guidetopharmacology.org Citing instructions here — for GPCRs, cite [Alexander 2017 & Harding 2018].
GPCRs from Uniprot 756 This query of the Uniprot database [UniProt Consortium 2018]
GPCRs all 759 Union of the above two lists See previous two entries
Natural product targets 37 List of hand-curated targets of natural products from supplement of [Dancik 2010] [Dancik 2010]
BROCA - Cancer Risk Panel 66 BROCA is useful for the evaluation of patients with a suspected hereditary cancer predisposition, with a focus on syndromes that include breast or ovarian cancer as one of the cancer types. Depending on the causative gene involved, these cancers may co-occur with other cancer types (such as colorectal, endometrial, pancreatic, endocrine, or melanoma). University of Washington
ACMG V2.0 59 The minimum list of genes to be reported as incidental or secondary findings as published by the American College of Medical Genetics and Genomics (ACMG) [Kalia 2017]
GPI-anchored proteins 135 Gene symbols encoding proteins annotated by UniProt as being GPI-anchored. Cite the latest UniProt paper: [UniProt Consortium 2017]

We welcome pull requests for adding additional lists, provided they are licensed for redistribution. If possible, please provide the source code used to extract the list from its original source, and an appropriate description for this readme.