Often in bioinformatics we want a list of genes so that we can ask, "are genes in this list more X than other genes?" or "are genes in this list enriched in this other list?" and so on. There are many useful lists out there, but many of them are in an Excel file supplement to a paper, or an XML format with loads of other info you don't need, or use outdated gene symbols. For one reason or another, it often takes a lot of work to wrestle them into a format you can use. This repository is the MacArthur Lab's effort to collect all the lists we find useful into one place, with each formatted as just a single-column text file listing the current gene symbols.
Here is a guide to the lists we currently have in this repo:
List | Count | Description | Please cite |
---|---|---|---|
Universe | 18,991 | Approved symbols for 18,991 protein-coding genes according to HGNC as of Feb 9, 2015. For details see src/create_universe.bash. This list is the "universe" of which all subsequent lists are subsets. | See genenames.org/about/overview. Users are asked to web reference "HUGO Gene Nomenclature Committee at the European Bioinformatics Institute" (http://www.genenames.org/) if possible. |
FDA-approved drug targets | 286 | Genes whose protein products are known to be the mechanistic targets of FDA-approved drugs. For details on the exact criteria we used for inclusion in this list, see src/drug_targets.py | See drugbank.ca/about. Please cite [Law 2014, Knox 2011, Wishart 2008 and/or Wishart 2006]. |
Drug targets by Nelson et al 2012 | 201 | Drug targets according to Nelson et al 2012, with reference to Russ & Lampel 2005. | [Nelson 2012, Russ & Lampel 2005] |
Autosomal dominant genes by Blekhman et al 2008 | 307 | OMIM disease genes deemed to follow autosomal dominant inheritance according to extensive manual curation by Molly Przeworski's group. | [Blekhman 2008] |
Autosomal dominant genes by Berg et al 2013 | 631 | OMIM disease genes (as of June 2011) deemed to follow autosomal dominant inheritance according Berg et al, 2013. | [Berg 2013] |
Autosomal recessive genes by Blekhman et al 2008 | 529 | OMIM disease genes deemed to follow autosomal recessive inheritance according to extensive manual curation by Molly Przeworski's group. | [Blekhman 2008] |
Autosomal recessive genes by Berg et al 2013 | 1073 | OMIM disease genes (as of June 2011) deemed to follow autosomal recessive inheritance according Berg et al, 2013. | [Berg 2013] |
X-linked genes by Blekhman et al 2008 | 66 | OMIM disease genes deemed to follow X-linked inheritance (dominant/recessive not specified) according to extensive manual curation by Molly Przeworski's group. | [Blekhman 2008] |
X-linked recessive genes by Berg et al 2013 | 102 | OMIM disease genes (as of June 2011) deemed to follow X-linked recessive inheritance according Berg et al, 2013. | [Berg 2013] |
X-linked dominant genes by Berg et al 2013 | 34 | OMIM disease genes (as of June 2011) deemed to follow X-linked dominant inheritance according Berg et al, 2013. | [Berg 2013] |
X-linked ClinVar genes | 61 | X chromosome genes in the August 6, 2015 ClinVar release that have at least 3 reportedly pathogenic, non-conflicted variants in ClinVar with at least one submitter other than OMIM or GeneReviews. Code here. | Cite the ClinVar paper [Landrum 2014] |
All dominant genes | 709 | Currently the union of the Berg and Blekhman dominant lists, may add more lists later. | [Blekhman 2008, Berg 2013] |
All recessive genes | 1183 | Currently the union of the Berg and Blekhman recessive lists, may add more lists later. | [Blekhman 2008, Berg 2013] |
Essential in culture | 285 | Genes deemed essential in multiple cultured cell lines based on shRNA screen data | [Hart 2014] |
Essential in mice | 2,454 | Genes where homozygous knockout in mice results in pre-, peri- or post-natal lethality. The mouse phenotypes were reported by Jackson Labs [Blake 2011], then essential gene list was extracted via manual review of phenotypes by [Georgi 2013], and the essential/non-essential flag was put into dbNSFP [Liu 2013]. We extracted the genes from dbNSFP. | [Blake 2011, Georgi 2013, and Liu 2013] |
Genes nearest to GWAS peaks | 3,762 | Closest gene 3' and 5' of GWAS hits in the NHGRI GWAS catalog as of Feb 9, 2015 | See instructions here. Cite [Welter 2014] and include a web reference to genome.gov/gwastudies/. |
DNA Repair Genes, WoodRD | 178 | An updated inventory of human DNA repair genes. (Last modified on Tuesday 15th April 2014). For details see src/DRG_WoodRD.R | Cite [Wood 2005] and include a web reference to this URL. |
DNA Repair Genes, KangJ | 151 | Supplementary Table 1. 151 DNA repair genes. DNA repair genes from DNA repair pathways: ATM, BER, FA/HR, MMR, NHEJ, NER, TLS, XLR, RECQ, and other. | Cite [Kang 2012] |
ClinGen haploinsufficient genes | 221 | Genes with sufficient evidence for dosage pathogenicity (level 3) as determined by the ClinGen Dosage Sensitivity Map as of Feb 27, 2015 | See ClinGen |
Olfactory receptors | 371 | Olfactory receptors from the Mainland 2015's data release | Mainland 2015 |
Genes with any disease association reported in ClinVar | 3078 | Using this simple script, downloaded the ClinVar tab-delimited summary as of May 12, 2015, and took all gene symbols for which there is at least one variant with an assertion of pathogenic or likely pathogenic in ClinVar. | Cite the ClinVar paper [Landrum 2014] |
Kinases | 351 | From UniProt's pkinfam list | According to UniProt this list is based on 3 publications: [Hunter 2000, Manning 2002, Miranda-Saavedra & Barton 2007] |
GPCRs | 1705 | GPCR list from guidetopharmacology.org | Please read citing instructions here and at a minimum, cite [Pawson 2014]. |
Natural product targets | 37 | List of hand-curated targets of natural products from supplement of [Dancik 2010] | [Dancik 2010] |
We welcome pull requests for adding additional lists, provided they are licensed for redistribution. If possible, please provide the source code used to extract the list from its original source, and an appropriate description for this readme.