Skip to content
Ewoud Ewing edited this page Oct 15, 2020 · 27 revisions

R package GeneSetCluster

Overview

This is a package meant to be used to cluster together Gene-Sets from pathway tools such as Ingenuity Pathway Analysis (IPA), GREAT, GSEA and others. Gene-Sets often appear significant when running such tools with the different labels representing the biological function of the gene sets.

However, it can become difficult to interpret the output of these tools when the data of several experiments are compared. Moreover, the output data has several limitations:

  1. Low ratio: where there are only a few genes enriched.
  2. High similarity: where the different genes sets that appear have the same genes enriched despite the different labels.
  3. Low overlap: where the same gene set labels appear in different experimental settings but different genes are enriched.

So its better to review theses sets of genes together instead of investigating the many different Gene-Sets individually. This package does this by taking the sets of genes of every Gene-Set and calculates a distance between them. The higher it is the greater overlap they have, the larger the distance. This distance score is then used to cluster the Gene-Sets together. By default the package uses the relative risk as a distance.

This tool has been used in several scientific publications: Citation list

Example data

Example data is taken from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE111385. The data is DeSeq2 analyzed Transcriptome sequencing of WT and conditional-Tgfbr2 knockout microglia and CNS-repopulating monocyte-derived macrophages from C57BL/6 mouse in triplicates. The genes were picked in a comparison of microglia (uG) vs Macrophages (Mac) in both wild type (WT) and Tgfbr2 knock out (KO) with a 1E-06 pvalue cutoff.

Significant genes were analysed in both IPA and GREAT. For IPA canonical pathways and functional annotations were exported into a excel file with default settings. For GREAT data with and without a background of the sequencing samples was run and exported in tsv format.

For a basic script to analyse the data use: Example Script

For a video user guide, check this link: Youtube

Tutorial

Pipeline

Pipeline

Pages

Step 1A Loading the data

Step 1B Creating an Object

Step 2 Combine and Cluster

Step 2B User supplied distance function

Step 2C Highlighting-Genes

Step 3 Exporting Data

Step 4 Functional Investigation

Contact

e-mail: [email protected]

Citation

Ewing, E., Planell-Picola, N., Jagodic, M. et al. GeneSetCluster: a tool for summarizing and integrating gene-set analysis results. BMC Bioinformatics 21, 443 (2020). https://doi.org/10.1186/s12859-020-03784-z