-
Notifications
You must be signed in to change notification settings - Fork 16
Workflows: panoply_ssgsea_workflow
This workflow performs single sample Gene Set Enrichment Analysis (ssGSEA) or PTM-Signature Enrichment Analysis (PTM-SEA) on each column of the input data matrix (panoply_ssgsea). The module also generates an interactive report. Optionally, data can be preprocessed with panoply_preprocess_gct to collapse from protein/peptide-level to gene-centric or site-centric. This workflow executes the following modules:
module | description |
---|---|
panoply_preprocess_gct |
creates a gene-centric or site-centric GCT file suitable for ssGSEA or PTM-SEA |
panoply_ssgsea |
performs single sample Gene Set Enrichment Analysis (ssGSEA) or PTM-Signature Enrichment Analysis (PTM-SEA) |
panoply_ssgsea_report |
creates an interactive R Markdown report of the ssgsea results |
This module collapses a feature-level GCT to a gene-centric (for ssGSEA) or site-centric (for PTM-SEA) level GCT. It can optionally be run before panoply_ssgsea module by toggling the preprocess_gct
parameter, to collapse a GCT with redundant features into an appropriately-formatted input file.
This module performs single sample Gene Set Enrichment Analysis (ssGSEA) or PTM-Signature Enrichment Analysis (PTM-SEA) [1] on each column of the input data matrix. This module is based on the implementation available at the ssGSEA2.0 GitHub repository.
This is an updated version of the original ssGSEA [2,3] R-implementation. Depending on the input dataset and chosen database (gene sets or PTM signatures), the software performs either ssGSEA or PTM-SEA, respectively. The Molecular Signatures Database (MSigDB) [4] provides a large collection of curated gene sets. Gene sets are stored as plain text in GMT format. A current version of MSigDB gene set collections can be found in the db/msigdb
subfolder. MSigDB gene sets are realeased under Creative Commons Attribution 4.0 International License. The license terms can be found in thedb/msigdb
folder.
File formats supported by ssGSEA2.0/PTM-SEA are Gene Cluster Text GCT v1.2 or GCT v1.3 files. Morpheus provides a convenient way to convert your data tables into GCT format.
For more information about the GSEA method and MSigDB please visit http://software.broadinstitute.org/gsea/.
This module creates an R Markdown report to provide a high-level summary of the panoply_ssgsea module results.
The report provides:
-
A heatmap depicting the normalized enrichment scores (NES) of gene sets significant in at least one data column. Significance is defined by parameter
fdr
in sectionpanoply_ssgsea_report
of thecfg_yaml
file. -
List of parameters used in the panoply_ssgsea module.
-
input_ds
: (.gct
file) input GCT file -
gene_set_database
: (.gmt
file) gene set database -
yaml_file
: (.yaml
file) master-parameters.yaml -
output_prefix
: (String) File prefix for output files. -
preprocess_gct
: (Boolean) If FALSE panoply_preprocess_gct will be skipped and the GCT file will be used as is (default: FALSE).
-
acc_type
: (String) Type of accession number in 'rid' object in GCT file ("uniprot", "refseq" (default), "symbol"). -
id_type
: (String) Notation of site-ids: 'sm' - Spectrum Mill (default); 'wg' - Web Gestalt; 'ph' - Philosopher. Only relevant for PTM-SEA. -
id_type_out
(String) Type of site id for output: 'uniprot'(default), 'refseq', 'seqwin'. Only relevant for PTM-SEA. -
level
(String) Mode of report:- 'ssc' - single-site-centric
- 'gc' - gene-centric (default)
- 'gcr' - gene-centric-redundant
-
loc
(Boolean) If TRUE only fully localized sites will be considered (default: TRUE). Localization infromation is expected to be encoded in the site identifier. Respective parsing rules are determined by '--id_type'. -
gene_col
: (String) Name of column listing gene names; used for gene centric reports (default: "geneSymbol"). -
seqwin_col
: (String) "Column containing flanking sequences, separated by '|'. Only relevant for PTM-SEA and if '--id_type_out' = 'seqwin' (default: 'VMsiteFlanks'). -
SGT_col
: (String) Column used to collpase subgroup-top (SGT) reports (default: "subgroupNum). Only relevant for Spectrum Mill protein reports. -
mod_res
: (String) Modified residues, e.g. "S|T|Y" or "K" (default: "S|T|Y"). -
mod_type
: (String) Type of post-translational modification, e.g "p" for phospho (default) or "ac" for acetylation -
mode
: (String) Determines how multiple features (e.g. proteins, PTM sites, etc.) mapping to the same gene symbol will be aggregated:- "mean" - mean
- "median" - median
- "sd - most variable (standard deviation) across sample columns
- "SGT" - subgroup top: first subgroup in protein group (Spectrum Mill)
- "abs.max" - for log-transformed, signed p-values"
-
correl_type
: (String) Correlation type: "z.score" (default), "rank", "symm.rank". -
global_fdr
: (Boolean) If TRUE global FDR across all data columns is calculated (default: FALSE). -
min_overlap
: (Integer) Minimal overlap between signature and data set (default: 10). -
tolerate_min_overlap_err
(String) true/false toggle for tolerating "not-enough-overlap" errors. Recommended value is FALSE. -
nperm
: (Integer) Number of permutations (default: 1000). -
output_score_type
: (String) Score type: "ES" - enrichment score, "NES" - normalized ES (default). -
sample_norm_type
: (String) Sample normalization: "rank"(default), "log", "log.rank" -
statistic
: (String) Test statistic: "area.under.RES" (default), "Kolmogorov-Smirnov" -
weight
: (Float) When weight=0, all genes have the same weight; if weight>0 actual values matter and can change the resulting score (default: 0.75). -
output_prefix
: (String, default="results-ssgsea") File prefix for output files.
panoply_ssgsea_workflow
produces the follow outputs:
-
results
: (.tar
) An output tar file that contains the ssgsea analysis results -
report
: (.html
file) Interactive R Markown report.
- Home
- PANOPLY Tutorial
- Data Preparation Modules
-
Data Analysis Modules
- panoply_association
- panoply_blacksheep
- panoply_clumps_ptm_diffexp
- panoply_clumps_ptm
- panoply_clumps_ptm_postprocess
- panoply_cmap_analysis
- panoply_cna_correlation
- panoply_cons_clust
- panoply_immune_analysis
- panoply_metaboanalyst
- panoply_mimp
- panoply_nmf
- panoply_nmf_postprocess
- panoply_omicsev
- panoply_quilts
- panoply_rna_protein_correlation
- panoply_sankey
- panoply_ssgsea
-
Report Modules
- panoply_association_report
- panoply_blacksheep_report
- panoply_clumps_ptm_report
- panoply_cna_correlation_report
- panoply_cons_clust_report
- panoply_immune_analysis_report
- panoply_metaboanalyst_report
- panoply_mimp_report
- panoply_nmf_report
- panoply_normalize_ms_data_report
- panoply_rna_protein_correlation_report
- panoply_sampleqc_report
- panoply_sankey_report
- panoply_ssgsea_report
- Support Modules
- Navigating Results
- PANOPLY without Terra
- Customizing PANOPLY
-
Workflows
- panoply_association_workflow
- panoply_blacksheep_workflow
- panoply_clumps_ptm_workflow
- panoply_immune_analysis_workflow
- panoply_metaboanalyst_workflow
- panoply_nmf_workflow
- panoply_nmf_internal_workflow
- panoply_normalize_filter_workflow
- panoply_process_SM_table
- panoply_sankey_workflow
- panoply_ssgsea_workflow
- Pipelines