-
Notifications
You must be signed in to change notification settings - Fork 16
Data Preparation Modules: panoply_cosmo
This is a PANOPLY implenentation of the COSMO toolset. Full COSMO documentation here.
Mandatory inputs:
-
STANDALONE
(Boolean): Determines whether COSMO should run as part ofpanoply_main
pipeline with inputs frompanoply_harmonize
, or if COSMO should run independently with user-selected .gct inputs. -
yaml_file
(File): Parameters yaml file from PANOPLY setup. Must containgene.id.col
and cosmo default parameters in thecosmo.params
section. -
label
(String): Label to use for report filename.
Mandatory inputs if STANDALONE == "false"
:
-
panoply_harmonize_tar
(File): Tar output from panoply_harmonize module. The inputs to the cosmo functions are taken from theharmonized-data
folder (proteome matrix, RNA matrix, sample annotations). -
ome_type
(String): The PANOPLY omics type of the input (e.g."proteome"
).
Mandatory inputs if STANDALONE == "true"
:
-
d1_file
(File): A .gct file. Typically proteome or other protein-level data. -
d2_file
(File): A .gct file. Typically RNAseq file to be compared withd1_file
for sample mislabeling. -
sample_file
(File): A .csv file with sample annotations.
Optional inputs:
-
sample_label
(String): Column header(s) insample_file
to use for prediction. This typically includes gender information. Column will be excluded if it is unbalanced (likely to cause an error). If multiple inputs, separate inputs with a comma (e.g.gender,msi
). If no input, columns are pulled from default set in yaml file.-
Note: if you specifically use
gender
as a sample label, then COSMO will know that column contains male/female information and will use genes from sex chromosome to better predict this attribute. COSMO does not properly identify other synonymous column titles (such assex
orGender
).
-
Note: if you specifically use
-
run_cosmo
(Boolean): whether or not to actually run the cosmo functions. Iffalse
, the original tar file is saved as the final output and the cosmo html report is not generated. If no input,run_cosmo
is pulled from default set in yaml file.
-
cosmo_tar
(File): The main tar output. WhenSTANDALONE == "false"
, this tar has the same data and format as the original tar input (plus cosmo results if cosmo was run). -
cosmo_report_html
(File): The html report summarizing cosmo results.
The main source of computing errors in COSMO is improper selection of sample labels. PANOPLY-specific preprocessing should eliminate sample labels that are likely to cause errors. COSMO requires clinical attributes that are well-balanced with only two levels (e.g. male/female, positive/negative) and no NA's. These are used to predict if there is mislabeling between the sample annotation file (sample_file
) and any of the data files (d1_file
, d2_file
).
If you get the following error message, this likely means that one of the sample label columns in not well-balanced. This happens because COSMO divides up the samples into 5 cross-validation sets, and if any of those sets contains a class with 1 or 0 observations then COSMO cannot build the GLM model. Consider choosing an alternative sample label.
<simpleError in { which = foldid == i if (length(dim(y)) > 1) y_sub = y[!which, ] else y_sub = y[!which] if (is.offset) offset_sub = as.matrix(offset)[!which, ] else offset_sub = NULL glmnet(x[!which, , drop = FALSE], y_sub, lambda = lambda, offset = offset_sub, weights = weights[!which], ...)}: task 4 failed - "one multinomial or binomial class has 1 or 0 observations; not allowed">
- Home
- PANOPLY Tutorial
- Data Preparation Modules
-
Data Analysis Modules
- panoply_association
- panoply_blacksheep
- panoply_clumps_ptm_diffexp
- panoply_clumps_ptm
- panoply_clumps_ptm_postprocess
- panoply_cmap_analysis
- panoply_cna_correlation
- panoply_cons_clust
- panoply_immune_analysis
- panoply_metaboanalyst
- panoply_mimp
- panoply_nmf
- panoply_nmf_postprocess
- panoply_omicsev
- panoply_quilts
- panoply_rna_protein_correlation
- panoply_sankey
- panoply_ssgsea
-
Report Modules
- panoply_association_report
- panoply_blacksheep_report
- panoply_clumps_ptm_report
- panoply_cna_correlation_report
- panoply_cons_clust_report
- panoply_immune_analysis_report
- panoply_metaboanalyst_report
- panoply_mimp_report
- panoply_nmf_report
- panoply_normalize_ms_data_report
- panoply_rna_protein_correlation_report
- panoply_sampleqc_report
- panoply_sankey_report
- panoply_ssgsea_report
- Support Modules
- Navigating Results
- PANOPLY without Terra
- Customizing PANOPLY
-
Workflows
- panoply_association_workflow
- panoply_blacksheep_workflow
- panoply_clumps_ptm_workflow
- panoply_immune_analysis_workflow
- panoply_metaboanalyst_workflow
- panoply_nmf_workflow
- panoply_nmf_internal_workflow
- panoply_normalize_filter_workflow
- panoply_process_SM_table
- panoply_sankey_workflow
- panoply_ssgsea_workflow
- Pipelines