-
Notifications
You must be signed in to change notification settings - Fork 16
Data Preparation Modules: panoply_filter
wcorinne edited this page Aug 28, 2025
·
3 revisions
This module preprocesses and filters proteomics (protein/PTM site) data. It is typically run after the panoply_normalize_ms_data
module.
Preprocessing (always applied):
- If
geneIdCol
with Hugo Gene Symbols missing from sample annotation table, it will be created from theproteinIdCol
- If the sample annotation table contains a
QC.status
column, samples markedQC.pass
will be retained in the output files. If not, all samples are assumed to beQC.pass
, and aQC.status
column is created accordingly. - If
separateQCTypes
is set to 'true', additional output files (e.g.*-QC.fail.gct
) will be created with non-QC.pass
samples.
Filters available are:
- If
sdFilterThreshold
is specified, exclude rows with standard deviation less thansdFilterThreshold
- If
combineReplicates
replicates is specified and replicates are present in the data (identified by identical values in theParticipant
,Type
(optional), andTimepoint
(optional) columns in the sample annotation table), combine values across replicates for each row, using the method specified bycombineReplicates
. - If
naMax
is specified, exclude rows with more thannaMax
missing values - If
noNA
is 'true', create an additional table with no missing values
Required inputs:
-
inputData
: (.tar
file) tarball frompanoply_normalize_ms_data
, or normalized input data ingct
format (whenstandalone
istrue
) -
type
: (String) proteomics data type -
standalone
: (String) set totrue
to run as a self-contained module; iftrue
theanalysisDir
input is required -
yaml
: (.yaml
file) parameters inyaml
format -
analysisDir
: (String) name of analysis directory
Optional inputs:
-
filterProteomics
: (String, default chosen in startup notebook) when 'true' filtering will be applied, when 'false' filtering is skipped. Preprocessing is always applied, regardless of toggle value. -
separateQCTypes
: (String, default = 'false') toggle for generating additional output files, subset to non-QC.pass
samples (e.g.*-QC.fail.gct
). Filtering is not applied to these outputs. -
geneIdCol
: (String, default = 'geneSymbol') name of (row) annotation column containing gene IDs. -
proteinIdCol
: (String, default = 'id') name of (row) annotation column containing protein IDs. -
proteinIdType
: (String, default chosen in startup notebook) keytype of protein IDs inproteinIdCol
-
combineReplicates
: (String, default = 'mean') method used to combine replicate samples, as are identified by identical values in theParticipant
,Type
(optional), andTimepoint
(optional) columns of the sample annotation table. Ifnull
, replicates will not be combined. -
naMax
: (Float, default = 0.7) maximum allowed NA values per row (protein/PTM site); can be fraction between 0-1 or an integer specifying actual number of samples. Ifnull
, NA values will not be removed. -
noNA
: (String, default = 'false') toggle for generating a GCT in which rows (protein/PTM sites) containing any NA values are excluded -
sdFilterThreshold
: (Float, default = 0.5) standard deviation (SD) threshold for SD filtering; rows (proteins/PTM sites) with SD less thansdFilterThreshold
are excluded from the filtered output table. Ifnull
, sd filtering will not be applied. -
ndigits
: (Int, default = 5) number of decimal digits to use in output tables -
outTar
: (String, default = "panoply_filter-output.tar") output.tar
file name -
outTable
: (String, default = "filtered_table-output.gct") output.gct
filtered file name
-
output_tar
: Tarball including the following files in thefiltered-data
subdirectory:- Filtered data files:
- data table containing only QC-pass samples (
*-ratio-norm.gct
), with no other filters applied - filtered data table (
*-ratio-norm-filt.gct
)
- data table containing only QC-pass samples (
- Optional data files:
- data table containing non-
QC.pass
samples of some {qc.type} (*-ratio-norm-{qc.type}.gct
), with no other filters applied - filtered data table, with rows (protein/PTM sites) containing any NA values excluded (
*-ratio-norm-filt-noNA.gct
)
- data table containing non-
- Filtered data files:
-
outputs
: (.gct
file) filtered data table (equivalent to*-ratio-norm-filt.gct
) -
output_yaml
: finalized parameter file
- Mertins, P., Mani, D., Ruggles, K., Gillette, M., Clauser, K., Wang, P., Wang, X., Qiao, J., Cao, S., Petralia, F., et al. (2016). Proteogenomics connects somatic mutations to signalling in breast cancer. Nature 534(7605), 55 - 62. https://dx.doi.org/10.1038/nature18003.
- Gillette, M., Satpathy, S., Cao, S., Dhanasekaran, S., Vasaikar, S., Krug, K., Petralia, F., Li, Y., Liang, W., Reva, B., et al. (2020). Proteogenomic Characterization Reveals Therapeutic Vulnerabilities in Lung Adenocarcinoma. Cell 182(1), 200 - 225.e35. https://dx.doi.org/10.1016/j.cell.2020.06.013
- Home
- PANOPLY Tutorial
- Data Preparation Modules
-
Data Analysis Modules
- panoply_association
- panoply_blacksheep
- panoply_clumps_ptm_diffexp
- panoply_clumps_ptm
- panoply_clumps_ptm_postprocess
- panoply_cmap_analysis
- panoply_cna_correlation
- panoply_cons_clust
- panoply_immune_analysis
- panoply_metaboanalyst
- panoply_mimp
- panoply_nmf
- panoply_nmf_postprocess
- panoply_omicsev
- panoply_quilts
- panoply_rna_protein_correlation
- panoply_sankey
- panoply_ssgsea
-
Report Modules
- panoply_association_report
- panoply_blacksheep_report
- panoply_clumps_ptm_report
- panoply_cna_correlation_report
- panoply_cons_clust_report
- panoply_immune_analysis_report
- panoply_metaboanalyst_report
- panoply_mimp_report
- panoply_nmf_report
- panoply_normalize_ms_data_report
- panoply_rna_protein_correlation_report
- panoply_sampleqc_report
- panoply_sankey_report
- panoply_ssgsea_report
- Support Modules
- Navigating Results
- PANOPLY without Terra
- Customizing PANOPLY
-
Workflows
- panoply_association_workflow
- panoply_blacksheep_workflow
- panoply_clumps_ptm_workflow
- panoply_immune_analysis_workflow
- panoply_metaboanalyst_workflow
- panoply_nmf_workflow
- panoply_nmf_internal_workflow
- panoply_normalize_filter_workflow
- panoply_process_SM_table
- panoply_sankey_workflow
- panoply_ssgsea_workflow
- Pipelines