-
Notifications
You must be signed in to change notification settings - Fork 16
Data Analysis Modules: panoply_clumps_ptm
This module runs Clumps-PTM, a top-down spatial-proteomics analysis tool that identifies proteins with nearby-clusters of differentially-regulated PTM-sites (phosphorylation, acetylation, and/or ubiquitination). The algorithm was adapted from the CLUMPS method for detecting clusters of mutations in 3D protein structures; it calculates a weighted average proximity score across all differentially-modified residue pairs in a given protein, with weights given according to logFC and significance. An empirical p-value is calculated by permuting across the possible PTM-sites within the protein, before correction for multiple-testing. A full description of the algorithm can be found in the Method Details of Geffen et al. 2023.
-
diff_exp_file
: (.tsv
file) results file from panoply_clumps_ptm_diffexp, containing differential expression results for all PTM -omes, for a given annotation -
var_sites_file
: (.tsv
file) filtered mapping file (filt_results
) from panoply_clumps_ptm_mapping, containing all varaible sites with valid PDB coordinates -
PDB_ref_bucket
: (String) Google-Cloud Bucket containing a tarred copy of the PDB structural archive (i.e.https://files.wwpdb.org/pub/pdb/data/structures/divided/pdb/
). A public bucket, pulled from a frozen 2025 snapshot, can be found at:"gs://fc-385e9b4e-43ff-44b3-8cf7-036a2a96d102/pdbs_2025_tars/"
-
PDB_DIR
: Internal parameter listing the files to import fromPDB_ref_bucket
-
-
output_prefix
: (String, default="results") prefix used to name the output tar file -
yaml_file
: (.yaml
file) master-parameters.yaml
-
run_combined
: (Boolean, default=true
) ifTRUE
analysis will be run on all PTM datasets combined, in addition to each -ome separately -
weight_col
: (String, default="logFC") column from differential-expression dataset to use as weights in ClumpsPTM -
accession_col
: (String, default="description") GCT rdesc column with protein accession IDs; must use the same ID type as the providedFASTA_ref_file
file. -
variable_sites_col
: (String, default="variableSites") GCT rdesc column with PTM variable site(s) (e.g. 'T527t') -
DEBUG_MODE
: (Boolean, default=false
) Debugging toggle; iftrue
, a small subset of proteins will be analyzed. Should be turned off for analysis.
-
results
: (.tar
file)
- Geffen, Y. et al. Pan-cancer analysis of post-translational modifications reveals shared patterns of protein regulation. Cell 186, 3945-3967.e26 (2023).
- Home
- PANOPLY Tutorial
- Data Preparation Modules
-
Data Analysis Modules
- panoply_association
- panoply_blacksheep
- panoply_clumps_ptm_diffexp
- panoply_clumps_ptm
- panoply_clumps_ptm_postprocess
- panoply_cmap_analysis
- panoply_cna_correlation
- panoply_cons_clust
- panoply_immune_analysis
- panoply_metaboanalyst
- panoply_mimp
- panoply_nmf
- panoply_nmf_postprocess
- panoply_omicsev
- panoply_quilts
- panoply_rna_protein_correlation
- panoply_sankey
- panoply_ssgsea
-
Report Modules
- panoply_association_report
- panoply_blacksheep_report
- panoply_clumps_ptm_report
- panoply_cna_correlation_report
- panoply_cons_clust_report
- panoply_immune_analysis_report
- panoply_metaboanalyst_report
- panoply_mimp_report
- panoply_nmf_report
- panoply_normalize_ms_data_report
- panoply_rna_protein_correlation_report
- panoply_sampleqc_report
- panoply_sankey_report
- panoply_ssgsea_report
- Support Modules
- Navigating Results
- PANOPLY without Terra
- Customizing PANOPLY
-
Workflows
- panoply_association_workflow
- panoply_blacksheep_workflow
- panoply_clumps_ptm_workflow
- panoply_immune_analysis_workflow
- panoply_metaboanalyst_workflow
- panoply_nmf_workflow
- panoply_nmf_internal_workflow
- panoply_normalize_filter_workflow
- panoply_process_SM_table
- panoply_sankey_workflow
- panoply_ssgsea_workflow
- Pipelines