Skip to content

Workflows: panoply_clumps_ptm_workflow

wcorinne edited this page Sep 2, 2025 · 2 revisions

panoply_clumps_ptm_workflow

Description

Workflow adaptation of the Clumps-PTM tool, which is a top-down spatial-proteomics analysis tool that identifies proteins with nearby-clusters of differentially regulated PTM-sites. The workflow first generates differential-expression results for every feature and maps PTM variable-sites to atomic coordinates, after which it performs Clumps-PTM analysis to identify proteins with clusters of differentially-regulated sites. After analysis is completed the panoply_clumps_ptm_postprocess generates summary figures and PyMol protein-structures, and panoply_clumps_ptm_report generates an interactive report.

This workflow is time-intensive and can become prohibitively expensive if too many annotations are chosen; it is recommended to select relatively few annotations when providing a groupsFile. Additionally, once panoply_mapping has been run once for a given dataset, it can be skipped in future runs by directly providing the output file in mapping_file (assuming the features' accession-numbers and variable-sites IDs have not changed).

module description
panoply_clumps_ptm_diffexp performs differential-expression analysis on provided PTM datasets for chosen annotations
panoply_clumps_ptm_mapping maps PTM sites to atomic coordinates
panoply_clumps_ptm runs ClumpsPTM algorithm
panoply_clumps_ptm_postprocess generates figures for ClumpsPTM analysis results
panoply_clumps_ptm_report creates an interactive R Markdown report of the clumps_ptm results

Input

Required inputs:

  • pSTY_gct: (.gct file) phosphoproteome data matrix

  • acK_gct: (.gct file) acetylome data matrix

  • ubK_gct: (.gct file) ubiquitylome data matrix

  • groupsFile: (.csv file) annotation file, subsetted to annotations of interest for this analysis

  • output_prefix: (String, default="results") prefix used to name the output tar file

  • yaml_file: (.yaml file) master-parameters.yaml

Mapping Databases

  • PDB_ref_bucket: (String, default="gs://fc-385e9b4e-43ff-44b3-8cf7-036a2a96d102/pdbs_2025_tars/") Google-Cloud Bucket containing a tarred copy of the PDB structural archive (i.e. https://files.wwpdb.org/pub/pdb/data/structures/divided/pdb/)
  • UNIPROT_SWISSPROT: (File, default="gs://fc-385e9b4e-43ff-44b3-8cf7-036a2a96d102/reference_files/uniprot_sprot.fasta") Reference FASTA file with all relevant UNIPROT sequences, to which your sequences will be BLASTed.
  • SIFTS_DB: (File, default="gs://fc-385e9b4e-43ff-44b3-8cf7-036a2a96d102/reference_files/pdb_chain_uniprot.tsv") SIFTS database containing mapping between UNIPROT IDs and PDB IDs.

Optional inputs:

  • accession_col: (String, default="description") GCT rdesc column with protein accession IDs; must use the same ID type as the provided FASTA_ref_file file.

  • variable_sites_col: (String, default="variableSites") GCT rdesc column with PTM variable site(s) (e.g. 'T527t')

  • mapping_file: (.tsv file) output file with mapping results from panoply_mapping. If provided directly, panoply_mapping will be skipped; useful when re-analyzing a dataset that has previously been mapped.

  • mapping_params: (.yaml file) parameters file from panoply_clumps_ptm_mapping, containing parameters used for PTM-mapping; allows mapping parameters to be listed in the report.

Output

panoply_clumps_ptm_workflow produces the follow outputs:

  • clumps_ptm_tar: (.tar) An output tar file that contains the clumps_ptm analysis results
  • clumps_ptm_report: (.html file) Interactive R Markown report.
Clone this wiki locally