Skip to content

Workflows: panoply_process_SM_table

wcorinne edited this page Aug 29, 2025 · 1 revision

panoply_process_SM_table

Description

This workflow performs preprocessing on the output of Spectrum Mill (SM) software; data is parsed and normalized, and an interactive report of the normalization results is generated. It executes the following modules:

module description
panoply_process_SM_table parse and format SM output into GCT format
panoply_normalize_ms_data performs data normalization
panoply_normalize_ms_data_report creates an interactive R Markdown report of the data normalization results

Input

Required inputs:

  • input_ssv: (.ssv file) semi-colon separated output table generated by Spectrum Mill
  • sample_annotation: (.csv file) experiment design file; must include Sample.ID (unique), Experiment (plex number) and Channel (TMT channel); additional optional columns that are used in the pipeline include QC.status (exclude samples from final table if QC.status is not QC.pass) and Participant (replicates are identified as having the same ID in this column)
  • job_identifier: (String) label to be associated with run
  • ome_type: (String) proteomics data type (e.g. proteome, phosphoproteome, etc)
  • yaml: (.yaml file) master-parameters.yaml

Optional inputs:

panoply_process_SM_table

  • labelType: (String, default = 'TMT10') type if chemical labeling used for multiplexing in MS proteomics data generation; options are
    • TMT10 (TMT-10 with 131 channel as common reference)
    • TMT11 (TMT-11 with 131C channel as common reference)
    • TMT10.126 (TMT-10 with 126 channel as common reference)
    • iTRAQ4 (iTRAQ 4-plex with 117 as common reference)
  • applyNumratioFilter: (String, default = TRUE) flag for applying numRatio based filter
  • minNumratioProteome: (Int, default = 2 for protome) minimum number of ratios that need to be observed for each protein/PTM site in order to retain in the filtered table (YAML only)
  • minNumratioPTMs: (Int, default = 1 for PTMs) minimum number of ratios that need to be observed for each protein/PTM site in order to retain in the filtered table (YAML only)
  • minNumratioFraction: (Float, default = 0.25) fraction of samples in which minNumratio should be present to retain protein/PTM site
  • speciesFilter: (String, default = TRUE) enable species filtering to retain only human proteins
  • ndigits: (Int, default = 5) number of decimal digits to use in output gct tables
  • outFile: (String, default = "panoply_parse_sm_table-output.tar") output .tar file name

panoply_normalize_ms_data

  • normMethod: (String, default = '2comp') normalization method; options are '2comp', 'median', 'mean'
  • altMethod: (String, default = 'median') alternate normalization method for comparison with normMethod; downstream modules typically do not generate analyses for the data normalized using altMethod
  • ndigits: (Int, default = 5) number of decimal digits to use in output tables
  • outTar: (String, default = "panoply_normalize_ms_data-output.tar") output .tar file name
  • outTable: (String, default = "normalized_table-output.gct") output .gct normalized file name

Output

panoply_process_SM_table produces the follow outputs:

  • output_tar: (.tar) An output tar file containing the normalized data results
  • output_report: (.html file) Interactive R Markown report.
Clone this wiki locally