Skip to content

Data Preparation Modules: panoply_parse_sm_table

wcorinne edited this page Aug 25, 2025 · 3 revisions

panoply_parse_sm_table

Description

This module extracts useful and relevant data from the output files generated by Spectrum Mill (SM) software. Sample annotations provided in the exptDesign file are included (as column annotations) in the output gct file, in addition to a collection of useful protein/PTM site-specific information (row annotations) extracted from the input data table.

Input

Required inputs:

  • SMtable: (.ssv file) semi-colon separated output table generated by Spectrum Mill
  • exptDesign: (.csv file) experiment design file; must include Sample.ID (unique), Experiment (plex number) and Channel (TMT channel); additional optional columns that are used in the pipeline include QC.status (exclude samples from final table if QC.status is not QC.pass) and Participant (replicates are identified as having the same ID in this column)
  • analysisDir: (String) name of analysis directory
  • type: (String) proteomics data type (proteome, phosphoproteome, etc)
  • yaml: (.yaml file) master-parameters.yaml

Optional inputs:

  • labelType: (String, default = 'TMT10') type if chemical labeling used for multiplexing in MS proteomics data generation; options are
    • TMT10 (TMT-10 with 131 channel as common reference)
    • TMT11 (TMT-11 with 131C channel as common reference)
    • TMT10.126 (TMT-10 with 126 channel as common reference)
    • iTRAQ4 (iTRAQ 4-plex with 117 as common reference)
  • applyNumratioFilter: (String, default = TRUE) flag for applying numRatio based filter
  • minNumratioProteome: (Int, default = 2 for protome) minimum number of ratios that need to be observed for each protein/PTM site in order to retain in the filtered table (YAML only)
  • minNumratioPTMs: (Int, default = 1 for PTMs) minimum number of ratios that need to be observed for each protein/PTM site in order to retain in the filtered table (YAML only)
  • minNumratioFraction: (Float, default = 0.25) fraction of samples in which minNumratio should be present to retain protein/PTM site
  • speciesFilter: (String, default = TRUE) enable species filtering to retain only human proteins
  • ndigits: (Int, default = 5) number of decimal digits to use in output gct tables
  • outFile: (String, default = "panoply_parse_sm_table-output.tar") output .tar file name

Output

  • outputs: Tarball with the following .gct files in the parsed-data subdirectory:
    • *-intensity report ion intensities (for channels with samples) for each protein/PTM site
    • *-num-ratio number of PSM ratios observed for each protein/PTM site
    • *-num-spectra number of spectra observed for each protein/PTM site
    • *-precursor-intensity precursor ion intensity for each protein/PTM site
    • *-ratio log2 ratio to common reference for each protein/PTM site
    • *-reference-intensity reporter ion intensity of the common reference channel for each protein/PTM site
    • *-unique-peptides unique peptide count for each protein/PTM site
Clone this wiki locally