Simulating a quantified phosphoproteome for software benchmarking and algorithm development #8

vtsiamis88 · 2019-09-20T14:08:58Z

Abstract

Signal transduction relies on a tightly time-controlled combination of phosphorylation/dephosphorylation events that are difficult to capture and integrate. Their large-scale characterization using bottom-up mass spectrometry necessitates phospho-peptide enrichment prior analysis and presents specific analytical challenges such as increased search space, need for confident modification localization, and extrapolation of proteoform quantitative behavior from a single peptide. Most of these studies provide low protein peptide coverages and thus require statistical sound methods to estimate quantitative changes at proteoform-level. This consists in translating quantitative changes of (phosphorylated) peptides into changes of both the protein and its phosphorylated isoforms, and calculate their relative stoichiometry when modified and unmodified versions of the same peptide are available. To our knowledge there are no suitable data sets that simulate phospho-regulations at the proteoform level, which prevents benchmarking of available computational methods on the basis of real ground truth. In this project, we will build an artificial quantitative phosphoproteomics data set simulating the influence of digestion, sample enrichment, spectra quality, wrong identifications and localizations, as well as technical and biological variance, that can be used for benchmarking of phosphoproteomics (and other PTMomics) data analysis algorithms.

Work plan

Main tasks

Implementation: Develop a computational tool for generating a “perfect” in silico data set from a FASTA file to a peptide-spectrum match (PSM) table with simulated MS intensities.
Parametrization: Determine the different parameters that will be implemented in the tool: list of “regulated” sites, peptidases, digestion efficiency, enrichment efficiency, technical/biological variance, detection threshold, … This includes defining their range and their error.
Community engagement: Develop a web interface that provides a simulated phosphoproteomics data set with parameters defined by the user.
Assessment: Collect several data sets that resemble common PTMomics experiments to be used as comparison to define the range of input parameters and test the quality of the simulated data.

These tasks will be discussed on the first day prior to their implementation. Depending on the skills and interest of the participants, we may define working groups for addressing them in the following days.

Preliminary time plan

Tuesday afternoon
Presentation of problem : Short presentation of the project.
Implementation scheme : Create modular mock-up of the processes that will be used to create the simulated data.

Wednesday
Implementation of different modules: Depending on the number of participants, we will form subgroups that will work on implementing modules that simulate:

Digestion: digest a FASTA file to a representation of (modified) peptides.
Identification: identification scores and false positives.
Quantification: measured peptide intensities.

Thursday

Integration of the different modules into one software.
Testing and comparison of the main features of the simulated data to experimental data.
If time permits, create prototype for web service that creates parametrized simulated data.

Expected results

At the end of the developer’s meeting, we expect to have a tool for generating a simulated PSM table with quantitative MS data containing modified and non-modified peptides corresponding to artificially regulated phospho-proteins. Depending on the number of participants and our progress, we can also expect to have a basic web interface, and to integrate simple parameters such as which protease(s) to use, digestion efficiency, …

Follow up

After the developer’s meeting, we expect to use the simulated data in ongoing and future projects and hope that they also will be used for benchmarking by bioinformaticians working with PTMomics data.

Technical details

The programming language(s) that will be used: Not all the tasks of this project involve programming. For the ones that do, we recommend R or Python as this project requires operating on quantitative data that is not too big for these languages.
Existing software that will be featured: None
(Public) datasets that will be used and their availability
Here, we provide an example of publications and associated data that will be used for the project:

Phosphoproteomics data set with label-free quantification:
Sharma, K., D’Souza, R. C. J., Tyanova, S., Schaab, C., Wiśniewski, J. R., Cox, J., & Mann, M. (2014). Ultradeep Human Phosphoproteome Reveals a Distinct Regulatory Nature of Tyr and Ser/Thr-Based Signaling. Cell Reports, 8(5), 1583–1594. https://doi.org/10.1016/J.CELREP.2014.07.036 (PRIDE: PXD000612)

Phosphoproteomics data set with TMT quantification:
Brubaker, D. K., Paulo, J. A., Sheth, S., Poulin, E. J., Popow, O., Joughin, B. A., … Haigis, K. M. (2019). Proteogenomic Network Analysis of Context-Specific KRAS Signaling in Mouse-to-Human Cross-Species Translation. Cell Systems. https://doi.org/10.1016/J.CELS.2019.07.006 (PRIDE: PXD013922)

Example of AP-MS experiments with phospho- and non-phosphorylated peptides from co- immunoprecipitated proteins:
Reginald, K., Chaoui, K., Roncagalli, R., Beau, M., Goncalves Menoita, M., Monsarrat, B., … Malissen, B. (2015). Revisiting the Timing of Action of the PAG Adaptor Using Quantitative Proteomics Analysis of Primary T Cells. Journal of Immunology (Baltimore, Md. : 1950), 195(11), 5472–5481. https://doi.org/10.4049/jimmunol.1501300

Contact information

Marie Locard-Paulet
Novo Nordisk Foundation Center for Protein Research
Blegdamsvej 3
2200 København N / Denmark
[email protected]

Veit Schwämmle
Protein Research Group
Department for Biochemistry and Molecular Biology
University of Southern Denmark
Campusvej 55
5230 Odense M / Denmark
[email protected]

Vasileios Tsiamis
Protein Research Group
Department for Biochemistry and Molecular Biology
University of Southern Denmark
Campusvej 55
5230 Odense M / Denmark
[email protected]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simulating a quantified phosphoproteome for software benchmarking and algorithm development #8

Simulating a quantified phosphoproteome for software benchmarking and algorithm development #8

vtsiamis88 commented Sep 20, 2019

Simulating a quantified phosphoproteome for software benchmarking and algorithm development #8

Simulating a quantified phosphoproteome for software benchmarking and algorithm development #8

Comments

vtsiamis88 commented Sep 20, 2019

Abstract

Work plan

Main tasks

Preliminary time plan

Expected results

Follow up

Technical details

Contact information