You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Signal transduction relies on a tightly time-controlled combination of phosphorylation/dephosphorylation events that are difficult to capture and integrate. Their large-scale characterization using bottom-up mass spectrometry necessitates phospho-peptide enrichment prior analysis and presents specific analytical challenges such as increased search space, need for confident modification localization, and extrapolation of proteoform quantitative behavior from a single peptide. Most of these studies provide low protein peptide coverages and thus require statistical sound methods to estimate quantitative changes at proteoform-level. This consists in translating quantitative changes of (phosphorylated) peptides into changes of both the protein and its phosphorylated isoforms, and calculate their relative stoichiometry when modified and unmodified versions of the same peptide are available. To our knowledge there are no suitable data sets that simulate phospho-regulations at the proteoform level, which prevents benchmarking of available computational methods on the basis of real ground truth. In this project, we will build an artificial quantitative phosphoproteomics data set simulating the influence of digestion, sample enrichment, spectra quality, wrong identifications and localizations, as well as technical and biological variance, that can be used for benchmarking of phosphoproteomics (and other PTMomics) data analysis algorithms.
Work plan
Main tasks
Implementation: Develop a computational tool for generating a “perfect” in silico data set from a FASTA file to a peptide-spectrum match (PSM) table with simulated MS intensities.
Parametrization: Determine the different parameters that will be implemented in the tool: list of “regulated” sites, peptidases, digestion efficiency, enrichment efficiency, technical/biological variance, detection threshold, … This includes defining their range and their error.
Community engagement: Develop a web interface that provides a simulated phosphoproteomics data set with parameters defined by the user.
Assessment: Collect several data sets that resemble common PTMomics experiments to be used as comparison to define the range of input parameters and test the quality of the simulated data.
These tasks will be discussed on the first day prior to their implementation. Depending on the skills and interest of the participants, we may define working groups for addressing them in the following days.
Preliminary time plan
Tuesday afternoon Presentation of problem : Short presentation of the project. Implementation scheme : Create modular mock-up of the processes that will be used to create the simulated data.
Wednesday Implementation of different modules: Depending on the number of participants, we will form subgroups that will work on implementing modules that simulate:
Digestion: digest a FASTA file to a representation of (modified) peptides.
Identification: identification scores and false positives.
Quantification: measured peptide intensities.
Thursday
Integration of the different modules into one software.
Testing and comparison of the main features of the simulated data to experimental data.
If time permits, create prototype for web service that creates parametrized simulated data.
Expected results
At the end of the developer’s meeting, we expect to have a tool for generating a simulated PSM table with quantitative MS data containing modified and non-modified peptides corresponding to artificially regulated phospho-proteins. Depending on the number of participants and our progress, we can also expect to have a basic web interface, and to integrate simple parameters such as which protease(s) to use, digestion efficiency, …
Follow up
After the developer’s meeting, we expect to use the simulated data in ongoing and future projects and hope that they also will be used for benchmarking by bioinformaticians working with PTMomics data.
Technical details
The programming language(s) that will be used: Not all the tasks of this project involve programming. For the ones that do, we recommend R or Python as this project requires operating on quantitative data that is not too big for these languages.
Existing software that will be featured: None
(Public) datasets that will be used and their availability
Here, we provide an example of publications and associated data that will be used for the project:
Phosphoproteomics data set with label-free quantification:
Sharma, K., D’Souza, R. C. J., Tyanova, S., Schaab, C., Wiśniewski, J. R., Cox, J., & Mann, M. (2014). Ultradeep Human Phosphoproteome Reveals a Distinct Regulatory Nature of Tyr and Ser/Thr-Based Signaling. Cell Reports, 8(5), 1583–1594. https://doi.org/10.1016/J.CELREP.2014.07.036 (PRIDE: PXD000612)
Phosphoproteomics data set with TMT quantification:
Brubaker, D. K., Paulo, J. A., Sheth, S., Poulin, E. J., Popow, O., Joughin, B. A., … Haigis, K. M. (2019). Proteogenomic Network Analysis of Context-Specific KRAS Signaling in Mouse-to-Human Cross-Species Translation. Cell Systems. https://doi.org/10.1016/J.CELS.2019.07.006 (PRIDE: PXD013922)
Example of AP-MS experiments with phospho- and non-phosphorylated peptides from co- immunoprecipitated proteins:
Reginald, K., Chaoui, K., Roncagalli, R., Beau, M., Goncalves Menoita, M., Monsarrat, B., … Malissen, B. (2015). Revisiting the Timing of Action of the PAG Adaptor Using Quantitative Proteomics Analysis of Primary T Cells. Journal of Immunology (Baltimore, Md. : 1950), 195(11), 5472–5481. https://doi.org/10.4049/jimmunol.1501300
Contact information
Marie Locard-Paulet
Novo Nordisk Foundation Center for Protein Research
Blegdamsvej 3
2200 København N / Denmark [email protected]
Veit Schwämmle
Protein Research Group
Department for Biochemistry and Molecular Biology
University of Southern Denmark
Campusvej 55
5230 Odense M / Denmark [email protected]
Vasileios Tsiamis
Protein Research Group
Department for Biochemistry and Molecular Biology
University of Southern Denmark
Campusvej 55
5230 Odense M / Denmark [email protected]
The text was updated successfully, but these errors were encountered:
Abstract
Signal transduction relies on a tightly time-controlled combination of phosphorylation/dephosphorylation events that are difficult to capture and integrate. Their large-scale characterization using bottom-up mass spectrometry necessitates phospho-peptide enrichment prior analysis and presents specific analytical challenges such as increased search space, need for confident modification localization, and extrapolation of proteoform quantitative behavior from a single peptide. Most of these studies provide low protein peptide coverages and thus require statistical sound methods to estimate quantitative changes at proteoform-level. This consists in translating quantitative changes of (phosphorylated) peptides into changes of both the protein and its phosphorylated isoforms, and calculate their relative stoichiometry when modified and unmodified versions of the same peptide are available. To our knowledge there are no suitable data sets that simulate phospho-regulations at the proteoform level, which prevents benchmarking of available computational methods on the basis of real ground truth. In this project, we will build an artificial quantitative phosphoproteomics data set simulating the influence of digestion, sample enrichment, spectra quality, wrong identifications and localizations, as well as technical and biological variance, that can be used for benchmarking of phosphoproteomics (and other PTMomics) data analysis algorithms.
Work plan
Main tasks
Implementation: Develop a computational tool for generating a “perfect” in silico data set from a FASTA file to a peptide-spectrum match (PSM) table with simulated MS intensities.
Parametrization: Determine the different parameters that will be implemented in the tool: list of “regulated” sites, peptidases, digestion efficiency, enrichment efficiency, technical/biological variance, detection threshold, … This includes defining their range and their error.
Community engagement: Develop a web interface that provides a simulated phosphoproteomics data set with parameters defined by the user.
Assessment: Collect several data sets that resemble common PTMomics experiments to be used as comparison to define the range of input parameters and test the quality of the simulated data.
These tasks will be discussed on the first day prior to their implementation. Depending on the skills and interest of the participants, we may define working groups for addressing them in the following days.
Preliminary time plan
Tuesday afternoon
Presentation of problem : Short presentation of the project.
Implementation scheme : Create modular mock-up of the processes that will be used to create the simulated data.
Wednesday
Implementation of different modules: Depending on the number of participants, we will form subgroups that will work on implementing modules that simulate:
Digestion: digest a FASTA file to a representation of (modified) peptides.
Identification: identification scores and false positives.
Quantification: measured peptide intensities.
Thursday
Integration of the different modules into one software.
Testing and comparison of the main features of the simulated data to experimental data.
If time permits, create prototype for web service that creates parametrized simulated data.
Expected results
At the end of the developer’s meeting, we expect to have a tool for generating a simulated PSM table with quantitative MS data containing modified and non-modified peptides corresponding to artificially regulated phospho-proteins. Depending on the number of participants and our progress, we can also expect to have a basic web interface, and to integrate simple parameters such as which protease(s) to use, digestion efficiency, …
Follow up
After the developer’s meeting, we expect to use the simulated data in ongoing and future projects and hope that they also will be used for benchmarking by bioinformaticians working with PTMomics data.
Technical details
The programming language(s) that will be used: Not all the tasks of this project involve programming. For the ones that do, we recommend R or Python as this project requires operating on quantitative data that is not too big for these languages.
Existing software that will be featured: None
(Public) datasets that will be used and their availability
Here, we provide an example of publications and associated data that will be used for the project:
Phosphoproteomics data set with label-free quantification:
Sharma, K., D’Souza, R. C. J., Tyanova, S., Schaab, C., Wiśniewski, J. R., Cox, J., & Mann, M. (2014). Ultradeep Human Phosphoproteome Reveals a Distinct Regulatory Nature of Tyr and Ser/Thr-Based Signaling. Cell Reports, 8(5), 1583–1594. https://doi.org/10.1016/J.CELREP.2014.07.036 (PRIDE: PXD000612)
Phosphoproteomics data set with TMT quantification:
Brubaker, D. K., Paulo, J. A., Sheth, S., Poulin, E. J., Popow, O., Joughin, B. A., … Haigis, K. M. (2019). Proteogenomic Network Analysis of Context-Specific KRAS Signaling in Mouse-to-Human Cross-Species Translation. Cell Systems. https://doi.org/10.1016/J.CELS.2019.07.006 (PRIDE: PXD013922)
Example of AP-MS experiments with phospho- and non-phosphorylated peptides from co- immunoprecipitated proteins:
Reginald, K., Chaoui, K., Roncagalli, R., Beau, M., Goncalves Menoita, M., Monsarrat, B., … Malissen, B. (2015). Revisiting the Timing of Action of the PAG Adaptor Using Quantitative Proteomics Analysis of Primary T Cells. Journal of Immunology (Baltimore, Md. : 1950), 195(11), 5472–5481. https://doi.org/10.4049/jimmunol.1501300
Contact information
Marie Locard-Paulet
Novo Nordisk Foundation Center for Protein Research
Blegdamsvej 3
2200 København N / Denmark
[email protected]
Veit Schwämmle
Protein Research Group
Department for Biochemistry and Molecular Biology
University of Southern Denmark
Campusvej 55
5230 Odense M / Denmark
[email protected]
Vasileios Tsiamis
Protein Research Group
Department for Biochemistry and Molecular Biology
University of Southern Denmark
Campusvej 55
5230 Odense M / Denmark
[email protected]
The text was updated successfully, but these errors were encountered: