quantms.io

quantms is a nextflow pipeline for the analysis of quantitative proteomics data. The pipeline is based on the OpenMS framework and DIA-NN; and it is designed to analyze large scale experiments. The main outputs of quantms workflow are the following:

mzTab files with the identification and quantification information.
MSstats input file with the peptide quantification values needed for the MSstats analysis.
MSstats output file with the differential expression values for each protein.
The input SDRF of the pipeline if available.

While all the previous formats are well-known standards and popular formats in the proteomics community; they are difficult to use in big data analysis projects. In addition, these file formats are difficult to extend and provide multiple views of the underlying data. For example, in mzTab it is extremely hard for big datasets to retrieve the identified peptides and features and the corresponding intensities. At the same time it is difficult to get the protein quantification values for a given sample.

Here, we aim to formalize and develop a more standardized format that enables better representation of the identification and quantification results but also enables new and novel use cases for proteomics data analysis. The main use cases for the format are:

Fast and easy visualization of the identification and quantification results.
Easy integration with other omics data.
Easy integration with sample metadata.
AI/ML model development based on identification and quantification results.
Easy data retrieval for big datasets and large-scale collections of proteomics data.

Note: We are not trying to replace the mzTab format, but to provide a new format that enables AI-related use cases. Most of the features of the mzTab format will be included in the new format.

Data model

quantms.io could be seen as a multiple view representation of a proteomics data analysis results. Each view of the format can be serialized in different formats depending on the use case. the data model of quantms.io defines two main things, the view and how the view is serialized.

The data model view defines the structure, the fields and properties that will be included in a view for each peptide, psms, feature or protein, for example.
The data serialization defines the format in which the view will be serialized and what features of serialization will be supported, for example compression, indexing or slicing.

view	file class	serialization format	definition	example
psm	psm_file	parquet	psm	psm example
feature	feature_file	parquet	feature	feature example
absolute	absolute_file	tsv	absolute	absolute example
differential	differential_file	tsv	differential	differential example
sdrf	sdrf_file	tsv	metadata	sdrf example
project	-	json	project	--

Note: Views can be extended and new views can be added to the format.

Introduction to quantms.io

A quantms.io file is a collection of views, and they are aggregated into a folder .qms and inside that folder a file collect project.json MUST be present. Please read about the project view for more information.

The introduction to the format, concepts and more details topics about serialization can be read in the introduction to the format here.

How to contribute

External contributors, researchers and the proteomics community are more than welcome to contribute to this project.

Contribute with the specification: you can contribute to the specification with ideas or refinements by adding an issue into the issue tracker or performing a PR.

Core contributors and collaborators

The project is run by different groups:

Yasset Perez-Riverol (PRIDE Team, European Bioinformatics Institute - EMBL-EBI, U.K.)

IMPORTANT: If you contribute with the following specification, please make sure to add your name to the list of contributors.

Code of Conduct

As part of our efforts toward delivering open and inclusive science, we follow the Contributor Covenant Code of Conduct for Open Source Projects.

How to cite

Copyright notice

This information is free; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.

This information is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this work; if not, write to the Free Software
Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.

Name		Name	Last commit message	Last commit date
Latest commit History 531 Commits
.github/workflows		.github/workflows
docs		docs
quantmsio		quantmsio
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
MANIFEST.md		MANIFEST.md
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

quantms.io

Data model

Introduction to quantms.io

How to contribute

Core contributors and collaborators

Code of Conduct

How to cite

Copyright notice

About

Releases

Packages

Contributors 5

Languages

License

bigbio/quantms.io

Folders and files

Latest commit

History

Repository files navigation

quantms.io

Data model

Introduction to quantms.io

How to contribute

Core contributors and collaborators

Code of Conduct

How to cite

Copyright notice

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages