You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Metaproteomics is the analysis of proteins in samples composed of multiple organisms. One major use case is the investigation of the functional composition of a sample. Multiple tools can connect identified sequences with functional information (e.g. Unipept, Prophane, MetaGOmics). Unfortunately, the performance of these tools is not easy to assess, due to a lack of data with known ground-truth at the functional level. The target benchmark dataset would consist of a diverse range of peptides/proteins with high-quality, experimentally validated functional annotations. The obstacles that need to be overcome for the creation of such a dataset are: (1) the further complicated protein inference issue in metaproteomics compared to single-organism proteomics (peptides can match to homologues in the same and multiple organisms) and (2) low annotation levels of proteins in the metaproteomic context (many proteins have no function - not even an assumed one - assigned to them). We plan to develop a concept on how the ideal gold standard dataset should be composed and generate it accordingly. Based on this dataset, a functional benchmark of the aforementioned tools can be initiated.
Work plan
Compile sequence database of proteins with validated functions
generate simulated peptide identification lists based on the database, closely resembling result characteristics in metaproteomics
specify benchmarking criteria
(Potentially) benchmark existing tools against generated data
Technical details
datasets are derived from reference databases such as SwissProt
tools for benchmarking:
Unipept
Prophane
MetaGOmics
Contact information
Henning Schiebenhoefer - Robert Koch-Institut (Germany) - [email protected]
The text was updated successfully, but these errors were encountered:
Abstract
Metaproteomics is the analysis of proteins in samples composed of multiple organisms. One major use case is the investigation of the functional composition of a sample. Multiple tools can connect identified sequences with functional information (e.g. Unipept, Prophane, MetaGOmics). Unfortunately, the performance of these tools is not easy to assess, due to a lack of data with known ground-truth at the functional level. The target benchmark dataset would consist of a diverse range of peptides/proteins with high-quality, experimentally validated functional annotations. The obstacles that need to be overcome for the creation of such a dataset are: (1) the further complicated protein inference issue in metaproteomics compared to single-organism proteomics (peptides can match to homologues in the same and multiple organisms) and (2) low annotation levels of proteins in the metaproteomic context (many proteins have no function - not even an assumed one - assigned to them). We plan to develop a concept on how the ideal gold standard dataset should be composed and generate it accordingly. Based on this dataset, a functional benchmark of the aforementioned tools can be initiated.
Work plan
Technical details
Contact information
Henning Schiebenhoefer - Robert Koch-Institut (Germany) - [email protected]
The text was updated successfully, but these errors were encountered: