Exploring De Novo Assembly of Protein Sequences from Simulated Reads Generated by a Single Molecule Protein Sequencing Technology #10
awassie22
announced in
Hackathon proposals
Replies: 1 comment
-
This looks like a very interesting project! To make the enzymatic digestion simulation more realistic, we could consider using one of the deep learning models trained on mass spectrometry peptide identifications (though of course, they may also be biased toward what mass spectrometers can detect). There will be time (1.5 hours I think) for the project leads to present their topic, before people decide which project(s) to work on. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Title
Exploring De Novo Assembly of Protein Sequences from Simulated Reads Generated by a Single Molecule Protein Sequencing Technology
Abstract
There are currently different approaches being explored for de novo single molecule protein sequencing (SMPS). While the proposed technologies differ in their sequencing mechanisms, at their core these approaches aim to sequence peptides or full-length proteins with single amino acid (AA) sensitivity and resolution. Such a high-throughput method for SMPS enables a wide-range of proteomic applications, one of which is the assembly of full length protein sequences from short, error-prone reads of peptides generated by the SMPS method. In the case of genomics, the early days saw the development of short-read assemblers for Solexa sequencers that enabled scalable whole genome sequencing. We foresee similar opportunities for protein sequencing that will be of great benefit to the proteomics community as SMPS technologies continue to mature over the next few years.
As part of this hackathon, we propose to explore and implement a strategy for a de novo assembler of full length protein sequences. To implement this strategy, we will provide a simulated dataset from a hypothetical SMPS technology. This SMPS sequencer has the ability to generate short reads from peptides. For each AA position, the sequencer will output posterior probabilities across the natural 20 AA, along with information on error rates
Project Plan
After the hackathon we aim to continue further developing the proposed strategy, and ideally test the assembler with any SMPS dataset that becomes publicly available. We will make the code available via a public Github repository, and if a publication is produced we aim to make all contributors co-authors
Technical Details
Contact information
Asmamaw (Oz) Wassie
Glyphic Biotechnologies
[email protected]
Beta Was this translation helpful? Give feedback.
All reactions