-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Full implementation of Casanovo-DB #352
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work. It's starting to look quite nice.
Several smaller comments that will be easy to address.
One major comment I have is that we should try to abstract the database functionality a bit more. Now a list of tuples with peptide, mass, protein is passed through several levels in the code. Instead, this can be encapsulated by a ProteinDatabase
class with functions cleave
, get_candidates
, etc. (examples) that provides these functionalities in a much more coherent and understandable fashion.
After that initial implementation, you could also simplify some parts of the database functionality. For example, passing the protein identifier through all inference steps is unnecessary. Instead, the ProteinDatabase
can retain this mapping internally, only the peptides are used during inference, and upon exporting this mapping is used to retrieve the relevant protein information.
I suggest starting with drawing a schematic of how this class would work, where it receives information from and where it will be called, and the data flow among those steps. And only after you have this clear mental overview, start with the implementation.
…into db_search_full
@bittremieux All comments you left should be addressed in the most recent commit |
Addresses all previous comments in PR #325.
High-level changes:
.mztab
output for Casanovo-DB consistent with specificationsThings to Note:
unique
,database
, anddatabase_version
columns of the output.mztab
for db-search mode