-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Add Ability to accept multiple result files #137
Conversation
Results loader accepts a list of result files
- Rename generic loader to ResultsLoader, do not make it an abstract class, can have just result files - Add methods for loading features to this class e.g. loadTopTransitionGroupFeature() which can take a list of result files
- Code cleanup - remove (comment out for now) feature tests from lower level loaders - Ensure tests still work
getIdentifiedPrecursors getIdentifiedProteins getIdentifiedPeptides getNumIndenticiations (precursors, peptides, proteins) getCV add appropriate tests for these new methods
add method which gets a df of # precursors/peptides/proteins update tests accordingly
extend ResultsLoader class to have methods to query the entire result file and plotting methods.
also bug fixes when add tests
fix tests update snapshots resulting from merging
Add back functionality of scoring distribution plotting
Also update bugs associated with these functions
Refactor SearchResultAnalysisPlotter for plotting search results
this does not work with all versions of pandas
…sh into refactor/resultsPlotting
Also fix bug fixes associated with these changes
a91c35f
to
d278d4f
Compare
fix typing references for python 3.9
d278d4f
to
510ec25
Compare
output consensusApexIM column in dataframe if present
minor bug fix reverse order of @Property and @AbstractMethod
bug fixes from last commit
update test snapshots to have IM dataframe
update quickstart to show working with multiple chromatograms in a single loader. fix bugs that found when editing the notebook
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates and changes. Mostly looks good, I just made a few comments/suggestions. Should be good to merge after some of them are addressed/fixed.
def __init__(self, **kwargs): | ||
super().__init__(**kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the doc-string need to be updated? Why are the args removed and replaced with kwargs? Does the ABSMeta take care of validating correct input kwargs? I'm not a huge fan of have only **kwargs
as the only input to a function. Makes it a little complicated to track what variables are used and where they're initialized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, so GenericChromatogramLoader
inherits from GenericRawDataLoader
which inherits from ResultsLoader
, which is where the args are explicitly set. Is the reason for setting **kwargs
in the GenericLoaders to allow for more flexible args for the actual implementations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea was to not have to copy the arguments all the way up the chain (e.g. if ResultsLoader is edited then we do not have to edit GenericChromatogramLoader)
It definitely is harder to read though so I'm not sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mm, that makes sense. I think it's good as is, it will make maintaining easier. I would add a doc-string link to where the args are laid out though, so it's easier to track references/inheritance.
INNER JOIN SCORE_PROTEIN ON SCORE_PROTEIN.PROTEIN_ID = PEPTIDE_PROTEIN_MAPPING.PROTEIN_ID | ||
WHERE FEATURE.RUN_ID = {run_id} AND SCORE_MS2.QVALUE <= {qvalue} AND PRECURSOR.DECOY = 0 AND SCORE_PEPTIDE.QVALUE <= {qvalue} AND SCORE_PROTEIN.QVALUE <= {qvalue} and SCORE_MS2.RANK == 1""" | ||
rslt = self.conn.execute(stmt) | ||
return set([i[0] for i in rslt.fetchall()]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this faster than doing DISTINCT
within the sql query?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was done this way because we want the end result to be a set
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
distinct is likely faster though not sure if I should add distinct and then still just convert to set
@singjc I think this is ready for merging now? |
Great, thanks for the changes. Will merge now. |
Description
rsltsFile
this is useful for automatically plotting features from a different software on the same plot.runName
inloadTransitionGroup()
andloadTransitionGroupFeatures()
to only fetch information from that run. This is useful forplotChromatogram()
method can only plot a chromatogram from a single runPlease delete options that are not relevant.
How Has This Been Tested?
This is still a work in progress
Contents (#137)
Other
Uncategorised!