Feature: Add Ability to accept multiple result files #137

jcharkow · 2024-09-30T21:00:54Z

Description

Loaders (MzMLLoader and SqMassLoader can accept a list of rsltsFile this is useful for automatically plotting features from a different software on the same plot.
Auto detect results file based on contents (does not have to be supplied by user)
Can specify a runName in loadTransitionGroup() and loadTransitionGroupFeatures() to only fetch information from that run. This is useful for plotChromatogram() method can only plot a chromatogram from a single run
Update tests + docs to reflect these changes.

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How Has This Been Tested?

This is still a work in progress

Contents (#137)

Other

extend ResultsLoader class
Add tests for new ResultsLoader methods
minor fixes
update tests snapshots
fix tests
add scoring distribution
add documentation to OSWAccess getScoreTable
add tests for new Score distribution functions
add back searchResultAnalysis plots
SearchResultsAnalysis server
get GUI for search plot distributions working
remove include_groups flag
bug in transitionLoading
Roestlab/massdash into refactor/resultsPlotting
docs: Update Loading Feature Information Docs
update docs to reflect new behaviour
bug when to pick to label by software
update plotting gallery for new code refactoring
snapshots not directory specific
add tests of multiple result files with mzML loader
fix: failing tests based on OS
remove unneeded dependency
add multi-result file tests for sqmassLoader
add test throw error if sqMass with DIA-NN
fix bugs with new changes
fix: streamlit interface with new refactoring
fix result file reading with streamlit
feature: add custom logger
select specific runs to get transitions/features from
bug with loading transition data
load DIA-NN when Precursor.Mz not found
fix: add tests for new "select runs" functionality + bug fixes
update snapshots
typing for python 3.9
updates docs for new codebase
control when to output IM column
bug fixes
update tests
fix typo
better check for if streamlit is running
fix: update quickstart to showcase multiple files
check if running in jupytere context
add loaders/access to docs
update ResultsLoader docs
loadPrecursorScoreDistribution --> loadPeakGroupScoreDistribution

Uncategorised!

Merge branch 'test/add_tests' into refactor/resultsPlotting
[FEATURE] autodetect .TSV results type
Rename GenericLoader -> ResultsLoader
[FEATURE][TEST] load multiple result files in loader
[REFACTOR][FIX] SqMassLoader+MzMLLoader new interface
Add access methods getIdentifications
Add getExperimentSummary
Merge branch 'dev' into refactor/resultsPlotting
Merge branch 'dev' into refactor/resultsPlotting
Merge branch 'dev' into refactor/resultsPlotting
Merge branch 'dev' into refactor/resultsPlotting
add pep, qvalue and p value as valid columns
Merge branch 'dev' into refactor/resultsPlotting
fix tests
Update tutorials for refactored changes
Merge branch 'dev' into refactor/resultsPlotting
Merge branch 'dev' into refactor/resultsPlotting
bug fig
Add runNames to repr
remove unneeded code
apply more justin suggestions
apply more suggestions
apply more comments
add checks that expected tables are there

Results loader accepts a list of result files

- Rename generic loader to ResultsLoader, do not make it an abstract class, can have just result files - Add methods for loading features to this class e.g. loadTopTransitionGroupFeature() which can take a list of result files

- Code cleanup - remove (comment out for now) feature tests from lower level loaders - Ensure tests still work

getIdentifiedPrecursors getIdentifiedProteins getIdentifiedPeptides getNumIndenticiations (precursors, peptides, proteins) getCV add appropriate tests for these new methods

add method which gets a df of # precursors/peptides/proteins update tests accordingly

extend ResultsLoader class to have methods to query the entire result file and plotting methods.

also bug fixes when add tests

fix tests update snapshots resulting from merging

Add back functionality of scoring distribution plotting

Also update bugs associated with these functions

Refactor SearchResultAnalysisPlotter for plotting search results

this does not work with all versions of pandas

…sh into refactor/resultsPlotting

fix new tests snapshots everything should pass now. Results a bit strange because it is a test file

Also fix bug fixes associated with these changes

fix typing references for python 3.9

output consensusApexIM column in dataframe if present

@AbstractMethod

minor bug fix reverse order of @Property and @AbstractMethod

bug fixes from last commit

update test snapshots to have IM dataframe

update quickstart to show working with multiple chromatograms in a single loader. fix bugs that found when editing the notebook

singjc

Thanks for the updates and changes. Mostly looks good, I just made a few comments/suggestions. Should be good to merge after some of them are addressed/fixed.

singjc · 2024-10-11T14:17:55Z

massdash/loaders/GenericChromatogramLoader.py

+    def __init__(self, **kwargs): 
+        super().__init__(**kwargs)


Does the doc-string need to be updated? Why are the args removed and replaced with kwargs? Does the ABSMeta take care of validating correct input kwargs? I'm not a huge fan of have only **kwargs as the only input to a function. Makes it a little complicated to track what variables are used and where they're initialized.

I see, so GenericChromatogramLoader inherits from GenericRawDataLoader which inherits from ResultsLoader, which is where the args are explicitly set. Is the reason for setting **kwargs in the GenericLoaders to allow for more flexible args for the actual implementations?

The idea was to not have to copy the arguments all the way up the chain (e.g. if ResultsLoader is edited then we do not have to edit GenericChromatogramLoader)
It definitely is harder to read though so I'm not sure

mm, that makes sense. I think it's good as is, it will make maintaining easier. I would add a doc-string link to where the args are laid out though, so it's easier to track references/inheritance.

massdash/loaders/GenericChromatogramLoader.py

massdash/loaders/GenericRawDataLoader.py

singjc · 2024-10-11T17:40:26Z

massdash/loaders/access/OSWDataAccess.py

+                INNER JOIN SCORE_PROTEIN ON SCORE_PROTEIN.PROTEIN_ID = PEPTIDE_PROTEIN_MAPPING.PROTEIN_ID
+                WHERE FEATURE.RUN_ID = {run_id} AND SCORE_MS2.QVALUE <= {qvalue} AND PRECURSOR.DECOY = 0 AND SCORE_PEPTIDE.QVALUE <= {qvalue} AND SCORE_PROTEIN.QVALUE <= {qvalue} and SCORE_MS2.RANK == 1"""
+            rslt = self.conn.execute(stmt)
+            return set([i[0] for i in rslt.fetchall()])


Is this faster than doing DISTINCT within the sql query?

It was done this way because we want the end result to be a set

distinct is likely faster though not sure if I should add distinct and then still just convert to set

massdash/loaders/access/OSWDataAccess.py

massdash/loaders/access/ResultsTSVDataAccess.py

massdash/peakPickers/pyMRMTransitionGroupPicker.py

massdash/structs/TransitionGroup.py

jcharkow · 2024-10-15T19:23:27Z

@singjc I think this is ready for merging now?

singjc · 2024-10-15T21:10:58Z

@singjc I think this is ready for merging now?

Great, thanks for the changes. Will merge now.

jcharkow added 30 commits March 5, 2024 08:21

Merge branch 'test/add_tests' into refactor/resultsPlotting

d0c2167

[FEATURE] autodetect .TSV results type

429cd68

Rename GenericLoader -> ResultsLoader

0196d77

Results loader accepts a list of result files

[FEATURE][TEST] load multiple result files in loader

2c2ec68

- Rename generic loader to ResultsLoader, do not make it an abstract class, can have just result files - Add methods for loading features to this class e.g. loadTopTransitionGroupFeature() which can take a list of result files

[REFACTOR][FIX] SqMassLoader+MzMLLoader new interface

daa34fb

- Code cleanup - remove (comment out for now) feature tests from lower level loaders - Ensure tests still work

Add access methods getIdentifications

69ca289

getIdentifiedPrecursors getIdentifiedProteins getIdentifiedPeptides getNumIndenticiations (precursors, peptides, proteins) getCV add appropriate tests for these new methods

Add getExperimentSummary

b89c94a

add method which gets a df of # precursors/peptides/proteins update tests accordingly

refactor: extend ResultsLoader class

9339de1

extend ResultsLoader class to have methods to query the entire result file and plotting methods.

test: Add tests for new ResultsLoader methods

df7a05b

also bug fixes when add tests

Merge branch 'dev' into refactor/resultsPlotting

f2bdb54

test: minor fixes

31f215c

Merge branch 'dev' into refactor/resultsPlotting

94bcdbc

Merge branch 'dev' into refactor/resultsPlotting

f996850

test: update tests snapshots

3038349

test: fix tests

0a2dabf

fix tests update snapshots resulting from merging

feature: add scoring distribution

3a83bc8

Add back functionality of scoring distribution plotting

Merge branch 'dev' into refactor/resultsPlotting

fbd849e

doc: add documentation to OSWAccess getScoreTable

3f362a5

test: add tests for new Score distribution functions

e1c88dc

Also update bugs associated with these functions

refactor: add back searchResultAnalysis plots

78165df

Refactor SearchResultAnalysisPlotter for plotting search results

refactor: SearchResultsAnalysis server

25807c7

refactor: get GUI for search plot distributions working

aab89fe

bug: remove include_groups flag

a1c38f0

this does not work with all versions of pandas

add pep, qvalue and p value as valid columns

2491394

fix: bug in transitionLoading

a800a85

Merge branch 'refactor/resultsPlotting' of github.com:Roestlab/massda…

0644b47

…sh into refactor/resultsPlotting

Merge branch 'dev' into refactor/resultsPlotting

e069d6c

fix tests

ace08cf

fix new tests snapshots everything should pass now. Results a bit strange because it is a test file

fix: docs: Update Loading Feature Information Docs

6f818b3

Also fix bug fixes associated with these changes

Update tutorials for refactored changes

a3c1ed1

jcharkow force-pushed the refactor/resultsPlotting branch 3 times, most recently from a91c35f to d278d4f Compare October 9, 2024 21:52

fix: typing for python 3.9

510ec25

fix typing references for python 3.9

jcharkow force-pushed the refactor/resultsPlotting branch from d278d4f to 510ec25 Compare October 9, 2024 22:13

jcharkow added 5 commits October 10, 2024 10:42

docs: updates docs for new codebase

4ecb9f6

feature: control when to output IM column

9b23368

output consensusApexIM column in dataframe if present

bug fig

6c7cbc7

minor bug fix reverse order of @Property and @AbstractMethod

fix: bug fixes

5ac95cd

bug fixes from last commit

test: update tests

23803cb

update test snapshots to have IM dataframe

jcharkow marked this pull request as ready for review October 10, 2024 15:29

jcharkow added 3 commits October 10, 2024 11:31

docs: fix typo

cb4bb9f

fix: better check for if streamlit is running

be4ed77

docs: fix: update quickstart to showcase multiple files

8f1184e

update quickstart to show working with multiple chromatograms in a single loader. fix bugs that found when editing the notebook

jcharkow requested a review from singjc October 10, 2024 18:18

singjc requested changes Oct 11, 2024

View reviewed changes

jcharkow added 10 commits October 13, 2024 13:37

Add runNames to repr

b3e40a0

remove unneeded code

a514f0a

feat: check if running in jupytere context

30d75f8

apply more justin suggestions

5d3d1a9

apply more suggestions

56784cc

apply more comments

6122622

add checks that expected tables are there

005a611

docs: add loaders/access to docs

410ea3c

docs: update ResultsLoader docs

6205dfc

test: loadPrecursorScoreDistribution --> loadPeakGroupScoreDistribution

d1b5bd5

singjc approved these changes Oct 15, 2024

View reviewed changes

singjc merged commit 4ee27ce into dev Oct 15, 2024
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Add Ability to accept multiple result files #137

Feature: Add Ability to accept multiple result files #137

jcharkow commented Sep 30, 2024 •

edited by github-actions bot

Loading

singjc left a comment

singjc Oct 11, 2024

singjc Oct 11, 2024

jcharkow Oct 13, 2024

singjc Oct 13, 2024

singjc Oct 11, 2024

jcharkow Oct 14, 2024

jcharkow Oct 14, 2024

jcharkow commented Oct 15, 2024

singjc commented Oct 15, 2024

Feature: Add Ability to accept multiple result files #137

Feature: Add Ability to accept multiple result files #137

Conversation

jcharkow commented Sep 30, 2024 • edited by github-actions bot Loading

Description

How Has This Been Tested?

Contents (#137)

Other

Uncategorised!

singjc left a comment

Choose a reason for hiding this comment

singjc Oct 11, 2024

Choose a reason for hiding this comment

singjc Oct 11, 2024

Choose a reason for hiding this comment

jcharkow Oct 13, 2024

Choose a reason for hiding this comment

singjc Oct 13, 2024

Choose a reason for hiding this comment

singjc Oct 11, 2024

Choose a reason for hiding this comment

jcharkow Oct 14, 2024

Choose a reason for hiding this comment

jcharkow Oct 14, 2024

Choose a reason for hiding this comment

jcharkow commented Oct 15, 2024

singjc commented Oct 15, 2024

jcharkow commented Sep 30, 2024 •

edited by github-actions bot

Loading