Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Casanovo-DB Functionality #325

Open
wants to merge 67 commits into
base: dev
Choose a base branch
from
Open

Add Casanovo-DB Functionality #325

wants to merge 67 commits into from

Conversation

VarunAnanth2003
Copy link
Contributor

Added functionality of Casanovo-DB that builds on top of Tide (Phase 1).

This is accomplished through two new commands, annotate and db-search.

Most of the functionality is created through subclassing existing classes as only minor modifications are needed to get the data flow to work.

@VarunAnanth2003 VarunAnanth2003 added the enhancement New feature or request label Apr 27, 2024
@bittremieux
Copy link
Collaborator

Can you make sure the unit tests succeed?

On a quick glance, without looking at the content of the code, please also adhere to the maximum line length for strings (both within code and docstrings). Black doesn't do this automatically.

@VarunAnanth2003
Copy link
Contributor Author

Can you make sure the unit tests succeed?

On a quick glance, without looking at the content of the code, please also adhere to the maximum line length for strings (both within code and docstrings). Black doesn't do this automatically.

I will definitely take a look at the line lengths and fix them if they are over the maximum.

As for the unit tests, the only one failing is the test_initialize_model test, which seems to be the issue we are waiting on in PR #301 .

FAILED tests/unit_tests/test_runner.py::test_initialize_model - RuntimeError: MPS backend out of memory (MPS allocated: 0 bytes, other allocations: 0 bytes, max allowed: 7.93 GB). Tried to allocate 1024 bytes on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).

@bittremieux
Copy link
Collaborator

No, I don't think that this is the same problem. #301 tries to do tests on the Mac M1 chips, which indeed doesn't work. The current tests on dev only use the old Mac architectures though, and should run successfully (as with the previous state of this branch).

@bittremieux
Copy link
Collaborator

@VarunAnanth2003 To avoid running on MPS-supported macOS versions, in the GitHub test action, change macos-latest to macos-13.

@bittremieux
Copy link
Collaborator

The macOS version has been fixed in #327, so just merge dev into your PR.

Move macOS test fixes from dev
Copy link

codecov bot commented May 7, 2024

Codecov Report

Attention: Patch coverage is 96.09756% with 8 lines in your changes missing coverage. Please review.

Project coverage is 94.68%. Comparing base (0d1df14) to head (092fa2a).

Files with missing lines Patch % Lines
casanovo/data/db_utils.py 96.87% 3 Missing ⚠️
casanovo/denovo/dataloaders.py 92.85% 2 Missing ⚠️
casanovo/denovo/model_runner.py 91.30% 2 Missing ⚠️
casanovo/denovo/model.py 97.56% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##              dev     #325      +/-   ##
==========================================
+ Coverage   94.37%   94.68%   +0.31%     
==========================================
  Files          13       14       +1     
  Lines        1102     1298     +196     
==========================================
+ Hits         1040     1229     +189     
- Misses         62       69       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@VarunAnanth2003
Copy link
Contributor Author

This branch is currently fully up to date with dev and does not introduce any new failing tests (that aren't already linked to known problems).

@bittremieux bittremieux self-requested a review September 21, 2024 11:37
@justin-a-sanders
Copy link

By default, Casanovo-DB should only report the top scoring PSM for each spectrum, and there should be a paremater top_n that allows the user to ask for more (like for tide). Currently, reporting all PSMs requires a huge amount of memory, and the resulting output file for even a modest run is 100+ Gb, which is impractical for most users who just care about the top id for each spectrum.

@bittremieux
Copy link
Collaborator

Yes, definitely. Casanovo-DB should adhere to the value set for top_match in the config, which controls exactly that behavior.

casanovo/casanovo.py Outdated Show resolved Hide resolved
casanovo/config.yaml Show resolved Hide resolved
casanovo/config.yaml Outdated Show resolved Hide resolved
casanovo/config.yaml Outdated Show resolved Hide resolved
casanovo/data/datasets.py Outdated Show resolved Hide resolved
casanovo/denovo/model_runner.py Outdated Show resolved Hide resolved
tests/conftest.py Outdated Show resolved Hide resolved
tests/conftest.py Show resolved Hide resolved
tests/conftest.py Show resolved Hide resolved
tests/conftest.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants