Add Casanovo-DB Functionality #325

VarunAnanth2003 · 2024-04-27T20:23:08Z

Added functionality of Casanovo-DB that builds on top of Tide (Phase 1).

This is accomplished through two new commands, annotate and db-search.

Most of the functionality is created through subclassing existing classes as only minor modifications are needed to get the data flow to work.

…into dev_db_search

bittremieux · 2024-04-28T16:13:49Z

Can you make sure the unit tests succeed?

On a quick glance, without looking at the content of the code, please also adhere to the maximum line length for strings (both within code and docstrings). Black doesn't do this automatically.

VarunAnanth2003 · 2024-04-29T00:42:40Z

Can you make sure the unit tests succeed?

On a quick glance, without looking at the content of the code, please also adhere to the maximum line length for strings (both within code and docstrings). Black doesn't do this automatically.

I will definitely take a look at the line lengths and fix them if they are over the maximum.

As for the unit tests, the only one failing is the test_initialize_model test, which seems to be the issue we are waiting on in PR #301 .

FAILED tests/unit_tests/test_runner.py::test_initialize_model - RuntimeError: MPS backend out of memory (MPS allocated: 0 bytes, other allocations: 0 bytes, max allowed: 7.93 GB). Tried to allocate 1024 bytes on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).

bittremieux · 2024-04-29T04:56:58Z

No, I don't think that this is the same problem. #301 tries to do tests on the Mac M1 chips, which indeed doesn't work. The current tests on dev only use the old Mac architectures though, and should run successfully (as with the previous state of this branch).

bittremieux · 2024-05-07T16:28:47Z

@VarunAnanth2003 To avoid running on MPS-supported macOS versions, in the GitHub test action, change macos-latest to macos-13.

casanovo/config.yaml

casanovo/data/annotate_db.py

bittremieux · 2024-05-07T18:10:37Z

The macOS version has been fixed in #327, so just merge dev into your PR.

casanovo/denovo/dataloaders.py

Move macOS test fixes from dev

codecov · 2024-05-07T18:39:38Z

Codecov Report

Attention: Patch coverage is 96.09756% with 8 lines in your changes missing coverage. Please review.

Project coverage is 94.68%. Comparing base (0d1df14) to head (092fa2a).

Files with missing lines	Patch %	Lines
casanovo/data/db_utils.py	96.87%	3 Missing ⚠️
casanovo/denovo/dataloaders.py	92.85%	2 Missing ⚠️
casanovo/denovo/model_runner.py	91.30%	2 Missing ⚠️
casanovo/denovo/model.py	97.56%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##              dev     #325      +/-   ##
==========================================
+ Coverage   94.37%   94.68%   +0.31%     
==========================================
  Files          13       14       +1     
  Lines        1102     1298     +196     
==========================================
+ Hits         1040     1229     +189     
- Misses         62       69       +7

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

casanovo/denovo/model.py

Full implementation of Casanovo-DB

VarunAnanth2003 · 2024-09-21T01:06:47Z

This branch is currently fully up to date with dev and does not introduce any new failing tests (that aren't already linked to known problems).

justin-a-sanders · 2024-10-01T23:17:48Z

By default, Casanovo-DB should only report the top scoring PSM for each spectrum, and there should be a paremater top_n that allows the user to ask for more (like for tide). Currently, reporting all PSMs requires a huge amount of memory, and the resulting output file for even a modest run is 100+ Gb, which is impractical for most users who just care about the top id for each spectrum.

bittremieux · 2024-10-02T08:39:17Z

Yes, definitely. Casanovo-DB should adhere to the value set for top_match in the config, which controls exactly that behavior.

casanovo/casanovo.py

casanovo/config.yaml

casanovo/data/datasets.py

casanovo/denovo/model_runner.py

tests/conftest.py

VarunAnanth2003 added 7 commits March 27, 2024 16:44

begin adding tests for annotate mode

258edb4

add basic test for annotate mode

30f5984

added test case for annotate mode and modified method

186bc0f

very rough sketch of db upgrade (untested)

a8f50f4

small upgrades to documentation

dae9c8a

better output formatting

7f95ae5

all tests added

278436b

VarunAnanth2003 added the enhancement New feature or request label Apr 27, 2024

VarunAnanth2003 requested a review from justin-a-sanders April 27, 2024 20:23

VarunAnanth2003 and others added 5 commits April 27, 2024 13:25

remove minor debugging print statement

949ea93

Generate new screengrabs with rich-codex

da5ef5e

remove excess info logs, add monkeypatch to tests

53f6bec

Merge branch 'dev_db_search' of https://github.com/Noble-Lab/casanovo …

e2cbce8

…into dev_db_search

mp fix

81aa073

VarunAnanth2003 and others added 2 commits May 6, 2024 23:26

fix line lengths and modify test

0ecbd80

Generate new screengrabs with rich-codex

ee6638e