Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uncaught exception in DeepLCFeatureGenerator if not enough peptides for calibration set #130

Open
vrkosk opened this issue Mar 22, 2024 · 2 comments

Comments

@vrkosk
Copy link

vrkosk commented Mar 22, 2024

I'm getting an uncaught exception when trying to use ms2rescore.feature_generators.ms2pip.DeepLCFeatureGenerator. The error happens when there are not enough peptides in psm_list for the calibration set.

Here's how I create the environment:

C:\python\python309\python.exe -m venv venv_309_ms2rescore
venv_309_ms2rescore\Scripts\pip3 install ms2rescore==3.0.2

I'm calling the feature generator as instructed in MS2Rescore docs:

    fgen = DeepLCFeatureGenerator(
        lower_score_is_better=True, # because we use expect value as 'score'
        spectrum_path=None, # not relevant
        processes=processes,
        deeplc_retrain=False,
        calibration_set_size=0.15,
    )

    fgen.add_features(psm_list)

When there are only a few items in psm_list, there's an uncaught exception:

2024-03-22 11:17:35,204 INFO Running DeepLC for PSMs from run (1/1): `F981141_1.tsv9ig132dw.mgf`...
Traceback (most recent call last):
  File "C:\Users\villek\githead\mascot-proj\mascot\www\bin\ML_adapters\MS2RescoreAdapter.py", line 243, in <module>
    main()
  File "C:\Users\villek\githead\mascot-proj\mascot\www\bin\ML_adapters\MS2RescoreAdapter.py", line 218, in main
    _add_DeepLC_features(
  File "C:\Users\villek\githead\mascot-proj\mascot\www\bin\ML_adapters\MS2RescoreAdapter.py", line 126, in _add_DeepLC_features
    fgen.add_features(psm_list)
  File "C:\Users\villek\tmp\venv_309_ms2rescore\lib\site-packages\ms2rescore\feature_generators\deeplc.py", line 163, in add_features
    seq_df=self._psm_list_to_deeplc_peprec(psm_list_calibration)
  File "C:\Users\villek\tmp\venv_309_ms2rescore\lib\site-packages\ms2rescore\feature_generators\deeplc.py", line 211, in _psm_list_to_deeplc_peprec
    peprec = peprec.rename(
  File "C:\Users\villek\tmp\venv_309_ms2rescore\lib\site-packages\pandas\core\frame.py", line 3813, in __getitem__
    indexer = self.columns._get_indexer_strict(key, "columns")[1]
  File "C:\Users\villek\tmp\venv_309_ms2rescore\lib\site-packages\pandas\core\indexes\base.py", line 6070, in _get_indexer_strict
    self._raise_if_missing(keyarr, indexer, axis_name)
  File "C:\Users\villek\tmp\venv_309_ms2rescore\lib\site-packages\pandas\core\indexes\base.py", line 6130, in _raise_if_missing
    raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Index(['tr', 'seq', 'modifications'], dtype='object')] are in the [columns]"

The workaround in my script is to pass calibration_set_size=1.0 when round(calibration_set_size * len(psm_list[~psm_list['is_decoy']])) == 0. Then _psm_list_to_deeplc_peprec() gets a non-empty array and all is fine. Quite likely I shouldn't even use DeepLC if there aren't enough peptide matches!

@RalfG
Copy link
Member

RalfG commented Apr 8, 2024

Hi, @vrkosk,

Thanks for reporting! We will look into this.

Best,
Ralf

@RalfG
Copy link
Member

RalfG commented Apr 8, 2024

For internal reference:

_psm_list_to_deeplc_peprec() has already been removed in the timsRescore branch in favor of sending the PSMList directly to DeepLC. However, we should still look into how this behaves when there are not enough PSMs (or none) for calibration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants