Code for paper "Molecular Fingerprints Are Strong Models for Peptide Function Prediction" J. Adamczyk, P. Ludynia, W. Czech, from AGH ML & Chemoinformatics Group.
Preprint is available on Arxiv.
Dependencies are installed with Poetry. You can install exact versions used from poetry.lock
by running poety sync
in fresh Python 3.11 virtual environment.
To ensure you have everything set up, you can also run make setup
. All datasets used are
already included in the data
directory.
To obtain results on benchmarks in the main paper body, run:
- LRGB:
python src/main_lrgb.py
- LRGB binary fingerprints:
python src/main_lrgb_binary_fp.py_
- BERT AMPs benchmark of Gao et al.:
python src/main_bert_amp_benchmark.py
- AMPs benchmark of Xu et al.:
python src/main_xu_amp.py
- AMPs benchmark of Sidorczuk et al.:
- run
python src/main_ampbenchmark.py
- result file will be created in
results
directory - upload it to AMPBenchmark web server
- download result from "Impact of the sampling method" tab
- run
- AutoPeptideML benchmark of Fernandez-Diaz et al.:
python src/main_autopeptideml.py
- PeptideReactor benchmark of Späanig et al.:
- run
python src/main_peptidereactor.py
- run
python src/main_peptidereactor_fp_encoding.py
- result files will be created in
results
directory - parse the results by running
cd util_scripts && python parse_peptidereactor_results.py
- run
util_scripts
directory contains a few utility scripts:
create_bert_benchmark_datasets.py
creates train-test dataset splits for BERT AMPs benchmark; requires CD-HIT to be installedparse_autopeptideml_results.py
parses original results from AutoPeptideML, to create more readable CSV outputparse_peptidereactor_results.py
parses original results from PeptideReactor, which are a very complex JSON, and adds fingerprint results from training our models