AMPBenchmark

AMPBenchmark is a part of our initative for the improvement of benchmarking standards in the field of antimicrobial peptide (AMP) prediction.

How to use the public data?

Download the benchmark sequence data:
- Dropbox link.
- GitHub link.
Download the training sequence data for all methods and replications:
- Dropbox link.
Train your model using each of the training data set (class of a sequence is denoted by AMP=1 for AMPs and AMP=0 for negative samples, see Sequence data section for details.)
Benchmark trained models against our data. Make sure to use a subset of sequences for appropriate replication (replication number is denoted by, e.g. rep=1, see Sequence data section for details.)
Submit the results in the format described below to the AMPBenchmark web server.

Data submission format

ID	training_sampling	AMP_probability
DBAASP_10018_AMP=1_rep1	dbAMP	0.97
DBAASP_3217_AMP=1_rep1	dbAMP	0.61
…	…	…

ID: must contain the sequence ID, as provided in the FASTA headers of the input sequences.
training_sampling: has to contain the type of negative sampling method used to train the model. Possible values are: AMAP, AmpGram, ampir-mature, AMPlify, AMPScannerV2, CS-AMPPred, dbAMP, Gabere&Noble, iAMP-2L, Wang-et-al, Witten&Witten. Remember that a proper benchmark requires you to train your model using every provided sampling method and evaluate it using all sampling methods using appropriate replication.
AMP_probability: has to be in the range between 0 and 1.

Example data for a random classifier can be downloaded from Dropbox.

Sequence data

The input data is hosted on Dropbox and GitHub. Note that this single file contains data for all replications which should be used separately with appropriate replications of training sets.

The training data sets are hosted on Dropbox and follow the same naming convention.

There are two types of the input sequences:

positive sequence (e.g., DBAASP_10718_AMP=1_rep1): IDinDBAASP_class_replicateID.
negative sequences (e.g., Seq1896_sampling_method=Gabere&Noble_AMP=0_rep4): IDandSamplingMethod_class_replicateID.

AMP sequences are derived from the DBAASP database.

md5 sum of the AMPBenchmark_public.fasta: 58f1424c057aaeb64bc632cad6038cad.

Citation

Katarzyna Sidorczuk, Przemysław Gagat, Filip Pietluch, Jakub Kała, Dominik Rafacz, Laura Bąkała, Jadwiga Słowik, Rafał Kolenda, Stefan Rödiger, Legana C H W Fingerhut, Ira R Cooke, Paweł Mackiewicz, Michał Burdukiewicz, Benchmarks in antimicrobial peptide prediction are biased due to the selection of negative data, Briefings in Bioinformatics, 2022;, bbac343, https://doi.org/10.1093/bib/bbac343.

Important links

https://github.com/BioGenies/NegativeDatasets: the repository containing the code necessary to reproduce results of our analysis.
https://github.com/BioGenies/NegativeDatasetsArchitectures: the repository containing all architectures considered in our analysis.
https://github.com/BioGenies/AMPBenchmark: the source code of AMPBenchmark.

Contact

If you have any questions, suggestions or comments, contact Michal Burdukiewicz.

Changelog

2023/01/11: fixed data processing.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
man		man
www		www
.gitignore		.gitignore
README.Rmd		README.Rmd
README.md		README.md
server.R		server.R
ui.R		ui.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AMPBenchmark

How to use the public data?

Data submission format

Sequence data

Citation

Important links

Contact

Changelog

About

Releases

Packages

Contributors 2

Languages

BioGenies/AMPBenchmark

Folders and files

Latest commit

History

Repository files navigation

AMPBenchmark

How to use the public data?

Data submission format

Sequence data

Citation

Important links

Contact

Changelog

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages