The development of MaCPepDB is moved to a new repository: https://github.com/medbioinf/macpepdb
Creates a peptide databases by digesting proteins stored in FASTA-/Uniprot-Text-files.
Some UniProt entries contain one letter codes which encode multiple amino acids. Usually the encoded amino acids have a similar or equal mass. Ambiguous one letter codes are:
B
encodesD
&N
J
encodesI
&L
Z
encodesE
&Q
Because the amino acids encoded by B
& Z
have a different mass and only a few hundreds entries contain these, MaCPepDB resolves the ambiguity by creating all possible combination of the peptide with the distinct amino acids, e.g.:
ambiguous peptide | distinct peptides |
---|---|
PE_B_TIDE_Z_K |
PE_D_TIDE_E_K |
PE_D_TIDE_Q_K |
|
PE_N_TIDE_E_K |
|
PE_N_TIDE_Q_K |
J
encodes Leucine and Isoleucine, both have the same mass. Resolving those would not make the peptides better distinguishable by mass.
In theory X
is also ambiguous encoding all amino acids. Practically a lot more entries containing X
sometimes with a high abundance of X
. Resolving this would increase the amount of peptides significantly and slow down MaCPepDB's search functionality. Because X
has no mass peptides, containing it, will be discarded entirely.
Only necessary for development and non-Docker installation
- GIT
- Build tools (Ubuntu:
build-essential
, Arch Linux:base-devel
) - C/C++-header for PostgreSQL (Ubuntu:
libpq-dev
, Arch Linux:postgresql-libs
) - C/C++-header for libev (Ubuntu:
libev-dev
, Arch Linux:libev
) - Rust Compiler
- Docker & Docker Compose
- Python 3.x
- pyenv
- pipenv
Make sure pipenv
finds pyenv
# Install correct python version and create environment
pipenv install -d
# Change to environment
pipenv shell
# Start the database
docker-compose up
# Run migrations
MACPEPDB_DB_URL=postgresql://postgres:[email protected]:5433/macpepdb_dev pipenv run db-migrate
Use pipenv
to install or uninstall Python modules
TEST_MACPEPDB_URL=postgresql://postgres:[email protected]:5433/macpepdb_dev pipenv run tests
Run python -m macpepdb --help
in the root-folder of the repository.
Than update pip with pip install --upgrade pip
and run pip install -e git+https://github.com/mpc-bioinformatics/macpepdb.git@<MACPEPDB_GIT_TAG>#egg=MaCPepDB
to install MaCPepDB.
Then you can use MacPepDB by running python -m macpepdb
.
Appending --help
shows the available command line parameter.
To create a Docker image use: docker build --tag macpepdb-py .
. You can use the image to start a container with
docker run -it --rm macpepdb-py --help
.
To access your files in the container mount your files to /usr/src/macpepdb/data
with -v YOUR_DATA_FOLDER:/usr/src/macpepdb/data
(add it before the macpepdb-py
). Keep in mind your working in a container, so all file paths are within the container.
If you intend to create a protein/peptide database and your Postgresql server is running in a Docker container too, make sure both, the Postgresql server and the MacPepDB container have access to the same Docker network by adding --network=YOUR_DOCKER_NETWORK
(before the ´macpepdb-py´).
- Follow the Citus documentation to setup a Citus cluster.
- Run
psql -h <CITUS_CONTROLLER> -U <DB_USER> -c "ALTER DATABASE <DB_NAME> SET citus.multi_shard_modify_mode = 'sequential';"
andpsql -h <CITUS_CONTROLLER> -U <DB_USER> -c "ALTER DATABASE <DB_NAME> SET citus.shard_count = 100;"
to configure the database - Run
MACPEPDB_DB_URL=postgresql://<USER>:<PASSWORD>@<HOST>:<PORT>/<DATABASE> alembic upgrade head
, if you use the docker container, run the command in a temporary container:docker run --rm -it macpepdb sh
First create a work folder with the following structure:
|_ work_dir
|__ protein_data
|__ taxonomy_data
|__ logs
Place your protein data files as .dat
- or .txt
-files, containing the proteins in UniProt-text-format, in the protein_data
-folder.
If you like to use the web interface as well, download the taxdump.zip
from NCBI and put the contained .dmp
-files in the taxonomy_data
-folder.
Than start the database maintenance job with python -m macpepdb database ...
. Run python -m macpepdb database --help
to see the required arguments. Remember to use the container internal paths when using a docker container.
Create a new config file with the default config
python -m macpepdb web write-config-file <PATH_TO_CONFIG_YAML>
Adjust the YAML file to your needs. Than start the WebAPI with
python -m macpepdb web serve -e production -c <PATH_TO_CONFIG_YAML>
For high availability in production use start multiple WebAPI and combine them with NginX (have a look in nginx.example.conf
)
Due to changes of the database schema and the database engine, version 2.x is not compatible with version 1.x. You have to recreate the database.
- MaCPepDB: A Database to Quickly Access All Tryptic Peptides of the UniProtKB
Julian Uszkoreit, Dirk Winkelhardt, Katalin Barkovits, Maximilian Wulf, Sascha Roocke, Katrin Marcus, and Martin Eisenacher
Journal of Proteome Research 2021 20 (4), 2145-2150
DOI: 10.1021/acs.jproteome.0c00967