This project is not completed.
Anu, a machine learning (ML) model to predict protein-protien interaction. Anu is a framework to test and benchmark ML models for prediction protein-protein interactions. It automates data retrieval, feature engineering and model evaluation.
- git
- python 3.7 or above
- python virtual environment
git clone https://github.com/ankitskvmdam/anu.gitpython -m venv venv # create python environment
. ./venv/bin/activate # activate python enviromentcurl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | pythonor
pipx install poetryFor more information about poetry visit poetry docs
pip install noxRun tests, lint check, type check, doc tests, coverage
noxFor more information visit nox tutorial
In order to use this tool. First few steps are similar to developing step.
git clone https://github.com/ankitskvmdam/anu.gitcurl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | pythonor
pipx install poetryNow you have to run the following command
# First move to the directory
cd anu
# Installing anu
poetry install- Pickle - Interacting protein database
- Negatome - Non-interacting protein database
Currently there is no way to specify anu to download only one databases. This feature will be implemented in future release.
# Download both databases
anu data fetch databases
# For help/more information
anu data fetch databases --help- Pickle dataset dataframe (vaex dataframes)
- Negatome dataset dataframe (vaex dataframes)
Currently there is no way to specify anu to make individual dataframes. This feature will be implemented in future release.
# Prepare pickle and negatome dataframe
anu data prepare dataframes
# For help/more information
anu data prepare dataframes -- helpNow we have to fetch the PDB file.
Since there are almost 30,000 proteins in pickle database and around 10,000 in negative database. It is hard to fetch them all at once. The fetching process is resumable. And for testing only 300 to 400 files for each dataset is enough. So once you have downloaded enough file you can press ctrl+c to exit.
# For help/more information
anu data fetch pdb --help
# Fetch pdb files for protein present in pickle dataset
anu data fetch pdb -p
# or
anu data fetch pdb --pickle
# Fetch pdb file for protein present in negatome dataset
anu data fetch pdb -n
# or
anu data fetch pdb --negatome
# Fetch pdb file from both data set
anu data fetch pdbIf the pdb file is already downloaded it will not be downloaded again. Downloading of pdb files is sync between both datasets.
This is also a time taking process.
# For help/more information
anu data prepare inputs --help
# Prepare interacting protein dataframe
anu data prepare inputs -i
# or
anu data prepare inputs --interacting
# Prepare non interacting protein dataframe
anu data prepare inputs -n
# or
anu data prepare inputs --non-interacting
# Prepare both input dataframes
anu data prepare inputsCurrently cnn model is only available.
anu train cnnBefore prediction you have to train the model.
# For help/more information
anu predict protein --help
# given pdb id as input
anu predict protein -p "1gzx" "4hh3"
# give uniprot id as input
anu predict protein -u "F4JRB0" "Q8RX29"
# give path as input
anu predict protein "path/to/protein/a.pdb" "path/to/protein/b.pdb"