Skip to content

This repository holds the code for experiments and training scripts used in PredKinKG framework.

License

Notifications You must be signed in to change notification settings

udel-cbcb/ikg_v2_public

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Getting started

Requirements

  1. Docker - 20.10.10
  2. Determined AI - 0.17.2
  3. Polyxon CE - 1.11.2
  4. RAM - 64 GB
  5. CPU - 16 cores
  6. Nvidia GPU with support for atleast CUDA 10.0 and atleast 16 GB memory (RTX A4000)

Folder Structure

src/data Source code of the scripts used to generate data for experiments.
src/models Source code of the unsupervised models described in the study.
src/experiments Source code of the supervised models described in the study.
configs Determined configuration files for running experiments on the determined cluster.
polyaxon_configs Polyaxon configuration files for running experiments on the k8s cluster.
main.py Entry point script used to run scripts locally.

Getting started.

  1. Download and extract the data needed to run experiments
# download the data
wget https://research.bioinformatics.udel.edu/iptmnet_data/downloads/ikg_v2_data.tar.gz

# extract to /data/ml_data/ikg_v2_data
tar -xf ikg_v2_data.tar.gz r -C /data/ml_data/ikg_v2_data
  1. Before starting the experiments, you need to build the docker containers.
# change to the docker directory
cd docker

# build the docker container
bash build.sh
  1. Start the docker container and open an interactive shell into it.
# start the container
docker-compose up -d

# start an interactive shell into the container
docker exec -it ikg-dev /bin/bash
  1. Generate an embedding using triple walk skip gram algorithm
det experiment create configs/triples_walk_embedder_const_mul_seeds.yaml .
  1. Once the training is complete, retrieve the embeddings from checkpoint folder. The embeddings will be in the form protein_head_embeddings_{fold_number}.csv and protein_tail_embeddings_{fold_number}.csv.

  2. To generate predictions use the polyaxon configs to run the prediction tasks on the kubernetes cluster.

# run the prediction task on the kubernetes cluster
polyaxon run -p ikg_v2 -f ./polyaxon_configs/make_predictions.yml -u

# run the prediction task locally
export POLYAXON_NO_OP=1
python src/main.py --c "make_predictions"

# After the task completes, check for a file named `predicted_edges.csv` in the root folder of this project.

Data and predictions.

  1. Data : https://research.bioinformatics.udel.edu/iptmnet_data/downloads/ikg_v2_data.tar.gz
  2. Predictions: https://github.com/udel-cbcb/ikg_v2_public/releases/tag/1.0.0

About

This repository holds the code for experiments and training scripts used in PredKinKG framework.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages