Introduction

The addgene-bioinformatics repository demonstrates a two step workflow for assembling plasmid next generation sequencing data using Toil.

Set-up

Install Docker Desktop, and start the Docker daemon

Download the Docker Desktop package, then install as usual. The Docker daemon should be started by default.

Docker will need to mount the directory Toil creates for a job. When using Docker Desktop on macOS, this mount is accomplished in "Preferences > Resources > File Sharing" by adding "/var/folders" to the list.

Additionally, Docker will need to be allocated enough resources to run a job. This can be done in "Preferences > Resources > Advanced" by setting (minimum) CPUs to 2 and memory to at least 8.0 GB

Clone the repository:

$ git clone [email protected]:addgene/addgene-bioinformatics.git

Create a virtual environment (using a Python 3 version), optionally update python dependencies, and install the requirements:

Update python dependencies (optional)

pip install pip-tools
pip-compile -U

Install the requirements

$ mkvirtualenv addgene-bioinformatics
$ pip install -r requirements.txt

Build Required Docker Images:

$ cd containers
$ ./build.sh

Demonstration

Run the tests:

$ cd src/python/jobs
$ python JobsTest.py

Run one of the jobs with default inputs locally:

$ python src/python/jobs/SpadesJob.py sjfs
$ python src/python/jobs/ApcJob.py ajfs
$ python src/python/jobs/WellAssemblyJob.py wajfs
$ python src/python/jobs/PlateAssemblyJob.py pajfs

Run a well or sample plate assembly job locally with data imported from S3:

$ python src/python/jobs/WellAssemblyJob.py  -s s3 -d addgene-sequencing-data/2018/FASTQ -l A11935X_sW0148 -w A01 wajfs
$ python src/python/jobs/PlateAssemblyJob.py -s s3 -d addgene-sequencing-data/2018/FASTQ -l A11935X_sW0148 pajfs

Run one of the jobs on EC2 (experimental)

The following assumes the instructions for preparing your AWS environment have been completed.

Launch the cluster leader:

$ toil launch-cluster --zone us-east-1a --keyPairName id_rsa --leaderNodeType t2.medium assembly-cluster

Synchronize code and data:

$ src/sh/make-archives-for-leader.sh
$ toil rsync-cluster --zone us-east-1a assembly-cluster python.tar.gz :/root
$ toil rsync-cluster --zone us-east-1a assembly-cluster miscellaneous.tar.gz :/root

Login to the cluster leader, and extract the archives:

$ toil ssh-cluster --zone us-east-1a assembly-cluster
# cd
# tar -zxvf python.tar.gz
# tar -zxvf miscellaneous.tar.gz

Login to the cluster leader, run the default plate assembly job on the cluster leader only with a local or S3 file store:

$ toil ssh-cluster --zone us-east-1a assembly-cluster
# cd
# python PlateAssemblyJob.py --data-path miscellaneous --plate-spec A11967B_sW0154 pajfs
# python PlateAssemblyJob.py --data-path miscellaneous --plate-spec A11967B_sW0154 aws:us-east-1:pajfs

Login to the cluster leader, run a well, or sample plate assembly job on the cluster leader only with data imported from S3:

$ toil ssh-cluster --zone us-east-1a assembly-cluster
# cd
# python WellAssemblyJob.py  -s s3 -d addgene-sequencing-data/2018/FASTQ -l A11935X_sW0148 -w A01 wajfs
# python PlateAssemblyJob.py -s s3 -d addgene-sequencing-data/2018/FASTQ -l A11935X_sW0148 pajfs

Login to the cluster leader, run the default or a larger plate assembly job using auto-scaling with an S3 file store:

$ toil ssh-cluster --zone us-east-1a assembly-cluster
# cd
# python PlateAssemblyJob.py --data-path miscellaneous --plate-spec A11967B_sW0154 --provisioner aws --nodeTypes c3.large --maxNodes 2 --batchSystem mesos aws:us-east-1:pajfs
# python PlateAssemblyJob.py --data-path miscellaneous --plate-spec A11967A_sW0154 --provisioner aws --nodeTypes c3.large --maxNodes 2 --batchSystem mesos aws:us-east-1:pajfs

Destroy the cluster leader:

$ toil destroy-cluster --zone us-east-1a assembly-cluster

Name		Name	Last commit message	Last commit date
Latest commit History 655 Commits
containers		containers
dat		dat
resources		resources
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
requirements.in		requirements.in
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Set-up

Install Docker Desktop, and start the Docker daemon

Clone the repository:

Create a virtual environment (using a Python 3 version), optionally update python dependencies, and install the requirements:

Update python dependencies (optional)

Install the requirements

Build Required Docker Images:

Demonstration

Run the tests:

Run one of the jobs with default inputs locally:

Run a well or sample plate assembly job locally with data imported from S3:

Run one of the jobs on EC2 (experimental)

Launch the cluster leader:

Synchronize code and data:

Login to the cluster leader, and extract the archives:

Login to the cluster leader, run the default plate assembly job on the cluster leader only with a local or S3 file store:

Login to the cluster leader, run a well, or sample plate assembly job on the cluster leader only with data imported from S3:

Login to the cluster leader, run the default or a larger plate assembly job using auto-scaling with an S3 file store:

Destroy the cluster leader:

About

Releases

Packages

Contributors 6

Languages

License

addgene/addgene-bioinformatics

Folders and files

Latest commit

History

Repository files navigation

Introduction

Set-up

Install Docker Desktop, and start the Docker daemon

Clone the repository:

Create a virtual environment (using a Python 3 version), optionally update python dependencies, and install the requirements:

Update python dependencies (optional)

Install the requirements

Build Required Docker Images:

Demonstration

Run the tests:

Run one of the jobs with default inputs locally:

Run a well or sample plate assembly job locally with data imported from S3:

Run one of the jobs on EC2 (experimental)

Launch the cluster leader:

Synchronize code and data:

Login to the cluster leader, and extract the archives:

Login to the cluster leader, run the default plate assembly job on the cluster leader only with a local or S3 file store:

Login to the cluster leader, run a well, or sample plate assembly job on the cluster leader only with data imported from S3:

Login to the cluster leader, run the default or a larger plate assembly job using auto-scaling with an S3 file store:

Destroy the cluster leader:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages