-
Notifications
You must be signed in to change notification settings - Fork 7
Installation Instructions
Jovian is developed on a Red Hat Enterprise Linux (RHEL) grid and new releases are also tested in CentOS and Ubuntu Dockers. We expect Jovian to work on other Linux distro's, but we cannot guarantee stability. Incidentally, it also works on Windows Subsystem for Linux but this functionality is currently not guaranteed.
- Download
jovian
usinggit clone https://github.com/DennisSchmitz/Jovian.git name_of_target_folder
. Navigate to this newly made foldercd name_of_target_folder
.
Jovian
requires the NCBI BLAST NT
, TaxDB
, New_Taxdump
, Krona_DB
and Virus-Host interaction DB
. They can be installed with the command bash jovian --install-databases
or bash jovian -id
. You will get asked the following questions:
- Do you want to install all databases in the same base location, type
single
. If you want to manually specify the locations of these databases (e.g. you've already downloaded (some of) them), typeindividual
.
- If
single
was chosen, you'll get asked the base path of these databases. E.g./mnt/database/
. All databases will then be downloaded below the/mnt/database/
directory. - If
individual
was chosen, you'll get asked the path of each of the required databases. You can then enter locations of previously downloaded local databases or specify paths of the newly installed databases.
Jovian requires a human genome reference to remove patient (privacy-sensitive) data. Although, as explained here you can also chose another genome for filtering. However, by default, Jovian
is intended for human clinical samples.
- Download the latest Human Genome version from https://support.illumina.com/sequencing/sequencing_software/igenome.html
- Select the NCBI version of
GRCh38
. NB do NOT download theGRCh38Decoy
version! This version will filter out certain human viruses.
- Select the NCBI version of
- The
GRCh38
version of the human genome still contains an Epstein Barr virus (EBV) contig, this needs to be removed as shown below:- Navigate to
NCBI/GRCh38/Sequence/Bowtie2Index/
in the newly downloaded Human Genome. - Remove the EBV contig via
awk '{print >out}; />chrEBV/{out="EBV.fa"}' out=temp.fa genome.fa; head -n -1 temp.fa > nonEBV.fa
(source). - Remove
EBV.fa
and replacegenome.fa
withnonEBV.fa
viarm EBV.fa; mv nonEBV.fa genome.fa
- Activate the
Jovian_helper
environment viasource activate Jovian_helper
and index the updatedgenome.fa
file viabowtie2-build --threads 10 genome.fa genome.fa
.
- Navigate to
For more information about (periodic automated) database updates see here.
- Install all
Jovian
dependencies via commandbash jovian --install-dependencies
orbash jovian -ic
. You will get the questions listed below. Update the paths as you specified above.
You will only have to answer these questions once. The answers will be stored in json
format in ~/.jovian_installchoice_db
and ~/.jovian_installchoice_compmode
. Please note that the prompts allow tab-completion of paths.
-
Please specify the location where your Background Reference is installed:
This is the path you've chosen here, e.g.:/mnt/db/Reference_genomes/Homo_sapiens/NCBI/GRCh38/Sequence/Bowtie2Index_without_EBV_virus_chr/genome.fa
-
Please specify the location where the Krona Taxonomy database is installed:
This is the path you've chosen here, e.g.:/mnt/db/taxonomy_krona/
-
Please specify the location where the MGKit Taxonomy database is installed:
This is the path you've chosen here, e.g.:/mnt/db/taxdb/
-
Please specify the location of the 'virushostdb.tsv' file:
This is the path you've chosen here, e.g.:/mnt/db/Virus-Host_interaction_DB/virushostdb.tsv
-
Please specify the location of the new_taxdump 'rankedlineage.dmp.delim' file:
This is the path you've chosen here, e.g.:/mnt/db/new_taxdump/rankedlineage.dmp.delim
N.B. it's the.dmp.delim
file, NOT the.dmp
file. -
Please specify the location of the new_taxdump 'host.dmp.delim' file:
This is the path you've chosen here, e.g.:/mnt/db/new_taxdump/host.dmp.delim
N.B. it's the.dmp.delim
file, NOT the.dmp
file. - You'll get an overview of your choices, type
yes
if paths are correct,no
if you see a mistake.
The next two questions are about the computing-mode. If you are working on your own computer/laptop you are working standalone
, if you are working on a high-performance compute (HPC
) or grid-computer (Grid
) please ask the administrator of this HPC/Grid to provide you with the required queue name.
-
Jovian can run in two computing-modes, 'standalone' and 'HPC/Grid'. Please specify the computing-mode that you wish to use for Jovian. Do you wish to run Jovian in 'standalone' or 'grid' mode? [standalone/grid]
Example answer:grid
- If you selected
grid
above, you'll get asked the queue name:Please specify the name of the Queue that your local grid/HPC cluster uses. Please enter exclusively the name of the queue.
Example answer:bio
- All
Jovian
dependencies will now be automatically installed. Depending on your technical setup (i.e. file system, network connection speed and I/O speed) installation can take anywhere from 5 minutes (lustre
filesystem on enterprise SSD's) up to 2 hours (nfs
filesystem on HDD's).
Jovian
uses Jupyter notebook
to generate the interactive reports. You can automatically configure it via the bash jovian --configure-jupyter
command. If it asks about overwriting default configurations, reply y
.
In order for the Jupyter report to function, you must setup a Jupyter notebook
server process. Open a separate terminal and start it via bash jovian --start-jupyter
. Copy and paste the reported web address into your browser and you can now access the Jovian portal and, once an analysis is finished, the Jovian report.
N.B. If you are working on a remote or grid computer you need to ask the system-admins to configure the network as to allow access to the Jupyter notebook
.
Once the database installation is finished, you will be provided with a script called database-updater.sh
, you can find this script in your home directory (~/
). This script will update all databases that were installed during the earlier database installation process. We strongly advise you to set up a cronjob which runs this script on a weekly basis in order to keep all databases up-to-date. Ask your system-admin to do this for you since this requires sudo
privileges.
While we recommend to use the provided script for updating the databases as described above, it is possible to update the various databases manually.
It is important to use the Jovian_helper
environment while updating the databases, you can activate this environment with source activate Jovian_helper
or conda activate Jovian_helper
.
- Updating the BLAST databases (NT, NR and Taxdb)
- NT:
cd [NT_database_location]; perl ${CONDA_PREFIX}/bin/update_blastdb.pl --decompress nt
- Taxdb:
cd [Taxdb_database_location]; perl ${CONDA_PREFIX}/bin/update_blastdb.pl --decompress taxdb
- NT:
- Updating the NCBI new_taxdump database
- run the commands in the following snippet:
cd [New_taxdump_database_location] curl -o new_taxdump.tar.gz -L https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/new_taxdump/new_taxdump.tar.gz curl -o new_taxdump.tar.gz.md5 -L https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/new_taxdump/new_taxdump.tar.gz.md5 tar -xzf new_taxdump.tar.gz for file in *.dmp; do awk '{gsub("\t",""); if(substr($0,length($0),length($0))=="|") print substr($0,0,length($0)-1); else print $0}' < ${file} > ${file}.delim; done
- Updating the KRONA taxonomy and accessions databases
- Taxonomy:
cd [Krona_database_location]; bash "${CONDA_PREFIX}"/opt/krona/updateTaxonomy.sh ./
- Accessions:
cd [Krona_database_location]; bash "${CONDA_PREFIX}"/opt/krona/updateAccessions.sh ./
- Taxonomy:
- Updating the Virus-Host_interaction database
cd [Virus-Host_database_location]; curl -o virushostdb.tsv -L ftp://ftp.genome.jp/pub/db/virushostdb/virushostdb.tsv
Jovian is available on GitHub under a AGPL license. The virus-typing tools are public services hosted by the RIVM and developed independently of Jovian.