Skip to content

Installation Instructions

Florian Zwagemaker edited this page Jun 2, 2020 · 7 revisions

Compatibility

Jovian is developed on a Red Hat Enterprise Linux (RHEL) grid and new releases are also tested in CentOS and Ubuntu Dockers. We expect Jovian to work on other Linux distro's, but we cannot guarantee stability. Incidentally, it also works on Windows Subsystem for Linux but this functionality is currently not guaranteed.


Jovian installation

  • Download jovian using git clone https://github.com/DennisSchmitz/Jovian.git name_of_target_folder. Navigate to this newly made folder cd name_of_target_folder.

Database installation

Jovian requires the NCBI BLAST NT, TaxDB, New_Taxdump, Krona_DB and Virus-Host interaction DB. They can be installed with the command bash jovian --install-databases or bash jovian -id. You will get asked the following questions:

  1. Do you want to install all databases in the same base location, type single. If you want to manually specify the locations of these databases (e.g. you've already downloaded (some of) them), type individual.
  • If single was chosen, you'll get asked the base path of these databases. E.g. /mnt/database/. All databases will then be downloaded below the /mnt/database/ directory.
  • If individual was chosen, you'll get asked the path of each of the required databases. You can then enter locations of previously downloaded local databases or specify paths of the newly installed databases.

Download Human genome

Jovian requires a human genome reference to remove patient (privacy-sensitive) data. Although, as explained here you can also chose another genome for filtering. However, by default, Jovian is intended for human clinical samples.

  • Download the latest Human Genome version from https://support.illumina.com/sequencing/sequencing_software/igenome.html
    • Select the NCBI version of GRCh38. NB do NOT download the GRCh38Decoy version! This version will filter out certain human viruses.
  • The GRCh38 version of the human genome still contains an Epstein Barr virus (EBV) contig, this needs to be removed as shown below:
    • Navigate to NCBI/GRCh38/Sequence/Bowtie2Index/ in the newly downloaded Human Genome.
    • Remove the EBV contig via awk '{print >out}; />chrEBV/{out="EBV.fa"}' out=temp.fa genome.fa; head -n -1 temp.fa > nonEBV.fa (source).
    • Remove EBV.fa and replace genome.fa with nonEBV.fa via rm EBV.fa; mv nonEBV.fa genome.fa
    • Activate the Jovian_helper environment via source activate Jovian_helper and index the updated genome.fa file via bowtie2-build --threads 10 genome.fa genome.fa.

Updating databases

For more information about (periodic automated) database updates see here.

Jovian configuration and dependency installation

  • Install all Jovian dependencies via command bash jovian --install-dependencies or bash jovian -ic. You will get the questions listed below. Update the paths as you specified above.

You will only have to answer these questions once. The answers will be stored in json format in ~/.jovian_installchoice_db and ~/.jovian_installchoice_compmode. Please note that the prompts allow tab-completion of paths.

  1. Please specify the location where your Background Reference is installed:
    This is the path you've chosen here, e.g.: /mnt/db/Reference_genomes/Homo_sapiens/NCBI/GRCh38/Sequence/Bowtie2Index_without_EBV_virus_chr/genome.fa
  2. Please specify the location where the Krona Taxonomy database is installed:
    This is the path you've chosen here, e.g.: /mnt/db/taxonomy_krona/
  3. Please specify the location where the MGKit Taxonomy database is installed:
    This is the path you've chosen here, e.g.: /mnt/db/taxdb/
  4. Please specify the location of the 'virushostdb.tsv' file:
    This is the path you've chosen here, e.g.: /mnt/db/Virus-Host_interaction_DB/virushostdb.tsv
  5. Please specify the location of the new_taxdump 'rankedlineage.dmp.delim' file:
    This is the path you've chosen here, e.g.: /mnt/db/new_taxdump/rankedlineage.dmp.delim N.B. it's the .dmp.delim file, NOT the .dmp file.
  6. Please specify the location of the new_taxdump 'host.dmp.delim' file:
    This is the path you've chosen here, e.g.: /mnt/db/new_taxdump/host.dmp.delim N.B. it's the .dmp.delim file, NOT the .dmp file.
  7. You'll get an overview of your choices, type yes if paths are correct, no if you see a mistake.

The next two questions are about the computing-mode. If you are working on your own computer/laptop you are working standalone, if you are working on a high-performance compute (HPC) or grid-computer (Grid) please ask the administrator of this HPC/Grid to provide you with the required queue name.

  1. Jovian can run in two computing-modes, 'standalone' and 'HPC/Grid'. Please specify the computing-mode that you wish to use for Jovian. Do you wish to run Jovian in 'standalone' or 'grid' mode? [standalone/grid]
    Example answer: grid
  2. If you selected grid above, you'll get asked the queue name: Please specify the name of the Queue that your local grid/HPC cluster uses. Please enter exclusively the name of the queue.
    Example answer: bio
  • All Jovian dependencies will now be automatically installed. Depending on your technical setup (i.e. file system, network connection speed and I/O speed) installation can take anywhere from 5 minutes (lustre filesystem on enterprise SSD's) up to 2 hours (nfs filesystem on HDD's).

Configure Jupyter Notebook

Jovian uses Jupyter notebook to generate the interactive reports. You can automatically configure it via the bash jovian --configure-jupyter command. If it asks about overwriting default configurations, reply y.

Start a Jupyter Notebook server process

In order for the Jupyter report to function, you must setup a Jupyter notebook server process. Open a separate terminal and start it via bash jovian --start-jupyter. Copy and paste the reported web address into your browser and you can now access the Jovian portal and, once an analysis is finished, the Jovian report.

N.B. If you are working on a remote or grid computer you need to ask the system-admins to configure the network as to allow access to the Jupyter notebook.


Updating databases

Once the database installation is finished, you will be provided with a script called database-updater.sh, you can find this script in your home directory (~/). This script will update all databases that were installed during the earlier database installation process. We strongly advise you to set up a cronjob which runs this script on a weekly basis in order to keep all databases up-to-date. Ask your system-admin to do this for you since this requires sudo privileges.

Manually updating the databases

While we recommend to use the provided script for updating the databases as described above, it is possible to update the various databases manually.
It is important to use the Jovian_helper environment while updating the databases, you can activate this environment with source activate Jovian_helper or conda activate Jovian_helper.

  • Updating the BLAST databases (NT, NR and Taxdb)
    • NT: cd [NT_database_location]; perl ${CONDA_PREFIX}/bin/update_blastdb.pl --decompress nt
    • Taxdb: cd [Taxdb_database_location]; perl ${CONDA_PREFIX}/bin/update_blastdb.pl --decompress taxdb
  • Updating the NCBI new_taxdump database
    • run the commands in the following snippet:
    cd [New_taxdump_database_location]
    
    curl -o new_taxdump.tar.gz -L https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/new_taxdump/new_taxdump.tar.gz
    
    curl -o new_taxdump.tar.gz.md5 -L https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/new_taxdump/new_taxdump.tar.gz.md5
    
    tar -xzf new_taxdump.tar.gz
    
    for file in *.dmp; do awk '{gsub("\t",""); if(substr($0,length($0),length($0))=="|") print substr($0,0,length($0)-1); else print $0}' < ${file} > ${file}.delim; done
  • Updating the KRONA taxonomy and accessions databases
    • Taxonomy: cd [Krona_database_location]; bash "${CONDA_PREFIX}"/opt/krona/updateTaxonomy.sh ./
    • Accessions: cd [Krona_database_location]; bash "${CONDA_PREFIX}"/opt/krona/updateAccessions.sh ./
  • Updating the Virus-Host_interaction database
    • cd [Virus-Host_database_location]; curl -o virushostdb.tsv -L ftp://ftp.genome.jp/pub/db/virushostdb/virushostdb.tsv