Skip to content

Frequently Asked Questions

Florian Zwagemaker edited this page Jun 16, 2020 · 7 revisions

I get an error saying the directory is locked, what should I do?

Probably an earlier analysis crashed and/or was cancelled by the user while the pipeline was still running. You can unlock the directory by typing bash jovian --unlock.


Why are there multiple lines per taxid in the host/disease information table?

In the Virus-Host interaction database there are sometimes multiple entries for a single taxid, meaning, there are multiple known hosts. Therefore, we follow this formatting and print the different hosts on different lines.


Why doesn't the virus typing-tool accept my query?

Please see this and this issue. The short answer; they were made for Sanger sequences and are not yet able to to handle NGS datasets. This is a work-in-progress.


I am missing a certain taxa of which I'm sure is in the dataset. How is that possible?

Could be due to multiple reasons. The first one being the stringency of the analysis: The current default values are quite strict, you might have filtered it away. Please try more relaxed settings, #TODO. The second being the result of the LCA analysis (Lowest Common Ancestor) putting a certain scaffold at a unexpected taxonomic level. Imagine a sequence that is homologous between (pro)phages and bacteria, the lowest common ancestor between phages and bacteria is the theoretical root of all life (i.e. root taxonomic level), so you will find it at the taxonomic level (you can try changing the bitscoreDeltaLCA: 5 to 0 in the config-file. Or you can try using the other LCA option via #TODO. It could also be a result of an erroneous entry in the used public databases, to which a scaffold then gets assigned. If it is anything other than these reasons, please let us know by making an issue.


I don't care about removing the human data, I have samples that are from other species, can I also automatically remove that?

Yes. Although we focus on patient-privacy since it was developed for clinical samples, you can enter any reference sequence you like. You can do that by changing background_ref: /path/to/file/genome.fa into the location of your desired background removal organisms in the config-file. The only limitations are that it is a fasta file and that is indexed via bowtie2, although this latter process will be automated in a future version.


How can some scaffolds still be assigned to Homo sapiens? I thought Jovian removed human data?

The human genome is a consensus genome built from many individuals from around the globe. It does not capture all diversity in the human gene pool and therefore cannot completely remove all human data. You can improve this by selecting a reference genome that is closer to your target population, e.g. if you sequence mainly Dutch samples, the GoNL genome might be better suited.


Why does installing the pipeline take so long?

See answer below.

Why do you have to install the software for every analysis?

The answer to both these questions is the same; it is a consequence of making the analysis replicate-able and reproducible. Briefly, for the different analysis steps in the pipeline, disparate virtual environments are created and they take some time to build. Since these virtual environments are created using hard-coded recipes, we know which software was used, and users can easily revert to this environment using the Jovian Git hash (unique methodological fingerprint). Thereby allowing users to replicate your own (old) results or allowing other groups to reproduce your results (if you share the raw data with them). This also allows us developers to easily validate analyses, track and fix bugs and compare results between versions.

Therefore, we recommend installing Jovian in a fixed location and only specifying different input directories. Once an analysis is finished, you can archive the results using bash jovian --archive and transfer them to back-up systems for long-term storage.


In the scaffold viewer and the minority variant table, the DoC values of forward and reverse orientated reads do not add up to the value reported in the "Total_depth_of_coverage" column.

Correct, please see this link for an explanation.


About the versions of Jovian

Jovian versions are released as x.y.z

  • x will be the major version number. This number will be only involved in really significant changes to the project. Normally, a change in the major version number also means there might be a conflict in backwards compatibility. This is something we intend to avoid. Though we cannot guarantee it, we will try to make future versions as compatible as possible with earlier releases.
  • y will be the minor version number. This number will be used for our normal release cycle of "anything that is larger than a simple patch". This number encompasses everything between larger-than-usual patches and large changes in functionalities of Jovian. Most, if not all, of our current plans will be released under this number.
    This number ranges from 0 to 99.
  • z will be the build/patch number. This number will be used for quick bugfixes and small changes that do not have a significant impact on the actual analysis workflow(s). In some cases we might append this number with an alphabetical letter for sub-versioning. As an example: 1.0.01a
    This number may range from 0 to 999.