Skip to content

Fail to assembly de-novo genome #2

@Dv1t

Description

@Dv1t

Hello VStrains team, thank you for developing such a great tool, but while using it, I faced the following problem.
I attempted to assemble the complete HIV genome from this sample: SRR29407826. I used corona-spades and it worked fine.
However VStrains crashes when assembling it.
First, there was an error related to rev_dict in VStrains_PE_Inference.py. It doesn't had lowercase nucleotides in it and therefore raised KeyError. I fixed it replacing:
rev_dict = {"A": "T", "T": "A", "C": "G", "G": "C"}
with this:
rev_dict = { "A": "T", "T": "A", "C": "G", "G": "C", "a": "t", "t": "a", "c": "g", "g": "c" }
But new issue occurred, after messages in CLI log:

----------------------Paired-End Information Alignment----------------------
Start aligning reads to gfa nodes
Number of processed reads: 0

It freezes forever and do not proceed any further.

Worth mentioning details
In the same log there is a suspicious message:

INFO - graph kmer size: 0

Also VStrains can't read assembly_graph_after_simplification.gfa file (which is the output of spades) without changing its version in header from 1.2 to 1.0 manually.

Steps to reproduce

  1. Assembly with spades:
spades.py --corona -1 SRR29407826_1.fastq -2 SRR29407826_2.fastq -o spades_G_SRR29407826
  1. Start VStrains:
vstrains -a spades -g spades_G_SRR29407826/assembly_graph_after_simplification.gfa \
-p spades_G_SRR29407826/contigs.paths \
-o vstrains_G -fwd SRR29407826_1.fastq -rve SRR29407826_2.fastq

Files with reads:.
SRR29407826.zip
VStrains log:
vstrains.log
Spades log:
spades.log

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions