assemble_seq_idba

Biopiece: assemble_seq_idba

Description

assemble_seq_idba assembles sequence in the stream using IDBA-UD and outputs the contig sequences.

An assembly directory must be specified, and assemble_seq_idba leaves the original assembly files in this directory.

Consult the IDBA-UD documentation for more information.

IDBA-UD must be installed in order for assemble_seq_idba to work.

Usage

... | assemble_seq_idba [options] <dir>

Options

[-?          | --help]                #  Print full usage description.
[-d <dir>    | --dir=<dir>]           #  Assembly directory.
[-k <uint>   | --kmer_min=<uint>]     #  Minimum k-mer value                                         -  Default=20
[-K <uint>   | --kmer_max=<uint>]     #  Maximum k-mer value                                         -  Default=100
[-c <uint>   | --count_min=<uint>]    #  Filtering threshold for each k-mer                          -  Default=2
[-p <uint>   | --pairs_min=<uint>]    #  Minimum number of pair-end connections to join two contigs  -  Default=3
[-P <uint>   | --prefix_len=<uint>]   #  Length of the prefix of k-mer used to split k-mer table     -  Default=3
[-C <uint>   | --cpus=<uint>]         #  Number of CPUs                                              -  Default=0 (all)
[-X          | --clean]               #  Remove directory upon completed assembly.
[-I <file!>  | --stream_in=<file!>]   #  Read input from stream file                                 -  Default=STDIN
[-O <file>   | --stream_out=<file>]   #  Write output to stream file                                 -  Default=STDOUT
[-v          | --verbose]             #  Verbose output.

Examples

In the below example illustrates a de-novo assembly of a Lactococcus lactus strain. The sequences are read with read_fastq before being piped to assemble_seq_idba. Following the assembly, the contigs are written to file in FASTA format with write_fasta and finally the contig sequences are analyzed with analyze_assembly:

read_fastq -i Lactococcus_NCDO0505.fq |
trim_seq |
assemble_seq_idba -d IDBA -v |
write_fasta -o Lactococcus_NCDO0505.contigs |
analyze_assembly -x

N50: 5296
MAX: 35366
MIN: 50
MEAN: 533
TOTAL: 2833428
COUNT: 5308
---

Note that verbose output from assemble_seq_idba is enabled with the -v switch.

Author

[email protected]

August 2012

License

GNU General Public License version 2

http://www.gnu.org/copyleft/gpl.html

Help

assemble_seq_idba is part of the Biopieces framework.

http://www.biopieces.org

Provide feedback

Saved searches

Use saved searches to filter your results more quickly