Skip to content
Martin Asser Hansen edited this page Oct 2, 2015 · 5 revisions

Biopiece: read_fasta

Description

read_fasta read in sequence entries from FASTA files. Each sequence entry consists of a sequence name prefixed by a '>' followed by the sequence name on a line of its own, followed by one or my lines of sequence until the next entry or the end of the file. The resulting biopiece record consists of the following record type:

SEQ_NAME: test
SEQ_LEN: 10
SEQ: ATCGATCGAC
---

Input files may be compressed with gzip og bzip2.

For more about the FASTA format:

http://en.wikipedia.org/wiki/Fasta_format

Usage

read_fasta [options] -i <FASTA file(s)>

Options

[-?          | --help]               #  Print full usage description.
[-i <files!> | --data_in=<files!>]   #  Comma separated list of files or glob expression to read.
[-n <uint>   | --num=<uint>]         #  Limit number of records to read.
[-I <file>   | --stream_in=<file!>]  #  Read input stream from file  -  Default=STDIN
[-O <file>   | --stream_out=<file>]  #  Write output stream to file  -  Default=STDOUT
[-v          | --verbose]            #  Verbose output.

Examples

To read all FASTA entries from a file:

read_fasta -i test.fna

To read in only 10 records from a FASTA file:

read_fasta -n 10 -i test.fna

To read all FASTA entries from multiple files:

read_fasta -i test1.fna,test2.fna

To read FASTA entries from multiple files using a glob expression:

read_fasta -i '*.fna'

See also

read_align

write_fasta

Author

Martin Asser Hansen - Copyright (C) - All rights reserved.

[email protected]

August 2007

License

GNU General Public License version 2

http://www.gnu.org/copyleft/gpl.html

Help

read_fasta is part of the Biopieces framework.

http://www.biopieces.org

Clone this wiki locally