-
Notifications
You must be signed in to change notification settings - Fork 23
extract_seq
extract_seq extracts a subsequence from a sequence in all records in the stream. The sequence is then replaced with this subsequence. The same goes for any ASCII encoded quality SCORE string (Solexa style) found in sequence records.
... | extract_seq [options]
[-? | --help] # Print full usage description.
[-b <uint> | --beg=<uint>] # Begin position of subsequence (first residue=1)
[-e <uint> | --end=<uint>] # End position of subsequence
[-l <uint> | --len=<uint>] # Length of subsequence
[-I <file!> | --stream_in=<file!>] # Read input from stream file - Default=STDIN
[-O <file> | --stream_out=<file>] # Write output to stream file - Default=STDOUT
[-v | --verbose] # Verbose output.
Consider the following FASTA entry in the file test.fna
:
>test
ACGACGCATNNNNNNactgatcga
To obtains the subsequence from position 5 (first residue is 1) to postion 10 we first read in the sequence using read_fasta and then we pipe the stream to extract_seq:
read_fasta -i test.fna | extract_seq -b 5 -e 10
SEQ: CGCATN
SEQ_LEN: 6
SEQ_NAME: test
---
Note the positions (first position is 1 ) and the returned sequence:
1 10 20
| | |
123456789012345678901234
ACGACGCATNNNNNNactgatcga
We could also have specified a length with -l
instead of end postion with -e
:
read_fasta -i test.fna | extract_seq -b 5 -l 5
SEQ: CGCAT
SEQ_LEN: 5
SEQ_NAME: test
---
Now, if we only specify the begin position, what happens?
read_fasta -i test.fna | extract_seq -b 5
SEQ: CGCATNNNNNNactgatcga
SEQ_LEN: 20
SEQ_NAME: test
---
Or if we only speficy the end postion?
read_fasta -i test.fna | extract_seq -b 5 -e 10
SEQ: ACGACGCATN
SEQ_LEN: 10
SEQ_NAME: test
---
Or what about if we only specify the length?
read_fasta -i test.fna | extract_seq -l 5
SEQ: ACGAC
SEQ_LEN: 5
SEQ_NAME: test
---
That is quite practical if we want the first five residues of all the sequences, but what if we want the five last residues? Easy! We use reverse_seq to reverse the sequences, and then we get the first 5 residues (which in fact are the last five residues), and the we reverse the sequence again with reverse_seq:
read_fasta -i test.fna | reverse_seq | extract_seq -l 5 | reverse_seq
SEQ: atcga
SEQ_LEN: 5
SEQ_NAME: test
---
Martin Asser Hansen - Copyright (C) - All rights reserved.
August 2007
GNU General Public License version 2
http://www.gnu.org/copyleft/gpl.html
extract_seq is part of the Biopieces framework.