Skip to content
Martin Asser Hansen edited this page Oct 2, 2015 · 6 revisions

Biopiece: substitute_vals

Description

substitute_vals can be used to search and replace values to keys in the stream using Perl Regex (see Examples). Flags are available for case insensitive and global search.

Usage

... | substitute_vals --search=<regex> --replace=<regex> [options]

Options

[-?          | --help]               #  Print full usage description.
[-s <string> | --search=<string>]    #  Regex search.
[-r <string> | --replace=<string>]   #  Regex replace.
[-i          | --ignore_case]        #  Case insensitive search.
[-g          | --global]             #  Globase replacement.
[-k <list>   | --keys=<list>]        #  List of keys whos values to substitute.
[-I <file!>  | --stream_in=<file!>]  #  Read input from stream file  -  Default=STDIN
[-O <file>   | --stream_out=<file>]  #  Write output to stream file  -  Default=STDOUT
[-v          | --verbose]            #  Verbose output.

Examples

Consider the following sequences in FASTA format in the file test.fna:

>test1
AGNCTTTTCATTCTGACTGCAACGGGCAATACCTGCCGTGAGTAAATNNN
>test2
TGGGCGTTNNNNNGCAGGTAAATAGGCTTCTGTNNGACGTACTATAACGT
>test3
NNNNATAGTACTACAGTAACGAAAGTCNNGGATTTTTCTGAAGAGCTTTA

To remove all numbers use substitute_vals like this:

read_fasta -i test.fna | substitute_vals -s '\d' -r '' -g

SEQ: AGNCTTTTCATTCTGACTGCAACGGGCAATACCTGCCGTGAGTAAATNNN
SEQ_LEN:
SEQ_NAME: test
---
SEQ: TGGGCGTTNNNNNGCAGGTAAATAGGCTTCTGTNNGACGTACTATAACGT
SEQ_LEN:
SEQ_NAME: test
---
SEQ: NNNNATAGTACTACAGTAACGAAAGTCNNGGATTTTTCTGAAGAGCTTTA
SEQ_LEN:
SEQ_NAME: test
---

We can use substitute_vals to remove all N's like this:

read_fasta -i test.fna | substitute_vals -k SEQ -s 'N' -r '' -g

SEQ: AGCTTTTCATTCTGACTGCAACGGGCAATACCTGCCGTGAGTAAAT
SEQ_LEN: 50
SEQ_NAME: test1
---
SEQ: TGGGCGTTGCAGGTAAATAGGCTTCTGTGACGTACTATAACGT
SEQ_LEN: 50
SEQ_NAME: test2
---
SEQ: ATAGTACTACAGTAACGAAAGTCGGATTTTTCTGAAGAGCTTTA
SEQ_LEN: 50
SEQ_NAME: test3
---

We can further specify to remove blocks of N's longer than 3:

read_fasta -i test.fna | substitute_vals -k SEQ -s 'N{3,}' -r '' -g

SEQ: AGNCTTTTCATTCTGACTGCAACGGGCAATACCTGCCGTGAGTAAAT
SEQ_LEN: 50
SEQ_NAME: test1
---
SEQ: TGGGCGTTGCAGGTAAATAGGCTTCTGTNNGACGTACTATAACGT
SEQ_LEN: 50
SEQ_NAME: test2
---
SEQ: ATAGTACTACAGTAACGAAAGTCNNGGATTTTTCTGAAGAGCTTTA
SEQ_LEN: 50
SEQ_NAME: test3
---

See also

transliterate_vals

Author

Martin Asser Hansen - Copyright (C) - All rights reserved.

[email protected]

January 2013

License

GNU General Public License version 2

http://www.gnu.org/copyleft/gpl.html

Help

substitute_vals is part of the Biopieces framework.

http://www.biopieces.org

Clone this wiki locally