Skip to content
Martin Asser Hansen edited this page Oct 2, 2015 · 6 revisions

Biopiece: duplicate_record

Description

duplicate_record uses the value to a spliced key to duplicate the record. This is useful to de-collapse sequence type records.

Usage

... | duplicate_record [options]

Options

[-?          | --help]               #  Print full usage description.
[-k <string> | --key=<string>]       #  Key with value to use as duplication count.
[-I <file!>  | --stream_in=<file!>]  #  Read input from stream file  -  Default=STDIN
[-O <file>   | --stream_out=<file>]  #  Write output to stream file  -  Default=STDOUT
[-v          | --verbose]            #  Verbose output.

Examples

Consider the following FASTA entries in the file test.fna:

>testA_1
GAGTCAAATTRARCCRCA
>testB_2
ATCAAATTRARCCRCA
>testC_3
ATCATTRARCCRCA

If the value following the _ in the sequence name represents the number of times this sequences was found, and we want to expand these we can achieve this using split_vals and duplicate_record like this:

read_fasta -i test.fna | split_vals -k SEQ_NAME -K SEQ_NAME,COUNT | duplicate_record -k COUNT

SEQ: GAGTCAAATTRARCCRCA
SEQ_LEN: 18
SEQ_NAME: testA
COUNT: 1
---
SEQ: ATCAAATTRARCCRCA
SEQ_LEN: 16
SEQ_NAME: testB
COUNT: 2
---
SEQ: ATCAAATTRARCCRCA
SEQ_LEN: 16
SEQ_NAME: testB
COUNT: 2
---
SEQ: ATCATTRARCCRCA
SEQ_LEN: 14
SEQ_NAME: testC
COUNT: 3
---
SEQ: ATCATTRARCCRCA
SEQ_LEN: 14
SEQ_NAME: testC
COUNT: 3
---
SEQ: ATCATTRARCCRCA
SEQ_LEN: 14
SEQ_NAME: testC
COUNT: 3
---

See also

read_fasta

split_vals

Author

Martin Asser Hansen - Copyright (C) - All rights reserved.

[email protected]

February 2012

License

GNU General Public License version 2

http://www.gnu.org/copyleft/gpl.html

Help

duplicate_record is part of the Biopieces framework.

http://www.biopieces.org

Clone this wiki locally