-
Notifications
You must be signed in to change notification settings - Fork 23
duplicate_record
Martin Asser Hansen edited this page Oct 2, 2015
·
6 revisions
duplicate_record uses the value to a spliced key to duplicate the record. This is useful to de-collapse sequence type records.
... | duplicate_record [options]
[-? | --help] # Print full usage description.
[-k <string> | --key=<string>] # Key with value to use as duplication count.
[-I <file!> | --stream_in=<file!>] # Read input from stream file - Default=STDIN
[-O <file> | --stream_out=<file>] # Write output to stream file - Default=STDOUT
[-v | --verbose] # Verbose output.
Consider the following FASTA entries in the file test.fna
:
>testA_1
GAGTCAAATTRARCCRCA
>testB_2
ATCAAATTRARCCRCA
>testC_3
ATCATTRARCCRCA
If the value following the _
in the sequence name represents the number of times this
sequences was found, and we want to expand these we can achieve this using split_vals
and duplicate_record like this:
read_fasta -i test.fna | split_vals -k SEQ_NAME -K SEQ_NAME,COUNT | duplicate_record -k COUNT
SEQ: GAGTCAAATTRARCCRCA
SEQ_LEN: 18
SEQ_NAME: testA
COUNT: 1
---
SEQ: ATCAAATTRARCCRCA
SEQ_LEN: 16
SEQ_NAME: testB
COUNT: 2
---
SEQ: ATCAAATTRARCCRCA
SEQ_LEN: 16
SEQ_NAME: testB
COUNT: 2
---
SEQ: ATCATTRARCCRCA
SEQ_LEN: 14
SEQ_NAME: testC
COUNT: 3
---
SEQ: ATCATTRARCCRCA
SEQ_LEN: 14
SEQ_NAME: testC
COUNT: 3
---
SEQ: ATCATTRARCCRCA
SEQ_LEN: 14
SEQ_NAME: testC
COUNT: 3
---
Martin Asser Hansen - Copyright (C) - All rights reserved.
February 2012
GNU General Public License version 2
http://www.gnu.org/copyleft/gpl.html
duplicate_record is part of the Biopieces framework.