Skip to content
Martin Asser Hansen edited this page Oct 2, 2015 · 6 revisions

Biopiece: order_pairs

Description

order_pairs order records with pair end sequence data where the sequence names are either using the Illuina 1.5 scheme where names end on /1 or /2 or the Illumina 1.8 scheme where The names contain a space followed by 1 or 2 and then a :. The records are output in inter leaved order - which is required for pair-end aware assembly programs. order_pairs uses a hashing scheme for this and does not sort according to sequence name.

Using order_pairs is important after filtering steps where one record of a pair may have been discarded. For each record the value to the ORDER key denotes if the record was paired or the record was orphan and you can use grab to filter the records accordingly.

SEQ_NAME: HWI-ST575:107:C0HE6ACXX:5:1101:1832:2218 1:N:0:TAGCTG
SEQ: GCTTTGACATAGTCGCTCCAGAATTGCCAGCTAGGGTTAGCTTGGCAACTGCAGCGACGTAATGTGCTGTGGCAGATCAATTTATCTGTTTTGAATCA
SEQ_LEN: 98
SCORES: ^P^PJ\Y`eea`e[daYdecggadgdXJIYVbdc`efg_cdedI^aXIO^abeb\eL_daQU^_V]``]UGTZ\^BBBBBBBBBBBBBBBBBBBBBBB
ORDER: paired
---
SEQ_NAME: HWI-ST575:107:C0HE6ACXX:5:1101:1832:2218 2:N:0:TAGCTG
SEQ: GGTTATCGATCTGGAAAAAGCAACTAAACCTAAAGCTAAACCACGTAGCGCCGGGTAAATGATTCAAAACAGATAAATTGATCTGCCACAGCACATTA
SEQ_LEN: 98
SCORES: ^VYPJQ`c^JJ[b[efg^dHJ`aa`adXd_ZXXbIIIY[af_H^aWHWPZ[`gggFFZ^bd_Z]Zb_]ba\^ZGY_`TZ``cc[[bbR]]]^aaXQ[bbb
ORDER: paired
---
SCORES: ffffcfffffded^eddddddbdcdeedcefecfefdffecabccBB`b`
SEQ: CCNAGGAGGAGNCAATAAGAGACCATTCGTATATGATCTCTCAGGAGAGC
SEQ_LEN: 50
SEQ_NAME: ILLUMINA-52179E_0004:2:1:1044:7943#TTAGGC/1
ORDER: orphan 1
---
SCORES: BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
SEQ: NNNNNNNNGGNNCNANNANNNNGTNNNTNGNANNNNCNNANTTGNNNNNN
SEQ_LEN: 50
SEQ_NAME: ILLUMINA-52179E_0004:2:1:1041:14486#TTAGGC/2
ORDER: orphan 2
---

Usage

... | order_pairs [options]

Options

[-?         | --help]               #  Print full usage description.
[-I <file!> | --stream_in=<file!>]  #  Read input from stream file   -  Default=STDIN
[-O <file>  | --stream_out=<file>]  #  Write output to stream file   -  Default=STDOUT
[-v         | --verbose]            #  Verbose output.

Examples

If you have two pair-end sequence files with the Illumina 1.5 or 1.8 scheme of naming pairs then you can order these with order_pairs simply by doing:

read_fastq -i test1.fq,test2.fq | order_pairs | write_fastq -o combi.fq -x

If you filter your sequences and discard a member of a pairs, you can run the data through order_pairs to discard any unmatched records:

read_fastq -i combi.fq |            # Read in Illumina data
trim_seq |                          # Trim ends according to quality scores
grab -e "SEQ_LEN>30" |              # Remove entries with sequence shorter than 30
order_pairs |                       # Make sure the pairs are in order
grab -p 'pair' -k ORDER |           # Grab paired records
write_fastq -o combi_trimmed.fq -x  # Write to new file

See also

read_fastq

write_fastq

trim_seq

grab

assemble_seq_idba

assemble_seq_velvet

Author

Martin Asser Hansen - Copyright (C) - All rights reserved.

[email protected]

May 2011

License

GNU General Public License version 2

http://www.gnu.org/copyleft/gpl.html

Help

order_pairs is part of the Biopieces framework.

http://www.biopieces.org

Clone this wiki locally