Skip to content
Martin Asser Hansen edited this page Oct 2, 2015 · 6 revisions

Biopiece: find_pairs

Description

find_pairs report sequence names of records with pair end sequence data where the sequence names are either using the Illuina 1.5 scheme where names end on /1 or /2 or the Illumina 1.8 scheme where The names contain a space followed by 1 or 2 and then a :. Only the sequence names are output in interleaved order. The records look like this:

SEQ_NAME: HWI-ST575:107:C0HE6ACXX:5:1101:1832:2218 1:N:0:TAGCTG
---
SEQ_NAME: HWI-ST575:107:C0HE6ACXX:5:1101:1832:2218 2:N:0:TAGCTG
---

find_pairs making that more efficient.

Usage

... | find_pairs [options]

Options

[-?         | --help]               #  Print full usage description.
[-I <file!> | --stream_in=<file!>]  #  Read input from stream file   -  Default=STDIN
[-O <file>  | --stream_out=<file>]  #  Write output to stream file   -  Default=STDOUT
[-v         | --verbose]            #  Verbose output.

Examples

If you have two pair-end sequence files with the Illumina 1.5 or 1.8 scheme of naming pairs then you can locate paired records in two steps with find_pairs:

Step 1:

read_fastq -i test1.fq,test2.fq | find_pairs | write_tab -k -o seq_names.tab -x

Step 2:

read_fastq -i test1.fq,test2.fq | grab -E seq_names.tab | write_fastq -o test_pairs.fq -x

See also

order_pairs

read_fastq

write_fastq

grab

Author

Martin Asser Hansen - Copyright (C) - All rights reserved.

[email protected]

November 2012

License

GNU General Public License version 2

http://www.gnu.org/copyleft/gpl.html

Help

find_pairs is part of the Biopieces framework.

http://www.biopieces.org

Clone this wiki locally