-
Notifications
You must be signed in to change notification settings - Fork 23
find_pairs
Martin Asser Hansen edited this page Oct 2, 2015
·
6 revisions
find_pairs report sequence names of records with pair end sequence data where the sequence
names are either using the Illuina 1.5 scheme where names end on /1 or /2 or the Illumina 1.8 scheme
where The names contain a space followed by 1
or 2
and then a :
. Only the sequence names are
output in interleaved order. The records look like this:
SEQ_NAME: HWI-ST575:107:C0HE6ACXX:5:1101:1832:2218 1:N:0:TAGCTG
---
SEQ_NAME: HWI-ST575:107:C0HE6ACXX:5:1101:1832:2218 2:N:0:TAGCTG
---
find_pairs making that more efficient.
... | find_pairs [options]
[-? | --help] # Print full usage description.
[-I <file!> | --stream_in=<file!>] # Read input from stream file - Default=STDIN
[-O <file> | --stream_out=<file>] # Write output to stream file - Default=STDOUT
[-v | --verbose] # Verbose output.
If you have two pair-end sequence files with the Illumina 1.5 or 1.8 scheme of naming pairs then you can locate paired records in two steps with find_pairs:
Step 1:
read_fastq -i test1.fq,test2.fq | find_pairs | write_tab -k -o seq_names.tab -x
Step 2:
read_fastq -i test1.fq,test2.fq | grab -E seq_names.tab | write_fastq -o test_pairs.fq -x
Martin Asser Hansen - Copyright (C) - All rights reserved.
November 2012
GNU General Public License version 2
http://www.gnu.org/copyleft/gpl.html
find_pairs is part of the Biopieces framework.