-
Notifications
You must be signed in to change notification settings - Fork 23
mate_pair_dist
mate_pair_dist calculated the distance between mate-pair reads that have been mapped to one or more sequences such as a genome or a number of contigs. The resulting information can be used to assess the integrity of the reads.
Mate-pairs are located by mapping the mate-pair reads againt a genome or a set of contigs using one of the following mapping tools:
Where the following keys are present in the output:
- S_ID
- STRAND
- Q_ID
- S_BEG
The input mate-pair reads must be Illumina type read names where the first read ID is followed
by a /1
and the second read is followed by a /2
. Input order does not matter.
mate_pair_dist then seperates the reads based on S_ID and STRAND and for each Q_ID output all
distances between /1
and /2
records. Thus, records of the below type are output if mate-pairs
are found:
Q_ID1: 1_ClditxwXsN1/1
S_ID: M1_c1
Q_ID2: 1_ClditxwXsN1/2
DIST: 1210
---
... | mate_pair_dist [options]
[-? | --help] # Print full usage description.
[-I <file!> | --stream_in=<file!>] # Read input from stream file - Default=STDIN
[-O <file> | --stream_out=<file>] # Write output to stream file - Default=STDOUT
[-v | --verbose] # Verbose output.
Here is a two-step analysis of mate-pair read integrity. First we index a FASTA file with contigs using create_bowtie_index:
read_fasta -i contigs.fna | create_bowtie_index -d my_dir -i contigs -x
And then we map Illumina reads with read_fastq and determine the mate-pair distances with mate_pair_dist - which we finally plot:
read_fastq -i reads.fq | bowtie_seq -i my_dir/contigs -m 3 | mate_pair_dist | plot_lendist -k DIST -x
Martin Asser Hansen - Copyright (C) - All rights reserved.
September 2009
GNU General Public License version 2
http://www.gnu.org/copyleft/gpl.html
mate_pair_dist is part of the Biopieces framework.