-
Notifications
You must be signed in to change notification settings - Fork 23
find_SNPs
Martin Asser Hansen edited this page Oct 2, 2015
·
6 revisions
find_SNPs locates single nucleotide polymophisms (SNP) and
deletion/insertion polymorphisms (DIP) in SAM type records in
the stream. The approach used by find_SNPs is to parse the
ALIGN
field and for each event emit a record like this:
REC_TYPE: SNP
S_ID: gi|48994873|gb|U00096.2|
POS: 993405
EVENT: G>C
SNP_COUNT: 1
TYPE: MISMATCH
---
The position POS
is 0-based and corresponds to the exact position
in the subject sequence.
... | find_SNPs [options]
[-? | --help] # Print full usage description.
[-I <file!> | --stream_in=<file!>] # Read input from stream file - Default=STDIN
[-O <file> | --stream_out=<file>] # Write output to stream file - Default=STDOUT
[-v | --verbose] # Verbose output.
Consider the following SAM entry in the file test.sam
:
@SQ SN:gi|48994873|gb|U00096.2| LN:4639675
ID00081401 16 gi|48994873|gb|U00096.2| 3405 37 100M * 0 0
CGCACGGGCGACATCTGGCAGGCTTCATTCACGCCTGCTATTCCCGTCAGCCTGAGCTTGCCGCGAAGCTGATGAAAGATGTTATCGCTGAACCCTAACC * XT:A:U NM:i:2
X0:i:1 X1:i:0 XM:i:XO:i:0 XG:i:0 MD:Z:97C1G0
To locate SNPs in this file use read_sam like this:
read_sam -i test.sam | find_SNPs
REC_TYPE: SAM
Q_ID: ID00081401
STRAND: -
S_ID: gi|48994873|gb|U00096.2|
S_BEG: 3405
MAPQ: 37
CIGAR: 100M
SEQ: CGCACGGGCGACATCTGGCAGGCTTCATTCACGCCTGCTATTCCCGTCAGCCTGAGCTTGCCGCGAAGCTGATGAAAGATGTTATCGCTGAACCCTAACC
ALIGN: 97:C>A,99:G>C
---
REC_TYPE: SNP
S_ID: gi|48994873|gb|U00096.2|
POS: 973405
EVENT: C>A
SNP_COUNT: 1
TYPE: MISMATCH
---
REC_TYPE: SNP
S_ID: gi|48994873|gb|U00096.2|
POS: 993405
EVENT: G>C
SNP_COUNT: 1
TYPE: MISMATCH
---
Martin Asser Hansen - Copyright (C) - All rights reserved.
September 2011
GNU General Public License version 2
http://www.gnu.org/copyleft/gpl.html
find_SNPs is part of the Biopieces framework.