-
Notifications
You must be signed in to change notification settings - Fork 23
fold_seq
fold_seq fold nucleotide sequence into secodary structure, which is useful to illustrate e.g. miRNA hairpins. The resulting folding information may be uploaded to a custom UCSC Genome Browser track ).
fold_seq currently uses RNAfold from the Vienna package as folding engine. RNAfold must be installed for fold_seq to work. Read more about RNAfold here:
http://www.tbi.univie.ac.at/RNA/
... | fold_seq [options]
[-? | --help] # Print full usage description.
[-I <file!> | --stream_in=<file!>] # Read input from stream file - Default=STDIN
[-O <file> | --stream_out=<file>] # Write output to stream file - Default=STDOUT
[-v | --verbose] # Verbose output.
Consider the following RNA sequence entry in FASTA format from the file test.fna
:
>MI0000116
UUCAGCCUUUGAGAGUUCCAUGCUUCCUUGCAUUCAAUAGUUAUAUUCAAGCAUAUGGAAUGUAAAGAAGUAUGGAGCGAAAUCUGGCGAG
We can read that file with read_fasta:
read_fasta -i test.fna | fold_seq
SIZE: 91
FREE_ENERGY: -36.20
SCORE: 36
SEQ: UUCAGCCUUUGAGAGUUCCAUGCUUCCUUGCAUUCAAUAGUUAUAUUCAAGCAUAUGGAAUGUAAAGAAGUAUGGAGCGAAAUCUGGCGAG
SEQ_NAME: MI0000116
SEQ_LEN: 91
SEC_STRUCT: ....(((...((..((((((((((((.((((((((.((((((.......))).))).)))))))).))))))))))))....)).)))...
CONF: 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 <truncated>
---
The resulting record contains the secondary structure in Stockholm format as the value to SEC_STRUCT. Also, the FREE_ENERGY key is useful for selecting good structures. grab is your friend. The CONF key holds confidence information for the folding at any given position (not used, so set to 1s).
Now, if you have a fully configure UCSC Genome Browser system installed, it is possible to upload secondary structure information. To do this, you must have a BED entry, and the folded sequence. The reasonable way to do this is to have a BED file with the coordinates of the sequence you wish to fold and display and then the magic:
read_bed -i <BED file> | get_genome_seq -g <genome> | fold_seq | upload_to_ucsc -d <genome> -t <my_table_rnaSecStr> -x
Notice that the table name must contain the string 'rnaSecStr' to display the folding information in the Genome Browser.
Martin Asser Hansen - Copyright (C) - All rights reserved.
August 2007
GNU General Public License version 2
http://www.gnu.org/copyleft/gpl.html
fold_seq is part of the Biopieces framework.