-
Notifications
You must be signed in to change notification settings - Fork 23
read_bed
The BED (Browser Extensible Data) format is a tabular format for data pertaining to one of the Eukaryotic genomes in the UCSC genome brower. The BED format consists of up to 12 columns, where the first three are mandatory.
- CHR - the name of the chromosome.
- CHR_BEG - the chromosome begin position.
- CHR_END - the chromosome end position.
- Q_ID - the name of the feature.
- SCORE - a score between 0 and 1000.
- STRAND - the orientation of the feature.
- THICK_BEG - begin position of 'thick' drawing used for UTRs.
- THICK_END - end position of 'thick' drawing used for UTRs.
- ITEMRGB - RGB color code for feature.
- BLOCKCOUNT - number of exon blocks.
- BLOCKSIZES - list of block sizes.
- Q_BEGS - list of block begins.
Furthermore, an extra three helper columns are added to the record by read_bed:
- REC_TYPE - the type of record, here BED.
- BED_LEN - the length of the entire feature.
- BED_COLS - the number of BED columns (for speed).
So a typical 12 column BED record looks like this:
STRAND: -
Q_ID: AA695812
CHR_END: 31601
THICK_END: 31601
SCORE: 0
CHR_BEG: 31176
BED_LEN: 426
REC_TYPE: BED
BLOCKCOUNT: 1
CHR: chr4
THICK_BEG: 31176
Q_BEGS: 0,
BLOCKSIZES: 426,
ITEMRGB: 0
BED_COLS: 12
---
For more about the BED format:
http://genome.ucsc.edu/FAQ/FAQformat#format1
read_bed [options] -i <BED file(s)>
[-? | --help] # Print full usage description.
[-i <files!> | --data_in=<files!>] # Comma separated list of files or glob expression to read.
[-c <uint> | --cols=<uint>] # Number of columns to read.
[-n <uint> | --num=<uint>] # Limit number of records to read.
[-C | --check] # Check integrity of BED entries.
[-I <file!> | --stream_in=<file!>] # Read input stream from file - Default=STDIN
[-O <file> | --stream_out=<file>] # Write output stream to file - Default=STDOUT
[-v | --verbose] # Verbose output.
To read all BED entries from a file:
read_bed -i test.bed
To read in only 10 records from a BED file:
read_bed -n 10 -i test.bed
To read in only 3 columns from a BED file:
read_bed -c 3 -i test.bed
To check the integrity of the BED entries use the -C
switch, which will
raise an error if the BED entry is malformatted:
read_bed -C -i test.bed
To read all BED entries from multiple files:
read_bed -i test1.bed,test2.bed
To read BED entries from multiple files using a glob expression:
read_bed -i '*.bed'
Martin Asser Hansen - Copyright (C) - All rights reserved.
August 2007
GNU General Public License version 2
http://www.gnu.org/copyleft/gpl.html
read_bed is part of the Biopieces framework.