classify_taxonomy

Biopiece: classify_taxonomy

Description

Warning: classify_taxonomy is under active development and testing.

classify_taxonomy parses taxonomy string from the Q_ID of records in the stream. For each Q_ID a taxonomy tree is created with nodes for each level (kingdom, phylum, class, etc) containing the taxonomic information at each node as well as the count and mean identity score. Using the -l switch will trim the taxonomic trees so that the lowest common ancester is output. Using the -s switch will add to the size the include cluster size from the Q_ID where this may be suffixed with _<cluster count>.

classify_taxonomy only works on headers of the GreenGenes format where the sequence name contains a taxonomy string of the format:

k__Archaea; p__Euryarchaeota; c__Methanococci; o__Methanococcales [...]

The records look like this:

REC_TYPE: Classification
LEVEL: phylum
NAME: SM2F11
COUNT: 3
SCORE: 0.65
---

Usage

... | classify_taxonomy [options]

Options

[-?         | --help]               # Print full usage description.
[-m <uint>  | --min_count]          # Debranch nodes where count <= min_count.
[-l         | --LCA]                # Output lowest common ancestor.
[-s <uint>  | --size=<uint>]        # Parse cluster size from Q_IDs.
[-I <file!> | --stream_in=<file!>]  # Read input from stream file     -  Default=STDIN
[-O <file>  | --stream_out=<file>]  # Write output to stream file     -  Default=STDOUT
[-v         | --verbose]            # Verbose output.

Examples

Here is an example of a complete taxonomic pipeline:

read_sff -ci data.sff |
extract_seq -l 500 |
trim_seq -l 10 |
grab -e 'SEQ_LEN >= 50' |
denoise_seq -vi 1 -r 0.6 |
denoise_seq -vi 0.98 -c 2 |
findsim_seq -vSQd sequences_16S_all_gg_2011_1_unaligned.fasta.gz |
grab -e 'REC_TYPE eq findsim' |
classify_taxonomy -ls |
grab -e 'REC_TYPE eq Classification' |
write_tab -ck COUNT,SCORE,LEVEL,NAME -o result.tab -x

Author

[email protected]

October 2012

License

GNU General Public License version 2

http://www.gnu.org/copyleft/gpl.html

Help

classify_taxonomy is part of the Biopieces framework.

http://www.biopieces.org

Provide feedback

Saved searches

Use saved searches to filter your results more quickly