Manatee version 1.3
Manatee is a tool for detection, quantification, and analysis of small ncRNAs
from next-generation sequencing data.
- perl
- Set::IntervalTree: perl package
- SAMtools: need to be installed and added to your PATH
- Bowtie: executable file included in Manatee package, no installation required
Install the required dependencies and execute Manatee main script as described in the usage section.
cpan
install Set::IntervalTree
The following compontents are included in the Manatee package.
bowtie-1.0.1 % directory with bowtie aligner
config % configuration file
Manatee % Perl core program for sRNA analysis
README.md % this file
manatee -config <file> -i <file> -o <dir>
|
Path to configuration file. |
|
Path to pre-processed FASTQ or FASTA file. Valid formats: .fa, .fasta, .fastq, .fq, .fa.gz, .fasta.gz, .fastq.gz, .fq.gz. |
|
Path to directory where the output will be stored. |
manatee [OPTIONS] -i <file> -o <dir> -index <ebwt> -genome <file> -annotation <file>
|
Path to pre-processed FASTQ or FASTA file. Valid formats: .fa, .fasta, .fastq, .fq, .fa.gz, .fasta.gz, .fastq.gz, .fq.gz. |
|
Path to directory where the output will be stored. |
|
Path and basename of the genome Bowtie index to be searched. The basename is the name of any of the index files up to but not including the final .1.ebwt/.rev.1.ebwt/etc. |
|
Path to genome FA or FASTA file. |
|
Path to non coding annotation file. File should contain the following tab seperated elements: chromosome, strand, start loci, end loci, biotype, transcript id, transcript name. |
|
Path and basename of the transcriptome Bowtie index to be searched. The basename is the name of any of the index files up to but not including the final .1.ebwt/.rev.1.ebwt/etc. If left blank, in case of non existing index, Manatee will generate transcriptome index based on the provided non coding annotation and will store that index within the transcripts directory. |
|
Number of alignment cores (default: -cores 1). |
|
Collapse reads with the same genomic sequences. This setting reduces significantly the execution time. Possible values yes/no (default: -collapse yes). |
|
Maximun number of mismatches in genomic alignments (default: mismatches=1). |
|
Max of multimapping loci, -m in bowtie execution. The mapping algorithm will be applied only for reads with multi-mapped loci less or equal than m. Reads with multimapped loci that exceed the -m will be aligned against transcriptome (default: -m 50). |
|
Strand specific mode of the algorithm (default -s yes). |
|
Minimum number of unannotated read abundances per cluster (default: -cd 5). |
|
Clusters of unannotated reads will be merged if the distance between them is equal or less than cdi (default: -cdi 50). |
A successful run will produce the following three output files in the output directory
<inputName>_Manatee_counts.tsv
<inputName>_Manatee_clusters.tsv
<inputName>_Manatee_isomirs.tsv.
Depending on the input, <inputName>_Manatee_clusters.tsv might not be generated.
- Input data should be trimmed for adapters and barcodes before running Manatee. Too short reads and reads with low sequencing quality should be discarded from the input as well.
- Example of annotation file in GTF format compatible with Manatee is included in the 'annotation' branch.
- Genome and transcriptome Bowtie index files should be build using Bowtie 1. Bowtie 1 is included in the Manatee package.
The "ELIXIR-GR: Managing and Analysing Life Sciences Data (MIS: 5002780)". Project is co-financed by Greece and the European Union - European Regional Development Fund.