Skip to content

Manatee: detection and quantification of small non-coding RNAs from next-generation sequencing data

License

Notifications You must be signed in to change notification settings

jehandzlik/Manatee

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Manatee

Manatee version 1.3

What is Manatee?

Manatee is a tool for detection, quantification, and analysis of small ncRNAs 
from next-generation sequencing data.

DEPENDENCIES

  1. perl
  2. Set::IntervalTree: perl package
  3. SAMtools: need to be installed and added to your PATH
  4. Bowtie: executable file included in Manatee package, no installation required

INSTALLATION (Unix/Linux)

Install the required dependencies and execute Manatee main script as described in the usage section.

Set::IntervalTree

cpan

install Set::IntervalTree

PACKAGE FILES

The following compontents are included in the Manatee package.

bowtie-1.0.1       % directory with bowtie aligner

config             % configuration file

Manatee            % Perl core program for sRNA analysis

README.md          % this file

USAGE with configuration file

Syntax:

manatee -config <file> -i <file> -o <dir>

-config <file>

Path to configuration file.

-i  <file>

Path to pre-processed FASTQ or FASTA file. Valid formats: .fa, .fasta, .fastq, .fq, .fa.gz, .fasta.gz, .fastq.gz, .fq.gz.

-o <dir>

Path to directory where the output will be stored.

USAGE with input parameters

Syntax:

manatee [OPTIONS] -i <file> -o <dir> -index <ebwt> -genome <file> -annotation <file>

-i <file>

Path to pre-processed FASTQ or FASTA file. Valid formats: .fa, .fasta, .fastq, .fq, .fa.gz, .fasta.gz, .fastq.gz, .fq.gz.

-o <dir>

Path to directory where the output will be stored.

-index <ebwt>

Path and basename of the genome Bowtie index to be searched. The basename is the name of any of the index files up to but not including the final .1.ebwt/.rev.1.ebwt/etc.

-genome <file>

Path to genome FA or FASTA file.

-annotation <file>

Path to non coding annotation file. File should contain the following tab seperated elements: chromosome, strand, start loci, end loci, biotype, transcript id, transcript name.

OPTIONS

-t_index <ebwt>

Path and basename of the transcriptome Bowtie index to be searched. The basename is the name of any of the index files up to but not including the final .1.ebwt/.rev.1.ebwt/etc. If left blank, in case of non existing index, Manatee will generate transcriptome index based on the provided non coding annotation and will store that index within the transcripts directory.

-cores <int>

Number of alignment cores (default: -cores 1).

-collapse <yes/no>

Collapse reads with the same genomic sequences. This setting reduces significantly the execution time. Possible values yes/no (default: -collapse yes).

-mismatches <int> 

Maximun number of mismatches in genomic alignments (default: mismatches=1).

-m <int>

Max of multimapping loci, -m in bowtie execution. The mapping algorithm will be applied only for reads with multi-mapped loci less or equal than m. Reads with multimapped loci that exceed the -m will be aligned against transcriptome (default: -m 50).

-s <yes/no>

Strand specific mode of the algorithm (default -s yes).

-cd <int>

Minimum number of unannotated read abundances per cluster (default: -cd 5).

-cdi <int>

Clusters of unannotated reads will be merged if the distance between them is equal or less than cdi (default: -cdi 50).

OUTPUT

A successful run will produce the following three output files in the output directory

<inputName>_Manatee_counts.tsv

<inputName>_Manatee_clusters.tsv

<inputName>_Manatee_isomirs.tsv.

Depending on the input, <inputName>_Manatee_clusters.tsv might not be generated.

ADDITIONAL COMMENTS

  • Input data should be trimmed for adapters and barcodes before running Manatee. Too short reads and reads with low sequencing quality should be discarded from the input as well.
  • Example of annotation file in GTF format compatible with Manatee is included in the 'annotation' branch.
  • Genome and transcriptome Bowtie index files should be build using Bowtie 1. Bowtie 1 is included in the Manatee package.

FUNDING

The "ELIXIR-GR: Managing and Analysing Life Sciences Data (MIS: 5002780)". Project is co-financed by Greece and the European Union - European Regional Development Fund.

About

Manatee: detection and quantification of small non-coding RNAs from next-generation sequencing data

Resources

License

Stars

Watchers

Forks

Packages