SAINT

SAINT is a weakly-supervised learning method where the embedding function is learned automatically from the easily-acquired data.Compared to existing deep learning-based alignment-free method, SAINT doesn’t require tedious labors to collect accurate alignment distances to train.SAINT is more computationally fast and memory efficient because sequence data are operated in a compressed embedding space which is much faster to retrieval and succinct to store.

Compared to existing alignment-free sequence comparison methods,SAINT offers following advantages:

SAINTis more computationally fast and memory efficient because sequence data are operated in a compressed embedding space which is much faster to retrieval and succinct to store.
SAINTis a weakly-supervised learning method where the embedding function is learned automatically from the easily-acquired data. Compared to existing deep learning-based alignment-free method, SAINT doesn’t require tedious labors to collect accurate alignment distances to train.

Version Release Notes

Version 1.0

This is the first version of SAINT pipeline.
An demo of SAINT running is given here.

Package installation and configuration

Pre-install running environment

Unix or Linux operating system.
CPU is enough for calculation.
Python 3 or above.
Packages like sys, optparse, os, random, numpy, pandas, collections, keras and sklearn need to be prepared.

Detailed steps

Download the source code to your directory, e.g: ’/home/user/SAINT’.
Enter your specified directory:
```
  $ cd /home/user/SAINT 
```
Extract the zip file:
```
  $ unzip ./resource/kmer.zip
```
If your operating system has multiple Python version, please be sure your Python version at least 3 or above.

The demo of SAINT

The dataset was download from NCBI. For the 232 bacteria genomes, Saint uses KMC tool to convert fasta file into kmer frequency file here.

Run SAINT

Usage of SAINT

The main running command are triplet_model.py and taxonomy_localization.py with following options:

-h, --help: show this help message and exit

-i, --inputcsv: the taxomony of the input data

-d, --kmer_frequency_dir: the dir of kmer frequency.

-t, --test_name: the list of test name.

-k, --kofKTuple: the value k of KTuple

-e, --epochNum: the number of epoch.

-o, --output: output dir.

Run SAINT to get model.

Create a new folder to put model file

  $ mkdir output

Run triplet_model.py

  $ python code/triplet_model.py -i resource/data.csv -d resource/kmer/ -t resource/test_name.txt -k 6 -e 30 -o output/

Predict taxonomy of unknown species and Calculate the performance of SAINT results.`

Run taxonomy_localization.py
```
  $ python code/taxonomy_localization.py -i resource/data.csv -d resource/kmer/ -t resource/test_name.txt  -o output/
```
The output are ./output/predict_taxonomy.txt and ./output/Accuracy.txt.

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
code		code
resource		resource
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SAINT

Version Release Notes

Package installation and configuration

The demo of SAINT

About

Releases

Packages

Contributors 2

Languages

Ying-Lab/SAINT

Folders and files

Latest commit

History

Repository files navigation

SAINT

Version Release Notes

Package installation and configuration

The demo of SAINT

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages