MELT

MEtric Learning with Triplet network for Comparing Genomic data

Comparison between genomic data is critical to estimating evolutionary relationships, understanding the microbiome dynamics, or exploring the developmental process of embryogenesis.

Alignment-based and alignment-free methods play an important role is genomic comparison. However, the current existing measures are mostly based on a fixed dissimilarity function or model, which might not be suitable for all scenarios.

Therefore, in this study, we proposed MELT, a metric learning model with triplet network to obtain a suitable dissimilarity measure for hierarchical or longitudinal genomic and microbiome data. The construction of the training triplet dataset sufficiently reflects the dissimilarity characteristics of application scenarios. Hence, MELT offers a data-based, scenario-oriented framework for adaptive metric comparison.

Compared to existing comparison methods, MELT offers the following advantages:

MELT learns useful representations of genomic data by the reference relationship that “data A is closer to data B than to data C ”, which sufficiently reflects the dissimilarity characteristics of application scenarios
Instead of accurate alignment distances for training, MELT requires only dissimilarity comparisons among A, B, and C. The embedding function is learned automatically from the dissimilarity comparisons. Therefore, MELT is particularly suitable to deal with scenarios without clear categorical information, such as hierarchical or longitudinal datasets.

The experiments on comparing genomic sequences, temporal microbiome samples, and gene expression profiles from scRNA-seq demonstrate the significant performance in the three application scenarios.

Environment/Package installation

We offer two methods

Install via conda-pack file(recommended)

Conda-pack is a command line tool for creating archives of conda environments that can be installed on other systems and locations.

Here we use conda-pack to pack our environment and provide it on release. You can easily download these environments to run our code without pre-installing anaconda or python.

Make sure your operating system is Linux
In the terminal window of the Linux operating system
Detailed steps

Download the source code to your directory.

$git clone https://github.com/Ying-Lab/MELT.git

2. Enter the MELT directory:

$cd MELT/

Download the environment .tar.gz file. This will take some time depending on your internet.

$wget https://github.com/Ying-Lab/MELT/releases/download/v1.0/MELT-pytorch-env.tar.gz

$wget https://github.com/Ying-Lab/MELT/releases/download/v1.0/MELT-tensorflow-env.tar.gz

4. Extract the environment .tar.gz file to a specific directory

$mkdir MELT-pytorch-env
$tar -zxvf MELT-pytorch-env.tar.gz -C MELT-pytorch-env

$mkdir MELT-tensorflow-env
$tar -zxvf MELT-tensorflow-env.tar.gz -C MELT-tensorflow-env

5.You have now successfully installed the environment. You can use the following command to activate an environment.

activate pytorch env

$source MELT-pytorch-env/bin/activate

activate tensorflow env

$source MELT-tensorflow-env/bin/activate

Installation via Anaconda (not recommended)

If you find conda-pack files difficult to download, you can choose this method to install the environment.

In the terminal window of the Linux operating system
Detailed steps:

Install Miniconda to manage your environment (you can skip this step if you already have Conda installed).

$wget -c https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
$bash Miniconda3-latest-Linux-x86_64.sh -bfp

Restart your terminal
Download the source code to your directory.

$git clone https://github.com/Ying-Lab/MELT.git

Create an environment and install the package from .yml file. Conda will download th package. The new environment name is 'MELT-pytorch-env'

$cd MELT/
$conda env create -f MELT-pytorch-env.yml
$conda env create -f MELT-tensorflow-env.yml

Activate environment

activate pytorch env

$conda activate MELT-pytorch-env

activate tensorflow env

$conda activate MELT-tensorflow-env

Now you've successfully installed the environment

The trained models of MELT

We provided three trained models for each of the three experiments:
- Experiment1: MELT/demo_experiment1/resource/trained_model.h5
- Experiment2: MELT/demo_experiment2/trained_model.pth
- Experiment3: MELT/demo_experiment3/trained_model.pth

The demo of MELT

Three small datasets are used for demonstration purposes.

MELT for hierarchical relationships of genomic sequences

Please activate tensorflow-env first

$source MELT-tensorflow-env/bin/activate

or

$conda activate MELT-tensorflow-env

Usage of MELT

The main running commands are as follows:

-h, --help: show help information

-i, --inputcsv: the taxomony of the input data

-d, --kmer_frequency_dir: the dir of kmer frequency.

-t, --test_name: the list of test name.

-k, --kofKTuple: the value k of KTuple

-e, --epochNum: the number of epoch.

-o, --output: output dir.

Run MELT to get model.

Extract the zip file:

$unzip demo_experiment1/resource/kmer.zip -d demo_experiment1/resource/

Create a new folder to put model file:

$mkdir demo_experiment1/output

Enter demo_experiment1 directory:

$cd demo_experiment1/

Run triplet_model.py:

$python code/triplet_model.py -i resource/data.csv -d resource/kmer/ -t resource/test_name.txt -k 6 -e 30 -o output/

Predict taxonomy of unknown species.

Run taxonomy_localization.py

$python code/taxonomy_localization.py -i resource/data.csv -d resource/kmer/ -t resource/test_name.txt  -o output/

The output are ./output/predict_taxonomy.txt.

Here we also provide a trained model in resource/trained_model.h5

back to MELT/ path

$cd ..

MELT for longitudinal cell division on scRNA-seq data

Please activate pytorch-env first

$source MELT-pytorch-env/bin/activate

or

$conda activate MELT-pytorch-env

Enter demo_experiment2 directory

$cd demo_experiment2/

Run MELT to train the model. It will save every epoch model during training(default) in ./model/ directory.

$python ./train.py

Predict embedding of test data. Here we provide a trained model in ./trained_model.pth . It will produce two csv files in the current path :

train_embedding.csv
test_embedding.csv

$python ./predict.py

back to MELT/ path

$cd ..

MELT for longitudinal dissimilarity of temporal microbiome data

Please activate pytorch-env first

$source MELT-pytorch-env/bin/activate

or

$conda activate MELT-pytorch-env

Enter experiment3 directory

$cd ./demo_experiment3

Run MELT to train the model. It will save every epoch model during training(default) in ./model/ directory.

$python ./train.py

Predict embedding of test data. Here we provide a trained model in ./trained_model.pth . It will produce two csv files in the current path :

all_data_embedding.csv
test_data_embedding.csv

$python ./predict.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

MELT

MEtric Learning with Triplet network for Comparing Genomic data

Compared to existing comparison methods, MELT offers the following advantages:

Environment/Package installation

We offer two methods

Install via conda-pack file(recommended)

Installation via Anaconda (not recommended)

The trained models of MELT

The demo of MELT

MELT for hierarchical relationships of genomic sequences

MELT for longitudinal cell division on scRNA-seq data

MELT for longitudinal dissimilarity of temporal microbiome data

Files

README.md

Latest commit

History

README.md

File metadata and controls

MELT

MEtric Learning with Triplet network for Comparing Genomic data

Compared to existing comparison methods, MELT offers the following advantages:

Environment/Package installation

We offer two methods

Install via conda-pack file(recommended)

Installation via Anaconda (not recommended)

The trained models of MELT

The demo of MELT

MELT for hierarchical relationships of genomic sequences

MELT for longitudinal cell division on scRNA-seq data

MELT for longitudinal dissimilarity of temporal microbiome data