HiCLSTM

Hi-C-LSTM is a framework to build Hi-C intrachromosomal representations which are useful for element identification and in-silico alterations.

Hi-C-LSTM Model

Note:

This might not be SOTA for Hi-C representation learning anymore. Refer to Akita, Orca, and Origami. Researchers interested in a preliminary hierachical version of Hi-C-LSTM can check HirarChy and those looking to apply contastive learning to Hi-C data can get started with HiCClip.

Requirements

The following software was installed on Ubuntu 16.04

Python 3.7.10
CUDA 10.1 with libcudnn.so.7
torch 1.8.0
captum 0.3.1
numpy 1.21.0
pandas 1.2.4
scipy 1.7.0
matplotlib 3.4.2
tensorboard 2.5.0
seaborn 0.11.1

Install the above dependencies using installers like pip. The typical install time is about 1 hour. No non-standard hardware is required.

To prepare data:

get HiC data: Download Juicer Tools and run ./extract_chromosomes.sh. Specify right path for juicer tools jar.
run compute_genome_length.py to create file with rounded, cumulative chromosome lengths

Demo

Use the hic_chr22.txt file as input for demo
Partition the file based on training and testing needs
Model parameters, hyperarameters, and output directories can be changed in ./code/config.py.

Train model:

./code/train_model.py

Specify the model_name of your choice
In the DataLoader, under the get_data_loader(cfg, cell) function, specify the chromosomes to be used
For the demo case, use chromosome 22. Change the directory of input Hi-C data to .data2/
Expected output is a trained model called model_name. Expected training time is less than 8 minutes per epoch on GeForce GTX 1080 Ti GPU.

Test model:

./code/test_model.py

Use the trained model model_name to test on the remainder of the chromosome 22 data
Expect the MSE of the model as the output along with predictions and representations. Expected testing time is less than 5 minutes.

How to use Representations

Extract the representations of size representation_size from the prediction file
Align them with the genome at 10Kbp resolution
Use for downstream tasks of preference like classification of genomic phenomena and in-silico mutagenesis.

Name		Name	Last commit message	Last commit date
Latest commit History 1,018 Commits
data2		data2
saved_models		saved_models
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HiCLSTM

Hi-C-LSTM Model

Note:

Requirements

To prepare data:

Demo

Train model:

Test model:

How to use Representations

About

Releases

Packages

Languages

License

kevinbdsouza/Hi-C-LSTM

Folders and files

Latest commit

History

Repository files navigation

HiCLSTM

Hi-C-LSTM Model

Note:

Requirements

To prepare data:

Demo

Train model:

Test model:

How to use Representations

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages