ChromMovie: A Molecular Dynamics Approach for Simultaneous Modeling of Chromatin Conformation Changes from Multiple Single-Cell Hi-C Maps.
ChromMovie is an Openmm based molecular dynamics simulation model for modeling 3D chromatin structure evolution up to the chromosome level and at a single cell resolution. Employs hierarchical modeling scheme and can be used with either CPU or GPU acceleration.
ChromMovie software was primarily tested on Unix-based systems and with Python version 3.10.0. Before running ChromMovie please install the required packages listed in requirements.txt
.
pip install -r requirements.txt
We recommend running ChromMovie on a GPU for optimal performance. To do so, please ensure that CUDA is properly configured on your system. To utilize the GPU, set the platform
parameter to either CUDA
or OpenCL
. If a GPU is not available, use CPU
(see "Simulation Arguments" section).
The input to ChromMovie consists of the ordered list of single cell interaction data. The data should reflect the cellular process of interest (cell cycle, cell maturation, etc.). The data are expected to be placed in a separate folder input
. The files are expected to be in alphabetical order (see examples
). Each file should contain the single cell contacts in .csv
format with or without the header. First few rows of an example file may look like this:
chrom1,start1,end1,chrom2,start2,end2
chr3,128668059,128668108,chr3,159257092,159257218
chr3,50107348,50107498,chr3,51161425,51161468
chr3,53660133,53660244,chr3,54171138,54171157
If you are running ChromMovie for the first time, we encourage you to try some of our examples in examples
folder.
The simulation parameters used by ChromMovie are stored in YAML file. Example specification is provided in config.yaml
file. The path to the YAML configuration file is the main input parameter of ChromMovie script.
After preparing the data and YAML configuration file, ChromMovie algorithm can be run for example with the following command:
python3 -m ChromMovie -i config.yaml
Simulation results will be saved in the folder output
. A typical output of ChromMovie simulation consists of the following data:
config.yaml
- Configuration file used for this particular simulation. This is a copy of the input configuration file.energy.csv
- File containing diagnostic information about Energy and Temperature of the simulation.simulation_reportXXX.pdf
-.pdf
file with diagnostic information and figures created for each resolution XXX used in the simulation. Only generated ifpdf_report==True
.struct_00_init.cif
-.cif
file containing merged information about the initial structure for each simulation frame. Typically initial structure is a self-avoiding random walk.struct_XX_resYYY_init.cif
-.cif
file containing merged information about the initial structure at each resolution of the simulation (after initial energy minimization).struct_XX_resYYY_ready.cif
-.cif
file containing merged information about the final structure at each resolution of the simulation.frames_cif
- Folder containing all of the structures in.cif
format for different steps and frames of the simulation and the lowest of the resolutions of the simulation.frames_npy
- Folder containing contact heatmaps in.npy
format created from the input contact data for all of the resolutions of the simulation.
The YAML configuration file for ChromMovie uses a number of configurable parameters. List of parameter default values recommended by the authors and their descriptions are provided in the table below:
Argument Name | Type | Value | Description |
---|---|---|---|
input | str | examples/example1_cell_cycle | Folder containing input scHi-C contacts in csv format. If 'None' simulated scHi-C maps are going to be used. |
output | str | results | Output folder for storing simulation results |
n | int | 10 | Number of scHi-C frames to be generated. Only applicable if input==None. |
m | int | 100 | Size of scHi-C frames to be generated (number of beads). Only applicable if input==None. |
n_contacts | int | 50 | Number of contacts to be drawn from each in silico structure. Only applicable if input==None. |
artificial_structure | int | 1 | Index of the in silico structure to be generated as simulation input. |
genome | str | mm10 | Genome assembly of the input data. Currently supported assemblies: hg19, hg38, mm10, GRCm39. |
chrom | str | chr1 | Chromosome to be modeled. The input files are going to be filtered for intra-chromosomal contacts within this chromosome. |
pdf_report | bool | True | Whether to save the simulation diagnostics in a pdf format. |
remove_problematic | bool | False | A flag indicating whether at each resolution round problematic contacts that the simulation was unable to resolve, should be removed. |
platform | str | OpenCL | Available platoforms: CPU, CUDA and OpenCL. |
resolutions | str | 5,2 | Resolutions to be used for hierarchical modeling. Expected to be in the form of comma separated integer of float numbers in the units of Mb. |
N_steps | int | 100 | Number of simulation steps to take at every resolution |
burnin | int | 0 | Number of simulation steps before starting collecting the simulation diagnostic data |
MC_step | int | 1 | Simulation diagnostic data is going to be collected every MC_step |
sim_step | int | 20 | The simulation step of Langevin integrator |
ev_formula | str | harmonic | Type of the Excluded Volume (EV) repulsion. Available types: harmonic |
ev_min_dist | float | 1.0 | Excluded Volume (EV) minimal distance |
ev_coef | float | 50.0 | Excluded Volume (EV) force coefficient |
ev_coef_evol | bool | False | Enable or disable the changing EV coefficient value. |
bb_formula | str | harmonic | Type of the Backbone (BB) potential. Available types: harmonic, gaussian |
bb_opt_dist | float | 1.0 | Backbone (BB) optimal distance |
bb_lin_thresh | float | 2.0 | Backbone (BB) distance after which the potential grows linearly. |
bb_coef | float | 1000.0 | Backbone (BB) force coefficient |
bb_coef_evol | bool | False | Enable or disable the changing BB coefficient value. |
sc_formula | str | harmonic | Type of the Single cell contact (SC) potential. Available types: harmonic, gaussian |
sc_opt_dist | float | 1.0 | Single cell contact (SC) optimal distance |
sc_lin_thresh | float | 2.0 | Single cell contact (SC) distance after which the potential grows linearly. |
sc_coef | float | 100.0 | Single cell contact (SC) force coefficient |
sc_coef_evol | bool | False | Enable or disable the changing SC coefficient value. |
ff_formula | str | harmonic | Type of the Frame force (FF) potential. Available types: harmonic, gaussian |
ff_opt_dist | float | 1.0 | Frame force (FF) optimal distance |
ff_lin_thresh | float | 2.0 | Frame force (FF) distance after which the potential grows linearly. |
ff_coef | float | 100.0 | Frame force (FF) force coefficient |
ff_coef_evol | bool | False | Enable or disable the changing FF coefficient value. |
The software is freely distributed under the GNU GENERAL PUBLIC LICENSE (GPL-3.0).