✨ Welcome to the official repository for "GerNA-Bind: Geometric-informed RNA-ligand Binding Specificity Prediction with Deep Learning". This work is a collaborative effort by Yunpeng Xia, Jiayi Li, Chu Yi-Ting, Jiahua Rao, Chen Jing, Will Hua, Dong-Jun Yu, Xiucai Chen, and Shuangjia Zheng from Shanghai Jiaotong University.
🚀 We introduce GerNA-Bind, a geometric deep learning framework that excels in predicting RNA-ligand binding specificity by integrating multi-modal RNA-ligand representations. GerNA-Bind achieves state-of-the-art performance, successfully identifying 19 compounds binding to oncogenic MALAT1 RNA through high-throughput screening. Wet-lab validation confirmed three compounds with submicromolar affinities, showcasing its potential for advancing RNA-targeted drug discovery.
conda env create -f gernabind.yaml -y
conda activate gernabind
# Create a conda environment
conda create -y -n gernabind python=3.8
conda activate gernabind
#conda install pytorch==2.0.1 torchvision==0.15.2 pytorch-cuda=12.2 -c pytorch -c nvidia
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0
# Install other dependencies
pip install rna-fm==0.2.2
pip install ml_collections==0.1.1
pip install simtk==0.1.0
pip install openmm==8.1.1
pip install torchdrug==0.2.1
pip install torch_geometric==2.4.0
pip install equiformer-pytorch
pip install edl_pytorch==0.0.2
pip install rdkit==2023.9.5
pip install biopython==1.79
pip install pandas==1.5.3
pip install scikit-learn==1.2.2
pip install prody==2.4.1
Refer to the following guides for setting up datasets:
- RNA-Small Molecule Screening datasets: Robin and Biosensor
- RNA-Ligand Complex Structure datasets: Hariboss
We use RhoFold+ to generate RNA 3D Structure and RNAfold (version: 2.5.1) to generate RNA 2D structure.
You can process data through the following steps:
python data_utils/process_data.py --fasta example/a.fasta --smile example/mol.txt --RhoFold_path your_RhoFold_project_path --RhoFold_weight RhoFold_model_weight_path
And the processed data will be saved in ./data folder as "new_data.pkl" file.
We process the processed data with Robin & Biosensor dataset. You can download the processed data from Zenodo.
We provide the training scripts that you can train the model yourself.
python train_model.py --dataset Robin --split_method random --model_output_path Model/
Download the model weights and put into the "Model" folder, which contains the model checkpoint. You can direct run the scripts in ./Model folder to ger the model weights.
bash Model/get_weights.sh
You can use our model to screening small molecules which can binding target RNA.
python inference_affinity.py
Otherwise, you can also use our model to get RNA target binding sites prediction. You can run the file below, so that you can get the RNA_binding.csv about RNA.
python inference_binding_site.py
No Commercial use of either the model nor generated data, details to be found in license.md.
Our work builds upon EquiFormer, Evidential Deep Learning, MONN, RNA-FM, RhoFold, and TankBind. Thanks for their excellent work and open-source contributions.