Skip to content
/ RDesign Public

The official implementation of the ICLR'24 paper RDesign: Hierarchical Data-efficient Representation Learning for Tertiary Structure-based RNA Design.

Notifications You must be signed in to change notification settings

A4Bio/RDesign

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RDesign: Hierarchical Data-efficient Representation Learning for Tertiary Structure-based RNA Design

GitHub stars GitHub forks

Introduction

While artificial intelligence has made remarkable strides in revealing the relationship between biological macromolecules' primary sequence and tertiary structure, designing RNA sequences based on specified tertiary structures remains challenging. Though existing approaches in protein design have thoroughly explored structure-to-sequence dependencies in proteins, RNA design still confronts difficulties due to structural complexity and data scarcity.

In this study, we aim to systematically construct a data-driven RNA design pipeline. We crafted a large, well-curated benchmark dataset and designed a comprehensive structural modeling approach to represent the complex RNA tertiary structure. More importantly, we proposed a hierarchical data-efficient representation learning framework that learns structural representations through contrastive learning at both cluster-level and sample-level to fully leverage the limited data. Extensive experiments demonstrate the effectiveness of our proposed method, providing a reliable baseline for future RNA design tasks.

Dataset

We carefully collected representative RNA tertiary structure data from two sources, RNAsolo and the Protein Data Bank (PDB). The refined data has been released here. Please download the datasets and organize them as follows.

RDesign
├── API
├── assets
├── checkpoints
├── methods
├── model
└── data
    ├── RNAsolo
    │   ├── train_data.pt
    │   ├── val_data.pt
    │   ├── test_data.pt

Citation

If you are interested in our repository and our paper, please cite the following paper:

@misc{tan2023hierarchical,
    title={Hierarchical Data-efficient Representation Learning for Tertiary Structure-based RNA Design}, 
    author={Cheng Tan and Yijie Zhang and Zhangyang Gao and Hanqun Cao and Stan Z. Li},
    year={2023},
    eprint={2301.10774},
    archivePrefix={arXiv},
    primaryClass={q-bio.BM}
}

Feedback

If you have any issue about this work, please feel free to contact me by email:

About

The official implementation of the ICLR'24 paper RDesign: Hierarchical Data-efficient Representation Learning for Tertiary Structure-based RNA Design.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages