This repository is the official implementation of our paper:
Learning Invariant Molecular Representation in Latent Discrete Space
Xiang Zhuang, Qiang Zhang*, Keyan Ding, Yatao Bian, Xiao Wang, Jingsong Lv, Hongyang Chen, Huajun Chen* (* denotes correspondence)
Advances in Neural Information Processing Systems (NeurIPS) 2023
To run the code successfully, the following dependencies need to be installed:
Python 3.8
torch 1.10.1
torch_geometric 2.0.4
torch_scatter 2.0.9
torch_cluster 1.6.0
torch_sparse 0.6.13
torch_spline_conv 1.2.1
rdkit_pypi 2022.9.5
vector_quantize_pytorch 1.0.7
ogb 1.3.6
This repo is also depended on GOOD
and DrugOOD
, please follow the installation methods provided for each package:
- GOOD (Version 1.1.1)
- Repository: https://github.com/divelab/GOOD/
- Installation: Please follow the instructions provided in the repository to install.
- DrugOOD (Version 0.0.1)
- Repository: https://github.com/tencent-ailab/DrugOOD
- Installation: Please follow the instructions provided in the repository to install.
The data used in the experiments can be downloaded from the following sources:
- GOOD
- DrugOOD
- download from link.
- Extract the downloaded file and save the contents in the
drugood-data-chembl30
directory.
An example of the folder hierarchy after adding the data files:
├── data
│ ├── GOODHIV
│ ├── GOODPCBA
│ ├── GOODZINC
├── drugood-data-chembl30
│ ├── lbap_core_ec50_assay.json
│ └── ...
├── models
│ ├── model.py
│ └── ...
├── run.py
└── README.md
python run.py --dataset GOODZINC --domain scaffold --shift concept --num_e 4000 --bs 256 --gamma 0.5 --inv_w 0.01 --reg_w 0.5 --gpu 0 --exp_name ZINC --exp_id scaffold-concept
Running parameters and descriptions are as follows:
Parameter | Description | Choices |
---|---|---|
dataset | name of dataset |
GOODHIV , GOODZINC , GOODPCBA , ic50_assay , ic50_scaffold , ic50_size , ec50_assay , ec50_scaffold , ec50_size . |
domain | environment-splitting strategy |
scaffold , size . Only need to be specified for datasets in GOOD . |
shift | type of distribution shift |
covariate , concept . Only need to be specified for datasets in GOOD . |
num_e | code book size | - |
bs | batch size | - |
gamma | threshold |
- |
inv_w | - | |
reg_w | - | |
gpu | which GPU to use | - |
exp_name | experiment name | - |
exp_id | experiment ID | - |
We provide the hyperparameters for the training of each dataset in the Appendix, and provide the corresponding checkpoints in the release page.
python eval.py --dataset GOODZINC --domain scaffold --shift concept --load_path checkpoint/GOODZINC-scaffold-concept.pkl
The load_path
parameter specifies the path to load the checkpoint.
If you use or extend our work, please cite the paper as follows:
@InProceedings{zhuang2023learning,
title={Learning Invariant Molecular Representation in Latent Discrete Space},
author={Xiang Zhuang and Qiang Zhang and Keyan Ding and Yatao Bian and Xiao Wang and Jingsong Lv and Hongyang Chen and Huajun Chen},
booktile={Advances in Neural Information Processing Systems},
year={2023}
}