Benchmarking deep learning models generating molecules in 3D. The details of metrics computed, specifically the Validity3D, and results of our benchmark on 6 models are available in our manuscript on arxiv and this is the package containing all the code to benchmark your models!
We recommend to install the genbench3d environement, but here is a list of main requirements:
- Python > 3.9
- RDKit > 2022.09 (molecule handling + 2022.09 required for rdDetermineBonds)
- openbabel (for molecule protonation + second bond determination option)
- vina (for Vina docking score)
- ADFRsuite (for Vina protein preparation)
- meeko (for ligand preparation for Vina)
- mdanalysis (for protein extraction)
- PDBFixer (fix the chain identification that is duplicated in CrossDocked files)
- access to Schrodinger Glide CLI (optional)
- csd python api (for Gold PLP score, optional)
I might put it on pip/conda if I have time, but for now it is a standalone. For installation, we recommend using mamba instead of conda that does the same job but much faster! Run these commands to download genbench3d, then install in development mode using pip:
git clone https://github.com/bbaillif/genbench3d.git
cd genbench3d
mamba env create -f environment.yml
mamba activate genbench3d
pip install -e . # install genbench3d in current environment
Before using GenBench3D, you must choose a source of reference 3D data to be used by the Validity3D metric. The current options are:
- LigBoundConf PDB subset (minimized version): Publicly available, download here. Then, set its path in the data/ligboundconf_path in the config/default.yaml file
- CSD Drug: Used in our manuscript, requires access to the CSD data and CSD Python API. You can download the drug subset here
- MolList: Any RDKit molecule list you want; Just have a name for it ! See the Scripting section.
If you plan to compare generated molecules to training molecules for models benchmarked in our work, you need to download CrossDocked: The original CrossDocked v1.1 dataset can be downloaded from here (make sure you have enough space because there are a lot of files), while the processed CrossDocked dataset (extracting pockets and ligands only for RMSD < 1A) used in e.g. Pocket2Mol can be downloaded here.
You can run the convert_crossdocked_split.py script with another conda environment you own that has pytorch installed to transform the datasplit (in .pt pickle format containing torch objects) to a .p format without torch object to be run with minimal dependancies with the provided genbench3d environment.
Finally, change the paths of the different data sources and executable in the config/default.yaml (or copy/paste this file and change it accordingly)
A basic usage is given by the sb_benchmark_mols.py script. We provide an example to benchmark all metrics :
python sb_benchmark_mols.py -c config/default.yaml -i examples/pocket2mol_generated_2z3h.sdf -o results_pocket2mol_generated_2z3h.json -p test_set/BSD_ASPTE_1_130_0/2z3h_A_rec.pdb -n test_set/BSD_ASPTE_1_130_0/2z3h_A_rec_1wn6_bst_lig_tt_docked_3.sdf --do_conf_analysis
-i
is the input sdf filepath containing the molecules to benchmark.-o
is the output json filepath to store the benchmark results.-p
is the pdb structure filepath for the protein target of interest that was used for molecule generation-n
is the native ligand filepath (to locate the pocket)--do_conf_analysis
to compute all metrics (otherwise, only computes the structure-based metrics)
By default, the -s
(reference source) argument is set to ligboundconf
, which is the publicly available. You can set to csd_drug
if you have CSD access and the CSD Python API.
By default, only Vina is computed. To compute Glide scores, add the --glide
argument. To compute Gold PLP scores, add the --gold
argument.
To compute the structure-based metrics only on 3D-valid molecules, add the --valid_only
argument (requires the --do_conf_analysis
argument).
To perform the analysis on relaxed (local pocket-ligand minimization), add the -m
argument.
A simpler version without structure-based metrics is given in benchmark_mols.py script. We provide an example:
python benchmark_mols.py -c config/default.yaml -i examples/pocket2mol_generated_2z3h.sdf -o results_pocket2mol_generated_2z3h.json
Here again you can use the -m
argument, but if that's the case don't forget to include the -p
(pdb_structure) and -n
(native ligand)
Use the -h
argument for a recap of all possible arguments. If you have any question or find some bugs, don't hesitate to open a GitHub Issue!
You can download the data used in the manuscript on figshare
I produced the paper results using the benchmark.py, benchmark_baselines.py and structure_based_benchmark.py scripts. I created classes to handle the retrieval of generated molecules (and minimization) for each model, and benchmarked for each target of the 100 targets in the CrossDocked test set (minus 24 targets for which ResGen generated no molecules). All targets in test_set/ligand_filenames.txt, and the actual subset used for benchmarking in test_set/ligand_filenames_subset.txt, the latter file was generated with python get_test_subset.py
.
Analysis of the results was done with benchmark.ipynb, structure_based_benchmark.ipynb and pattern_distributions.ipynb (not the cleanest notebooks, but you should be able to run it with the genbench3d environement).
The first step is to load the configuration file to indicate the working directories and default parameters.
import yaml
config_path = 'config/default.yaml' # change path accordingly
config = yaml.safe_load(open(args.config_path, 'r'))
The second step is to setup a reference geometry from a data source. The CSDDrug source was used in our manuscript, but you can define any reference from a list of molecules or sdf file. We provide an example for LigBoundConf, a public subset of the PDB with high quality structural data.
from genbench3d.data.source import SDFSource, MolListSource
from genbench3d.geometry import ReferenceGeometry
# The values and kernel densities used to compute the Validity3D are
# saved in files for fast processing, the name allows to retrive them
ligboundconf_name = 'LigBoundConf'
ligboundconf_path = config['data']['ligboundconf_path'] # Set accordingly
source = SDFSource(ligands_path=ligboundconf_path,
name=ligboundconf_name)
# Or using a mol list
source_mol_list = Chem.SDMolSupplier(ligboundconf_path, removeHs=False)
source = MolListSource(mol_list=source_mol_list,
name=ligboundconf_name)
reference_geometry = ReferenceGeometry(source=source,
root=config['benchmark_dirpath'],
minimum_pattern_values=config['genbench3d']['minimum_pattern_values'],)
The root argument is used to save the values and kernel densities for the extracted reference geometry, and the minimum pattern values is the number of values required for a pattern to have its kernel density computed (if lower than 50, the default behaviour during validity3D evaluation is to simplify the pattern, and if simplified, to consider the geometry as valid in the absence of sufficient data)
The main usage is to compute all metrics from a list of RDKit molecules mol_list
:
from genbench3d import GenBench3D
# mol_list is a list of RDKit molecules
benchmark = GenBench3D(reference_geometry=reference_geometry,
config=config['genbench3d'])
results = benchmark.get_results_for_mol_list(mol_list)
Under the hood, the GenBench3D is transforming the molecule list (generated molecules or training molecules) into a ConfEnsembleLibrary
, a structure that groups the conformations of the same molecule (i.e. molecule topological graph and stereochemistry) into unique ConfEnsemble
(wrapper around a single RDKit molecule having multiple Conformer), under a default name that is the SMILES representation
from conf_ensemble import ConfEnsembleLibrary
cel = ConfEnsembleLibrary.from_mol_list(mol_list)
cel.library
is a dictionary in the form {SMILES (or other specified name): conf_ensemble}
conf_ensemble.mol
gives the RDKit molecule containing each listed conformation of that molecule in the original mol_list
To compute metrics that depends on the training molecules (e.g. training similarity, novelty2D and 3D), you need to set the training mols with the corresponding method before running the get_results:
# training_mols is a list of RDKit molecules corresponding to training ligands
training_cel = ConfEnsembleLibrary.from_mol_list(training_mols)
benchmark.set_training_cel(training_cel)
When dealing with pocket-based generative models, you can use the SBGenBench3D
version to add metrics related to activity. This new object needs the pocket (extracted from the protonated = "clean" protein produced during Vina processing) and native ligands (as native_ligand
RDKit molecule) as input.
You can setup the Vina calculation by creating a VinaProtein
object: this legacy class preprocess the raw pdb files from CrossDocked (select protein only, add hydrogens) and generate pdbqt files for Vina input.
You can also setup the Glide calculation with a GlideProtein
or a Gold calculation with a VinaProtein
. The benchmark is setup as followed
from genbench3d import SBGenBench3D
from genbench3d.data.structure import VinaProtein, Pocket, GlideProtein
# native_ligand is a RDKit molecule
# original_structure_path is a string pdb path to the initial protein file
vina_protein = VinaProtein(pdb_filepath=original_structure_path,
prepare_receptor_bin_path=config['bin']['prepare_receptor_bin_path'],)
glide_protein = GlideProtein(pdb_filepath=vina_protein.protein_clean_filepath,
native_ligand=native_ligand,
glide_output_dirpath=config['glide_working_dir'],
glide_path=config['bin']['glide_path'],
structconvert_path=config['bin']['structconvert_path'],)
pocket = Pocket(pdb_filepath=vina_protein.protein_clean_filepath,
native_ligand=native_ligand,
distance_from_ligand=config['pocket_distance_from_ligand'])
sb_benchmark = SBGenBench3D(reference_geometry=reference_geometry,
config=config['genbench3D'],
pocket=pocket,
native_ligand=native_ligand)
sb_benchmark.setup_vina(vina_protein,
config['vina'],
add_minimized=True)
sb_benchmark.setup_glide(glide_protein,
glide_path=config['bin']['glide_path'],
add_minimized=True)
sb_benchmark.setup_gold_plp(vina_protein)
sb_benchmark.set_training_cel(training_cel)
results = sb_benchmark.get_results_for_mol_list(mols=gen_mols,
n_total_mols=n_total_mols)
Additionnally, you can relax molecules inside the binding pocket (i.e. local energy minimization using MMFF94s forcefield, keeping protein atom fixed and allowing up to 1 Angstrom ligand heavy atom deviation, maximum 1000 steps) with the ComplexMinimizer:
from genbench3d.data import ComplexMinimizer
# mol is a RDKit molecule with a single conformer
complex_minimizer = ComplexMinimizer(pocket,
config=config['minimization'])
mini_mol = complex_minimizer.minimize_ligand(mol)
Aggregated compiled metrics can be computed for the benchmark results:
import numpy as np
import pandas as pd
summary = {}
for metric_name, values in results.items():
if isinstance(values, dict): # e.g. Ring proportion
for key, value in values.items():
summary[metric_name + str(key)] = value
elif isinstance(values, list):
summary[metric_name] = np.nanmedian(values) # values can have nan
else: # float or int
summary[metric_name] = values
print(summary)
Can be found in the config/default.yaml file:
Parameter | Definition |
---|---|
minimum_pattern_values | Minimum number of values for a geometric pattern to define a Kernel Density for the q-value (i.e. normalized likelihood) computation |
tfd_threshold | Minimum value for Torsion Fingerprint Deviation between 2 conformers to consider them different (for Uniqueness3D and Novelty3D definitions) |
q_value_threshold | Minimum q-value for a value for a geometric pattern to be considered as valid |
steric_clash_safety_ratio | The minimal distance between two atoms (as computed using vdW radii) is multiplied by this safety ratio before clash detection (clash = distance < minimum) |
maximum_ring_plane_distance | Minimum perpendicular distance for an aromatic ring atom to the plane (computed with SVG) |
consider_hydrogens | Consider hydrogens in the geometric patterns. Default to False, as most methods don't generate hydrogen atom positions, and these can be easily predicted with RDKit for instance |
include_torsions_in_validity3D | Whether to include torsions in the validity3D (q-value aggregation). Default to False, as the profile of torsions between the reference and generated molecules (adapted to a given target) is different |
add_minimized_docking_scores | Whether to include molecule adjustement for docking score (Vina and Glide) computation |
overwrite_results | Whether to overwrite the results (for the benchmark.py and structure_based_benchmark.py to reproduce paper results) |
Metric | Definition |
---|---|
Validity2D | Fraction of molecules parsed by RDKit (the n_total_mols can be setup manually in all get_metrics functions) |
Uniqueness2D | Fraction of unique molecules (based on SMILES with stereochemistry embeded) |
Novelty2D | Fraction of molecules with a stereo-SMILES not in training molecules (if input) |
Diversity2D | Average Tanimoto dissimilarity (1 - similarity) of Morgan Fingerprints radius 3 with stereochemistry considered between all generated molecules |
Ring size proportion | Distribution of the proportion of observed ring sizes |
Molecular weight (MW) | Self-explanatory |
logP | Using RDKit |
SAScore | Using RDKit implementation of SAScore |
Quantitative Estimate of Drug-likeness (QED) | Using RDKit implementation of QED |
Metric | Definition |
---|---|
Validity3D | Fraction of molecular conformations with valid bond lengths and valence angles, based on kernel density estimations from values observed in the CSD Drug Subset, see Validity3D class for details, flat aromatic rings and no intramolecular steric clash |
Uniqueness3D | Fraction of (3D-valid by default) unique conformations based on Torsion Fingerprint Deviation (TFD) with a default threshold of 0.2. |
Novelty3D | Fraction of (3D-valid by default) conformations different from those observed in the training set (only on the set of common molecules between training and generated) |
Diversity3D | Average interconformation deviation computed using the TFD (on 3D-valid by default) |
MMFF94s strain energy | Using up to 1000 minimization step, computed using the MMFF94s RDKit implementation |
Metric | Definition |
---|---|
Steric clash | Fraction of molecules clashing with the protein |
Out of pocket | Fraction of molecules having a centroid 10 Angstrom away from the native ligand centroid |
Absolute Vina score | Using Vina Python package |
Vina score relative to test ligand | Using Vina Python package |
Absolute Gold PLP score | Using CSD Python API, CCDC tools required |
Gold PLP score relative to test ligand | Using CSD Python API, CCDC tools required |
Absolute Glide score | Requires Schrodinger Glide command line interface |
Glide score relative to test ligand | Requires Schrodinger Glide command line interface |
Earlier versions of GenBench3D used the ESPSIM and Interaction FingerPrints (IFP), that are commented out in the main code, we decided to put the emphasis on scoring functions
You have different ways of inputing molecules in the benchmark. If the model generated xyz coordinates in an ASE db (i.e. GSchNet) you can use the ASEDBReader
ase_db_path = "your favorite path"
ase_reader = ASEDBReader(ase_db_path)
This reader is directly embedded in the benchmark:
benchmark = GenBench3D(reference_geometry=reference_geometry,
config=config['genbench3d'])
metrics = benchmark.get_metrics_for_ase_db(filepath=ase_db_path,
cel_name='generated_molecules')