Loch

CUDA accelerated Grand Canonical Monte Carlo (GCMC) water sampling code. Built on top of Sire, BioSimSpace and OpenMM.

Installation

First, create a conda environment with the required dependencies:

conda create -f environment.yaml
conda activate loch

Next, clone the repository and install the package:

git clone https://github.com/openbiosim/loch
cd loch
pip install -e .

Alternatively, to install loch into an existing conda environment:

conda install -c conda-forge openbiosim loch

Or, for the development version:

conda install -c conda-forge -c openbiosim/label/dev loch

How does it work?

Instead of computing the energy change for each trial insertion/deletion with OpenMM, the calculation is performed at the reaction field (RF) level using a custom CUDA kernel, allowing multiple candidates to be evaluated simultaneously. Particle mesh Ewald (PME) is handled via the method for sampling from an approximate potential (in this case the RF potential) introduced here. Parallelisation of the insertion and deletion trials is achieved using the strategy described in this paper. loch has been designed to be modular, allowing standalone GCMC sampling, or integration with OpenMM-based molecular dynamics simulation code, e.g. as has been done in the SOMD2 free-energy perturbation engine. See our whitepaper for further technical details.

Usage

Load the molecular system of interest, e.g.:

import sire as sr

mols = sr.load_test_files("bpti.prm7", "bpti.rst7")

Create a GCMCSampler:

from loch import GCMCSampler

sampler = GCMCSampler(
    mols,
    reference = "(resnum 10 and atomname CA) or (resnum 43 and atomname CA)",
    num_attempts=10000,
    batch_size=1000,
    cutoff_type="pme",
    cutoff="10 A",
    radius="4 A",
    temperature="298 K",
    num_ghost_waters=50,
    bulk_sampling_probability=0.1,
    log_level="info",
)

Here the reference is a Sire selection string for the atoms that define the centre of geometry of the GCMC sphere. Each GCMC move consists of a total of num_attempts random insertion and deletion attempts, with batch_size number of attempts being performed in parallel. The bulk_sampling_probability controls the probability performing a bulk sampling move, i.e. performing attempts within the entire simulation box, rather than just within the GCMC sphere.

Get the GCMC system:

In order to perform a simulation we need to get back the GCMC system, which contains an additional num_ghost_waters number of ghost water molecules that are used for insertion moves.

gcmc_system = sampler.system()

Create an OpenMM context:

We can directly use the Sire dynamics interface to create an OpenMM context for us, e.g.:

d = gcmc_system.dynamics(
    integrator="langevin_middle",
    temperature="298 K",
    pressure=None,
    cutoff_type="pme",
    cutoff="10 A",
    constraint="h_bonds"
    timestep="2 fs",
)

Note

While we have used Sire to create the OpenMM context, you can also write the GCMC system to file and create the OpenMM context manually. See here for an example of how to perform an OpenMM-to-Sire roundtrip.

Note

GCMC sampling must be performed in the NVT ensemble, hence the pressure is set to None in the above example. However, bulk sampling moves can be used as an effective barostat.

In order to enable crash recovery during dynamics, we next need to bind the GCMCSampler to the Sire dynamics object. This makes sure that the water state is correctly reset in the OpenMM context when restarting from a crash:

sampler.bind_dynamics(d)

Run dynamics with GCMC sampling:

# Set the cycle frequency for saving ghost residue indices.
frame_frequency = 50

# Run 1ns of dynamics and perform GCMC moves every 1ps.
for i in range(1000):
    # Run 1ps of dynamics.
    d.run("1ps", energy_frequency="50ps", frame_frequency="50ps")

    # Perform a GCMC move.
    moves = sampler.move(d.context())

    # If we hit the frame frequency, then save the current ghost residue indices.
    if i > 0 and (i + 1) % frame_frequency == 0:
        sampler.write_ghost_residues()

    # Print the current status.
    print(
        f"Cycle {i}, N = {sampler.num_waters()}, "
        f"insertions = {sampler.num_insertions()}, "
        f"deletions = {sampler.num_deletions()}"
    )
    print(
        f"Current potential energy: {d.current_potential_energy().value():.3f} kcal/mol"
    )

# Save the trajectory.
mols = d.commit()
sr.save(mols.trajectory(), "gcmc_traj.dcd")

Note

loch is designed to be compatible with grand, so you can make use of the grand.utils module to perform post-simulation analysis, such as trajectory processing and water cluster analysis.

Calibrating the GCMC potential

The default GCMC potential parameters are calibrated for the TIP3P water model at 298 K and 1 bar. To use loch with other water models we provide a utility module for calibrating the excess chemical potential and standard volume for a given water model at a specified temperature and pressure. The excess chemical potential is computed via an alchemical decoupling simulation of a water molecule in bulk solvent, while the standard volume is computed from constant pressure simulations of bulk water.

To compute the excess chemical potential, you will first need to install the SOMD2 package into your conda environment:

conda install -c openbiosim somd2

The excess chemical potential can be computed as follows:

import sire as sr
from loch import excess_chemical_potential

mols = sr.load("water.prm7", "water.rst7")

mu_ex = excess_chemical_potential(
    mols,
    temperature="298 K",
    pressure="1 bar",
    cutoff="10 A",
    runtime="5 ns",
    num_lambda=24,
    replica_exchange=True,
)

Here water.prm7 and water.rst7 are the topology and co-ordinates files for an equilibrated bulk water system. Any file format supported by Sire can be used. The num_lambda parameter controls the number of alchemical intermediate states used in the decoupling simulation, while the replica_exchange flag enables replica exchange between the alchemical states to enhance sampling.

Once finished, mu_ex will contain the computed excess chemical potential in units kcal/mol.

Note that the simulation requires a system with CUDA support. Please set the CUDA_VISIBLE_DEVICES environment variable accordingly.

The standard volume can be computed as follows:

import sire as sr
from loch import standard_volume

mols = sr.load("water.prm7", "water.rst7")

v_std = standard_volume(
    mols,
    temperature="298 K",
    pressure="1 bar",
    cutoff="10 A",
    num_samples=1000,
    sample_interval="1 ps",
)

Once finished, v_std will contain the computed standard volume in units of Å³.

Examples

A full set of examples can be found in the examples directory.

GCMC Free Energy Perturbation

Free Energy Perturbation (FEP) with GCMC using loch is supported via the SOMD2 package.

Notes

Make sure that nvcc is in your PATH. If you require a different nvcc to that provided by conda, you can set the PYCUDA_NVCC environment variable to point to the desired nvcc binary, or use the nvcc kwarg in the GCMCSampler constructor. Depending on your setup, you may also need to install the cuda-nvvm package from conda-forge.
A future version supporting AMD GPUs via PyOpenCL is planned.
OpenMM-to-Sire roundtrip example:

from openmm.app import AmberPrmtopFile, AmberInpcrdFile, PDBFile
from loch import GCMCSampler

import parmed as pmd
import sire as sr

# Here we have an existing OpenMM system in omm_system.

# Load a PDB representation of the topology.
pdb = PDBFile("system.pdb")

# Create a ParmEd structure.
structure = pmd.openmm.load(
    pdb.topology,
    omm_system,
    xyz=pdb.positions,
)

# Save the structure to AMBER format files.
struct.save("system.prmtop")
struct.save("system.inpcrd")

# Load the system into Sire.
mols = sr.load("system.prmtop", "system.inpcrd")

# Define the GCMC reference selection. (This should be adjusted as needed.)
gcmc_reference = "(resnum 10 and atomname CA) or (resnum 43 and atomname CA)"

# Create the GCMCSampler.
sampler = GCMCSampler(
    mols,
    reference = gcmc_reference,
)

# Get the GCMC system.
gcmc_system = sampler.system()

# Save the GCMC system to AMBER format files.
sr.save(gcmc_system, ["gcmc_system.prmtop", "gcmc_system.inpcrd"])

# Load the GCMC system back into OpenMM.
prmtop = AmberPrmtopFile("gcmc_system.prmtop")
inpcrd = AmberInpcrdFile("gcmc_system.inpcrd")

Acknowledgements

We thank the Essex Lab and grand for the inspiration.
Many thanks to Gregory Ross for clarifying the parallelisation scheme described here.

Name		Name	Last commit message	Last commit date
Latest commit History 418 Commits
.github/workflows		.github/workflows
.img		.img
actions		actions
examples		examples
recipes/loch		recipes/loch
src/loch		src/loch
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
WHITEPAPER.md		WHITEPAPER.md
environment.yaml		environment.yaml
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Loch

Installation

How does it work?

Usage

Calibrating the GCMC potential

Examples

GCMC Free Energy Perturbation

Notes

Acknowledgements

About

Uh oh!

Releases 1

Packages

Contributors 2

Uh oh!

Languages

License

OpenBioSim/loch

Folders and files

Latest commit

History

Repository files navigation

Loch

Installation

How does it work?

Usage

Calibrating the GCMC potential

Examples

GCMC Free Energy Perturbation

Notes

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Uh oh!

Languages

Packages