From 80aab0d3aa3b847288620e8a9fb08a5ee57390b9 Mon Sep 17 00:00:00 2001 From: Sander Vandenhaute Date: Wed, 24 Jul 2024 16:30:23 -0400 Subject: [PATCH] more docs --- docs/data.md | 59 +++++++++++++---------- docs/hamiltonian.md | 114 ++++++++++++++++++++++++++++++++------------ docs/index.md | 4 +- mkdocs.yml | 10 ++-- 4 files changed, 123 insertions(+), 64 deletions(-) diff --git a/docs/data.md b/docs/data.md index a852f8a..6368c82 100644 --- a/docs/data.md +++ b/docs/data.md @@ -72,34 +72,40 @@ simulation engines (i-PI, GROMACS, OpenMM, to name a few) require that the start configuration of the system is given in its canonical orientation. -To understand what is meant by 'generating data in the future', consider the following +## representing an atomic structure **from the future** +Consider the following example: imagine that we have a trajectory of atomic geometries, and we wish to -minimize each of their potential energies and inspect the final optimized geometry for -each state in the trajectory. In pseudo-code, this would look something like this: +minimize each of their potential energies and inspect the result (for example to find the +global minimum). +In pseudo-code, this would look something like this: + +!!! note "pseudocode" + ```py + minima = [] # list which tracks result of optimization for each snapshot in the trajectory + for state in trajectory: + local_minimum = geometry_optimization(state) + minima.append(local_minimum) + global_minimum(*minima) + ``` -``` -for state in trajectory: - final = geometry_optimization(state) - detect_minimum(final) - -``` In "normal", _synchronous_ execution, when entering the first iteration of the loop, Python would start executing the first geometry optimization right away and *wait* for it complete, before moving on to the next iteration. This makes little sense, since we know in advance -that each of the optimizations is in fact independent. As such, the loop can in fact be completed -much quicker if we would simply execute each optimization in parallel (as opposed to -serial). +that each of the optimizations is independent. As such, the loop can in fact be completed +much quicker if we would simply execute each optimization in parallel. +This is referred to as _asynchronous_ execution because the execution of any optimization +will now be performed independent from another. The intended way to achieve this in Python is by using the built-in [_concurrent_](https://docs.python.org/3/library/concurrent.futures.html) library, and it provides the foundation of psiflow's efficiency and scalability. Aside from _asynchronous_ execution, we also want _remote_ execution. Geometry optimizations, molecular dynamics simulations, ML training, quantum chemistry -calculations, ... are rarely ever performed on a local laptop. -Instead, they should ideally be forwarded towards e.g. a SLURM/PBS(Pro) scheduler or an +calculations, ... are rarely ever performed on a local workstation. +Instead, they should ideally be forwarded to e.g. a SLURM/PBS(Pro)/SGE scheduler or an AWS/GCP instance. To achieve this, psiflow is built with [Parsl](https://github.com/parsl/parsl): a DoE-funded Python package which -extends the native _concurrent_ library with +extends the native `python.concurrent` library with the ability to offload execution towards remote compute resources. Parsl (and `python.concurrent`) is founded on two concepts: apps and futures. In their simplest @@ -126,6 +132,8 @@ print(sum_future.result()) # now compute the actual result; this will print ``` The return value of Parsl apps is not the actual result (in this case, an integer), but an AppFuture that will store the result of the function evaluation after it has completed. +By explicitly calling `.result()` on the future, we block the main code execution +until the sum is actually computed, and literally ask for the result. For more information, check out the [Parsl documentation](https://parsl.readthedocs.io/en/stable/). In our geometry optimization example from before, we would implement the function @@ -135,17 +143,20 @@ Importantly, when organized in this way, Python will go through the loop almost instantaneously, observe that we want to perform a number of `geometry_optimization` calculations, and offload those calculations separately, independently, and immediately to whatever compute resource it has available. As such, all optimizations will effectively run in parallel: -``` -for state in trajectory: - final_future = geometry_optimization_app(state) # completes instantaneously! - detect_minimum_app(final_future) # completes instantaneously! - type(final_future) # AppFuture -# all geometry optimizations are running +!!! note "pseudocode" + ```py + minima = [] # list which tracks **futures** of the optimizations + for state in trajectory: + local_minimum = geometry_optimization(state) # not an actual geometry, but a *future* + minima.append(local_minimum) -print(final_future.result()) # print the obtained `Geometry` from the last loop + global_minimum = find_lowest(*minima) # not an actual geometry, but a *future* -``` + # all optimizations are running at this point in the code. + + global_minimum.result() # will wait until all optimizations are actually completed + ``` ## representing multiple structures @@ -206,5 +217,5 @@ print(energies.result().shape) # energies is an AppFuture, so we forces = train.get('forces') # get the forces of all geometries in the training set print(forces.result().shape) # forces is an AppFuture, so we need to call .result() -# (n, 3) +# (n, natoms, 3) ``` diff --git a/docs/hamiltonian.md b/docs/hamiltonian.md index 7991907..c89098b 100644 --- a/docs/hamiltonian.md +++ b/docs/hamiltonian.md @@ -1,62 +1,114 @@ -In Born-Oppenheimer-based molecular simulation, atomic nuclei are treated as classical particles that are subject to *effective* interactions, which are determined by the quantum mechanical behavior of the electrons. -These interactions determine the interatomic forces which are used in a dynamic simulation to propagate the atomic positions from one timestep to the next. -In more advanced schemes, researchers may modify these effective interactions to include biasing forces (e.g. in order to induce a phase transition), or perform an alchemical transformation between two potential energy surfaces (when computing relative free energies). - -The ability to combine various energy contributions in an arbitrary manner allows for a very natural definition of many algorithms in computational statistical physics. -To accomodate for all these use cases, psiflow provides a simple abstraction for *"a function which accepts an atomic geometry and returns energies and forces"*: the `Hamiltonian` class. -Examples of Hamiltonians are a specific ML potential, a bias potential on a collective variable, or a quadratic approximation to a potential energy minimum. - -By far the simplest hamiltonian is the Einstein crystal, which binds atoms to a certain reference position using harmonic springs with a single, fixed force constant. +In Born-Oppenheimer-based molecular simulation, atomic nuclei are treated as classical +particles that are subject to *effective* interactions -- these are the result of the quantum +mechanical behavior of the electrons. These interactions determine the interatomic forces +which are used in a dynamic simulation to propagate the atomic positions from one timestep +to the next. +Traditionally, dynamic simulations required an explicit evaluation of these effective +forces in terms of a quantum mechanical calculation (e.g. DFT(B)). +Recently, it became clear that it is much more efficient to perform such simulations +using a machine-learned representation of the interaction energy, i.e. an ML potential. +The development and application of ML potentials throughout large simulation workflows is in +fact one of the core applications of psiflow. + +The `Hamiltonian` class is used to represent any type of interaction potential. +Examples are pre-trained, 'universal' models (e.g. [MACE-MP0](https://arxiv.org/abs/2401.00096)), +ML potentials trained within psiflow (see [ML potentials](model.md)), or a quadratic +(hessian-based) approximation to a local energy minimum, to name a few. +In addition, various sampling schemes employ bias potentials which are superimposed on the +QM-based Born-Oppenheimer surface in order to drive the system +along specific reaction coordinates (e.g. metadynamics, umbrella sampling). +Such bias potentials are also instances of a `Hamiltonian`. + +By far the simplest hamiltonian is the Einstein crystal, which binds atoms to a certain +reference position using harmonic springs with a single, fixed force constant. ```py from psiflow.geometry import Geometry from psiflow.hamiltonians import EinsteinCrystal +# isolated H2 molecule geometry = Geometry.from_string(''' 2 H 0.0 0.0 0.0 H 0.0 0.0 0.8 ''') -einstein = EinsteinCrystal( - reference_geometry=geometry.positions, # positions at which all springs are at rest - force_constant=0.1, # force constant, in eV / A**2 - ) +einstein = EinsteinCrystal(geometry, force_constant=0.1) # in eV/A**2 ``` -As mentioned earlier, the key feature of hamiltonians is that they take as input an atomic geometry, and spit out an energy, a set of forces, and optionally also virial stress. -Because hamiltonians might require specialized resources for their evaluation (e.g. an ML potential which gets executed on a GPU), evaluation of a hamiltonian does not necessarily happen instantly (e.g. if a GPU node is not immediately available). Similar to how `Dataset` instances return futures of a `Geometry` when a particular index is queried, hamiltonians return a future when asked to evaluate the energy/forces/stress of a particular `Geometry`: +As mentioned earlier, the key feature of hamiltonians is that they represent an interaction energy between atoms, +i.e. they output an energy (and its gradients) when given a geometry as input. +Because hamiltonians might require specialized resources for their evaluation (e.g. an ML +potential which gets executed on a GPU), evaluation of a hamiltonian does not necessarily +happen instantly (e.g. if a GPU node is not immediately available). Similar to how +`Dataset` instances return futures of a `Geometry` when a particular index is queried, +hamiltonians return a future when asked to evaluate the energy/forces/stress of a +particular `Geometry`: ```py -future = einstein.evaluate(geometry) # returns an AppFuture of the Geometry; evaluates instantly -evaluated = future.result() # calling result makes us wait for it to actually complete +energy = einstein.compute(geometry, 'energy') # AppFuture of an energy (np.ndarray with shape (1,)) +print(energy.result()) # wait for the result to complete, and print it (in eV) + + +data = Dataset.load('snapshots.xyz') # N snapshots +energy, forces, stress = einstein.compute(data) # returns energy and gradients for each snapshot in data + -assert evaluated.energy is not None # the energy of the hamiltonian -assert not np.any(np.isnan(evaluated.per_atom.forces)) # a (N, 3) array with forces +assert energy.result().shape == (N,) # one energy per snapshot +assert forces.result().shape == (N, max_natoms, 3) # forces for each snapshot, with padded natoms +assert stress.result().shape == (N, 3, 3) # stress; filled with NaNs if not applicable ``` -One of the most commonly used hamiltonians will be that of MACE, one of the most ubiquitous ML potentials. -There exist reasonably accurate pretrained models which can be used for exploratory purposes. +An particularly important hamiltonian is MACE, one of the most ubiquitous ML potentials. These are readily available in psiflow: ```py -from psiflow.hamiltonians import get_mace_mp0 +from psiflow.hamiltonians import MACEHamiltonian -mace = get_mace_mp0() # downloads MACE-MP0 from github -future = mace.evaluate(geometry) # evaluates the MACE potential on the geometry +mace = MACEHamiltonian.mace_mp0() # downloads MACE-MP0 from github +forces = mace.compute(geometry, 'forces') # evaluates the MACE potential on the geometry + +forces = forces.result() # wait for evaluation to complete and get actual value + +assert np.sum(np.dot(forces[0], forces[1])) < 0 # forces in H2 always point opposite of each other + +assert np.allclose(np.sum(forces, axis=0), 0.0) # forces are conservative --> sum to [0, 0, 0] +``` +A unique feature of psiflow `Hamiltonian` instances is the ability to create a new +hamiltonian from a linear combination of two or more existing hamiltonians. +Let us consider the particular example of [umbrella +sampling](https://wires.onlinelibrary.wiley.com/doi/10.1002/wcms.66) using +[PLUMED](https://www.plumed.org/): + +```py +from psiflow.hamiltonians import PlumedHamiltonian -evaluated = future.result() -forces = evaluated.per_atom.forces # forces on each atom, in float32 +plumed_str = "" +bias = PlumedHamiltonian() -assert np.sum(np.dot(forces[0], forces[1])) < 0 # forces in H2 always point opposite of each other -assert np.allclose(np.sum(forces), 0.0) # forces are conservative --> sum to zero ``` -As alluded to earlier, hamiltonians can be combined in arbitrary ways to create new hamiltonians. -Psiflow supports a concise syntax for basic arithmetic operations on hamiltonians, such as -multiplication by a scalar or addition of two hamiltonians: + +$$ +H = \alpha H_0 + (1 - \alpha) H_1 +$$ + +is very straightforward to express in code: + + +This allows for a very natural definition of many algorithms in computational statistical physics +(e.g. Hamiltonian replica exchange, thermodynamic integration, biased dynamics). ```py +from psiflow.hamiltonians import PlumedHamiltonian + + +plumed_input = """ + +""" +bias = + + data = Dataset.load('train.xyz') mix = 0.5 * einstein + 0.5 * mace # MixtureHamiltonian with E = 0.5 * E_einstein + 0.5 * E_mace energies_mix = mix.evaluate(data).get('energy') diff --git a/docs/index.md b/docs/index.md index 678077e..7b07bc7 100644 --- a/docs/index.md +++ b/docs/index.md @@ -9,10 +9,10 @@ hide: Psiflow is a scalable molecular simulation engine for chemistry and materials science applications. It supports: -- **quantum mechanical calculations** at various levels of theory (GGA and hybrid DFT, post-HF methods such as MP2 or RPA, and even coupled cluster; using CP2K|GPAW|ORCA) +- **quantum mechanical calculations** at various levels of theory (GGA and hybrid DFT, post-HF methods such as MP2 or RPA, and even coupled cluster; using CP2K | GPAW | ORCA) - **trainable interaction potentials** as well as easy-to-use universal potentials, e.g. [MACE-MP0](https://arxiv.org/abs/2401.00096) -- a wide range of **sampling algorithms**: NVE|NVT|NPT, path-integral molecular dynamics, alchemical replica exchange, metadynamics, phonon-based sampling, ... (thanks to [i-PI](https://ipi-code.org/)) +- a wide range of **sampling algorithms**: NVE | NVT | NPT, path-integral molecular dynamics, alchemical replica exchange, metadynamics, phonon-based sampling, ... (thanks to [i-PI](https://ipi-code.org/)) Users may define arbitrarily complex workflows and execute them **automatically** on local, HPC, and/or cloud infrastructure. To achieve this, psiflow is built using [Parsl](https://parsl-project.org/): a parallel execution library which manages job submission and workload distribution. diff --git a/mkdocs.yml b/mkdocs.yml index fad069b..ff247c5 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -65,11 +65,7 @@ markdown_extensions: # - javascripts/mathjax.js # - https://polyfill.io/v3/polyfill.min.js?features=es6 # - https://unpkg.com/mathjax@3/es5/tex-mml-chtml.js - +# extra_javascript: - - javascripts/katex.js - - https://unpkg.com/katex@0/dist/katex.min.js - - https://unpkg.com/katex@0/dist/contrib/auto-render.min.js - -extra_css: - - https://unpkg.com/katex@0/dist/katex.min.css + - javascripts/mathjax.js + - https://unpkg.com/mathjax@3/es5/tex-mml-chtml.js