Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove global state #60

Open
jngrad opened this issue May 17, 2024 · 1 comment
Open

Remove global state #60

jngrad opened this issue May 17, 2024 · 1 comment

Comments

@jngrad
Copy link
Member

jngrad commented May 17, 2024

Global variables are notorious for introducing race conditions, which can cause bugs that are hard to reproduce and investigate. They also have shared ownership with multiple objects, thus client code cannot make assumptions about their state.

To illustrate pitfalls from global variables, consider the following example:

class pymbe_library():
    import pint
    units = pint.UnitRegistry()
    SEED = None
    def __init__(self, SEED):
        self.SEED = SEED
        unit_length = 0.355 * self.units.nm
        self.units.define(f"reduced_length = {unit_length}")

This is a simplified version of the pyMBE class that was available up to and including d10d172. The class members units and SEED are bound to the class, not to class instances. You can access them from outside with print(pymbe_library.SEED).

Here is a MWE:

pmb1 = pymbe_library(SEED=42)
pmb2 = pymbe_library(SEED=43)
print(f"{pmb1.SEED=}")
print(f"{pmb2.SEED=}")
print(f"{pmb1.units=}")
print(f"{pmb2.units=}")
print(id(pmb1.units) == id(pmb2.units))

Output:

pmb1.SEED=42
pmb2.SEED=43
pmb1.units=<pint.registry.UnitRegistry object at 0x7235c663fa0>
pmb2.units=<pint.registry.UnitRegistry object at 0x7235c663fa0>
True

The SEED members are different, but the unit members are shared between the two objects (same memory address). These members differ in behavior because integers are immutable, while Pint objects are mutable (quick refresher). The rules for an object's mutability and scope can be a bit difficult to grasp, and while Python developers should know about them, we cannot expect them to know how the pyMBE class constructor uses these rules internally to manage the lifetime and ownership of its class members. I would advocate for the principle of least surprise by making these class members fully owned by the class instance, i.e. the constructed object, rather than by the class itself.

This can be achieved without any API change with the following rewrite:

import pint
class pymbe_library():
    def __init__(self, SEED):
        self.SEED = SEED
        self.units = pint.UnitRegistry()
        unit_length = 0.355 * self.units.nm
        self.units.define(f"reduced_length = {unit_length}")

Output of the MWE shows that each object has its own Pint unit registry:

pmb1.SEED=42
pmb2.SEED=43
pmb1.units=<pint.registry.UnitRegistry object at 0x106e81cd03a0>
pmb2.units=<pint.registry.UnitRegistry object at 0x106e81ca7790>
False

The same issue applies to the data frame. What happens if we create two class instances to load different parameter sets? The answer is not straightforward, because pandas sometimes returns copies, sometimes returns views, and sometimes allows data to be modified in-place.

Coming back to the unit system, we have another source of counter-intuitive behavior in pymbe_library.set_reduced_units():

pyMBE/pyMBE.py

Line 2511 in 37491e8

def set_reduced_units(self, unit_length=None, unit_charge=None, temperature=None, Kw=None, verbose=True):

Here is the relevant part of the method body:

pyMBE/pyMBE.py

Lines 2528 to 2544 in 37491e8

self.units=pint.UnitRegistry()
if unit_length is None:
unit_length=0.355*self.units.nm
if temperature is None:
temperature=298.15 * self.units.K
if unit_charge is None:
unit_charge=self.units.e
if Kw is None:
Kw = 1e-14
self.N_A=6.02214076e23 / self.units.mol
self.Kb=1.38064852e-23 * self.units.J / self.units.K
self.e=1.60217662e-19 *self.units.C
self.kT=temperature*self.Kb
self.Kw=Kw*self.units.mol**2 / (self.units.l**2)
self.units.define(f'reduced_energy = {self.kT} ')
self.units.define(f'reduced_length = {unit_length}')
self.units.define(f'reduced_charge = {unit_charge}')

We are creating a new unit registry. Which means we cannot pass a custom temperature as argument, because we end up multiplying two quantities whose unit belong to different registries:

import pint
units = pint.UnitRegistry()
Kb = 1.38064852e-23 * units.J / units.K
units = pint.UnitRegistry()
temperature = 298.15 * units.K
kT = temperature * Kb

Output:

Traceback (most recent call last):
ValueError: Cannot operate with Quantity and Quantity of different registries.

This exception is not necessarily raised by passing custom values to the other function arguments, but it might be raised later in the simulation script... It is simply not possible to safely pass custom values, because line 2528 guarantees that our custom values won't belong to the same registry as the one used by pyMBE.

Deleting line 2528 doesn't help, because when we call units.define(f"reduced_length = {unit_length}"), we end up re-defining the unit. I just found out the hard way how dangerous that is:

import pint
units = pint.UnitRegistry()

unit_length = 0.355 * units.nm
units.define(f"reduced_length = {unit_length}")
unit_length=units.Quantity(1,'reduced_length')
print(unit_length.to('nm'), "=", unit_length)

unit_length = 0.201 * units.nm
units.define(f"reduced_length = {unit_length}")
# units._build_cache() # <--- never forget this line when re-defining a custom unit!!!
unit_length=units.Quantity(1,'reduced_length')
print(unit_length.to('nm'), "=", unit_length)

Output:

0.355 nanometer = 1 reduced_length
0.355 nanometer = 1 reduced_length

The new reduced_length value is not used because the cache is out-of-date. See hgrecco/pint#1773 (comment) and the source code in hgrecco/[email protected]:pint/facets/plain/registry.py#L517-L521 for more details. By default it should raise a warning, but in my environment (Python 3.10.12, Pint 0.23), the warnings don't get displayed in the terminal.

It would probably be safer to let the user define their own Pint unit registry, and pass it as a units argument of the pyMBE class constructor. In this way, the correct set up of the unit registry becomes the responsibility of the user rather than ours, and we reduce the risk of making a mistake, like the one I made above when re-defining units, or the one currently in pyMBE where the temperature cannot be set, or the one that used to be in the pyMBE constructor where the energy unit was silently ignored (#55 (comment)). If the user doesn't provide a units, we define a default one.

@jngrad
Copy link
Member Author

jngrad commented May 17, 2024

Online discussion with @pm-blanco: move the unit registry creation to the class constructor, remove line 2528, and introduce units._build_cache() in set_reduced_units() to rebuild the cache when reduced units are redefined.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant