Skip to content

Commit

Permalink
feat: hierachical, multi-source settings manager
Browse files Browse the repository at this point in the history
The new sub-package `datasalad.settings` provides a framework for
implementing a system where information items can be read from and
written to any number of sources, and sources are ordered to implement a
simple precedence rule.  An example of such a system is the layered Git
config setup, which system, global, local and other scopes.

This can serve as the basis for a revamped configuration manager for
DataLad.

This changeset is complete with tests and documentation.

Refs: datalad/datalad-next#397
mih committed Oct 8, 2024
1 parent 3099d87 commit a30ea9f
Showing 15 changed files with 1,443 additions and 1 deletion.
2 changes: 2 additions & 0 deletions datasalad/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
from __future__ import annotations

from datasalad._version import __version__

__all__ = [
204 changes: 204 additions & 0 deletions datasalad/settings/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,204 @@
"""Hierarchical, multi-source settings management
This module provides a framework for implementing a system where
information items can be read from and written to any number of sources.
These sources are ordered to implement a simple query precedence rule.
An example of such a system is the layered Git config setup, with
system, global, local and other scopes.
The framework is built on three main classes:
- :class:`Setting`: an individual information item
- :class:`Source`: base class for a settings provider
- :class:`Settings`: the top-level API for a multi-source settings manager
Basic usage
-----------
To establish a settings manager instance one needs to create an instance of
:class:`Settings` and supply it with any instances of sources that the manager
should consider. Importantly, the order in which the sources are declared also
represents the precedence rule for reporting. Items from sources declared first
take precedence over sources declared later.
>>> from datasalad.settings import Settings, Environment, Defaults
>>> defaults = Defaults()
>>> settings = Settings(
... {
... 'env': Environment(var_prefix='myapp_'),
... # any number of additional sources could be here
... 'defaults': defaults,
... }
... )
It often makes sense to use a dedicated instance of :class:`Defaults` (a
variant of :class:`InMemory`) as a base source. It can be populated on import
to collect all default settings of an application, and simplifies
implementations, because all possible settings are known to this instance.
>>> defaults['myconf'] = Setting('default-value')
>>> settings['myconf'].value
'default-value'
It is also possible to equip a setting with a callable that performs
type-coercion or validation:
>>> defaults['myapp_conf'] = Setting('5', coercer=int)
>>> settings['myapp_conf'].value
5
This coercer is inherited, if not overwritten, even when the value
with the highest precedence is retrieved from a different source,
which does not provide a coercer itself.
>>> # set value for `myapp_conf` in the `env` source
>>> settings.sources['env']['myapp_conf'] = Setting('123')
>>> settings['myapp_conf'].value
123
Advanced usage
--------------
The usage patterns already shown above are often all that is needed.
However, the framework is more flexible and allows for implementing
more flexible solutions.
Setting keys need not be of type ``str``, but can be any hashable type,
and need not necessarily be homogeneous across (or even within) individual
sources, as long as their are hashable
>>> defaults[(0, 1, 2)] = Settings(object)
There is support for multiple values registered under a single key, even within
a single source. The standard accessor methods (:meth:`__getitem__`, and
:meth:`~Settings.get`), however, always return a single item only. In case of
multiple available values, they return an item that is the composition of item
properties with the highest precedence. In contrast, the
:meth:`~Settings.getall`) method return all items across all sources as
a ``tuple``.
The :class:`Settings` class does not support setting values. Instead, the
desired source has to be selected explicitly via the :meth:`~Settings.sources`
method (as shown in the example above). This allows for individual sources to
offer an API and behavior that is optimally tuned for a particular source type,
rather than be constrained by a common denominator across all possible source
types. Sources are registered and selected via a unique, use case specific
identifier. This should make clear what kind of source is being written to in
application code.
It is also possible to use this framework with custom :class:`Setting`
subclasses, possibly adding properties or additional methods. The
:class:`Settings` class variable :attr:`~Settings.item_type` can take
a type that is used for returning default values.
Implement custom sources
------------------------
Custom sources can be implemented by subclassing
:class:`~datasalad.settings.Source`, and implementing methods for its
``dict``-like interface. Different (abstract) base classes are provided for
common use cases.
- :class:`~datasalad.settings.Source`
- :class:`~datasalad.settings.WritableSource`
- :class:`~datasalad.settings.WritableMultivalueSource`
- :class:`~datasalad.settings.CachingSource`
:class:`~datasalad.settings.Source` is the most basic class, suitable
for any read-only source. It requires implementing the following
private methods (see the class documentation for details):
- :meth:`~datasalad.settings.Source._reinit`
- :meth:`~datasalad.settings.Source._load`
- :meth:`~datasalad.settings.Source._get_item`
- :meth:`~datasalad.settings.Source._get_keys`
:class:`~datasalad.settings.WritableSource` extends the interface
with methods for modification of a writable source, and requires
the additional implementation of:
- :meth:`~datasalad.settings.Source._set_item`
- :meth:`~datasalad.settings.Source._del_item`
The property :meth:`~datasalad.settings.WritableSource.is_writable` returns
``True`` by default. It can be reimplemented to report a particular source
instance as read-only, even if it is theoretically writable, for example due to
insufficient permissions.
:class:`~datasalad.settings.CachingSource` is a writable source implementation
with an in-memory cache. It only requires implementing
:meth:`~datasalad.settings.Source._load` when set items shall not be written to
the underlying source, but are only cached in memory. Otherwise, all standard
getters and setters need to be wrapped accordingly.
Lastly, :class:`~datasalad.settings.InMemory` is a readily usable,
"source-less" items source, which is also the basis for
:class:`~datasalad.settings.Defaults`.
Notes on type-coercion and validation
-------------------------------------
Type-coercion and validation is solely done on access of a
:class:`~datasalad.settings.setting.Setting` instance's
:attr:`~datasalad.settings.setting.Setting.value` property. There is no
on-load validation to reject invalid configuration immediately. This approach
is taken to avoid spending time on items that might never actually get
accessed.
There is also no generic on-write validation. This has to be done for each
source implementation separately and explicitly. There is no assumption of
homogeneity regarding what type and values are acceptable across sources.
API reference
-------------
.. currentmodule:: datasalad.settings
.. autosummary::
:toctree: generated
Settings
Setting
Source
WritableSource
WritableMultivalueSource
CachingSource
Environment
InMemory
Defaults
UnsetValue
"""

from __future__ import annotations

from .defaults import Defaults
from .env import Environment
from .setting import (
Setting,
UnsetValue,
)
from .settings import Settings
from .source import (
CachingSource,
InMemory,
Source,
WritableMultivalueSource,
WritableSource,
)

__all__ = [
'CachingSource',
'Defaults',
'Environment',
'InMemory',
'Setting',
'Settings',
'Source',
'WritableSource',
'WritableMultivalueSource',
'UnsetValue',
]
54 changes: 54 additions & 0 deletions datasalad/settings/defaults.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
from __future__ import annotations

import logging
from typing import (
TYPE_CHECKING,
Hashable,
)

from datasalad.settings.source import InMemory

if TYPE_CHECKING:
from datasalad.settings.setting import Setting

lgr = logging.getLogger('datasalad.settings')


class Defaults(InMemory):
"""Source for collecting implementation defaults of settings
Such defaults are not loaded from any source. Clients have to set any items
they want to see a default be known for. There would typically be only one
instance of this class, and it would then be the true source of this
information by itself.
The difference to :class:`InMemory` is minimal. It is limited
to emitting a debug-level log message when setting the value of an item
that has already been set before.
>>> from datasalad.settings import Defaults, InMemory, Setting, Settings
>>> defaults = Defaults()
>>> defaults['myswitch'] = Setting(
... 'on', coercer=lambda x: {'on': True, 'off': False}[x]
... )
>>> defaults['myswitch'].value
True
>>> settings = Settings({'overrides': InMemory(), 'defaults': defaults})
>>> settings['myswitch'].value
True
>>> settings.sources['overrides']['myswitch'] = Setting('off')
>>> settings['myswitch'].value
False
>>> settings.sources['overrides']['myswitch'] = Setting('broken')
>>> settings['myswitch'].value
Traceback (most recent call last):
KeyError: 'broken'
"""

def _set_item(self, key: Hashable, value: Setting) -> None:
if key in self:
# resetting is something that is an unusual event.
# __setitem__ does not allow for a dedicated "force" flag,
# so we leave a message at least
lgr.debug('Resetting %r default', key)
super()._set_item(key, value)
167 changes: 167 additions & 0 deletions datasalad/settings/env.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
from __future__ import annotations

import logging
from os import (
environ,
)
from os import (
name as os_name,
)
from typing import (
TYPE_CHECKING,
Hashable,
)

from datasalad.settings.setting import Setting
from datasalad.settings.source import WritableSource

if TYPE_CHECKING:
from collections.abc import Collection

lgr = logging.getLogger('datasalad.settings')


class Environment(WritableSource):
"""Process environment source
This is a stateless source implementation that gets and sets items directly
in the process environment.
Environment variables to be read can be filtered by declaring a name
prefix. More complex filter rules can be implemented by replacing the
:meth:`include_var()` method in a subclass.
It is possible to transform an environment variable name to a setting key
(and vice versa), by implementing the methods
:meth:`get_key_from_varname()` and :meth:`get_varname_from_key()`.
.. attention::
Due to peculiarities of the behavior of Python's ``os.environ`` on the
windows platform (and ``os2``), all variable names are converted to
upper case, and are effectively treated as case-insensitive, on that
platform. For this default implementation this implies that the
:meth:`~Environment.keys` method can only ever return uppercase keys.
Reimplement :meth:`~Environment.get_key_from_varname` to change this.
Retrieving a value for an individual key will nevertheless work for the
default implementation even with a lowercase or mixed case key.
"""

def __init__(
self,
*,
var_prefix: str | None = None,
):
super().__init__()
self._var_prefix = (
var_prefix.upper()
if var_prefix is not None and os_name in ('os2', 'nt')
else var_prefix
)

def _reinit(self):
"""Does nothing"""

def _load(self) -> None:
"""Does nothing
All accessors inspect the process environment directly.
"""

def _get_item(self, key: Hashable) -> Setting:
return Setting(value=environ[self.get_varname_from_key(key)])

def _set_item(self, key: Hashable, value: Setting) -> None:
name = self.get_varname_from_key(key)
environ[name] = str(value.value)

def _del_item(self, key: Hashable) -> None:
name = self.get_varname_from_key(key)
del environ[name]

def _get_keys(self) -> Collection:
"""Returns all keys that can be determined from the environment
.. attention::
Due to peculiarities of the behavior of Python's ``os.environ`` on
the windows platform (and ``os2``), this method can only report
uppercase keys with the default implementation. Reimplement
:meth:`get_key_from_varname()` to modify this behavior.
"""
varmap = {
k: self.get_key_from_varname(k)
for k, v in environ.items()
if self.include_var(name=k, value=v)
}
_keys = set(varmap.values())
if len(_keys) < len(varmap):
allkeys = list(varmap.values())
lgr.warning(
'Ambiguous ENV variables map on identical keys: %r',
{
key: [k for k in sorted(varmap) if varmap[k] == key]
for key in _keys
if allkeys.count(key) > 1
},
)
return _keys

def __str__(self):
return f'Environment[{self._var_prefix}]' if self._var_prefix else 'Environment'

def __contains__(self, key: Hashable) -> bool:
# we only need to reimplement this due to Python's behavior to
# forece-modify environment variable names on Windows. Only
# talking directly for environ accounts for that
return self.get_varname_from_key(key) in environ

def __repr__(self):
# TODO: list keys?
return 'Environment()'

def include_var(
self,
name: str,
value: str, # noqa: ARG002 (default implementation does not need it)
) -> bool:
"""Determine whether to source a setting from an environment variable
This default implementation tests whether the name of the variable
starts with the ``var_prefix`` given to the constructor.
Reimplement this method to perform custom tests.
"""
return name.startswith(self._var_prefix or '')

def get_key_from_varname(self, name: str) -> Hashable:
"""Transform an environment variable name to a setting key
This default implementation returns the unchanged name as a key.
Reimplement this method and ``get_varname_from_key()`` to perform
custom transformations.
"""
return name

def get_varname_from_key(self, key: Hashable) -> str:
"""Transform a setting key to an environment variable name
This default implementation only checks for illegal names and
raises a ``ValueError``. Otherwise it returns the unchanged key.
.. attention::
Due to peculiarities of the behavior of Python's ``os.environ``
on the windows platform, all variable names are converted to
upper case, and are effectively treated as case-insensitive,
on that platform.
"""
varname = str(key)
if '=' in varname or '\0' in varname:
msg = "illegal environment variable name (contains '=' or NUL)"
raise ValueError(msg)
if os_name in ('os2', 'nt'):
# https://stackoverflow.com/questions/19023238/why-python-uppercases-all-environment-variables-in-windows
return varname.upper()
return varname
124 changes: 124 additions & 0 deletions datasalad/settings/setting.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
from __future__ import annotations

from copy import copy
from typing import (
Any,
Callable,
)


class UnsetValue:
"""Placeholder type to indicate a value that has not been set"""


class Setting:
"""Representation of an individual setting"""

def __init__(
self,
value: Any | UnsetValue = UnsetValue,
*,
coercer: Callable | None = None,
lazy: bool = False,
):
"""
``value`` can be of any type. A setting instance created with
default :class:`UnsetValue` represents a setting with no known value.
The ``coercer`` is a callable that processes a setting value
on access via :attr:`value`. This callable can perform arbitrary
processing, including type conversion and validation.
If ``lazy`` is ``True``, ``value`` must be a callable that requires
no parameters. This callable will be executed each time :attr:`value`
is accessed, and its return value is passed to the ``coercer``.
"""
if lazy and not callable(value):
msg = 'callable required for lazy evaluation'
raise ValueError(msg)
self._value = value
self._coercer = coercer
self._lazy = lazy

@property
def pristine_value(self) -> Any:
"""Original, uncoerced value"""
return self._value

@property
def value(self) -> Any:
"""Value of a setting after coercion
For a lazy setting, accessing this property also triggers the
evaluation.
"""
# we ignore the type error here
# "error: "UnsetValue" not callable"
# because we rule this out in the constructor
val = self._value() if self._lazy else self._value # type: ignore [operator]
if self._coercer:
return self._coercer(val)
return val

@property
def coercer(self) -> Callable | None:
"""``coercer`` of a setting, or ``None`` if there is none"""
return self._coercer

@property
def is_lazy(self) -> bool:
"""Flag whether the setting evaluates on access"""
return self._lazy

def update(self, other: Setting) -> None:
"""Update the item from another
This replaces any ``value`` or ``coercer`` set in the other
setting. If case the other's ``value`` is :class:`UnsetValue`
no update of the ``value`` is made. Likewise, if ``coercer``
is ``None``, no update is made. Update to or from a ``lazy``
value will also update the ``lazy`` property accordingly.
"""
if other._value is not UnsetValue: # noqa: SLF001
self._value = other._value # noqa: SLF001
# we also need to synchronize the lazy eval flag
# so we can do the right thing (TM) with the
# new value
self._lazy = other._lazy # noqa: SLF001

if other._coercer: # noqa: SLF001
self._coercer = other._coercer # noqa: SLF001

def __str__(self) -> str:
# wrap the value in the classname to make clear that
# the actual object type is different from the value
return f'{self.__class__.__name__}({self._value})'

def __repr__(self) -> str:
# wrap the value in the classname to make clear that
# the actual object type is different from the value
return (
f'{self.__class__.__name__}('
f'{self._value!r}'
f', coercer={self._coercer!r}'
f', lazy={self._lazy}'
')'
)

def __eq__(self, item: object) -> bool:
"""
This default implementation of comparing for equality only compare the
types, value, and coercer of the two items. If additional criteria are
relevant for derived classes :meth:`__eq__` has to be reimplemented.
"""
if not isinstance(item, type(self)):
return False
return (
self._lazy == item._lazy
and self._value == item._value
and self._coercer == item._coercer
)

def copy(self):
"""Return a shallow copy of the instance"""
return copy(self)
147 changes: 147 additions & 0 deletions datasalad/settings/settings.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
from __future__ import annotations

from copy import copy
from itertools import chain
from types import MappingProxyType
from typing import (
TYPE_CHECKING,
Any,
Hashable,
)

from datasalad.settings.setting import Setting

if TYPE_CHECKING:
from datasalad.settings import Source


class Settings:
"""Query across different sources of settings
This class implements key parts of the standard ``dict`` interface
(with some additions).
An instance is initialized with an ordered mapping of source identifiers
to :class:`~datasalad.settings.Source` instances. The order reflects
the precedence rule with which settings and their properties are selected
for reporting across sources. Source declared earlier take precedence over
sources declared later.
When an individual setting is requested via the ``__getitem__()`` method, a
"flattened" representation of the item across all sources is determined and
returned. This is not necessarily a setting that exists in this exact form
at any source. Instead, for each setting property the value from the source
with the highest precedence is looked up and used for the return item.
In practice, this means that, for example, a ``coercer`` can come from a
lower-precedence source and the setting's ``value`` from a different
higher-precedence source.
See :meth:`~Settings.getall` for an alternative access method.
"""

item_type: type = Setting
"""Type to wrap default value in for :meth:`get()` and
:meth:`getall()`."""

def __init__(
self,
sources: dict[str, Source],
):
# we keep the sources strictly separate.
# the order here matters and represents the
# precedence rule
self._sources = sources

@property
def sources(self) -> MappingProxyType:
"""Read-only mapping of source identifiers to source instance
This property is used to select individual sources for source-specific
operations, such as writing a setting to an underlying source.
"""
return MappingProxyType(self._sources)

def __len__(self):
return len(self.keys())

def __getitem__(self, key: Hashable) -> Setting:
"""Some"""
# this will become the return item
item: Setting | None = None
# now go from the back
# - start with the first Setting class instance we get
# - update a copy of this particular instance with all information
# from sources with higher priority and flatten it across
# sources
for s in reversed(self._sources.values()):
update_item = None
try:
update_item = s[key]
except KeyError:
# source does not have it, proceed
continue
if item is None:
# in-place modification and destroy the original
# item's integrity
item = copy(update_item)
continue
# we run the update() method of the first item we ever found.
# this will practically make the type produced by the lowest
# precedence source define the behavior. This is typically
# some kind of implementation default
item.update(update_item)
if item is None:
# there was nothing
raise KeyError
return item

def __contains__(self, key: Hashable):
return any(key in s for s in self._sources.values())

def keys(self) -> set[Hashable]:
"""Returns all setting keys known across all sources"""
return set(chain.from_iterable(s.keys() for s in self._sources.values()))

def get(self, key: Hashable, default: Any = None) -> Setting:
"""Return a particular setting identified by its key, or a default
The composition of the returned setting follows the same rules
as the access via ``__getitem__``.
When the ``default`` value is not given as an instance of
:class:`~datasalad.settings.Setting`, it will be
automatically wrapped into the one given by :attr:`Settings.item_type`.
"""
try:
return self[key]
except KeyError:
return self._get_default_setting(default)

def getall(
self,
key: Hashable,
default: Any = None,
) -> tuple[Setting, ...]:
"""Returns a tuple of all known setting instances for a key across sources
If no source has any information for a given key, a length-one tuple
with a :class:`~datasalad.settings.Setting` instance for the given
``default`` value is returned.
"""
# no flattening, get all from all
items: tuple[Setting, ...] = ()
for s in reversed(self._sources.values()):
if key in s:
# we checked before, no need to handle a default here
items = (
(*items, *s.getall(key))
if hasattr(s, 'getall')
else (*items, s[key])
)
return items if items else (self._get_default_setting(default),)

def _get_default_setting(self, default: Any) -> Setting:
if isinstance(default, Setting):
return default
return self.item_type(value=default)
345 changes: 345 additions & 0 deletions datasalad/settings/source.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,345 @@
from __future__ import annotations

from abc import (
ABC,
abstractmethod,
)
from typing import (
TYPE_CHECKING,
Any,
Generator,
Hashable,
)

from datasalad.settings.setting import Setting

if TYPE_CHECKING:
from collections.abc import Collection


class Source(ABC):
"""Abstract base class a settings source.
This class offers a ``dict``-like interface. Individual settings can be
retrieved via the standard accessor methods :meth:`~Source.__getitem__`,
and :meth:`~Source.get`.
If the underlying source can represent multiple settings under a single
key, the standard accessors :meth:`Source.__getitem__` and
:meth:`Source.get` nevertheless return a single
:class:`~datasalad.settings.Setting` only. It is the decision of the
source implementation to select the most appropriate setting to return.
Such multi-value sources should reimplement :method:`Source._getall`
to provide access to all values for a given key.
A number of methods have to be implemented for any concrete source
(see their documentation for details on the expected behavior):
- :meth:`Source._reinit` (see :meth:`Source.reinit`)
- :meth:`Source._load` (see :meth:`Source.load`)
- :meth:`Source._get_keys` (see :meth:`Source.keys`)
- :meth:`Source._get_item` (see :meth:`Source.__getitem__`)
This class is in itself a suitable base for a generic read-only setting
source. For other scenarios alternative base class are also available:
- :class:`~datasalad.settings.WritableSource`
- :class:`~datasalad.settings.WritableMultivalueSource`
- :class:`~datasalad.settings.CachingSource`
- :class:`~datasalad.settings.InMemory`
"""

item_type: type = Setting
"""Type to wrap default value in for :meth:`get()`"""

def load(self) -> None:
"""Load items from the underlying source.
This default implementation calls a source's internal `_load()` method.
It is expected that after calling this method, an instance of this
source reports on items according to the latest state of the source.
No side-effects are implied. Particular implementations may
even choose to have this method be a no-op.
Importantly, calling this method does not imply a call to
:meth:`~Source.reinit`. If a from-scratch reload is desired,
:meth:`~Source.reinit` must be called explicitly.
"""
self._load()

def reinit(self) -> Source:
"""Re-initialize source instance
Re-initializing is resetting any state of the source interface instance
such that a subsequent :meth:`~Source.load` fully synchronizes the
reporting of settings with the state of the underlying source. Calling
this method does *not* imply resetting the underlying settings source
(e.g., removing all settings from the source).
This method returns ``self`` for convenient chaining of a ``load()``
call.
"""
self._reinit()
return self

def __getitem__(self, key: Hashable) -> Setting:
"""Calls a source's internal `_get_item()` method"""
return self._get_item(key)

def keys(self) -> Collection:
"""Returns all setting keys known to a source"""
return self._get_keys()

@property
def is_writable(self) -> bool:
"""Flag whether configuration item values can be set at the source
This default implementation returns ```False``.
"""
return False

def get(self, key: Hashable, default: Any = None) -> Setting:
"""Return a particular setting identified by its key, or a default
This method calls ``__getitem__``, and returns the default on
a ``KeyError`` exception.
When the ``default`` value is not given as an instance of
:class:`~datasalad.settings.Setting`, it will be
automatically wrapped into the one given by :attr:`Source.item_type`.
"""
try:
return self[key]
except KeyError:
return self._get_default_setting(default)

def getall(self, key: Hashable, default: Any = None) -> tuple[Setting, ...]:
"""Return all individual settings registered for a key
Derived classes for source that can represent multiple values for
a single key should reimplement the internal accessor :meth:`Source._getall`
appropriately.
"""
try:
return self._getall(key)
except KeyError:
return (self._get_default_setting(default),)

def _getall(self, key: Hashable) -> tuple[Setting, ...]:
"""Returns all settings for a key, or raises ``KeyError``
This default implementation returns a length-one tuple with the
return value of :meth:`~Source.get`.
"""
return (self[key],)

def __len__(self) -> int:
return len(self.keys())

def __contains__(self, key: Hashable) -> bool:
return key in self.keys()

def __iter__(self) -> Generator[Hashable]:
yield from self.keys()

def _get_default_setting(self, default: Any) -> Setting:
if isinstance(default, Setting):
return default
return self.item_type(value=default)

@abstractmethod
def _load(self) -> None:
"""Implement to load settings from the source"""

@abstractmethod
def _reinit(self) -> None:
"""Implement to reinitialize the state of the source interface"""

@abstractmethod
def _get_keys(self) -> Collection:
"""Implement to return the collection of keys for a source"""

@abstractmethod
def _get_item(self, key: Hashable) -> Setting:
"""Implement to return a single item
Or raise ``KeyError`` if there is none.
"""


class WritableSource(Source):
"""Extends ``Source`` with a setter interface
By default, the :attr:`is_writable` property of a class instance is
``True``.
"""

@property
def is_writable(self) -> bool:
"""Flag whether configuration item values can be set at the source
This default implementation returns ```True``.
"""
return True

def __setitem__(self, key: Hashable, value: Setting) -> None:
"""Assign a single (exclusive) setting to the given key"""
self._ensure_writable()
self._set_item(key, value)

def __delitem__(self, key: Hashable):
"""Remove the key and all associated information from the source"""
self._ensure_writable()
self._del_item(key)

def _ensure_writable(self):
if not self.is_writable:
msg = 'Source is (presently) not writable'
raise RuntimeError(msg)

@abstractmethod
def _set_item(self, key: Hashable, value: Setting) -> None:
"""Implement to set a value for a key in a source"""

@abstractmethod
def _del_item(self, key: Hashable):
"""Implement to remove a key form the source"""


class WritableMultivalueSource(WritableSource):
def add(self, key: Hashable, value: Setting) -> None:
"""Add a additional setting under a given key
If the key is not yet known, it will be added, and the given
setting will become the first assigned value.
"""
self._ensure_writable()
self._add(key, value)

def _add(self, key: Hashable, value: Setting) -> None:
"""Internal method to add a setting to a key
This default implementation relies on retrieving any existing setting
values and then reassigning via
:meth:`WritableMultivalueSource.setall`. Reimplement to support more
efficient approaches for a given source type.
"""
if key in self:
self.setall(key, (*self.getall(key), value))
else:
self[key] = value

def setall(self, key: Hashable, values: tuple[Setting, ...]) -> None:
"""Implement to set all given values in the underlying source"""
self._ensure_writable()
self._setall(key, values)

@abstractmethod
def _setall(self, key: Hashable, values: tuple[Setting, ...]) -> None:
""" """


class CachingSource(WritableMultivalueSource):
"""Extends ``WritableSource`` with an in-memory cache
On first access of any setting the ``reinit()`` and ``load()`` methods of a
subclass are called.
On load, an implementation can use the standard ``__setitem__()`` method of
this class directly to populate the cache. Any subsequent read access is
reported directly from this cache.
Subclasses should generally reimplement ``__setitem__()`` to call the base
class implementation in addition to setting a value in the actual source.
"""

def __init__(self) -> None:
super().__init__()
self.__items: dict[Hashable, Setting | tuple[Setting, ...]] | None = None

@property
def _items(self) -> dict[Hashable, Setting | tuple[Setting, ...]]:
if self.__items is None:
self.reinit().load()
if TYPE_CHECKING:
assert self.__items is not None
return self.__items

def _reinit(self) -> None:
# particular implementations may not use this facility,
# but it is provided as a convenience. Maybe factor
# it out into a dedicated subclass even.
self.__items = {}

def __len__(self) -> int:
return len(self._items)

def _get_item(self, key: Hashable) -> Setting:
val = self._items[key]
if isinstance(val, tuple):
return val[-1]
return val

def _set_item(self, key: Hashable, value: Setting) -> None:
self._items[key] = value

def _del_item(self, key: Hashable):
del self._items[key]

def __contains__(self, key: Hashable) -> bool:
return key in self._items

def _get_keys(self) -> Collection[Hashable]:
return self._items.keys()

def _setall(self, key: Hashable, values: tuple[Setting, ...]) -> None:
self._items[key] = values

def __repr__(self) -> str:
return f'{self.__class__.__name__}({self._items!r})'

def __str__(self) -> str:
return ''.join(
(
f'{self.__class__.__name__}(',
','.join(
# we use the pristine value here to avoid issues
# with validation/coercion failures when rendering
# sources
f'{k}=({",".join(repr(val.pristine_value) for val in v)})'
if isinstance(v, tuple)
else f'{k}={v.pristine_value!r}'
for k, v in self._items.items()
),
')',
)
)

def _getall(self, key: Hashable) -> tuple[Setting, ...]:
# ok to let KeyError bubble up
val = self._items[key]
if isinstance(val, tuple):
return val
return (val,)


class InMemory(CachingSource):
"""Extends ``CachingSource`` with a no-op ``load()`` implementation
This class provides a directly usable implementation of a setting source
that manages all settings in memory only, and does not load information
from any actual source.
"""

is_writable = True

def _load(self) -> None:
"""Does nothing
An instance of :class:`InMemory` has no underlying source
to load from.
"""

def __str__(self):
return f'{self.__class__.__name__}'
Empty file.
53 changes: 53 additions & 0 deletions datasalad/settings/tests/test_defaults.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
import logging
import sys
from os.path import dirname

from ..defaults import Defaults
from ..setting import Setting


def test_defaultsrc(caplog):
d = Defaults()
assert str(d) == 'Defaults'

# smoke test NO-OP method
d.load()

target_key = 'some.key'
orig_value = 'mike'
updated_value = 'allnew'

assert target_key not in d
assert d.get(target_key, 'default').value == 'default'
assert d.get(target_key, Setting('default2')).value == 'default2'
d[target_key] = Setting(orig_value)
assert d[target_key].value == orig_value
assert 'Resetting' not in caplog.text
with caplog.at_level(logging.DEBUG):
# we get a debug message when a default is reset
d[target_key] = Setting(updated_value)
assert 'Resetting' in caplog.text
assert d[target_key].value == updated_value
del d[target_key]
assert target_key not in d

d[target_key] = Setting(orig_value)
assert len(d) == 1
d.reinit()
assert target_key not in d
assert len(d) == 0


def test_defaultsrc_dynamic():
d = Defaults()
target_key = 'some.key'
dynset = Setting(
lambda: sys.executable,
coercer=dirname,
lazy=True,
)
assert dynset.value == dirname(sys.executable)

d[target_key] = dynset
item = d[target_key]
assert item.value == dirname(sys.executable)
142 changes: 142 additions & 0 deletions datasalad/settings/tests/test_env.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
from os import (
environ,
)
from os import (
name as os_name,
)
from typing import Hashable
from unittest.mock import patch

import pytest

from ..env import Environment
from ..setting import Setting


def test_envsrc():
assert str(Environment()) == 'Environment'
assert str(Environment(var_prefix='DATALAD_')) == 'Environment[DATALAD_]'
assert repr(Environment()) == 'Environment()'

# smoke test NO-OP methods
env = Environment()
env.reinit().load()


def test_envsrc_illegal_keys():
env = Environment()
# prevent any accidental modification
with patch.dict(environ, {}):
with pytest.raises(ValueError, match='illegal'):
env['mustnothave=char'] = 'some'
with pytest.raises(ValueError, match='illegal'):
env['mustnothave\0char'] = 'some'


# traditional datalad name transformation approach
class DataladLikeEnvironment(Environment):
def get_key_from_varname(self, name: str) -> Hashable:
return name.replace('__', '-').replace('_', '.').casefold()

def get_varname_from_key(self, key: Hashable) -> str:
# note that this is not actually a real inverse transform
return str(key).replace('.', '_').replace('-', '__').upper()


def test_envsrc_get(monkeypatch):
target_key = 'datalad.chunky-monkey.feedback'
target_value = 'ohmnomnom'
absurd_must_be_absent_key = 'nobody.would.use.such.a.key'
with monkeypatch.context() as m:
m.setenv('DATALAD_CHUNKY__MONKEY_FEEDBACK', 'ohmnomnom')
env = DataladLikeEnvironment(var_prefix='DATALAD_')
assert target_key in env.keys() # noqa: SIM118
assert target_key in env
assert env.get(target_key).value == target_value
# default is wrapped into Setting if needed
assert env.get(absurd_must_be_absent_key, target_value).value is target_value
assert (
env.get(absurd_must_be_absent_key, Setting(value=target_value)).value
is target_value
)
# assert env.getvalue(target_key) == target_value
# assert env.getvalue(absurd_must_be_absent_key) is None
assert len(env)


def test_envsrc_ambiguous_keys(monkeypatch, caplog):
target_key = 'datalad.chunky-monkey.feedback'
target_value = 'ohmnomnom'
with monkeypatch.context() as m:
# define two different setting that map on the same key
# with datalad's mapping rules
m.setenv('DATALAD_CHUNKY__monkey_FEEDBACK', 'würg')
m.setenv('DATALAD_CHUNKY__MONKEY_FEEDBACK', 'ohmnomnom')
env = DataladLikeEnvironment(var_prefix='DATALAD_')
# we still get the key's value
assert env[target_key].value == target_value
# negative test to make the next one count
assert 'map on identical' not in caplog.text
assert env.keys() == {target_key}
# we saw a log message complaining about the ambiguous
# key
if os_name not in ('os2', 'nt'):
# not testing on platforms where Python handles vars
# in case insensitive manner
assert (
'Ambiguous ENV variables map on identical keys: '
"{'datalad.chunky-monkey.feedback': "
"['DATALAD_CHUNKY__MONKEY_FEEDBACK', "
"'DATALAD_CHUNKY__monkey_FEEDBACK']}"
) in caplog.text


def test_envsrc_set():
env = Environment()

with patch.dict(environ, {}):
env['some.key'] = Setting(value='mike')
assert 'some.key' in env

# the instance is stateless, restoring the original
# env removes any knowledge of the key
assert 'some.key' not in env


def test_envsrc_del():
env = Environment()

with patch.dict(environ, {}):
env['some.key'] = Setting(value='mike')
assert 'some.key' in env
del env['some.key']
assert 'some.key' not in env

# the instance is stateless, restoring the original
# env removes any knowledge of the key
assert 'some.key' not in env


def test_envsrc_set_matching_transformed():
env = DataladLikeEnvironment(var_prefix='DATALAD_')
env_name = 'DATALAD_SOME_KEY'
orig_value = 'mike'
updated_value = 'allnew'

with patch.dict(environ, {env_name: orig_value}):
assert 'datalad.some.key' in env
assert env['datalad.some.key'].value == orig_value
env['datalad.some.key'] = Setting(updated_value)
# the new value is set for the inverse-transformed
# variable name
assert environ.get(env_name) == updated_value


def test_envsrc_lowercase_keys():
with patch.dict(environ, {}):
env = Environment(var_prefix='myapp_')
env['myapp_conf'] = Setting(123, coercer=str)
assert (
env.keys() == {'MYAPP_CONF'} if os_name in ('os2', 'nt') else {'myapp_conf'}
)
assert env['myapp_conf'].value == '123'
37 changes: 37 additions & 0 deletions datasalad/settings/tests/test_setting.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
import pytest

from ..setting import Setting


def test_setting():
with pytest.raises(ValueError, match='callable required'):
Setting(5, lazy=True)

test_val = 5
item = Setting(lambda: test_val, lazy=True)
assert item.is_lazy is True
assert item.value == test_val

assert 'lambda' in str(item)

test_val = 4
item.update(Setting(str(test_val), coercer=int))
assert item.is_lazy is False
assert item.value == test_val

item.update(Setting(coercer=float))
assert item.value == float(test_val)


def test_setting_derived_copy():
class MySetting(Setting):
def __init__(self, allnew: str):
self.allnew = allnew

target = 'dummy'
ms = MySetting(target)
ms_c = ms.copy()
assert ms_c.allnew == target

# __eq__ considers the derived type and rejects
assert ms != Setting(target)
66 changes: 66 additions & 0 deletions datasalad/settings/tests/test_settings.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
import sys

import pytest

from ..defaults import Defaults
from ..setting import Setting
from ..settings import Settings
from ..source import InMemory


def test_settings():
man = Settings(
{
'mem1': InMemory(),
'mem2': InMemory(),
'defaults': Defaults(),
}
)

assert list(man.sources.keys()) == ['mem1', 'mem2', 'defaults']
assert len(man) == 0
target_key = 'some.key'
assert target_key not in man
with pytest.raises(KeyError):
man[target_key]

man.sources['defaults'][target_key] = Setting('0', coercer=int)
assert man[target_key].value == 0

man.sources['mem2'][target_key] = Setting('1', coercer=float)
man.sources['mem1'][target_key] = Setting('2')

coerced_target = 2.0
item = man[target_key]
assert item.value == coerced_target
assert item.coercer == float

vals = man.getall(target_key)
assert isinstance(vals, tuple)
# one per source here
# TODO: enhance test case to have a multi-value setting in a single source
nsources = 3
assert len(vals) == nsources
assert [v.value for v in vals] == [0, 1.0, '2']

vals = man.getall('idonotexist')
assert isinstance(vals, tuple)
assert vals == (Setting(None),)

vals = man.getall('idonotexist', Setting(True))
assert isinstance(vals, tuple)
assert vals == (Setting(True),)

assert man.get('idonotexist').value is None
assert (
man.get(
'idonotexist',
# makes little actual sense, but exercises a lazy
# default setting
Setting(
lambda: sys.executable,
lazy=True,
),
).value
is sys.executable
)
100 changes: 100 additions & 0 deletions datasalad/settings/tests/test_source.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
import pytest

from ..setting import Setting
from ..source import (
CachingSource,
InMemory,
Source,
WritableSource,
)


class DummyCachingSource(CachingSource):
def _load(self):
pass


def test_inmemorysrc():
mem = InMemory()
assert str(mem) == 'InMemory'

target_key = 'dummy'
mem[target_key] = Setting('dummy')
assert mem.getall('dummy') == (Setting('dummy'),)
assert str(InMemory()) == 'InMemory'


def test_cachingsource():
ds = DummyCachingSource()
ds['mike'] = Setting('one')
assert ds['mike'] == Setting('one')
assert ds.get('mike') == Setting('one')
assert str(ds) == "DummyCachingSource(mike='one')"
assert repr(ds) == (
'DummyCachingSource(' "{'mike': Setting('one', coercer=None, lazy=False)})"
)

ds.add('mike', Setting('two'))
assert ds['mike'].value == 'two'
assert ds.get('mike').value == 'two'
assert ds.getall('mike') == (Setting('one'), Setting('two'))

assert ds.getall('nothere') == (Setting(None),)
assert ds.getall('nothere', Setting(True)) == (Setting(True),)

ds.add('notherebefore', Setting('butnow'))
assert ds['notherebefore'].value == 'butnow'


def test_settings_base_default_methods():
class DummySource(Source):
def _load(self): # pragma: no cover
pass

def _reinit(self): # pragma: no cover
pass

def _get_item(self, key): # pragma: no cover
return Setting(f'key_{key}')

def _get_keys(self):
return {'mykey', 'plain', 'tuple'}

src = DummySource()
assert 'mykey' in src
# smoke test for __iter__
assert set(src) == src.keys()

assert not src.is_writable

assert src.get('plain').value == 'key_plain'
assert src.getall('plain') == (Setting('key_plain'),)


def test_settings_writable_not_writable():
class DummySource(WritableSource):
@property
def is_writable(self):
return False

def _load(self): # pragma: no cover
pass

def _reinit(self): # pragma: no cover
pass

def _get_item(self, key): # pragma: no cover
raise NotImplementedError

def _get_keys(self):
raise NotImplementedError

def _set_item(self):
raise NotImplementedError

def _del_item(self):
raise NotImplementedError

src = DummySource()
with pytest.raises(RuntimeError):
src['dummy'] = Setting('irrelevant')
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
@@ -78,6 +78,7 @@ Also see the :ref:`modindex`.
runners
iterable_subprocess
itertools
settings
Why ``datasalad``?
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -132,7 +132,7 @@ extra-dependencies = [
]
[tool.hatch.envs.cz.scripts]
check-commits = [
# check all commit messages since we switched to convential commits
# check all commit messages since we switched to conventional commits
# only (no merge commits also)
"cz check --rev-range a518855b7e2a08d8a5ba6a36070425457271857b..HEAD",
]

0 comments on commit a30ea9f

Please sign in to comment.