Side effect apis + registry #3876

dmadisetti · 2025-02-21T22:51:12Z

Description

The discussion in #3270 (@leventov) highlighted that the current cache model is sensitive to non obvious side-effects (e.g. cache miss in one case not invalidating another )

A simple case follows:

i = random()
with mo.persistent_cache("make_dependent"):
    dependent = UnserializableObj(i)

with mo.persistent_cache("use_dependent"):
    result = dependent.action()

here, result will always be the same, because it uses the execution path hash to resolve dependent. At the very least, the make_dependent should invalidate use_dependent because there'll be a hash miss.

Let's have persistent_cache create a side_effect entry associated with a cell. ExecutionPath hashes will now also consume the side effect data created during the execution of the relevant block.

class _cache_context():
    # ...
    def __exit__(self, ...):
        #...
        context.side_effect_registry.add(cell_id, cache.hash)

def hash_and_dequeue_execution_refs(...):
    # ...
    side_effects = set()
    for ancestor_id in to_hash:
        side_effects |= context.side_effect_registry.get(ancestor_id)
    # ... 
    self.hash_alg.update(side_effects)

Suggested solution

There are a couple other places a side effect registry could come into play:

marimo.random: could ensure that the call is consistent to the notebook running the same cell would give you the same random number (PRNG increment is dependent on number of total calls until that point in the dag, and a cache skip would be associated with a jump ahead). Also a note, I think jax has a pretty nice model- https://docs.jax.dev/en/latest/random-numbers.html

marimo.cache_timeout: Reason why I thought of this again. A user on discord reported they would like their sql queries to be cached until some timeout. This could be done through some form of cache clean up, but issuing a sideeffect is a neat way to achieve this as well. Conversely, mo.sql calls could issue a side effect behind the scenes.

marimo.fetch: Get a network value. Could also have a time expiry

marimo.file: file change watcher issues a side effect

marimo.env: Environmental variable access

These could also all be namespaced under sideeffects:

from marimo.sideeffects import random, fetch, file, env, timeout

cat marimo/_runtime/side_effects.py

class SideEffectRegistry:
    def __init__(self) -> None:
        self.namespaces: dict[str, set[str]] = {}

    def register(self, cell_id: CellId_t, state_hash: bytes) -> None:
        # state_hash is some how tied to the "side-effect"
        # For instance, in random, it might be the random state
        # timeout might be state = (now() - original_time) // time_interval
        if cell_id not in self.namespaces:
            self.namespaces[cell_id] = set()
        self.namespaces[cell_id].add(state_hash)

    def delete(self, cell_id) -> None:
        """Called when cells get cleaned up in CellRunnerKernel"""
        if cell_id in self.namespaces:
            del self.namespaces[cell_id]

Alternative

the side effect registry is just behind the scenes- but exposing some of the functionality through our api should allow for for more reliable usage of cache invalidation

Additional context

I have a branch from a few weeks back where I started casually hacking on this.

For the caching timeline I see:

base registry + maybe some of these "extras"
data "adapters" (i.e. read from redis or buckets opposed to just vanilla file systems)
async/sync executor + dataflow.Runner refactor
cell level caching executor

as completing the first bit of this story. A full white paper should go on to examine:

mandala
diskcache
joblib.Memory
functools.cache

comparisons where applicable

The text was updated successfully, but these errors were encountered:

dmadisetti added the enhancement New feature or request label Feb 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Side effect apis + registry #3876

Side effect apis + registry #3876

dmadisetti commented Feb 21, 2025

Side effect apis + registry #3876

Side effect apis + registry #3876

Comments

dmadisetti commented Feb 21, 2025

Description

Suggested solution

Alternative

Additional context