Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 2 additions & 5 deletions narwhals/_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,11 +73,7 @@
CompliantSeriesT,
NativeSeriesT_co,
)
from narwhals._compliant.typing import (
EvalNames,
NativeDataFrameT,
NativeLazyFrameT,
)
from narwhals._compliant.typing import EvalNames, NativeDataFrameT, NativeLazyFrameT
from narwhals._namespace import (
Namespace,
_NativeArrow,
Expand Down Expand Up @@ -2067,6 +2063,7 @@ def deep_getattr(obj: Any, name_1: str, *nested: str) -> Any:
"""Perform a nested attribute lookup on `obj`."""
return deep_attrgetter(name_1, *nested)(obj)


class Compliant(
_StoresNative[NativeT_co], _StoresImplementation, Protocol[NativeT_co]
): ...
Expand Down
110 changes: 110 additions & 0 deletions narwhals/plugins.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
from __future__ import annotations

import sys
from functools import cache
from typing import TYPE_CHECKING, Any, Protocol, cast

from narwhals._compliant import CompliantNamespace
from narwhals._typing_compat import TypeVar

if TYPE_CHECKING:
from collections.abc import Iterator
from importlib.metadata import EntryPoints

from typing_extensions import LiteralString, TypeAlias

from narwhals._compliant.typing import (
CompliantDataFrameAny,
CompliantFrameAny,
CompliantLazyFrameAny,
CompliantSeriesAny,
)
from narwhals.utils import Version


Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also going to use this review to talk myself through the code and check my understanding with you, hope that's ok.

is __all__ below for protecting wildcard imports? i.e. stopping non-public ones getting imported too? (commenting out line doesn't break code)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also going to use this review to talk myself through the code and check my understanding with you, hope that's ok.

No problem at __all__ 😉

is __all__ below for protecting wildcard imports? i.e. stopping non-public ones getting imported too? (commenting out line doesn't break code)

Yeah that is one of the reasons

The rest is documented in: https://typing.python.org/en/latest/spec/distributing.html#library-interface-public-and-private-symbols

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dangotbanned, can I check I understand correctly: having the Plugin protocol with the two top-level functions means that if future plugin authors don't read our write-up (it's cooking..) on how to create a plugin & put different function at the interface to their plugin, typing will shout at them?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

& put different function at the interface to their plugin, typing will shout at them?

I wish! 😄

No, sadly, the only user-facing reason for it is documentation.
E.g. we say something like:

The narwhals.plugins.Plugin protocol defines what we expect to find in <plugin_package_name>.__init__.py

Internally though it is what give us typing from https://github.com/narwhals-dev/narwhals/blob/7bcec6199fe83ef2ddec9c76786c9c3c49612a1f/narwhals/plugins.py#L84

        plugin: Plugin = entry_point.load()

something_undefined doesn't exist on Plugin, so we're warned when we use it 👍

image

Without that, everything is Any, so we get no help on this 👎

image

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤯 ohhh I think you've just brought my understanding of Protocols forward by a biig jump. Thanks so much!! I read documentation/tutorials but they often remain surface.

__all__ = ["Plugin", "from_native"]

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ll.27-39: here we're setting up the typing groundwork so that we can work with objects from plugins.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh I don't know why we need both a covariant (FromNativeR_co )and invariant version (FrameT) of Frame, if that's what's going on?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh I don't know why we need both a covariant (FromNativeR_co )and invariant version (FrameT) of Frame, if that's what's going on?

Ahhhh the variance part is related, but don't worry about that for now!

I've written too much lately about this 😭

These have some breadcrumbs for now:


This is the meat of the question:

why we need both FromNativeR_co and FrameT?

  • CompliantNamespace introduces the Frame requirement, but what about Series?
  • {Eager,Lazy}Namespace have more complete definitions
    • From there, we want the return type(s) of from_native

PluginNamespace therefore says:

For however CompliantNamespace uses it's first TypeVar:

  • I accept any kind of compliant frame
  • Which defaults to both eager and lazy
FrameT = TypeVar(
    "FrameT",
    bound="CompliantFrameAny",
    default="CompliantDataFrameAny | CompliantLazyFrameAny",
)
class PluginNamespace(CompliantNamespace[FrameT, Any])

Additionally, here's what I know about the return of my from_native method:

  • It could be any compliant frame, or series
  • Which defaults to everything
CompliantAny: TypeAlias = (
    "CompliantDataFrameAny | CompliantLazyFrameAny | CompliantSeriesAny"
)
"""A statically-unknown, Compliant object originating from a plugin."""

FromNativeR_co = TypeVar(
    "FromNativeR_co", bound=CompliantAny, covariant=True, default=CompliantAny
)

class PluginNamespace(CompliantNamespace[FrameT, Any], Protocol[FrameT, FromNativeR_co]):
    def from_native(self, data: Any, /) -> FromNativeR_co: ...

Important

These TypeVars happen to be the same for lazy-only backends
But pandas, polars, pyarrow have more funky stuff

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

woa.. yeah it's a lot.. I will trust the process, I think this is going to be a long-term project for me to first absorb this knowledge then learn to apply it!

CompliantAny: TypeAlias = (
"CompliantDataFrameAny | CompliantLazyFrameAny | CompliantSeriesAny"
)
"""A statically-unknown, Compliant object originating from a plugin."""

FrameT = TypeVar(
"FrameT",
bound="CompliantFrameAny",
default="CompliantDataFrameAny | CompliantLazyFrameAny",
)
FromNativeR_co = TypeVar(
"FromNativeR_co", bound=CompliantAny, covariant=True, default=CompliantAny
)


@cache
def _discover_entrypoints() -> EntryPoints:
from importlib.metadata import entry_points as eps

group = "narwhals.plugins"
if sys.version_info < (3, 10):
return cast("EntryPoints", eps().get(group, ()))
return eps(group=group)


class PluginNamespace(CompliantNamespace[FrameT, Any], Protocol[FrameT, FromNativeR_co]):
def from_native(self, data: Any, /) -> FromNativeR_co: ...


class Plugin(Protocol[FrameT, FromNativeR_co]):
NATIVE_PACKAGE: LiteralString

def __narwhals_namespace__(
self, version: Version
) -> PluginNamespace[FrameT, FromNativeR_co]: ...
def is_native(self, native_object: object, /) -> bool: ...


Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ll.65-79, familiar territory here, all good

@cache
def _might_be(cls: type, type_: str) -> bool:
try:
return any(type_ in o.__module__.split(".") for o in cls.mro())
except TypeError:
return False


def _is_native_plugin(native_object: Any, plugin: Plugin) -> bool:
pkg = plugin.NATIVE_PACKAGE
return (
sys.modules.get(pkg) is not None
and _might_be(type(native_object), pkg) # type: ignore[arg-type]
and plugin.is_native(native_object)
)


def _iter_from_native(native_object: Any, version: Version) -> Iterator[CompliantAny]:
for entry_point in _discover_entrypoints():
plugin: Plugin = entry_point.load()
if _is_native_plugin(native_object, plugin):
compliant_namespace = plugin.__narwhals_namespace__(version=version)
yield compliant_namespace.from_native(native_object)
Comment on lines +82 to +87
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ym-pett this is related to (#2 (comment)), and (#2 (comment))

Here is the existing section of _from_native_impl, but updated to use the new references.

I've added some comments which are mainly focused on the Iterator and None parts

Important

None can mean 3 quite different things here

def _from_native_impl(
    native_object: Any,
    *,
    pass_through: bool = False,
    eager_only: bool = False,
    eager_or_interchange_only: bool,
    series_only: bool,
    allow_series: bool | None,
    version: Version,
) -> Any | None:
    from narwhals.translate import _translate_if_compliant

    for entry_point in _discover_entrypoints():
        plugin = entry_point.load()
        if _is_native_plugin(native_object, plugin):
            compliant_namespace = plugin.__narwhals_namespace__(version=version)
            compliant_object = compliant_namespace.from_native(native_object)
            translate_result = _translate_if_compliant(
                compliant_object,
                pass_through=pass_through,
                eager_only=eager_only,
                eager_or_interchange_only=eager_or_interchange_only,
                series_only=series_only,
                allow_series=allow_series,
                version=version,
            )
            # `None` means *all of*:
            # - We have at least 1 plugin
            # - A plugin **said they could** handle `native_object`
            # - The plugin also had `CompliantNamespace.from_native`
            #   - But what that returned didn't implement `__narwhals_{dataframe,lazyframe,series}__`
            # - The caller of `nw.from_native` used `pass_through=True`
            return translate_result  # noqa: RET504

    # `None` means *either*:
    # (No plugins)
    #   - `_discover_entrypoints()` returned an empty iterable
    # (Has plugin(s))
    #   - But **couldn't** handle `native_object`
    #   - Every plugin returned `False` on `_is_native_plugin`
    return None

Copy link
Owner

@ym-pett ym-pett Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ouch, so that's actually not great, as we won't know why we're not managing to read our plugin!

I'll need to have a think about how the situation is with the new version 🤔



def from_native(native_object: Any, version: Version) -> CompliantAny | None:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from_native - this is the function we'll actually call in translate. At first I was confused by the name _iter_from_native, but now I get it 😅 as that's what actually iterates! (or rather the generator it outputs can be iterated over with next.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the return statement, does next(_iter_from_native(native_object, version), None) mean that it'll try to iterate, but if there's no plugin, it'll return None?

Copy link
Collaborator Author

@dangotbanned dangotbanned Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the return statement, does next(_iter_from_native(native_object, version), None) mean that it'll try to iterate, but if there's no plugin, it'll return None?

Almost 😅

Try experimenting/extending/fixing this

Show playground

E.g. in an interactive session, try passing any of plugins_* into the functions and see what happens 🙂

from __future__ import annotations

from typing import TYPE_CHECKING, Any, TypedDict

# ruff: noqa: T201

if TYPE_CHECKING:
    from collections.abc import Iterable, Iterator, Sequence


class PluginMock(TypedDict):
    name: str
    is_native_plugin: bool


ok_1 = PluginMock(name="ok_1", is_native_plugin=True)
ok_2 = PluginMock(name="ok_2", is_native_plugin=True)
not_ok = PluginMock(name="not_ok", is_native_plugin=False)

plugins_1 = []
plugins_2 = [ok_1]
plugins_3 = [not_ok]
plugins_4 = [not_ok, ok_1]
plugins_5 = [ok_1, not_ok]
plugins_6 = [ok_1, ok_2]
plugins_7 = [ok_2, ok_1]


def example_iterator(loaded_entrypoints: Iterable[PluginMock]) -> Iterator[PluginMock]:
    for plugin in loaded_entrypoints:
        name = plugin["name"]
        is_native_plugin = plugin["is_native_plugin"]
        print(f"Checking {name!r} ...")
        if is_native_plugin:
            print(f"  Success: {name!r}")
            yield plugin
        else:
            print(f"  Fail: {name!r}")
    print("Exhausted 🥵")


def consumers(plugins: Sequence[PluginMock]) -> Any:
    a = example_iterator(plugins)
    b = next(example_iterator(plugins))
    c = next(example_iterator(plugins), "default?")
    d = list(example_iterator(plugins))
    e_it = example_iterator(plugins)
    e_1 = next(e_it)
    e_2 = next(e_it, "another default?")
    f, *g = example_iterator(plugins)
    return a, b, c, d, e_1, e_2, f, g

Relating back to (#2 (comment)) ...

What we know for sure is that None means we didn't find a plugin that could handle native_object

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nicee! will have a play now!

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohhh wow this is soooo weird. It's making my eyes bleed and I needed print statements; still don't quite understand why it's happening, but at least I can see it happening:

from plugins5 onwards things get weird, for b & c we don't seem to iterate over the full list of plugins? So as long as the first plugin is ok, it stops iterating? I need to still figure out why though 😅

plugin_4 output image
plugin_5 output image

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ps: learned about extended iterable unpacking in the process :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from plugins_5 onwards things get weird, for b & c we don't seem to iterate over the full list of plugins?
So as long as the first plugin is ok, it stops iterating?
I need to still figure out why though

@ym-pett here's some docs links that may be helpful, in order of accessibility 🙂:

The key parts in (#2 (comment)) are:

  • It only yields when is_native_plugin == True
  • How many times that happens will depend on both
    • The way you consume the iterator
    • How many items pass that condition

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking back at (#2 (comment))

we don't need to change how this works any time soon 😅
But I'm curious if this description sparks any ideas on potential issues it may have?

Say for example we had 2 plugins (P1, P2) which support native_object.
Currently, P2 will never be used - because we always use the first (P1).

I think that's a bit of an issue alone.
Now, consider that P1 could still later fail during the call to _translate_if_compliant. Our problem is then that we have no means to backtrack and try any other plugin - because we bet on the first working 😳.

By using an Iterator, we could instead do something like this (later, not in this PR 😅)

from __future__ import annotations

from typing import TYPE_CHECKING, Any, TypedDict

# ruff: noqa: T201

if TYPE_CHECKING:
    from collections.abc import Iterable, Iterator, Sequence


class PluginMock(TypedDict):
    name: str
    is_native_plugin: bool


ok_1 = PluginMock(name="ok_1", is_native_plugin=True)
ok_2 = PluginMock(name="ok_2", is_native_plugin=True)
not_ok = PluginMock(name="not_ok", is_native_plugin=False)


def example_iterator(loaded_entrypoints: Iterable[PluginMock]) -> Iterator[PluginMock]:
    for plugin in loaded_entrypoints:
        name = plugin["name"]
        is_native_plugin = plugin["is_native_plugin"]
        print(f"Checking {name!r} ...")
        if is_native_plugin:
            print(f"  Success: {name!r}")
            yield plugin
        else:
            print(f"  Fail: {name!r}")
    print("Exhausted 🥵")


def translate_if_compliant(thing: Any) -> Any:
    if thing is ok_1:
        return None
    return thing


def from_native() -> Any:
    """Super verbose control-flow to make the example clearer."""
    plugins_8 = [ok_1, not_ok, ok_2]
    it = example_iterator(plugins_8)
    first = next(it, None)
    if first is not None:
        first_result = translate_if_compliant(first)
        if first_result is not None:
            return first_result
        second = next(it, None)
        if second is not None:
            second_result = translate_if_compliant(second)
            if second_result is not None:
                return second_result
    return None

Copy link
Owner

@ym-pett ym-pett Sep 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you so much for this @dangotbanned !! Sorry I seem to have forgot to hit 'comment' again! 🤦

I need to read the links you've given me & play with this!

"""Attempt to convert `native_object` to a Compliant object, using any available plugin(s).
Arguments:
native_object: Raw object from user.
version: Narwhals API version.
Returns:
If the following conditions are met
- at least 1 plugin is installed
- at least 1 installed plugin supports `type(native_object)`
Then for the **first matching plugin**, the result of the call below.
This *should* be an object accepted by a Narwhals Dataframe, Lazyframe, or Series:
plugin: Plugin
plugin.__narwhals_namespace__(version).from_native(native_object)
In all other cases, `None` is returned instead.
Comment on lines +97 to +108
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this to describe the current behavior, but I'm not saying it should be the end goal.

(more qualifying) we don't need to change how this works any time soon 😅
But I'm curious if this description sparks any ideas on potential issues it may have?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I sense a rhetorical question, but I don't quite see far enough. I wondered about the line

compliant_object = plugins.from_native(native_object, version) in translate.py.

But it tries this and if it returns None, carries on with the rest of translate. (I tried it by uninstalling nw-daft & running some nw example with pandas). So that's not a problem..

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow, I think the examples worked, I'm a little embarrassed now that I didn't see the problem first time round: we only get a call to the plugin.__narwhals_namespace with the first matching plugin, if I've got this right, it won't care about the next plugin after that, even if it had the necessary functions?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to be embarrassed!

if I've got this right, it won't care about the next plugin after that, even if it had the necessary functions?

💯

"""
return next(_iter_from_native(native_object, version), None)
Empty file removed narwhals/plugins/__init__.py
Empty file.
47 changes: 0 additions & 47 deletions narwhals/plugins/_utils.py

This file was deleted.

27 changes: 12 additions & 15 deletions narwhals/translate.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
from functools import wraps
from typing import TYPE_CHECKING, Any, Callable, Literal, TypeVar, overload

from narwhals import plugins
from narwhals._constants import EPOCH, MS_PER_SECOND
from narwhals._namespace import (
is_native_arrow,
Expand Down Expand Up @@ -35,7 +36,6 @@
is_pyarrow_scalar,
is_pyarrow_table,
)
from narwhals.plugins._utils import _is_native_plugin, discover_entrypoints

if TYPE_CHECKING:
from narwhals.dataframe import DataFrame, LazyFrame
Expand Down Expand Up @@ -564,20 +564,17 @@ def _from_native_impl( # noqa: C901, PLR0911, PLR0912, PLR0915
raise TypeError(msg)
return Version.V1.dataframe(InterchangeFrame(native_object), level="interchange")

for entry_point in discover_entrypoints():
plugin = entry_point.load()
if _is_native_plugin(native_object, plugin):
compliant_namespace = plugin.__narwhals_namespace__(version=version)
compliant_object = compliant_namespace.from_native(native_object)
return _translate_if_compliant(
compliant_object,
pass_through=pass_through,
eager_only=eager_only,
eager_or_interchange_only=eager_or_interchange_only,
series_only=series_only,
allow_series=allow_series,
version=version,
)
compliant_object = plugins.from_native(native_object, version)
if compliant_object is not None:
return _translate_if_compliant(
compliant_object,
pass_through=pass_through,
eager_only=eager_only,
eager_or_interchange_only=eager_or_interchange_only,
series_only=series_only,
allow_series=allow_series,
version=version,
)

if not pass_through:
msg = f"Expected pandas-like dataframe, Polars dataframe, or Polars lazyframe, got: {type(native_object)}"
Expand Down
Loading