Skip to content

Conversation

ym-pett
Copy link
Contributor

@ym-pett ym-pett commented Aug 12, 2025

What type of PR is this? (check all applicable)

  • 💾 Refactor
  • ✨ Feature
  • 🐛 Bug Fix
  • 🔧 Optimization
  • 📝 Documentation
  • ✅ Test
  • 🐳 Other

Related

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below

started trying to adapt nw so that it can read daft dataframes, but I might be barking up the wrong tree: I installed daft in the repo so that it would be available to nw. I have a feeling we want to avoid that, and I'm working as if I were adding an actual _daft module rather than prep for a plugin.

note to self: if installing daft was correct, need to add this to an install file, think it's pyproject.toml

@dangotbanned
Copy link
Member

@ym-pett did you know about Marco's PR?

Apologies if you've discussed this privately 😅

@MarcoGorelli
Copy link
Member

MarcoGorelli commented Aug 12, 2025

thanks for starting this!

😄 @dangotbanned yeah we'd discussed this

My hope is that we can use Daft as the way to develop the plugin system, and it can serve as a reference implementation

Here's what we're aiming for:

If a user has narwhals-daft installed, then they should be able to run

import narwhals as nw
import daft

df_native = daft.from_pydict({"a": [1, 2, 3], "b": [4, 5, 6]})

df = nw.from_native(df_compliant)
result = df.select("a", nw.col("b") * nw.col("a"))
print(result.collect())

This needs to be done in a way that won't be specific to Daft, so that anyone can register their own plugin without Narwhals having any knowledge about it. In this PR it's currently all Daft-specific

The packaging docs around entry-points might be useful here:

I'll also cc @camriddell into the conversation, as IIRC he'd also thought about pluggable backends


For prior art on plugins and entry-points, I think https://github.com/PyCQA/flake8 might also be good to look at

@ym-pett
Copy link
Contributor Author

ym-pett commented Aug 13, 2025

thanks both, I'll revert the current changes - I feel like I had to go down the wrong route first to see what this actually consists of! :)

I can now go through the materials armed with more background! 🦾

@ym-pett ym-pett closed this Aug 13, 2025
@ym-pett ym-pett force-pushed the create_fromnative_daft branch from 3c8b34b to 11fe33f Compare August 13, 2025 10:32
@ym-pett
Copy link
Contributor Author

ym-pett commented Aug 13, 2025

oops, didn't mean to close this! will reopen when I push new content!

@ym-pett ym-pett reopened this Aug 13, 2025
@ym-pett
Copy link
Contributor Author

ym-pett commented Aug 13, 2025

based on flake8 example - will try to get something more sensible in next

@ym-pett ym-pett force-pushed the create_fromnative_daft branch from 1ac87eb to cfd156f Compare August 13, 2025 16:28
@ym-pett ym-pett force-pushed the create_fromnative_daft branch from b004d4e to e3a2f3b Compare August 14, 2025 13:34
Comment on lines 541 to 553
for plugin in discovered_plugins:
obj = plugin.load()
frame = obj.dataframe.DaftLazyFrame

# from obj.dataframe import DaftLazyFrame
try:
df_compliant = frame(native_object, version=Version.MAIN)
return df_compliant.to_narwhals()
except:
# try the next plugin
continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooh, nice! does this work?

to make it not daft-specific, perhaps we could aim to have something like

try:
    df_compliant = obj.from_native(native_object, version=Version.MAIN)

?

This would mean making a top-level function in narwhals-daft too, and then we document that plugin authors are expected to implement this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a feeling I'm doing something weird with how things are imported. Maybe it's that the top-level __init__ file needs altering in daft-narwhals.

with your suggestion, t.py no longer works and I get the error

TypeError: Expected pandas-like dataframe, Polars dataframe, or Polars lazyframe, got: <class 'daft.dataframe.dataframe.DataFrame'>

that dataframe.dataframe looks weird to me... would you expect that structure?

the type of plugin is <class 'importlib.metadata.EntryPoint'>, so I figured I had to load the module via that, for obj I then get the type <class 'module'>

the only way I could get access to the DaftLazyFrame (haven't tried simple LazyFrame yet) was by assigning it to a variable name, I couldn't do something like

from obj.dataframe import DaftLazyFrame (the error then is ModuleNotFoundError: No module named 'obj')

I think at the moment this all leaves us too bound to daft, and I bet I'm breaking a million coding conventions, eek!

I suspect I need to do a better job at exposing the modules within daft-nw but I haven't found how yet

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps in narwhals_daft/__init__.py you could make a from_native function, and then here use obj.from_native?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, will try that!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should plugin detection come before we check our own written backends? That way if someone really wanted to override our pandas backend they would be empowered to?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 yeah maybe

@ym-pett ym-pett force-pushed the create_fromnative_daft branch from d72da25 to ca01346 Compare August 15, 2025 08:49
@ym-pett ym-pett force-pushed the create_fromnative_daft branch from 319b3d6 to 90ad973 Compare August 16, 2025 11:44
@ym-pett ym-pett force-pushed the create_fromnative_daft branch from 5e3be1f to 76232df Compare August 16, 2025 14:15
@ym-pett ym-pett force-pushed the create_fromnative_daft branch from 495d727 to ebc1a8f Compare August 16, 2025 14:26
Comment on lines +90 to +91
def from_native(native_object: Any, version: Version) -> CompliantAny | None:
"""Attempt to convert `native_object` to a Compliant object, using any available plugin(s).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@ym-pett ym-pett Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added #3130 in the linked issues as I think it'll fix the type checking failure in the tests.

enable-cache: "true"
cache-suffix: pytest-full-coverage-${{ matrix.python-version }}
cache-dependency-glob: "pyproject.toml"
- name: install-reqs
Copy link
Contributor Author

@ym-pett ym-pett Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm I tried to set l.87 to be the same as in the 'main' branch, but maybe this has contributed to the additional test failures? I will try out locally if going back to the line which doesn't specify the duckdb version prevents these new test failures. Can't quite make out if they're all duckdb related though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey - doesn't look like it's the same, could you make sure you've fetched upstream first please?

the only diff we should be seeing here is

+      - name: install-test-plugin
+        run: uv pip install -e tests/test_plugin --system

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarcoGorelli, I'm a bit confused as to what's happening with these nightly tests now failing (Min, old, and nightly versions / nightlies (3.12, ubuntu-latest) (pull_request) - locally my pytest runs without any failures.

Similarly these PyTest / pytest-full-coverage (3.11, ubuntu-latest) (pull_request)

Wondering if I have I introduced an error into the pytest.yml? Does the order in which the processes are listed matter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took out "duckdb<1.4" as I thought that might be the issue, but it doesn't seem to be the case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey - doesn't look like it's the same, could you make sure you've fetched upstream first please?

the only diff we should be seeing here is

  •  - name: install-test-plugin
    
  •    run: uv pip install -e tests/test_plugin --system
    

just seen your comment now - will do!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I think the diff is looking how it should now

@ym-pett
Copy link
Contributor Author

ym-pett commented Sep 30, 2025

ah still those nightly versions and full-coverage tests failing, is there anything I can investigate for those?

@MarcoGorelli
Copy link
Member

the nightly one is unrelated, but the coverage one does show some missing coverage:

Name                                         Stmts   Miss Branch BrPart  Cover   Missing
----------------------------------------------------------------------------------------
narwhals/plugins.py                             36      2      4      0    95%   69-70
tests/test_plugin/test_plugin/dataframe.py      47      1      0      0    98%   35
----------------------------------------------------------------------------------------
TOTAL                                        24760      3   2722      0    99%

the typing issue is also related:

/home/runner/work/narwhals/narwhals/tests/test_plugin/test_plugin/__init__.py
  /home/runner/work/narwhals/narwhals/tests/test_plugin/test_plugin/__init__.py:16:12 - error: Cannot instantiate abstract class "DictNamespace"
    "CompliantNamespace.is_native" is not implemented (reportAbstractUsage)

@ym-pett
Copy link
Contributor Author

ym-pett commented Oct 8, 2025

#2978 (comment) - thanks Marco, will try to fix, may have some more concrete questions :)

class DictNamespace(CompliantNamespace[DictLazyFrame, Any]):
def __init__(self, *, version: Version) -> None:
self._version = version

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wasn't sure of my definition of is_native, but it was passing locally.

in my local version I'm not getting any test failures using pytest, but on the remote the coverage fails; my problem is I can't see the details like you got @MarcoGorelli.

I click on the failing test

Image

, get a long printout with the most informative being:

Image

I've tried running just the plugins tests with pytest tests/test_plugin/test_plugin and pytest tests/test_plugin/, and I realise no tests have run, so maybe that's why I'm not getting any failures locally? Or am I running this incorrectly?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah the tests are passing, it's just telling you that those lines of code aren't being run

  • the narwhals/plugins.py one is a defensive check (i presume) so i'd say it's ok to pragma: no cover it
  • for the other uncovered methods (_with_native / is_native), unless you write a test which hits them, i'd suggest to turn them into not_implemented

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok, will do that for the moment, would like to figure the tests for those out when I have more time though. Putting this on my personal backlog :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ps: I'm so silly, only just realised I can look at the 'Missing' column to see which line is causing the problem! 🤦

return self

@property
def columns(self) -> list[str]:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if it's ok to use the pragma: no cover trick here, it won't accept not_implemented as it conflicts with another instantiation.

I'm not sure if I can use the same trick in the tests/plugins_test.py file, will try & just roll back commit if not

def columns(self) -> list[str]: # pragma: no cover
return list(self._native_frame.keys())

_with_native = not_implemented()
Copy link
Contributor Author

@ym-pett ym-pett Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having _with_native as not implemented and the paragma: no cover silencing in the test/plugins_test.py (ll.15-16) trick it into saying we have full test coverage, but this now complains that

E NotImplementedError: '_with_version' is not implemented for: <Implementation.UNKNOWN: 'unknown'>.

I'm not sure what I've done so far is kosher, seems weird to silence tests that were explicitly written in the test/plugins_test.py file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request extensibility
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants