Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
202 commits
Select commit Hold shift + click to select a range
eb6a2f8
feat: Port some simple impls
dangotbanned Nov 19, 2025
5033437
feat: Add `gather_every`
dangotbanned Nov 19, 2025
dfe9155
feat: Add `drop_nulls`
dangotbanned Nov 19, 2025
6fbb7e7
prepping for windowed `is_{duplicated,unique}`
dangotbanned Nov 19, 2025
9c0b13e
fix(typing): Tighten up `StoresNativeT_co` bound
dangotbanned Nov 20, 2025
618c83f
feat: Impl and update `clip`
dangotbanned Nov 20, 2025
7da8f5b
fix: Don't allow `__len__` to be used for `__bool__`
dangotbanned Nov 20, 2025
9f23d8e
fix: Ensure `with_columns` always broadcasts
dangotbanned Nov 20, 2025
039cb4a
feat: Partial impl `kurtosis`, `skew`
dangotbanned Nov 20, 2025
cc1cffb
chore: Make the `rolling_*` gap more visible
dangotbanned Nov 20, 2025
03dfef5
feat: Use native `kurtosis` and `skew` when available
dangotbanned Nov 20, 2025
291dffd
chore(typing): Happy mypy
dangotbanned Nov 20, 2025
4a360b5
feat: Support `kurtosis`, `skew` in group_by
dangotbanned Nov 20, 2025
33f7707
refactor: De-duplicate `kurtosis`, `skew`
dangotbanned Nov 20, 2025
2e39678
feat(DRAFT): Impl and update `mode`
dangotbanned Nov 20, 2025
84ce86c
fix `mode_all` and port tests
dangotbanned Nov 20, 2025
a0b6395
test: Add tests for `is_{duplicated,unique}`
dangotbanned Nov 21, 2025
4c25b05
feat: Add `is_{duplicated,unique}` + support in `over`
dangotbanned Nov 21, 2025
b880b21
feat(DRAFT): Port `fill_null_with_strategy`
dangotbanned Nov 21, 2025
df0d9b8
test: Add `fill_null_test`
dangotbanned Nov 21, 2025
58c18e4
feat: Avoid `numpy` dependency in `fill_null_with_strategy`
dangotbanned Nov 22, 2025
2f632b4
test: Reveal index out-of-bounds errors
dangotbanned Nov 22, 2025
d65a73b
fix: Avoid index OOB error
dangotbanned Nov 22, 2025
bd2b75a
perf: Reuse `is_valid` mask
dangotbanned Nov 22, 2025
3f7900b
perf: Avoid 1/3 reverses for `fill_null("backward",limit=...)`
dangotbanned Nov 22, 2025
818471a
perf: Noop when we have nothing to fill/fill with
dangotbanned Nov 22, 2025
0e82e20
chore: renaming
dangotbanned Nov 22, 2025
50e95af
refactor: Finalize `_fill_null_forward_limit`
dangotbanned Nov 22, 2025
591b700
tidy
dangotbanned Nov 22, 2025
103e4d4
start adding `replace_strict`
dangotbanned Nov 22, 2025
a19d38a
update `Expr.replace_strict` signature, partial parsing
dangotbanned Nov 22, 2025
ce57dcc
test: Port non-default tests
dangotbanned Nov 23, 2025
2eab41a
test: Port the default tests
dangotbanned Nov 23, 2025
7ec10f7
feat(DRAFT): Support `replace_strict(default=...)`
dangotbanned Nov 23, 2025
801f283
split out ops logic
dangotbanned Nov 23, 2025
050b60c
add tests for scalar
dangotbanned Nov 23, 2025
c2faad0
feat: Add scalar paths for `replace_strict`
dangotbanned Nov 23, 2025
cec550c
finally start working on `rolling_expr`
dangotbanned Nov 23, 2025
08477b5
feat: Add `rolling_{sum,mean}`
dangotbanned Nov 23, 2025
d98d5bf
test: Add `rolling_{sum,mean}.over(order_by=...)`
dangotbanned Nov 23, 2025
69ec056
chore: Update cov/todos
dangotbanned Nov 23, 2025
db262b5
feat(DRAFT): Add `rolling_{var,std}`
dangotbanned Nov 23, 2025
f0d182d
perf: Don't create nulls and then replace
dangotbanned Nov 23, 2025
f351758
feat: Add `CompliantSeries` dunders
dangotbanned Nov 24, 2025
299d9d1
start adding `ArrowSeries` dunders
dangotbanned Nov 24, 2025
afb195b
feat: Impl `ArrowSeries` dunders
dangotbanned Nov 24, 2025
a17c312
feat: Impl the other `ArrowSeries` methods
dangotbanned Nov 24, 2025
8e5c156
"fix" `rolling_var`
dangotbanned Nov 24, 2025
9bd8bcd
get sugary, with an extended `zip_with`
dangotbanned Nov 24, 2025
988694d
add `test_rolling_std`
dangotbanned Nov 24, 2025
7dbb882
Add `test_rolling_{var,std}_order_by`
dangotbanned Nov 24, 2025
fdd901c
test: Limit values to `abs_tol=1e-6`
dangotbanned Nov 24, 2025
99bb1b4
pull out `ddof` handling
dangotbanned Nov 24, 2025
50d7f0d
add `shift(fill_value=0)` back
dangotbanned Nov 24, 2025
2e745f2
remove unreachable `window_size == 0` branches
dangotbanned Nov 24, 2025
1b6370f
consistently use `ArrowSeries.pow`
dangotbanned Nov 24, 2025
7a24c51
Add `test_rolling_expr_invalid`
dangotbanned Nov 24, 2025
582da64
Move rolling to `ArrowSeries`, tidy up
dangotbanned Nov 24, 2025
ef2a4f5
to-done
dangotbanned Nov 24, 2025
c519f70
quick+dirty `map_batches(returns_scalar=True)`
dangotbanned Nov 24, 2025
aa52033
test: Move existing `map_batches` tests
dangotbanned Nov 25, 2025
100d82e
test: Clean up and doc `map_batches`
dangotbanned Nov 25, 2025
3f3884f
test: Add `test_map_batches_invalid`
dangotbanned Nov 25, 2025
6707919
test(typing): happier mypy
dangotbanned Nov 25, 2025
c87fb74
fix: raise when `returns_scalar=False` does not return a `Series`
dangotbanned Nov 25, 2025
24fb350
fix: Check the correct flags for mutual exclusivity 😅
dangotbanned Nov 25, 2025
632932f
test: Also check list scalar
dangotbanned Nov 25, 2025
951272c
dont repeat yourself, dont repeat yourself, ...
dangotbanned Nov 25, 2025
2685791
remove dead code
dangotbanned Nov 25, 2025
11ed36a
chore(typing): Move `map_batches` to `EagerExpr`
dangotbanned Nov 25, 2025
2f1ca62
feat: Add and use `ArrowSeries.diff(n=...)` for rolling
dangotbanned Nov 25, 2025
d586d55
perf?: Use `diff(window_size)` in `_rolling_sum` too
dangotbanned Nov 25, 2025
7169fef
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 25, 2025
6c836c5
feat(DRAFT): Add `struct.field`
dangotbanned Nov 25, 2025
b17b343
chore: Remove finished todo
dangotbanned Nov 26, 2025
ce3ec21
test: Move comments to test ids
dangotbanned Nov 26, 2025
cd04606
test: Tidy up `replace_strict`
dangotbanned Nov 26, 2025
e71a3bc
chore: remove unused
dangotbanned Nov 26, 2025
0c2d425
test: Reveal some `struct.field` bugs
dangotbanned Nov 26, 2025
bcf197b
fix: Ensure generated method names preserve accessors
dangotbanned Nov 26, 2025
6188c96
fix: Handle `struct.field(...)` in `ExprIR` expansion
dangotbanned Nov 26, 2025
72f17e2
fix: Use the right names for select/alias
dangotbanned Nov 26, 2025
2f48f10
test: `{kurtosis,skew}.over(*partition_by)`
dangotbanned Nov 26, 2025
84c01ac
feat: Add `cat.get_categories`
dangotbanned Nov 26, 2025
895cbe2
test: xfail `pyarrow<15.0`
dangotbanned Nov 26, 2025
95f2d1c
tidy
dangotbanned Nov 26, 2025
7674b60
feat: Add `list.len`
dangotbanned Nov 27, 2025
e6e2045
cov
dangotbanned Nov 27, 2025
d45f45c
feat: Add `{DataFrame,Series}.gather_every`
dangotbanned Nov 27, 2025
f493547
refactor: Clear out the cobwebs
dangotbanned Nov 27, 2025
94b676c
remove unused keywords from `kurtosis`
dangotbanned Nov 27, 2025
f3aacf0
feat: More `nan`s and `null`s
dangotbanned Nov 27, 2025
05e5c7a
Mark more features as not implemented
dangotbanned Nov 27, 2025
33d1511
feat: Add `Expr.{ceil,floor}`
dangotbanned Nov 28, 2025
237e0f2
start adding missing `list` ops
dangotbanned Nov 28, 2025
dd051a3
test: Add `test_list_contains_invalid`
dangotbanned Nov 28, 2025
2b458bb
test: Add `test_list_get`
dangotbanned Nov 28, 2025
e91f04f
add `test_list_get_invalid`
dangotbanned Nov 28, 2025
55d20ea
feat: Add `Expr.list.get`
dangotbanned Nov 28, 2025
d8363e1
fix: Ensure nulls are preserved in `Expr.unique` during `group_by`
dangotbanned Nov 28, 2025
2f3efe1
feat: Add native `str.zfill`
dangotbanned Nov 28, 2025
d0cc2df
revert autocomplete
dangotbanned Nov 28, 2025
d5771bb
Adapt hand-rolled `str.zfill` from `main`
dangotbanned Nov 29, 2025
b97aff9
feat: Add `str.{slice,len_chars}`
dangotbanned Nov 29, 2025
7be2d29
Add nodes for missing namespace methods
dangotbanned Nov 29, 2025
f633ce1
feat: Impl most of `ArrowStringNamespace`
dangotbanned Nov 29, 2025
f4cecb0
refactor: Extend ` _unary_function`, docs, fix typing
dangotbanned Nov 30, 2025
8456f6f
feat: Add `coalesce`
dangotbanned Nov 30, 2025
06dc04b
feat: Add `format`
dangotbanned Nov 30, 2025
11fad99
start adding `{all,any}_horizontal(ignore_nulls=True)`
dangotbanned Nov 30, 2025
29e1f2a
feat: Support `{all,any}_horizontal(ignore_nulls=True)`
dangotbanned Nov 30, 2025
4a86c68
🧹🧹🧹
dangotbanned Nov 30, 2025
91f222d
feat: Add `{DataFrame,Series}.sample`
dangotbanned Dec 1, 2025
d88d2c1
test: cover `with_replacement=True`
dangotbanned Dec 1, 2025
e5c2165
test: xfail `over(order_by=...).meta.root_names()` on nightly
dangotbanned Dec 2, 2025
6c16ff5
refactor: Move `unique` to `Series`
dangotbanned Dec 2, 2025
c537650
feat: Add `Expr.sample`*
dangotbanned Dec 2, 2025
4885cba
test: Cover `str.replace(_all)`
dangotbanned Dec 2, 2025
9881c93
feat: Support some forms of `str.replace(value: Expr)`
dangotbanned Dec 2, 2025
44a9d1e
feat(DRAFT): Pull more out of `str_replace`
dangotbanned Dec 3, 2025
0fea5b5
cov
dangotbanned Dec 3, 2025
91cfa33
feat: Support `str.replace(value:Expr, n=-1)` as well 🥳🥳🥳
dangotbanned Dec 3, 2025
60b6433
factor-in `ReplaceExpr`
dangotbanned Dec 3, 2025
41634b6
feat: Support `str.replace_all(value: Expr)`
dangotbanned Dec 3, 2025
3b03f2a
partially handle `ignore_nulls` in `{list,str}_join`
dangotbanned Dec 3, 2025
2786a09
test: Cover more in `str.replace_all`
dangotbanned Dec 3, 2025
9cc469b
refactor: Merge `str_replace` impls
dangotbanned Dec 3, 2025
3f0d8e0
test: Add a test for `str.replace(value: Expr, n>1)`
dangotbanned Dec 3, 2025
abe577d
feat: Support `str.replace(value: Expr, n>1)`
dangotbanned Dec 3, 2025
24f2358
cursed typing
dangotbanned Dec 3, 2025
10100a2
feat(DRAFT): Add native `linear_space`
dangotbanned Nov 29, 2025
048f372
feat(DRAFT): More `Series.hist` prep
dangotbanned Dec 4, 2025
9e92e08
feat: Add `DataFrame.to_struct`
dangotbanned Dec 4, 2025
1172f82
feat: Add `Series.struct.unnest`
dangotbanned Dec 4, 2025
36ff66e
kinda fix `struct` typing
dangotbanned Dec 4, 2025
5bd4651
test: un-xfail all but 1 `hist` test
dangotbanned Dec 4, 2025
d69ad77
feat: rough port of `ArrowSeries.hist`
dangotbanned Dec 4, 2025
36f3b68
cov
dangotbanned Dec 4, 2025
da791a7
fix: `pyarrow<15` compat
dangotbanned Dec 4, 2025
9ab3d00
import
dangotbanned Dec 4, 2025
642dcb4
fix: Don't slice twice
dangotbanned Dec 4, 2025
2f3a557
perf: Don't slice twice (again)
dangotbanned Dec 4, 2025
e4f0f83
fix: `pyarrow<18` compat
dangotbanned Dec 4, 2025
8886421
fix: `pyarrow<21` compat
dangotbanned Dec 4, 2025
bb1d1b0
chore: remove outdated comments
dangotbanned Dec 5, 2025
b1b6f85
refactor: Reuse `is_between`, rather than re-implementing
dangotbanned Dec 5, 2025
08ad7f1
refactor: Keep everything-but `np.searchsorted` native
dangotbanned Dec 5, 2025
96e5fb5
refactor: Make `BooleanArray.{false,true}_count` easier to access
dangotbanned Dec 5, 2025
57c854c
feat: Add `Series.{cast,drop_nulls,drop_nans}`
dangotbanned Dec 5, 2025
bcea3fb
test: Simplify, parallelize `hist(bin_count=...)`
dangotbanned Dec 5, 2025
a443e74
test: break up smaller `hist` stuff
dangotbanned Dec 5, 2025
251cff5
tidier?
dangotbanned Dec 5, 2025
129b8e6
factor-in `_hist_calculate_bins`
dangotbanned Dec 5, 2025
1e1d2cb
move `search_sorted`
dangotbanned Dec 5, 2025
31f4a7d
refactor: `_hist_is_empty_series` -> `is_only_nulls`
dangotbanned Dec 5, 2025
679a216
refactor: Move `hist_with_bins` -> `ArrowExpr.hist_bins`
dangotbanned Dec 5, 2025
f756df0
refactor: Move more of `hist_with_bin_count` -> `pyarrow`
dangotbanned Dec 6, 2025
6132607
refactor: More `hist` organizing
dangotbanned Dec 6, 2025
3e83593
stop fiddling w/ `hist` impl (for now)
dangotbanned Dec 6, 2025
96cb095
test: Split out more tools for test gen
dangotbanned Dec 6, 2025
6f74bf9
test: Split up `test_hist_bins`
dangotbanned Dec 6, 2025
4af7b10
chore: re-cover things
dangotbanned Dec 6, 2025
9f3d9ac
fix: Ensure `Expr.hist` only returns a struct when we have multiple f…
dangotbanned Dec 6, 2025
7ad60df
chore: Align `Expr.hist` defaults with `pl.Expr.hist`
dangotbanned Dec 6, 2025
2d198dd
test: Cover `Expr.hist`
dangotbanned Dec 6, 2025
8283a47
feat(expr-ir): Add `{DataFrame,Series}.explode(empty_as_nulls, keep_n…
dangotbanned Dec 9, 2025
5ecd62f
test: Cover `DataFrame.to_struct`
dangotbanned Dec 9, 2025
33c9498
remove outdated comments
dangotbanned Dec 9, 2025
03fbcfb
refactor: Move `Series.hist` impl -> `CompliantSeries.hist`
dangotbanned Dec 9, 2025
ca22bac
refactor(typing): Align type param positions between `Expr`, `Series`…
dangotbanned Dec 9, 2025
7530214
docs(typing): Highlight and explain cycling dependency issue
dangotbanned Dec 9, 2025
66f1163
tests: cov `Series.cast`
dangotbanned Dec 9, 2025
68c08c0
test: A whole bunch of `str.*` method coverage
dangotbanned Dec 9, 2025
f50278d
fix: Actually support `Sequence` in `Series.gather`
dangotbanned Dec 9, 2025
937bc57
feat(expr-ir): Add `linear_space` (#3349)
dangotbanned Dec 10, 2025
d238ac6
feat(expr-ir): Support `list.unique`
dangotbanned Dec 10, 2025
c2c3d78
fix(typing): Well that my goof, apparently
dangotbanned Dec 10, 2025
5f001b4
test: Cover unique on `ListScalar`
dangotbanned Dec 10, 2025
cd3b626
fix: `pyarrow<14` compat
dangotbanned Dec 10, 2025
ced4ae1
use acero more directly in `list.unique`
dangotbanned Dec 10, 2025
2ce64d6
refactor: somewhat cleaner `list_unique`
dangotbanned Dec 11, 2025
a10bf73
fix: Avoid `pa.array` bug w/ `[<pyarrow.ListScalar: None>]`
dangotbanned Dec 11, 2025
27992a1
perf: Add broadcast fastpath for 1 column
dangotbanned Dec 11, 2025
1519d7f
perf: Add a fastpath for all valid
dangotbanned Dec 11, 2025
3ae4113
test: Reveal `Scalar` bugs
dangotbanned Dec 11, 2025
7b2e06d
fix: Handle 2/4 scalar `list.unique` cases
dangotbanned Dec 11, 2025
e50a654
fix: Correctly handle `Scalar` in all `list.unique` cases
dangotbanned Dec 11, 2025
7eb40c7
docs: Write an essay on `list.unique`
dangotbanned Dec 11, 2025
bad7df1
test: Add `list.contains` tests
dangotbanned Dec 11, 2025
8cf2173
test: Add `DataFrame.explode` fail case
dangotbanned Dec 11, 2025
c6dc2a8
fix: Support `explode` when only column in table#
dangotbanned Dec 11, 2025
8fe333e
feat(expr-ir): Support `ArrowExpr.list.contains(IntoExpr)`
dangotbanned Dec 11, 2025
1a69b68
fix: Raise and suggest solution for len(1) Series broadcast
dangotbanned Dec 12, 2025
f10823f
test: Add `test_list_contains_scalar`
dangotbanned Dec 12, 2025
61cfa52
feat(expr-ir): Support `ArrowScalar.list.contains`
dangotbanned Dec 12, 2025
1e58264
chore: Update cov
dangotbanned Dec 12, 2025
38a698e
docs: Add a note on `CompliantExpr.dt`]
dangotbanned Dec 12, 2025
30e5f48
feat: Expose `Series.struct.schema`
dangotbanned Dec 12, 2025
1f9da68
refactor: Pull out tools from list ops
dangotbanned Dec 12, 2025
2d84d14
feat(DRAFT): `Expr.list.join(ignore_nulls=True)` progress
dangotbanned Dec 12, 2025
fbb71a6
fix: Handle `[None, None , ...]` in `list.join(ignore_nulls=True)`
dangotbanned Dec 13, 2025
4e2da36
chore: remove outdated comment
dangotbanned Dec 13, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions narwhals/_plan/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,15 @@
all,
all_horizontal,
any_horizontal,
coalesce,
col,
concat_str,
date_range,
exclude,
format,
int_range,
len,
linear_space,
lit,
max,
max_horizontal,
Expand All @@ -37,12 +40,15 @@
"all",
"all_horizontal",
"any_horizontal",
"coalesce",
"col",
"concat_str",
"date_range",
"exclude",
"format",
"int_range",
"len",
"linear_space",
"lit",
"max",
"max_horizontal",
Expand Down
21 changes: 17 additions & 4 deletions narwhals/_plan/_dispatch.py
Original file line number Diff line number Diff line change
Expand Up @@ -182,7 +182,20 @@ def _method_name(tp: type[ExprIRT | FunctionT]) -> str:


def get_dispatch_name(expr: ExprIR | type[Function], /) -> str:
"""Return the synthesized method name for `expr`."""
return (
repr(expr.function) if is_function_expr(expr) else expr.__expr_ir_dispatch__.name
)
"""Return the synthesized method name for `expr`.

Note:
Refers to the `Compliant*` method name, which may be *either* more general
or more specialized than what the user called.
"""
dispatch: Dispatcher[Any]
if is_function_expr(expr):
from narwhals._plan import expressions as ir

if isinstance(expr, (ir.RollingExpr, ir.AnonymousExpr)):
dispatch = expr.__expr_ir_dispatch__
else:
dispatch = expr.function.__expr_ir_dispatch__
else:
dispatch = expr.__expr_ir_dispatch__
return dispatch.name
4 changes: 2 additions & 2 deletions narwhals/_plan/_function.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,8 +52,8 @@ def __init_subclass__(
**kwds: Any,
) -> None:
super().__init_subclass__(*args, **kwds)
if accessor:
config = replace(config or FEOptions.default(), accessor_name=accessor)
if accessor_name := accessor or cls.__expr_ir_config__.accessor_name:
config = replace(config or FEOptions.default(), accessor_name=accessor_name)
if options:
cls._function_options = staticmethod(options)
if config:
Expand Down
7 changes: 6 additions & 1 deletion narwhals/_plan/_guards.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
NativeSeriesT,
Seq,
)
from narwhals.typing import NonNestedLiteral
from narwhals.typing import NonNestedLiteral, PythonLiteral

T = TypeVar("T")

Expand All @@ -38,6 +38,7 @@
bytes,
Decimal,
)
_PYTHON_LITERAL_TPS = (*_NON_NESTED_LITERAL_TPS, list, tuple, type(None))


def _ir(*_: Any): # type: ignore[no-untyped-def] # noqa: ANN202
Expand Down Expand Up @@ -68,6 +69,10 @@ def is_non_nested_literal(obj: Any) -> TypeIs[NonNestedLiteral]:
return obj is None or isinstance(obj, _NON_NESTED_LITERAL_TPS)


def is_python_literal(obj: Any) -> TypeIs[PythonLiteral]:
return isinstance(obj, _PYTHON_LITERAL_TPS)


def is_expr(obj: Any) -> TypeIs[Expr]:
return isinstance(obj, _expr().Expr)

Expand Down
10 changes: 7 additions & 3 deletions narwhals/_plan/_parse.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,11 @@
is_selector,
)
from narwhals._plan.common import flatten_hash_safe
from narwhals._plan.exceptions import invalid_into_expr_error, is_iterable_error
from narwhals._plan.exceptions import (
invalid_into_expr_error,
is_iterable_error,
list_literal_error,
)
from narwhals._utils import qualified_type_name
from narwhals.dependencies import get_polars
from narwhals.exceptions import InvalidOperationError
Expand Down Expand Up @@ -127,7 +131,7 @@ def parse_into_expr_ir(
expr = col(input)
elif isinstance(input, list):
if list_as_series is None:
raise TypeError(input) # pragma: no cover
raise list_literal_error(input)
expr = lit(list_as_series(input))
else:
expr = lit(input, dtype=dtype)
Expand Down Expand Up @@ -331,7 +335,7 @@ def _combine_predicates(predicates: Iterator[ExprIR], /) -> ExprIR:
inputs = (first,)
else:
return first
return AllHorizontal().to_function_expr(*inputs)
return AllHorizontal(ignore_nulls=False).to_function_expr(*inputs)


def _is_iterable(obj: Iterable[T] | Any) -> TypeIs[Iterable[T]]:
Expand Down
6 changes: 4 additions & 2 deletions narwhals/_plan/arrow/acero.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
import operator
from functools import reduce
from itertools import chain
from typing import TYPE_CHECKING, Any, Final, Union, cast
from typing import TYPE_CHECKING, Any, Final, Literal, Union, cast

import pyarrow as pa # ignore-banned-import
import pyarrow.acero as pac
Expand Down Expand Up @@ -61,7 +61,9 @@
"""

Target: TypeAlias = OneOrSeq[Field]
Aggregation: TypeAlias = "_Aggregation"
Aggregation: TypeAlias = Union[
"_Aggregation", Literal["hash_kurtosis", "hash_skew", "kurtosis", "skew"]
]
AggregateOptions: TypeAlias = "_AggregateOptions"
Opts: TypeAlias = "AggregateOptions | None"
OutputName: TypeAlias = str
Expand Down
15 changes: 14 additions & 1 deletion narwhals/_plan/arrow/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

from typing import TYPE_CHECKING, Any, ClassVar, Generic

from narwhals._plan.arrow.functions import BACKEND_VERSION
from narwhals._plan.arrow.functions import BACKEND_VERSION, random_indices
from narwhals._typing_compat import TypeVar
from narwhals._utils import Implementation, Version, _StoresNative

Expand Down Expand Up @@ -43,6 +43,10 @@ def _with_native(self, native: NativeT) -> Self:
msg = f"{type(self).__name__}._with_native"
raise NotImplementedError(msg)

def __len__(self) -> int:
msg = f"{type(self).__name__}.__len__"
raise NotImplementedError(msg)

if BACKEND_VERSION >= (18,):

def _gather(self, indices: Indices) -> NativeT:
Expand All @@ -57,5 +61,14 @@ def gather(self, indices: Indices | _StoresNative[ChunkedArrayAny]) -> Self:
ca = self._gather(indices.native if is_series(indices) else indices)
return self._with_native(ca)

def gather_every(self, n: int, offset: int = 0) -> Self:
return self._with_native(self.native[offset::n])

def slice(self, offset: int, length: int | None = None) -> Self:
return self._with_native(self.native.slice(offset=offset, length=length))

def sample_n(
self, n: int = 1, *, with_replacement: bool = False, seed: int | None = None
) -> Self:
mask = random_indices(len(self), n, with_replacement=with_replacement, seed=seed)
return self.gather(mask)
100 changes: 79 additions & 21 deletions narwhals/_plan/arrow/dataframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
from narwhals._plan.arrow.series import ArrowSeries as Series
from narwhals._plan.compliant.dataframe import EagerDataFrame
from narwhals._plan.compliant.typing import namespace
from narwhals._plan.exceptions import shape_error
from narwhals._plan.expressions import NamedIR
from narwhals._utils import Version, generate_repr
from narwhals.schema import Schema
Expand All @@ -24,16 +25,18 @@
from collections.abc import Iterable, Iterator, Mapping, Sequence

import polars as pl
from typing_extensions import Self
from typing_extensions import Self, TypeAlias

from narwhals._plan.arrow.typing import ChunkedArrayAny
from narwhals._plan.arrow.typing import ChunkedArrayAny, ChunkedOrArrayAny
from narwhals._plan.compliant.group_by import GroupByResolver
from narwhals._plan.expressions import ExprIR, NamedIR
from narwhals._plan.options import SortMultipleOptions
from narwhals._plan.options import ExplodeOptions, SortMultipleOptions
from narwhals._plan.typing import NonCrossJoinStrategy
from narwhals.dtypes import DType
from narwhals.typing import IntoSchema

Incomplete: TypeAlias = Any


class ArrowDataFrame(
FrameSeries["pa.Table"], EagerDataFrame[Series, "pa.Table", "ChunkedArrayAny"]
Expand All @@ -48,6 +51,10 @@ def _with_native(self, native: pa.Table) -> Self:
def _group_by(self) -> type[GroupBy]:
return GroupBy

@property
def shape(self) -> tuple[int, int]:
return self.native.shape

def group_by_resolver(self, resolver: GroupByResolver, /) -> GroupBy:
return self._group_by.from_resolver(self, resolver)

Expand All @@ -68,11 +75,16 @@ def __len__(self) -> int:

@classmethod
def from_dict(
cls, data: Mapping[str, Any], /, *, schema: IntoSchema | None = None
cls,
data: Mapping[str, Any],
/,
*,
schema: IntoSchema | None = None,
version: Version = Version.MAIN,
) -> Self:
pa_schema = Schema(schema).to_arrow() if schema is not None else schema
native = pa.Table.from_pydict(data, schema=pa_schema)
return cls.from_native(native, version=Version.MAIN)
return cls.from_native(native, version=version)

def iter_columns(self) -> Iterator[Series]:
for name, series in zip(self.columns, self.native.itercolumns()):
Expand All @@ -96,10 +108,12 @@ def to_polars(self) -> pl.DataFrame:

return pl.DataFrame(self.native)

def _evaluate_irs(self, nodes: Iterable[NamedIR[ExprIR]], /) -> Iterator[Series]:
ns = namespace(self)
from_named_ir = ns._expr.from_named_ir
yield from ns._expr.align(from_named_ir(e, self) for e in nodes)
def _evaluate_irs(
self, nodes: Iterable[NamedIR[ExprIR]], /, *, length: int | None = None
) -> Iterator[Series]:
expr = namespace(self)._expr
from_named_ir = expr.from_named_ir
yield from expr.align((from_named_ir(e, self) for e in nodes), default=length)

def sort(self, by: Sequence[str], options: SortMultipleOptions | None = None) -> Self:
return self.gather(fn.sort_indices(self.native, *by, options=options))
Expand All @@ -121,6 +135,19 @@ def with_row_index_by(
column = fn.unsort_indices(indices)
return self._with_native(self.native.add_column(0, name, column))

def to_struct(self, name: str = "") -> Series:
native = self.native
if fn.TO_STRUCT_ARRAY_ACCEPTS_EMPTY:
struct = native.to_struct_array()
elif fn.HAS_FROM_TO_STRUCT_ARRAY:
if len(native):
struct = native.to_struct_array()
else:
struct = fn.chunked_array([], pa.struct(native.schema))
else:
struct = fn.struct(native.column_names, native.columns)
return Series.from_native(struct, name, version=self.version)

def get_column(self, name: str) -> Series:
chunked = self.native.column(name)
return Series.from_native(chunked, name, version=self.version)
Expand All @@ -136,6 +163,12 @@ def drop_nulls(self, subset: Sequence[str] | None) -> Self:
native = self.native.filter(~to_drop)
return self._with_native(native)

def explode(self, subset: Sequence[str], options: ExplodeOptions) -> Self:
builder = fn.ExplodeBuilder.from_options(options)
if len(subset) == 1:
return self._with_native(builder.explode_column(self.native, subset[0]))
return self._with_native(builder.explode_columns(self.native, subset))

def rename(self, mapping: Mapping[str, str]) -> Self:
names: dict[str, str] | list[str]
if fn.BACKEND_VERSION >= (17,):
Expand All @@ -144,20 +177,26 @@ def rename(self, mapping: Mapping[str, str]) -> Self:
names = [mapping.get(c, c) for c in self.columns]
return self._with_native(self.native.rename_columns(names))

# NOTE: Use instead of `with_columns` for trivial cases
def with_series(self, series: Series) -> Self:
"""Add a new column or replace an existing one.

Uses similar semantics as `with_columns`, but:
- for a single named `Series`
- no broadcasting (use `Scalar.broadcast` instead)
- no length checking (use `with_series_checked` instead)
"""
return self._with_native(with_array(self.native, series.name, series.native))

def with_series_checked(self, series: Series) -> Self:
expected, actual = len(self), len(series)
if len(series) != len(self):
raise shape_error(expected, actual)
return self.with_series(series)

def _with_columns(self, exprs: Iterable[Expr | Scalar], /) -> Self:
native = self.native
columns = self.columns
height = len(self)
for into_series in exprs:
name = into_series.name
chunked = into_series.broadcast(height).native
if name in columns:
i = columns.index(name)
native = native.set_column(i, name, chunked)
else:
native = native.append_column(name, chunked)
return self._with_native(native)
names_and_columns = ((e.name, e.broadcast(height).native) for e in exprs)
return self._with_native(with_arrays(self.native, names_and_columns))

def select_names(self, *column_names: str) -> Self:
return self._with_native(self.native.select(list(column_names)))
Expand Down Expand Up @@ -200,3 +239,22 @@ def partition_by(self, by: Sequence[str], *, include_key: bool = True) -> list[S
from_native = self._with_native
partitions = partition_by(self.native, by, include_key=include_key)
return [from_native(df) for df in partitions]


def with_array(table: pa.Table, name: str, column: ChunkedOrArrayAny) -> pa.Table:
column_names = table.column_names
if name in column_names:
return table.set_column(column_names.index(name), name, column)
return table.append_column(name, column)


def with_arrays(
table: pa.Table, names_and_columns: Iterable[tuple[str, ChunkedOrArrayAny]], /
) -> pa.Table:
column_names = table.column_names
for name, column in names_and_columns:
if name in column_names:
table = table.set_column(column_names.index(name), name, column)
else:
table = table.append_column(name, column)
return table
Loading
Loading