Skip to content

Conversation

@dangotbanned
Copy link
Member

@dangotbanned dangotbanned commented Dec 10, 2025

Description

This one started as a plan to expose the linear_space I adapted from np.linspace.
I added that for use in hist, where on main there is currently a dependency on np.linspace.

Show usage

def hist_zeroed_data(
arg: int | Sequence[float], *, include_breakpoint: bool
) -> Mapping[str, Iterable[Any]]:
# NOTE: If adding `linear_space` and `zeros` to `CompliantNamespace`, consider moving this.
n = arg if isinstance(arg, int) else len(arg) - 1
if not include_breakpoint:
return {"count": zeros(n)}
bp = linear_space(0, 1, arg, closed="right") if isinstance(arg, int) else arg[1:]
return {"breakpoint": bp, "count": zeros(n)}

lower: NativeScalar = fn.min_(native)
upper: NativeScalar = fn.max_(native)
if lower.equals(upper):
# All data points are identical - use unit interval
rhs = fn.lit(0.5)
lower, upper = fn.sub(lower, rhs), fn.add(upper, rhs)
bins = fn.linear_space(lower.as_py(), upper.as_py(), bin_count + 1)
data = fn.hist_bins(native, bins, include_breakpoint=include)

def _linear_space(
self,
start: float,
end: float,
num_samples: int,
*,
closed: Literal["both", "none"] = "both",
) -> _1DArray:
from numpy import linspace # ignore-banned-import
return linspace(start=start, stop=end, num=num_samples, endpoint=closed == "both")

However, after l'd ported the polars.linear_space tests and noticing the direct comparison to np.linspace - I got ... curious
So I did my digging and found the polars rust impl - which was a lot cleaner in handling ClosedInterval. So, I stole that instead 😄

I'm deferring support on these for now:

Related issues

`hist` depends on it, and if it gets extended to support `nw.Time` - then we've got ourselves a pretty powerful feature
I wanna share some fixtures between these
Have some more tests to add from `polars` for `linear_space`
I found this one pretty interesting
The `pyarrow` version never uses `np.linspace`, but will use `np.arange` when `pyarrow<21`
Handling the specific case on equal dtypes isn't hard, but it does make the existing code messy
I'd rather revisit this with the other bits I wanna change
Its nicer that this has all the `ClosedInterval` cases handled before generating the range
Realised I haven't been running the doctests, oops
The change in `ByName` was a fix from #3233
@dangotbanned dangotbanned added internal pyarrow Issue is related to pyarrow backend labels Dec 10, 2025
@dangotbanned dangotbanned marked this pull request as ready for review December 10, 2025 18:41
@dangotbanned dangotbanned merged commit 937bc57 into expr-ir/plz-finish-arrow-expr Dec 10, 2025
23 of 35 checks passed
@dangotbanned dangotbanned deleted the expr-ir/linear-space-nw branch December 10, 2025 18:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

internal pyarrow Issue is related to pyarrow backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants