Skip to content

Commit

Permalink
BUG: Ensure einsum uses chunking (now that nditer doesn't) (numpy#28043)
Browse files Browse the repository at this point in the history
* BUG: Ensure einsum uses chunking (now that nditer doesn't)

The nditer refactor enabled the `GROWINNER` for reductions, this
was not the case before (I am not sure about cases where the reduction
is in an outer axis, but I don't think so).

Either way, as the test says, chunking always improves the precision
of large sorts if their mean is nonzero and sklearn noticed lower
precision.

We could only remove growinner if there is a reduction, but it seems
like a 1% performance hit for the simplest (non-trivial) case where
GROWINNER had done something before.

* Refine test and add code comment
  • Loading branch information
seberg authored Dec 20, 2024
1 parent 031a890 commit 9aa5cda
Show file tree
Hide file tree
Showing 2 changed files with 31 additions and 1 deletion.
5 changes: 4 additions & 1 deletion numpy/_core/src/multiarray/einsum.c.src
Original file line number Diff line number Diff line change
Expand Up @@ -1032,10 +1032,13 @@ PyArray_EinsteinSum(char *subscripts, npy_intp nop,
NPY_ITER_NBO|
NPY_ITER_ALIGNED|
NPY_ITER_ALLOCATE;
/*
* Note: We skip GROWINNER here because this gives a partially stable
* summation for float64. Pairwise summation would be better.
*/
iter_flags = NPY_ITER_EXTERNAL_LOOP|
NPY_ITER_BUFFERED|
NPY_ITER_DELAY_BUFALLOC|
NPY_ITER_GROWINNER|
NPY_ITER_REFS_OK|
NPY_ITER_ZEROSIZE_OK;
if (out != NULL) {
Expand Down
27 changes: 27 additions & 0 deletions numpy/_core/tests/test_einsum.py
Original file line number Diff line number Diff line change
Expand Up @@ -1282,3 +1282,30 @@ def test_overlap():
#gh-10080, out overlaps one of the operands
c = np.einsum('ij,jk->ik', a, b, out=b)
assert_equal(c, d)

def test_einsum_chunking_precision():
"""Most einsum operations are reductions and until NumPy 2.3 reductions
never (or almost never?) used the `GROWINNER` mechanism to increase the
inner loop size when no buffers are needed.
Because einsum reductions work roughly:
def inner(*inputs, out):
accumulate = 0
for vals in zip(*inputs):
accumulate += prod(vals)
out[0] += accumulate
Calling the inner-loop more often actually improves accuracy slightly
(same effect as pairwise summation but much less).
Without adding pairwise summation to the inner-loop it seems best to just
not use GROWINNER, a quick tests suggest that is maybe 1% slowdown for
the simplest `einsum("i,i->i", x, x)` case.
(It is not clear that we should guarantee precision to this extend.)
"""
num = 1_000_000
value = 1. + np.finfo(np.float64).eps * 8196
res = np.einsum("i->", np.broadcast_to(np.array(value), num)) / num

# At with GROWINNER 11 decimals succeed (larger will be less)
assert_almost_equal(res, value, decimal=15)

0 comments on commit 9aa5cda

Please sign in to comment.