Skip to content

Conversation

@fjetter
Copy link
Contributor

@fjetter fjetter commented Mar 26, 2025

This came up in dask/dask#11844 (comment)

I don't see a reason why you shouldn't be able to use the dask native da.reduction. This way, dask owns the tree reduction logic and it will work as intended once we move to an expression backend.

The only caveat with this is that the very final reduction / Session merge would happen locally. This should typically only be a very small number of sessions (one per dask.array). The only reason why this would be bad is if that merge would be expensive or would require IO but I believe this is just merging metadata, isn't it?

cc @dcherian

@paraseba paraseba requested a review from dcherian March 27, 2025 23:25
@dcherian
Copy link
Contributor

dcherian commented Mar 28, 2025

The only reason why this would be bad is if that merge would be expensive or would require IO but I believe this is just merging metadata, isn't it?

Yes, we were being a bit extra around being scale-aware here. I figured if I wanted tree-reduce across arrays, I'd need to write out the graph, so I just did it for the arrays themselves too. I am surprised this just works, are these Session objects being automatically cast to object arrays of the right shape under the hood in _concatenate2? They must be. I had originally thought that I'd need to write custom combine and aggregate ...

@fjetter
Copy link
Contributor Author

fjetter commented Mar 28, 2025

There is no _concatenate2 involved since we're disabling concatenate in the reduction. This way the chunks are just passed to merge_sessions as a nested list, e.g.

image

and there is no final concat step because the final reduction is also a merge_sessions which returns a singleton.

Sorry for the typing mess and thanks for cleaning up

@fjetter
Copy link
Contributor Author

fjetter commented Apr 1, 2025

sorry for the linting problems. I didn't setup pre-commit because I struggled with the uv setup, etc. I think it should be working

@fjetter
Copy link
Contributor Author

fjetter commented Apr 2, 2025

ok, rust ci is now failing with... is this related?

advisories FAILED, bans ok, licenses ok, sources ok
error: Recipe `check-deps` failed on line 34 with exit code 1
error: Recipe `pre-commit` failed on line 47 with exit code 1

I'll try to reproduce. Main doesn't seem to eb affected

@dcherian
Copy link
Contributor

dcherian commented Apr 2, 2025

@fjetter the PR is fine, this is an unrelated error.

Copy link
Contributor

@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fjetter!

@dcherian dcherian enabled auto-merge (squash) April 3, 2025 20:38
@paraseba paraseba disabled auto-merge April 3, 2025 21:22
@paraseba paraseba merged commit 69e9e43 into earth-mover:main Apr 3, 2025
7 of 8 checks passed
@fjetter fjetter deleted the dask_array_reduction branch April 4, 2025 07:48
dcherian added a commit that referenced this pull request Apr 4, 2025
* main:
  Release version v0.2.12 (#894)
  Use dask array native reduction (#864)
  Update sample-datasets page (#887)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants