-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add a non-touching ak.zip
, called 'ak.zip_no_broadcast'
#3390
Conversation
I think we need to be careful here with what we can check without touching data. I could add e.g. checking that all lengths of all given layouts are the same, which would touch all shapes instead of one, but is significantly more safe. I'm curious where we should draw the boundary of how "unsafe" this function should be. edit: I added checking for the same lengths as otherwise this could be really dangerous. Touching lengths is anyway ok with |
@pfackeldey my feeling for an To implement such a function, we would build the result form using the type of e.g. the first array (or to be smarter, whichever array has the most "well-known" type above the record). I think this would require the Just food for thought :) |
Hi @agoose77, thanks for your comment! I assumed you gave this already some thoughts a while ago in dask-contrib/dask-awkward#536, so I'm really happy to have your input. I'm not sure what kind of assertion you have in mind. The only way this I could simply add this assertion here to enforce the same values for non- What else would be dangerous? I'm not sure I understood your suggestion fully yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pfackeldey - looks good to me! Thanks! I wonder if a test where this operation fails would be useful?
@pfackeldey I'm replying to
we can 100% avoid constructing invalid arrays at runtime. We just need to ensure that enforcing the check doesn't trigger touching. The way to do this would be:
Maybe that's what you're proposing, I'm not totally sure! |
I'll respond to both of you directly @agoose77 and @ianna:
Let me add the second check and rename the function, then I think it's good 👍 |
Ok, @ianna and @agoose77 can you have another look at it?
|
ak.zip
, called 'ak.unsafe_zip'
ak.zip
, called 'ak.zip_no_broadcast'
This PR adds a non-touching version of
ak.zip
called"ak.unsafe_zip"
. This is useful to re-arrange record arrays in coffea without touching any contents.The price we're paying to achieve non-touching zipping behavior is safety, i.e. broken arrays can be constructed.
ak.unsafe_zip
assumes that all input arrays have the same layouts (and lengths!), and currently can only work onak.contents.NumpyArray
orak.contents.ListOffsetArray
. Nesting record arrays can be achieved by nestingak.unsafe_zip
calls, similar to howak.zip
works currently.For usage with coffea, this needs to be daskified similar to
ak.zip
, see: https://github.com/dask-contrib/dask-awkward/blob/main/src/dask_awkward/lib/structure.py#L1272-L1344cc @nsmith-
See also: dask-contrib/dask-awkward#536
non-touching show-case:
No data is touched. However, touching shape can not be avoided, but for this we have
unknown_length
already.