Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: to/from PyTorch JaggedTensor #3246

Open
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

maxymnaumchyk
Copy link
Collaborator

No description provided.

Copy link

codecov bot commented Sep 17, 2024

Codecov Report

Attention: Patch coverage is 23.25581% with 66 lines in your changes missing coverage. Please review.

Project coverage is 81.85%. Comparing base (b749e49) to head (c68321c).
Report is 162 commits behind head on main.

Files with missing lines Patch % Lines
src/awkward/operations/ak_to_jaggedtensor.py 20.00% 40 Missing ⚠️
src/awkward/operations/ak_from_jaggedtensor.py 23.52% 26 Missing ⚠️
Additional details and impacted files
Files with missing lines Coverage Δ
src/awkward/operations/__init__.py 100.00% <100.00%> (ø)
src/awkward/operations/ak_from_jaggedtensor.py 23.52% <23.52%> (ø)
src/awkward/operations/ak_to_jaggedtensor.py 20.00% <20.00%> (ø)

... and 105 files with indirect coverage changes

@maxymnaumchyk
Copy link
Collaborator Author

@jpivarski should I also leave out a "keep_regular" parameter since it does the same as ak.from_regular()? It's kind of the same situation we talked about today (about "padded" parameter).

@jpivarski
Copy link
Member

You're right: it is. The situation is that we should be providing an interface to the user that's like Lego bricks that they can put together however they like. If there's an alternative way of doing something, it shouldn't be a feature of the new functions, because then we'd have to explain why someone would use one or the other.

I agree that the padded and keep_regular arguments are more convenient if that's exactly what someone wants; passing padded=True or keep_regular=True is easier than the multi-step process it would be with the other method. However, these shortcuts would only work in the cases that these functions apply to, which are limited by the capabilities of TensorFlow and PyTorch's ragged array implementations.

@maxymnaumchyk
Copy link
Collaborator Author

thanks for such a detailed answer!

@ianna ianna marked this pull request as ready for review September 25, 2024 14:03
Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great work! But it might be solving the wrong problem. As we discussed at the meeting, this fbgemm-gpu-cpu module is not something ML users seem to be familiar with, so adding to/from functions wouldn't help them. It doesn't seem to be the interface that they use to implement DeepSets and GNNs, the ML models that might actually involve ragged data. So we're going to follow-up with ML experts to find out what interfaces they really do need.

Meanwhile, as discussed at the meeting, you'll be adding

  • ak.to_torch using Content.to_backend_array
  • ak.from_torch

for rectilinear arrays only (allow_record=False and allow_missing=False, following ak_to_numpy.py).

Another two functions,

would be needed to pre-process an idiomatic Awkward Array into the kind of interface that PyTorch-Geometric needs, which isn't one RaggedTensor object like TensorFlow; it's a few, separate, completely rectilinear arrays. "The way to do it" needs to be explained as a User Guide that puts all of these functions together, rather than a single function that tries to do everything in one call.

Although I'm setting this to "request changes," we'll likely be closing this PR and following up with new ones.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants