Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Array4: __array_interface__ #17

Closed

Conversation

ax3l
Copy link
Member

@ax3l ax3l commented Feb 15, 2021

  • draft the __array_interface__
  • handle C->F and F->C array index properly (invert shape & stride from/to AMReX)
  • check how to handle offsets expressed with begin/end
  • handle 1D-3D(4D) numpy array adoption properly
  • write type dispatch wrapper for Array4 construction
  • implement __cuda_array_interface__
  • ☮️ ❤️ 🌈

Close #9

Implement the `__array_interface__` in Array for.
@sayerhs sayerhs marked this pull request as draft February 15, 2021 14:00
Copy link
Contributor

@sayerhs sayerhs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good @ax3l seems like numpy <-> Array4 roundtrip is working already?

Currently, I see a conflict between how a user would expect to index into an array in Python vs. AMReX C++ code. In AMReX we expect to index using a Box so the starting index will not be zero. And in presence of ghost cells, negative indices will yield the ghost cells on the lo side.

[](Array4<T> const & a4) {
std::stringstream s;
s << a4.size();
return "<amrex.Array4 of size '" + s.str() + "'>";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should indicate the polymorphic type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in #19

Comment on lines +72 to +77
auto shape = py::make_tuple( // Buffer dimensions
len.x < 0 ? 0 : len.x,
len.y < 0 ? 0 : len.y,
len.z < 0 ? 0 : len.z//, // zero-size shall not have negative dimension
//a4.ncomp
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this handle ghost cells?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet at all, let's have a VC about how you solved it in the past.

So far I am mainly experimenting with the data interface (trying the GPU version next)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handled in #19


auto a4 = std::make_unique< Array4<T> >();
a4.get()->p = (T*)buf.ptr;
a4.get()->begin = Dim3{0, 0, 0};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably needs to be adjusted to a box index space

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this function here is mainly to support views into numpy and other python buffer objects via Array4.
In #19 currently mapped to a zero-based index space for incoming buffer protocols.

Potentially can implement shift-like functions later on to adjust the index space if we like to.

py::buffer_info buf = arr.request();

auto a4 = std::make_unique< Array4<T> >();
a4.get()->p = (T*)buf.ptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I read this correctly, this aliases the Array4 to the python buffer. Idk enough about the buffer protocol, so I am asking here: will arr be kept alive as long as this instance of pyamrex.Array4?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also what happens when the user changes to contents of arr?

Copy link
Member Author

@ax3l ax3l Feb 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, at the moment (and I am just trying the design of the data access) I made the Python Array4 behave like the C++ Array4: it's purely a view to data that is managed elsewhere.

If you create a numpy array and pass it to an Array4, the data is owned by Python. If you return an Array4 that is created on from C++ data, then the C++ side owns the data.

This might not be the final design because it can shoot Python users in the foot and I'll discuss with @sayerhs (feel free to join!) how to design this in the end. In his prior design, he basically did not expose Array4 as a user-creatable type at all but instead used it to return views to C++ data only. This sounds to me like something we might want to do again - but I need to catch up with him to see the details.

# assert(arr[1, 1, 1] == 42)

# copy to numpy
c_arr2np = np.array(arr, copy=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dumb question -- but I need to learn eventually -- is this the "official" way to define a amrex -> python deepcopy (i.e. using the np.array constructor)? Or do we need to define a dedicated deepcopy also?

Copy link
Contributor

@JBlaschke JBlaschke Feb 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Followup question: do we need to define a non-aliasing constructor (i.e. python -> amrex deepcopy)?

I can imaging that for data analytics applications we might want to keep a time series in python (looking at you machine learning folks) -- for that we need the former (amrex -> python) which is covered by this.

For experimental science applications we frequently have python provide a data source. A python -> amrex deepcopy (probably a simple copy constructor?) would be helpful in ensuring that the data is owned by amrex.

Come to think of it: A lot of algorithms in amrex are involve creating copies of data and then applying functions to them. Users might want to preserve this workflow.

My opinion is that this case best approach would be to expose the Array4 copy constructor (assuming that hasn't been deleted), and settle on the workflow: i) shallow-copy into amrex; ii) call copy constructor to create a deep copy.

Copy link
Member Author

@ax3l ax3l Feb 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some details above. This is just a view at the moment, the same way as the C++ Array4 is a plain view to non-owned data.

I just added the numpy overload so I can test around with it, maybe in the end we do not let users create this type directly because it will easily lead to memory issues if used wrongly (and segfaulting interpreters are never well received by users, even if they caused it by not following an API contract).

@ax3l
Copy link
Member Author

ax3l commented Mar 18, 2021

Discussed today:

  • we will make Array4(Real/Int) createable only by MultiFabs (amrex::Box indexing semantics by default for everything: index -1 is ghost)
  • add explicit as_array / as_cupy functions for non-owning data from it: (numpy/Python Buffer Interface semantics for everything: index -1 is end of array)
  • make data copies explicit or fail if coming from device (check also pytorch & tensorflow interfaces for good vibes)

Refs.:

# Looping
for mfi in mfab:
    bx = mfi.tilebox()
    marr = mfab.array(mfi)

    for i, j, k in bx:
        mar[i, j, k, 0] = 10.0 * i
        mar[i, j, k, 1] = 10.0 * j
        mar[i, j, k, 2] = 10.0 * k



template< typename T >
void make_Array4(py::module &m, std::string typestr)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed in #19: we want to make sure that users do not construct Array4s on their own and only get them from MFIter loops

@ax3l
Copy link
Member Author

ax3l commented Mar 22, 2022

This is now merged into #19 and improved therein.

@ax3l ax3l mentioned this pull request Mar 26, 2022
@ax3l ax3l closed this Mar 26, 2022
@ax3l ax3l deleted the topic-arrayBufferProtocols branch March 26, 2022 08:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Discussion on mapping between amrex, numpy.ndarray, and torch.tensor data types
3 participants