Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA]: Implement Arrow PyCapsule Interface #1332

Open
kylebarron opened this issue Jan 30, 2024 · 4 comments
Open

[FEA]: Implement Arrow PyCapsule Interface #1332

kylebarron opened this issue Jan 30, 2024 · 4 comments
Labels
External Issues filed by people outside the team feature request New feature or request Needs Triage Need team to review and classify

Comments

@kylebarron
Copy link

Is this a new feature, an improvement, or a change to existing functionality?

New Feature

How would you describe the priority of this feature request

Medium

Please provide a clear description of problem you would like to solve.

I have an interest in growing the GeoArrow ecosystem and making projects interoperable. I'm also developing lonboard, a Python library that uses GeoArrow with deck.gl for visualization of millions of geometries in a Jupyter notebook. I think that would complement cuspatial well, as cuspatial already uses GeoArrow and does not implement its own visualization.

Arrow recently made a PyCapsule Interface spec, where a consumer is able to call an __arrow_c_stream__ method and construct a table without knowing anything about the producer. This feature request is for cuspatial to implement the PyCapsule spec. So as an example in lonboard, you could pass a cuspatial.GeoDataFrame into lonboard.viz and it would just work because of the __arrow_c_stream__ method.

Describe any alternatives you have considered

Right now it looks like to_arrow only exists on the GeoSeries object but not the GeoDataFrame object? Maybe a first approach is to implement #1288 and create the public dunder of __arrow_c_stream__ after that?

I'm not sure how the interfacing with GPU memory works; I suppose __arrow_c_stream__ could call table.to_geoarrow_pyarrow() and finish by calling the pyarrow __arrow_c_stream__ method?

Additional context

See also the discussion and links in pyarrow: apache/arrow#39195 and GeoPandas geopandas/geopandas#3156

I don't have an NVIDIA GPU so I'm unable to test solutions :/

@kylebarron kylebarron added the feature request New feature or request label Jan 30, 2024
@GPUtester GPUtester added Needs Triage Need team to review and classify External Issues filed by people outside the team labels Jan 30, 2024
@GPUtester
Copy link
Contributor

Hi @kylebarron!

Thanks for submitting this issue - our team has been notified and we'll get back to you as soon as we can!
In the mean time, feel free to add any relevant information to this issue.

@thomcom
Copy link
Contributor

thomcom commented Feb 1, 2024

This is a fantastic feature request @kylebarron , thanks for pointing it out!

@mroeschke
Copy link
Contributor

Noting that in the 24.08 release, GeoDataFrame will have __arrow_c_stream__ as it subclasses cudf.DataFrame rapidsai/cudf#16310, but it goes through cudf.DataFrame.to_arrow so I suspect the geo types might not be correctly converted yet.

@kylebarron
Copy link
Author

If geometry arrays are already stored as arrow, then all you need to do is ensure the geospatial metadata is appropriately applied in the schema when exporting

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
External Issues filed by people outside the team feature request New feature or request Needs Triage Need team to review and classify
Projects
Status: Todo
Development

No branches or pull requests

4 participants