Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: cuDF errors on GeoDataFrame Print #1125

Open
jarmak-nv opened this issue May 5, 2023 · 1 comment
Open

[BUG]: cuDF errors on GeoDataFrame Print #1125

jarmak-nv opened this issue May 5, 2023 · 1 comment
Labels
bug Something isn't working Needs Triage Need team to review and classify

Comments

@jarmak-nv
Copy link
Contributor

jarmak-nv commented May 5, 2023

Version

23.04

On which installation method(s) does this occur?

Conda

Describe the issue

After generating a GDF - printing throws an error:

TypeError: data type 'geometry' not understood

Minimum reproducible example

import geopandas as gpd
import numpy as np
import cuspatial
from shapely.geometry import LineString

# Make some lines
xmin, ymin, xmax, ymax = -180, -90, 180, 90
n_points = np.random.randint(2, 100, size=1000)
x_points = np.random.uniform(xmin, xmax, size=(1000, np.max(n_points)))
y_points = np.random.uniform(ymin, ymax, size=(1000, np.max(n_points)))
lines = [LineString(np.column_stack((x[:n], y[:n]))) for x, y, n in zip(x_points, y_points, n_points)]

# Make a gdf, then cuspatial gdf
gdf = gpd.GeoDataFrame(geometry=lines)
cuspatial_gdf = cuspatial.from_geopandas(gdf)
print(cuspatial_gdf)

Relevant log output

TypeError                                 Traceback (most recent call last)
File ~/miniconda3/envs/rapids-23.04-release2/lib/python3.10/site-packages/IPython/core/formatters.py:344, in BaseFormatter.__call__(self, obj)
    342     method = get_real_method(obj, self.print_method)
    343     if method is not None:
--> 344         return method()
    345     return None
    346 else:

File ~/miniconda3/envs/rapids-23.04-release2/lib/python3.10/site-packages/nvtx/nvtx.py:101, in annotate.__call__..inner(*args, **kwargs)
     98 @wraps(func)
     99 def inner(*args, **kwargs):
    100     libnvtx_push_range(self.attributes, self.domain.handle)
--> 101     result = func(*args, **kwargs)
    102     libnvtx_pop_range(self.domain.handle)
    103     return result

File ~/miniconda3/envs/rapids-23.04-release2/lib/python3.10/site-packages/cudf/core/dataframe.py:1906, in DataFrame._repr_latex_(self)
   1904 @_cudf_nvtx_annotate
   1905 def _repr_latex_(self):
-> 1906     return self._get_renderable_dataframe().to_pandas()._repr_latex_()

File ~/miniconda3/envs/rapids-23.04-release2/lib/python3.10/site-packages/cudf/core/dataframe.py:1875, in DataFrame._get_renderable_dataframe(self)
   1873     upper = cudf.concat([upper_left, upper_right], axis=1)
   1874     lower = cudf.concat([lower_left, lower_right], axis=1)
...
File column.pyx:380, in cudf._lib.column.Column._view()

File types.pyx:241, in cudf._lib.types.dtype_to_data_type()

TypeError: data type 'geometry' not understood

Environment details

No response

Other/Misc.

After erroring, in a Jupyter Notebook it will still then print out the GDF info.

                                              geometry
0    LINESTRING (-45.35532 -6.81397, 20.22481 -61.0...
1    LINESTRING (8.74951 -82.15693, -76.63895 -31.9...
2    LINESTRING (-69.24599 -77.87897, -42.02738 -11...
3    LINESTRING (-84.89946 -44.72954, -29.47745 -64...
4    LINESTRING (162.67521 -50.75755, -84.80769 82....
..                                                 ...
995  LINESTRING (-167.43410 19.66316, -158.42750 13...
996  LINESTRING (-9.51915 15.09780, 144.76755 68.10...
997  LINESTRING (61.13300 -80.36317, 28.90036 58.69...
998  LINESTRING (20.38292 -71.43508, -64.48469 11.0...
999  LINESTRING (-14.11138 30.30998, -88.91027 -88....

[1000 rows x 1 columns]
(GPU)
@jarmak-nv jarmak-nv added bug Something isn't working Needs Triage Need team to review and classify labels May 5, 2023
@thomcom
Copy link
Contributor

thomcom commented May 5, 2023

This error has to do with how cudf slices up dataframes with more rows than pd.options.display.max_rows. We need to implement _get_renderable_dataframe to avoid the issue.

You can avoid the issue by simply slicing your own dataframe before printing it, or using .head() or .tail()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Needs Triage Need team to review and classify
Projects
Status: Todo
Development

No branches or pull requests

2 participants