Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Break up intake-xarray into separate drivers? #61

Open
danielballan opened this issue Feb 28, 2020 · 6 comments
Open

Break up intake-xarray into separate drivers? #61

danielballan opened this issue Feb 28, 2020 · 6 comments

Comments

@danielballan
Copy link
Member

Intake-xarray lumps together several different I/O backends. Grouping drivers by what they return (xarray in this case) is different from the usual pattern. We propose to break intake-xarray into intake-netcdf, intake-tiff with rasterio support (and maybe also tifffile support?), intake-png, and intake-zarr. We can maintain intake-xarray going forward as a metapackage that depends on these as a convenience and for back-compat.

@martindurant
Copy link
Member

You may well have a point, but the original rationale was: that all these call xarray loader function

@danielballan
Copy link
Member Author

Yes, I think that is sensible early on, just as we currently package a bunch of unrelated drivers together in databroker._drivers with the intention of splitting them up once things stabilize. Maybe this is worth doing eventually, with some shared utility library containing the xarray loader.

@martindurant martindurant transferred this issue from intake/intake Mar 10, 2020
@jbednar
Copy link
Contributor

jbednar commented Mar 10, 2020

Seems fine to me...

@joshmoore
Copy link

Happy to write it up as a separate issue, but if this would help load zarrs which were not written with xarray, I'd also be in favor.

I'm currently running into a KeyError on _ARRAY_DIMENSIONS.

Stacktrace
Traceback (most recent call last):
  File "/opt/anaconda/envs/py36/lib/python3.6/site-packages/xarray/backends/zarr.py", line 163, in _get_zarr_dims_and_attrs
    dimensions = zarr_obj.attrs[dimension_key]
  File "/Users/jamoore/opt/zarr/zarr/attrs.py", line 64, in __getitem__
    return self.asdict()[item]
KeyError: '_ARRAY_DIMENSIONS'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "intake_test.py", line 3, in <module>
    ds = cat.idr_ebi_6001240.to_dask()
  File "/opt/anaconda/envs/py36/lib/python3.6/site-packages/intake_xarray/base.py", line 69, in to_dask
    return self.read_chunked()
  File "/opt/anaconda/envs/py36/lib/python3.6/site-packages/intake_xarray/base.py", line 44, in read_chunked
    self._load_metadata()
  File "/opt/anaconda/envs/py36/lib/python3.6/site-packages/intake/source/base.py", line 117, in _load_metadata
    self._schema = self._get_schema()
  File "/opt/anaconda/envs/py36/lib/python3.6/site-packages/intake_xarray/base.py", line 18, in _get_schema
    self._open_dataset()
  File "/opt/anaconda/envs/py36/lib/python3.6/site-packages/intake_xarray/xzarr.py", line 31, in _open_dataset
    self._ds = xr.open_zarr(self._mapper, **self.kwargs)
  File "/opt/anaconda/envs/py36/lib/python3.6/site-packages/xarray/backends/zarr.py", line 599, in open_zarr
    ds = maybe_decode_store(zarr_store)
  File "/opt/anaconda/envs/py36/lib/python3.6/site-packages/xarray/backends/zarr.py", line 582, in maybe_decode_store
    drop_variables=drop_variables,
  File "/opt/anaconda/envs/py36/lib/python3.6/site-packages/xarray/conventions.py", line 570, in decode_cf
    vars, attrs = obj.load()
  File "/opt/anaconda/envs/py36/lib/python3.6/site-packages/xarray/backends/common.py", line 123, in load
    (_decode_variable_name(k), v) for k, v in self.get_variables().items()
  File "/opt/anaconda/envs/py36/lib/python3.6/site-packages/xarray/backends/zarr.py", line 290, in get_variables
    (k, self.open_store_variable(k, v)) for k, v in self.ds.arrays()
  File "/opt/anaconda/envs/py36/lib/python3.6/site-packages/xarray/core/utils.py", line 402, in FrozenDict
    return Frozen(dict(*args, **kwargs))
  File "/opt/anaconda/envs/py36/lib/python3.6/site-packages/xarray/backends/zarr.py", line 290, in <genexpr>
    (k, self.open_store_variable(k, v)) for k, v in self.ds.arrays()
  File "/opt/anaconda/envs/py36/lib/python3.6/site-packages/xarray/backends/zarr.py", line 274, in open_store_variable
    dimensions, attributes = _get_zarr_dims_and_attrs(zarr_array, _DIMENSION_KEY)
  File "/opt/anaconda/envs/py36/lib/python3.6/site-packages/xarray/backends/zarr.py", line 167, in _get_zarr_dims_and_attrs
    "required for xarray to determine variable dimensions." % (dimension_key)
KeyError: 'Zarr object is missing the attribute `_ARRAY_DIMENSIONS`, which is required for xarray to determine variable dimensions.'

@martindurant
Copy link
Member

load zarrs which were not written with xarray

LIke intake.source.zarr.ZarrArraySource ("ndzarr")?

@joshmoore
Copy link

Thanks, @martindurant. ZarrArraySource does work for my data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants