Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error reading zarr store from an IPFS gateway #130

Open
lgloege opened this issue Sep 3, 2022 · 2 comments
Open

Error reading zarr store from an IPFS gateway #130

lgloege opened this issue Sep 3, 2022 · 2 comments

Comments

@lgloege
Copy link

lgloege commented Sep 3, 2022

Thank you for building intake-xarray, this is an awesome package! I am having trouble reading a zarr datastore via an IPFS gateway. I am trying to read a NOAA SST dataset with an intake catalog.

This code reads a NOAA SST dataset with just xarray

import xarray as xr
zarr_store = "https://dweb.link/ipfs/bafybeiepna7ilkhdwykd65i7aovmrapnmaifullesxg5pdsjy6cos5qfrm/"
ds = xr.open_dataset(zarr_store, engine="zarr")

This code works as expected. Now I want to build an intake catalog to read this file. I wrote the following simple catalog catalog_ipfs.yaml

---
plugins:
  source:
    - module: intake_xarray

sources:
  SST:      
    args:
      urlpath: https://dweb.link/ipfs/bafybeiepna7ilkhdwykd65i7aovmrapnmaifullesxg5pdsjy6cos5qfrm/
      xarray_kwargs: 
          engine: zarr
    driver: intake_xarray.netcdf.NetCDFSource

I then try reading data from it using the following code:

import intake
import xarray as xr
catolog = './catalog_ipfs.yaml'
cat = intake.open_catalog(catolog)
ds = cat.SST().read()

When I run this code I get this ValueError

ValueError: Starting with Zarr 2.11.0, stores must be subclasses of BaseStore, if your store exposes the MutableMapping interface wrap it in Zarr.storage.KVStore. Got <File-like object HTTPFileSystem, https://dweb.link/ipfs/bafybeiepna7ilkhdwykd65i7aovmrapnmaifullesxg5pdsjy6cos5qfrm/>

Any thoughts on how to resolve this issue? I am confused because I thought in the background intake_xarray was just doing this xr.open_dataset("https://dweb.link/ipfs/bafybeiepna7ilkhdwykd65i7aovmrapnmaifullesxg5pdsjy6cos5qfrm/", engine="zarr"), which I know works from the above example.

Here is the version of each package I am using:

intake.__version__ = '0.6.6'
intake_xarray.__version__ = '0.6.1'
xr.__version__ = '2022.6.0'
zarr.__version__ = '2.12.0'

I appreciate any help you can provide, thanks!

@observingClouds
Copy link

Hi @lgloege
I'm excited to see that more and more data is hosted on IPFS. Have you tried to access this data directly via the IPFS protocol and not via http? Here is an example on how the syntax looks like. This way you would also be independent of the gateway. You will need to install ipfsspec though.

@observingClouds
Copy link

driver: intake_xarray.netcdf.NetCDFSource looks also suspicious. I would have rather written

sources:
  SST:      
    args:
      urlpath: https://dweb.link/ipfs/bafybeiepna7ilkhdwykd65i7aovmrapnmaifullesxg5pdsjy6cos5qfrm/
    driver: zarr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants