Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

recursively combine items from catalog using xarray.auto_combine #29

Open
rabernat opened this issue Jan 20, 2019 · 5 comments
Open

recursively combine items from catalog using xarray.auto_combine #29

rabernat opened this issue Jan 20, 2019 · 5 comments

Comments

@rabernat
Copy link

Xarray has a function called auto_combine which takes several datasets and combines them into one, using a series of heuristics to figure out the best way to merge. This is used internally in open_mfdataset.

It would be quite cool if I could take an intake catalog containing many xarray datasets in a hierarchy and open any point of this hierarchy into a single xarray dataset.

Sorry for the vague issue, but I just wanted to jot this idea down before I forgot it. Could be very useful in multiple contexts (e.g. intake/intake-thredds#2).

@martindurant
Copy link
Member

I believe the AliasSource might be a good model to derive from for this, where you pass the original catalogue plus some parameters to the CombinedXarraySource (or whatever) and it instantiates the xarrays from the cat for each input parameter and then calls auto_combine.

@rabernat
Copy link
Author

@martindurant, I'm afraid I can't understand how to use AliasSource. Could you give an example?

@martindurant
Copy link
Member

I'll try to knock something up for you, @rabernat . However, were you expecting a source which could

  • make a combined xarray out of several that have already been defined in a catalog; perhaps all the entries in a catalog matching some name pattern or condition, or
  • something that can automatically combine all sources, say, within a thredds server matching a path or other condition
  • something else?

@martindurant
Copy link
Member

Sorry this has slipped through the net.
Is there still a need here? I see that in some intake-related repos, there are already ways to combine xarray-compatible datasets, but maybe we still want something in intake-xarray itself.

@pbranson
Copy link

This is of interest to me also.

We have several datasets comprised of hourly output of nd-gridded data in netCDF format (10's of thousands of files).

I have played with defining the urlpath with parameters, which works will to open one time-point. Using urlpath with a glob pattern also works, ultimately calling open_mfdataset to get all the metadata.

I have looked through the code of intake-esm to try and get a better understanding of how that works too.

The metadata (aside from the time coordinate) is consistent across the files, so theoretically I should be able to read the first file and infer from the filenames the complete stack. But I am failing to understand how to implement this in Intake and would appreciate some pointers.

From a user perspective the funcationality I am going for is:

cat = intake.Catalog('catalog.yaml')
ds=ca['dataset'].to_xarray()
cropped=ds.sel(lon=slice(110,113),lat=slice(-33,-30),time=slice('15-1-2019','15-2-2019')
cropped.to_netcdf(...)

Should I write a plugin using the DataSourceMixin? Im trying to work out how the file paths are mapped into the metadata for the concatenated (time in my case) coordinate.

Thanks for any suggesttions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants