Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to template path expansion for opendap sources #63

Open
jhamman opened this issue Mar 26, 2020 · 13 comments
Open

how to template path expansion for opendap sources #63

jhamman opened this issue Mar 26, 2020 · 13 comments

Comments

@jhamman
Copy link

jhamman commented Mar 26, 2020

Suppose I have a OpeNDAP url like this:

urlpath: http://thredds..../MET/{{ variable }}/{{ variable }}_{{ '%04d' % year }}.nc

and I want to template the expansion of the year variable from 1979..2020 so that all years are concatenated into a single dataset. How would I do this?

For reference, here is a working intake datasource:

  gridmet_opendap:
    description: 'GRIDMET data from OpeNDAP'
    args:
      urlpath: http://thredds.northwestknowledge.net:8080/thredds/dodsC/MET/{{ variable }}/{{ variable }}_{{ '%04d' % year }}.nc
      auth: urs
      chunks:
        lat: 585
        lon: 1386
    driver: opendap
    parameters:
      variable:
        description: climate variable
        type: str
        default: pr
        allowed: ["pr", "tmmn", "tmmx"]
      year:
        description: year
        type: int
        default: 2000
@martindurant
Copy link
Member

I assume you need to explicitly declare your pattern (path_as_pattern?), but I am not sure of the syntax.
@jsignell ?

@jsignell
Copy link
Member

I think you just need to get rid of this chunk:

      year:
        description: year
        type: int
        default: 2000

@jhamman
Copy link
Author

jhamman commented Mar 30, 2020

I don't think that will work:

UndefinedError: 'year' is undefined

How would intake-xarray know to expand year without some configuration? OpeNDAP doesn't support globbing so unless we have someway to tell it what the list of urls should be, I expect we will end up with messages like:

HTTPError: 404 Not Found
Error {
    code = 404;
    message = "MET/pr/pr_*.nc";
};

@martindurant
Copy link
Member

The whole point of path_as_pattern is to do globs and use the results to fill out the variables. The opendap source doesn't handle multiple target URLs at all.
So in this case, you want the number 2000 to appear in the URL and to have it also be a length-one coordinate; all so that then you can do the concat in your own code?

@jhamman
Copy link
Author

jhamman commented Mar 30, 2020

I'm trying to avoid this pattern:

years = range(1979, 2020)
ds = xr.concat([cat.climate.gridmet_opendap(year=year).to_dask() for year in years], dim='day')

If this was a filesystem, I could have intake concat the individual years together using a simple glob pattern.

@martindurant
Copy link
Member

Your solution doesn't look too bad, though ;)
Given the lack of glob for opendap, you could write this into the driver, or write a new driver specifically for the list-of-urls case. You could only coerce the existing path_as_pattern code if you could find a way to glob the URLs.

@jsignell
Copy link
Member

Ah I see. Well you could pass a list of urls and a path_as_pattern.

@jhamman
Copy link
Author

jhamman commented Mar 31, 2020

As I've thought about this more, I've realized what I'm really after is a new feature in intake that would allow me to specify the range of a parameter:

      year:
        description: year
        type: int
        range:
          min: 1979
          max: 2020

When the range key is present, intake would essentially construct a list of urls. Thoughts on how this may work out?

@jsignell
Copy link
Member

I see what you mean, but I think that would have limited utility and introduce complexity into an already pretty brittle system. For instance say you have a couple of these things (for month and year) what happens when certain combinations don't exist (no year: 2020, month: 12)?

Side note: I would think range would indicate the allowable min-max not fill in all the values between.

@jbednar
Copy link
Contributor

jbednar commented Mar 31, 2020

Side note: I would think range would indicate the allowable min-max not fill in all the values between.

Right -- range doesn't indicate how to fill in; the data may be from every year (a good default!), but e.g. Census data would only be every 10 years, so filling in every integer value is only a default, not a full solution...

@martindurant
Copy link
Member

martindurant commented Mar 31, 2020

I would think range would indicate the allowable min-max not fill in all the values between.

That is exactly what it means at the moment. I might imagine more complex kinds of parameter expansion through the Intake parameter system, but it would need to be pretty sophisticated. Right now, it replaces one value in a string or gives a complete replacement value (where the type is not string). A new block for producing the list-of-strings maybe would be like

    args:
      auth: urs
      chunks:
        lat: 585
        lon: 1386
    driver: opendap
    parameters:
      variable:
        description: climate variable
        type: str
        default: pr
        allowed: ["pr", "tmmn", "tmmx"]
      urlpath: 
        template: "http://thredds.northwestknowledge.net:8080/thredds/dodsC/MET/{{ variable }}/{{ variable }}_{{ '%04d' % year }}.nc"
        description: year
        type: expand_range
        range:
          min: 1979
          max: 2020
          step: 1

(where the templating would need to be recursive, because variable must be substituted into the string before the string->list expansion is done)

@jsignell
Copy link
Member

jsignell commented Apr 1, 2020

What we really want if for list comprehension to be allowed in catalogs right? Perhaps this is just a case where yaml isn't the right format.

@aaronspring
Copy link
Collaborator

If you get a catalog.xml for that thredds server, you can now use intake-thredds https://intake-thredds.readthedocs.io/en/latest/tutorial.html#loading-a-catalog

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants