Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A cf-xarray compliance checker? #366

Open
kthyng opened this issue Sep 30, 2022 · 9 comments
Open

A cf-xarray compliance checker? #366

kthyng opened this issue Sep 30, 2022 · 9 comments

Comments

@kthyng
Copy link
Contributor

kthyng commented Sep 30, 2022

Would something like this be in scope for cf-xarray? It would need to be fairly loosely defined, but maybe a minimum would be that a Dataset would have axes and coordinates all defined? Variables would need standard_names? Though some variables don't usually have standard names like maybe "angle" on a ROMS grid.

@dcherian
Copy link
Contributor

dcherian commented Oct 3, 2022

A number of these exist:

so i don't think we should reinvent it. It would be nice if we could run the checker on a Dataset using ds.cf.check(checker="ioos") for example

cc @ocefpaf

@malmans2
Copy link
Member

malmans2 commented Oct 3, 2022

For another project I've been looking at CF checkers last week, and it looks like all options are mostly command-line tools meant to check NetCDF files.

It would be great if cf-xarray allows to check any format supported by xarray and datasets that have not been written on disk. I also think it would be great to use other checkers in the backend, but looks like before doing it changes are needed in compliance-checker and cf-checker (i.e., the checkers only accept paths right now, they would have to accept xarray datasets as well).

@dcherian
Copy link
Contributor

dcherian commented Oct 3, 2022

It'd be nice to build an API connection, but worst case we can write a tiny dataset with all attributes to /tmp/check.nc and run that, and print the output to screen.

@ocefpaf
Copy link
Contributor

ocefpaf commented Oct 3, 2022

I have mixed feelings. While I don't want to overload cf-xarray with functionalities that exists elsewhere this could be a nice idea b/c:

  1. what @malmans2 said above
  2. compliance-checker is super verbose and sometimes you don't want a full CF check, just a bare bones "what is missing so I can plot this automatically, or load this data into analysis X." In a way, iris used to be like that but has become more and more restrictive with time.

I guess that, instead of becoming a compliance-checker cf-xarray could have a "verbose mode" where all the compliance issues would be printed when loading a dataset.

@dcherian
Copy link
Contributor

dcherian commented Oct 3, 2022

"what is missing so I can plot this automatically, or load this data into analysis X."

This is hard to define!

@ocefpaf
Copy link
Contributor

ocefpaf commented Oct 3, 2022

This is hard to define!

Indeed! That is why cc is super verbose, kind of all or nothing. However, @kthyng suggestion above looks like a nice start:

  1. axes and coordinates
  2. valid standard_names
  3. enough variables defined to compute say z for example

More than that we would get into the weeds of CF but those 3 lines ensure almost all of plotting with labels.

@kthyng
Copy link
Contributor Author

kthyng commented Oct 7, 2022

I wrote some tests for a package: https://github.com/NOAA-ORR-ERD/model_catalogs/blob/main/model_catalogs/tests/test_catalogs.py#L326-L369

When the models are read in with the package, they should be able to be used by cf-xarray in a basic way. I am finding I need this functionality again so that is when I thought it could be useful in cf-xarray itself. It could warn a user if no axes or coordinates are known for a Dataset/Array, and which data_vars do not have standard_names. I also like the connection @ocefpaf said for being able to calculate z.

@dcherian
Copy link
Contributor

dcherian commented Jun 5, 2024

NASA-specific compliance checker: https://github.com/eugenegesdisc/diwg-data-compliance-test

@DWesl
Copy link

DWesl commented Jun 25, 2024

This is hard to define!

Indeed! That is why cc is super verbose, kind of all or nothing. However, @kthyng suggestion above looks like a nice start:

  1. axes and coordinates
  2. valid standard_names

I'd suggest allowing long_names as an option, for those variables that aren't in the standard name table yet. You can add a warning pointing to the forum for adding standard names if you want to discourage long_name without standard_name.

  1. enough variables defined to compute say z for example

Everything mentioned in formula_terms or similar, at a guess? Or do you want enough information to convert from the model vertical coordinate to a geometric vertical coordinate?

More than that we would get into the weeds of CF but those 3 lines ensure almost all of plotting with labels.

I'd suggest a fourth check for units: it's possible to guess from values, but I like having that explicitly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants