-
Notifications
You must be signed in to change notification settings - Fork 57
Add metadata and description of variables to output files #401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Entirely agreed. This is a key component of being good citizens of FAIR principles (https://www.go-fair.org/fair-principles/). #354 doesn't necessarily get us all the way there for that; our feature detection output will still be a Pandas DataFrame at the moment, which has frustratingly limited metadata options. We have a couple options for resolving that issue; we could simply output xarray if users input xarray rather than iris data. The issue there is that our users likely don't have a workflow set up for that xarray data (but they would have to opt into using xarray by changing their workflow anyway). We could also make it an option, and decide down the road whether to disable or make pandas non-default for output. After #354, but before 1.6.0 releases, I think we should make sure that we have an xarray output option with the appropriate metadata. Perhaps that would be a good topic for the tobathon next week. How we implement it (default or an option) would be a good discussion; I think there are reasonable points on both sides. Longer-term, we should have options (I think there's another issue for this) to output/combine into a single file, although that gets challenging with how large segmentation output can get. |
I think outputting xarray is the way to go because, as you say, with the xarray transition, users have to change their workflow anyhow. And yes, it is frustrating that pandas dataframes have so limited options for metadata, and a question that I think we have not discussed extensively is whether we only want to switch from iris to xarray or also replace all pandas dataframe operations internally. Pandas dataframes still have some very useful functionalities, so maybe it would make sense to output even the features as xarray but keep pandas internally? I am not sure about this.
Good idea, I also thought that this is something we could take up at the tobathon since it would be useful to get input from users who are not currently developers.
Do you mean something like our |
As part of the xarray transition, we should add some metadata and description of variables to the output files that are created with tobac. Part of it can be left to the user (e.g. the user-specific bulk statistics), but for projects like
MCSMIP
wheretobac
data is shared and published, it would be helpful to open the files and see what our definitions of variables are (e.g., what we currently only have listed here ).The text was updated successfully, but these errors were encountered: