Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Project Proposal]: Various IOOS Compliance Checker topics #40

Open
jcermauwedu opened this issue Mar 19, 2024 · 22 comments
Open

[Project Proposal]: Various IOOS Compliance Checker topics #40

jcermauwedu opened this issue Mar 19, 2024 · 22 comments
Assignees
Labels
2024 Topic to be executed during 2024 event code sprint topic Proposed topic for a code sprint activity

Comments

@jcermauwedu
Copy link

jcermauwedu commented Mar 19, 2024

Project Description

At the IOOS DMAC, it was generally agreed that there could be work put into the IOOS Compliance Checker. Additional IOOS toolsets may also receive beneficial updates with related work.

General topics:

  • Tackling current Github issues & pull requests
  • Standards
  • Test Suites
  • Community Engagement
  • Features

Standards

  • Consideration and addition of a plug-in for OG-1.0 data format
  • CF support and UGRID

Test Suite

Solicitation of participation in creation of example datasets with application of the OG-1.0 data format. The published document as it stands.

The example datasets will also need to be assessed for interoperability issues with CF, ACDD and NCEI.

A personal goal for this project is to continue work on acoustic type datasets with focus on the OG-1 data format and resolve or create additional issues for the IOOS Compliance Checker.

As time permits, examine impacts on glider processing packages with utilization of the OG-1.0 data format.

Community Engagement

GOAL: Increase community involvement in this and other IOOS toolsets.

NOTE: These topics could also serve as templates to other IOOS toolsets.

  • On-boarding new community members
  • Issue submission templates (e.g. bug reporting)

Features

  • Testing using doi references?
  • Catalog testing?
  • Allow testing to work through an API?

Expected Outcomes

Community Engagement

  • Review successful Github communities
  • Create some lightweight issue templates

Code and Documentation

  • Finish clean up of pytest for IOOS Compliance Checker and pocean-core.
  • Roughing in a new plugin for OG-1.0 data format for the IOOS Compliance Checker.
  • Participants will learn how to run tests, test in groups, troubleshoot tests and run individual tests.
  • Increased documentation for IOOS toolsets

Standards

OG-1.0 Examples

  • Generate example as many unique datasets as needed to exercise the OG-1.0 data format. Working towards a set of gold standard examples to guide and test the future OG-1.0 compliance tester.
  • Review of the IOOS compliance checker/plugins and at least brainstorm design elements to bring this toolset forward with the potential addition of a potential OG-1.0 plug in. Attempt to close out current issues and other others as necessary.
  • Discuss other impacts of the OG-1.0 data format on existing toolsets utilized by the community. E.g. IOOS Compliance Tester, pocean-core, erddap
  • Report back to OGC with any recommendations on the OG-1.0 data format. Share any additional recommendations back to respective toolset developers.

Skills required

It would be useful to have working knowledge of python and knowledge of the netCDF4, xarray and pytest packages.

Expertise

Novice

Topic Lead(s)

@jcermauwedu

It would be great to have co-leaders to share experiences with related issues.

Relevant links

Discussion and Issues

OceanGlidersCommunity/OG-format-user-manual#165
OceanGlidersCommunity/OG-format-user-manual#92
OceanGlidersCommunity/OG-format-user-manual#172

Software

https://docs.python.org/3/
https://docs.xarray.dev/en/stable/ (https://unidata.github.io/netcdf4-python/)
https://docs.pytest.org/en/8.0.x/
https://github.com/ioos/compliance-checker
https://github.com/pyoceans/pocean-core
https://github.com/ERDDAP/erddap

Example datasets and templates

https://www.ncei.noaa.gov/netcdf-templates
https://github.com/ERDDAP/erddapTest
http://test.opendap.org/

@jcermauwedu jcermauwedu added the code sprint topic Proposed topic for a code sprint activity label Mar 19, 2024
@callumrollo
Copy link
Contributor

I'd be interested in contributing to this alongside #34

@mwengren mwengren added the 2024 Topic to be executed during 2024 event label Mar 29, 2024
@benjwadams
Copy link

I'm a maintainer on IOOS Compliance Checker and would be happy to help with this.

@ChrisBarker-NOAA
Copy link

Perhaps one focused sub-topic could be focussing on the NOS OFSs, which, as a rule are not CF compliant.

  • Run them through the compliance checker(s)
  • And by hand
  • Document what ways they are not compliant, and what needs to be done to bring them into compliance.

@ChrisBarker-NOAA
Copy link

UGRID has recently been added to CF, as of 1.11.

I don't think the compliance checker(s) have kept up. There are a couple out there for UGRID, but I'm not sure of the status:

https://github.com/pp-mo/ugrid-checks

https://github.com/ioos/cc-plugin-ugrid

@jcermauwedu jcermauwedu changed the title [Project Proposal]: Generate dataset examples for OG-1 data format and examine potential impacts on the IOOS compliance checker [Project Proposal]: Various IOOS Compliance Checker topics Apr 4, 2024
@jcermauwedu
Copy link
Author

Thank you for all the feedback. I will continue to iterate on this proposal as additional feedback rolls in.

@ocefpaf
Copy link
Member

ocefpaf commented Apr 26, 2024

@jcermauwedu I'd ike to experiment with OG standards as a compliance-checker plugin to check its feasibility during the code sprint. If we can leverage the existing cc-glider-plugin it may be helpful, if not, maybe we can cross this idea out and move to the next one.

@jcermauwedu
Copy link
Author

I think this should be fairly straightforward to clone the glider plugin to create a cc-og-plugin.

@jcermauwedu
Copy link
Author

@callumrollo @ocefpaf As was mentioned today, there are three areas of interest: (1) OG plugin, (2) CF, (3) improving community engagement. We can bootstrap the OG plugin with one or more tests and prepare it for the eventual release of information later in June 2024. There is plenty to do.

@MathewBiddle
Copy link
Contributor

Thank you for taking the time to propose this topic! From the Code Sprint topic survey, this has garnered a lot of interest.

Following the contributing guidelines on selecting a code sprint topic I have assigned this topic to @jcermauwedu . Unless indicated otherwise, the assignee will be responsible for identifying a plan for the code sprint topic, establishing a team, and taking the lead on executing said plan. The first action for the lead is to:

@jcermauwedu
Copy link
Author

@MathewBiddle The code of conduct link on Contributing: Ground Rules gives a 404.

@MathewBiddle
Copy link
Contributor

Webpage https://ioos.github.io/ioos-code-sprint/2024/topics/02-compliance-checker-topics.html

Thanks for the heads up on the Code of Conduct. We are discussing what an organization wide one should be in this issue ioos/.github#10

@jcermauwedu
Copy link
Author

I think this should be fairly straightforward to clone the glider plugin to create a cc-og-plugin.

I utilized some boilerplate framework from the glider and ugrid plugin to create the OG plugin. @ocefpaf : Would you create a IOOS new repo with the Apache 2 license? cc-plugin-og? I will copy the boilerplate code over to it. REF: https://github.com/uw-farlab/cc-plugin-og

The basic operation seems functional. It just needs to be populated with proper content.

$ compliance-checker -l
IOOS compliance checker available checker suites:
 - OG:1.0
 - UGRID:2.0
 - acdd:1.1
 - acdd:1.3
 - cf:1.6
...

$ compliance-checker -t OG -D
====
 OG 
====
- check_basic_requirements

  Check basic OG stated conventions.
   * Format follows the CF 1.8 convention.
   * Format follows the ACDD 1.3 convention.
   * Variables are identified in capital letters.
   * Attributes are identified in lower case.

$ pytest
=========================================================== test session starts ============================================================
platform linux -- Python 3.11.9, pytest-8.2.0, pluggy-1.5.0
rootdir: /home/portal/src/cc-plugin-og
configfile: pyproject.toml
plugins: flake8-1.1.1, requests-mock-1.12.1, time-machine-2.14.1
collected 1 item                                                                                                                           

tests/test_basicchecks.py .                                                                                                          [100%]

============================================================ 1 passed in 0.16s =============================================================

@ocefpaf
Copy link
Member

ocefpaf commented May 8, 2024

@ocefpaf : Would you create a IOOS new repo with the Apache 2 license? cc-plugin-og? I will copy the boilerplate code over to it. REF: https://github.com/uw-farlab/cc-plugin-og

I don't have admin privileges to create repos but, while I do believe that we should move that to IOOS at some point, it is nice to keep it under your an account where you (we?) have more control. When the project is kind of mature we can move it to IOOS. What do you think @MathewBiddle?

@MathewBiddle
Copy link
Contributor

I agree with @ocefpaf. Once ready, feel free to submit a "New IOOS Repository Request" using the issue form linked at
https://github.com/ioos/governance/issues/new/choose

@jcermauwedu
Copy link
Author

Perhaps one focused sub-topic could be focussing on the NOS OFSs, which, as a rule are not CF compliant.

  • Run them through the compliance checker(s)
  • And by hand
  • Document what ways they are not compliant, and what needs to be done to bring them into compliance.

@ChrisBarker-NOAA @dpsnowden Is there URL to source some of these datasets for checking? Where should the feedback go?

@ChrisBarker-NOAA
Copy link

ChrisBarker-NOAA commented May 21, 2024

@jcermauwedu: Yes please!

The OFSs are all served up via TDS servers and also on AWS.

The AWS ones are here:

https://noaa-nos-ofs-pds.s3.amazonaws.com/index.html

It would be nice to have a complete list, maybe a utility to download a set, or ...

Where might that go?

In the compliance checker repo?

Maybe a new repo specifically for OFS compliance?

@jcermauwedu
Copy link
Author

jcermauwedu commented May 22, 2024

Quite a hunting expedition to find an unstructured grid example. There is a ton of data out there. Finally located an example.

$ wget https://noaa-nos-ofs-pds.s3.amazonaws.com/sfbofs/netcdf/202405/nos.sfbofs.fields.n006.20240522.t03z.nc

$ ugrid-checker nos.sfbofs.fields.n006.20240522.t03z.nc 

UGRID conformance checks complete.

List of checker messages :
  *** FAIL R502 : Mesh data variable "u" has mesh="fvcom_mesh", which is not a variable in the dataset.
  *** FAIL R509 : Mesh data variable "u" has dimensions ('time', 'siglay', 'nele'), of which 0 are mesh dimensions, instead of 1.
  *** FAIL R502 : Mesh data variable "v" has mesh="fvcom_mesh", which is not a variable in the dataset.
  *** FAIL R509 : Mesh data variable "v" has dimensions ('time', 'siglay', 'nele'), of which 0 are mesh dimensions, instead of 1.
  *** FAIL R502 : Mesh data variable "tauc" has mesh="fvcom_mesh", which is not a variable in the dataset.
  *** FAIL R509 : Mesh data variable "tauc" has dimensions ('time', 'nele'), of which 0 are mesh dimensions, instead of 1.
  *** FAIL R502 : Mesh data variable "temp" has mesh="fvcom_mesh", which is not a variable in the dataset.
  *** FAIL R509 : Mesh data variable "temp" has dimensions ('time', 'siglay', 'node'), of which 0 are mesh dimensions, instead of 1.
  *** FAIL R502 : Mesh data variable "salinity" has mesh="fvcom_mesh", which is not a variable in the dataset.
  *** FAIL R509 : Mesh data variable "salinity" has dimensions ('time', 'siglay', 'node'), of which 0 are mesh dimensions, instead of 1.
  *** FAIL R502 : Mesh data variable "short_wave" has mesh="fvcom_mesh", which is not a variable in the dataset.
  *** FAIL R509 : Mesh data variable "short_wave" has dimensions ('time', 'node'), of which 0 are mesh dimensions, instead of 1.
  *** FAIL R502 : Mesh data variable "net_heat_flux" has mesh="fvcom_mesh", which is not a variable in the dataset.
  *** FAIL R509 : Mesh data variable "net_heat_flux" has dimensions ('time', 'node'), of which 0 are mesh dimensions, instead of 1.
  *** FAIL R502 : Mesh data variable "uwind_speed" has mesh="fvcom_mesh", which is not a variable in the dataset.
  *** FAIL R509 : Mesh data variable "uwind_speed" has dimensions ('time', 'nele'), of which 0 are mesh dimensions, instead of 1.
  *** FAIL R502 : Mesh data variable "vwind_speed" has mesh="fvcom_mesh", which is not a variable in the dataset.
  *** FAIL R509 : Mesh data variable "vwind_speed" has dimensions ('time', 'nele'), of which 0 are mesh dimensions, instead of 1.
  *** FAIL R502 : Mesh data variable "wet_nodes" has mesh="fvcom_mesh", which is not a variable in the dataset.
  *** FAIL R509 : Mesh data variable "wet_nodes" has dimensions ('time', 'node'), of which 0 are mesh dimensions, instead of 1.
  *** FAIL R502 : Mesh data variable "wet_cells" has mesh="fvcom_mesh", which is not a variable in the dataset.
  *** FAIL R509 : Mesh data variable "wet_cells" has dimensions ('time', 'nele'), of which 0 are mesh dimensions, instead of 1.
  *** FAIL R502 : Mesh data variable "wet_nodes_prev_int" has mesh="fvcom_mesh", which is not a variable in the dataset.
  *** FAIL R509 : Mesh data variable "wet_nodes_prev_int" has dimensions ('time', 'node'), of which 0 are mesh dimensions, instead of 1.
  *** FAIL R502 : Mesh data variable "wet_cells_prev_int" has mesh="fvcom_mesh", which is not a variable in the dataset.
  *** FAIL R509 : Mesh data variable "wet_cells_prev_int" has dimensions ('time', 'nele'), of which 0 are mesh dimensions, instead of 1.
  ... WARN A903 : dataset has Conventions="CF-1.0", which does not contain a UGRID convention statement of the form "UGRID-<major>.<minor>".

Total of 27 problems logged :
  26 Rxxx requirement failures
  1 Axxx advisory recommendation warnings

Done.

Error codes and conformance documentation for the ugrid-checks code: https://ugrid-conventions.github.io/ugrid-conventions/conformance/

REF: https://github.com/pp-mo/ugrid-checks

Compliance Checker UGRID:2.0 response:

$ compliance-checker -t UGRID:2.0 nos.sfbofs.fields.n006.20240522.t03z.nc 
Running Compliance Checker on the datasets from: ['nos.sfbofs.fields.n006.20240522.t03z.nc']


--------------------------------------------------------------------------------
                         IOOS Compliance Checker Report                         
                          Version 5.1.2.dev30+gf543e4f                          
                     Report generated 2024-05-22T21:25:29Z                      
                                   UGRID:2.0                                    
                    https://github.com/ioos/cc-plugin-ugrid                     
--------------------------------------------------------------------------------
                               Corrective Actions                               
nos.sfbofs.fields.n006.20240522.t03z.nc has 1 potential issue


                               Highly Recommended                               
--------------------------------------------------------------------------------
Run UGRID checks if mesh variables are present in the data
* No mesh variables are detected in the data; all checks fail.

@ChrisBarker-NOAA
Copy link

and lots of problems with it ;-)

If you want some smaller examples, you can use the OFS Subsetter:

SFBOFS and NGOFS2 are ugrids (and the great lakes ones, I think)

Not a good UI, but it works.

@jcermauwedu
Copy link
Author

It looks like for those, the mesh/grid is not contained within the netCDF file. They are external in the "OFS_Grid_Datum" directory? Which is why the packages are not detecting a mesh variable?

$ head -5 sfbofs.2dm 
MESH2D
MESHNAME SFBOFS              
E3T       1     93      1      2      1
E3T       2     93     92      1      1
E3T       3      3     93      2      1

@jcermauwedu
Copy link
Author

There was an initial plan to automatically call the CF tests if the OG format was called on for testing. Why not just enable it on the command line:

$ compliance-checker -v -t og:1.0 -t cf:1.8 ~/src/upstream/data/sea076_20230906T0852_R.nc
Running Compliance Checker on the datasets from: ['/home/portal/src/upstream/data/sea076_20230906T0852_R.nc']
Using cached standard name table v49 from /home/portal/.local/share/compliance-checker/cf-standard-name-table-test-49.xml


--------------------------------------------------------------------------------
                         IOOS Compliance Checker Report                         
                          Version 5.1.2.dev30+gf543e4f                          
                     Report generated 2024-05-23T01:16:17Z                      
                                     cf:1.8                                     
http://cfconventions.org/Data/cf-conventions/cf-conventions-1.8/cf-conventions.html
--------------------------------------------------------------------------------
All tests passed!


--------------------------------------------------------------------------------
                         IOOS Compliance Checker Report                         
                          Version 5.1.2.dev30+gf543e4f                          
                     Report generated 2024-05-23T01:16:17Z                      
                                     og:1.0                                     
  https://oceangliderscommunity.github.io/OG-format-user-manual/OG_Format.html  
--------------------------------------------------------------------------------
                               Corrective Actions                               
sea076_20230906T0852_R.nc has 2 potential issues


                                   Mandatory                                    
--------------------------------------------------------------------------------
Check that all attribute names are lowercase.
* Global attribute Metadata_Conventions should be lowercase: metadata_conventions
* Variable CNDC attribute URI should be lowercase: uri
* Variable DOXY attribute URI should be lowercase: uri
* Variable PRES attribute URI should be lowercase: uri
* Variable PSAL attribute URI should be lowercase: uri
* Variable TEMP attribute URI should be lowercase: uri
* Variable DENSITY attribute URI should be lowercase: uri
* Variable TIME attribute URI should be lowercase: uri
* Variable TIME_GPS attribute URI should be lowercase: uri

Missing mandatory variables.
* Variable PLATFORM_SERIAL_NUMBER is missing

The work on CF-1.9 is not fully complete, but it is complete enough for use in testing the OG 1.0 requirements at the CF-1.9 and CF-1.10 level. The class just needs to be enabled in the compliance checker. It was mentioned that once the CF-1.9 work is completed, we can also enable it for CF-1.10 as there is not much difference between the two versions. UGRID will be included for CF-1.11.

The above test is now testing four distinct rulesets for Ocean Gliders 1.0:

  • All variable names are capitalized
  • All attribute variables are lowercase - with some exceptions.
  • Checking for mandatory variables.
  • The ability to check against another referenced standard. Not quite CF-1.10 yet :)

@ChrisBarker-NOAA
Copy link

“It looks like for those, the mesh/grid is not contained within the netCDF file.”

Something is off - by “those” do you mean from the OFS subsetter? They have always been complete for me.

I’ll try to get you one.

@jcermauwedu
Copy link
Author

Something else to trudge through Thursday. Instrument representation appears to be different for OG 1.0 vs CF. The IOOS checker checks each variable for an instrument attribute to attach it to an instrument or package with recording multiple variables (CTD). The OG 1.0 format does the reverse. There is a list of instruments mapped to the list of variables defined as such below. This has caused an issue for the CF checker. The use of the instrument is first defined in the IOOS Glider DAC netCDF 2.0 format specification under the dimensionless container variable types.

global attribute:

string :instrument = "WET Labs {Sea-Bird WETLabs} ECO Puck Triplet BBFL2-IRB scattering fluorescence
 sensor", "Oxygen optode 4831", "Unpumped CT sail CTD", "Seaglider M1 Glider data logger" ;

variables:

PARAMETER = "TEMP_CPU_CHLA", "FLUORESCENCE_CHLA", "DPHASE_DOXY", 
    "MOLAR_DOXY", "TPHASE_DOXY", "TEMP_DOXY", "OXYSAT_DOXY", "PRES", 
    "SIGMA_T", "CNDC", "TEMP", "PSAL", "LATITUDE_GPS", "LONGITUDE_GPS", 
    "GLIDER_ROLL", "LATITUDE", "GLIDER_PITCH", "LONGITUDE", "GLIDER_DEPTH" ;

 PARAMETER_SENSOR = 
    "WET Labs {Sea-Bird WETLabs} ECO Puck Triplet BBFL2-IRB scattering fluorescence sensor", 
    "WET Labs {Sea-Bird WETLabs} ECO Puck Triplet BBFL2-IRB scattering fluorescence sensor", 
    "Oxygen optode 4831", "Oxygen optode 4831", "Oxygen optode 4831", 
    "Oxygen optode 4831", "Oxygen optode 4831", "Unpumped CT sail CTD", 
    "Unpumped CT sail CTD", "Unpumped CT sail CTD", "Unpumped CT sail CTD", 
    "Unpumped CT sail CTD", "Seaglider M1 Glider data logger", 
    "Seaglider M1 Glider data logger", "Seaglider M1 Glider data logger", 
    "Seaglider M1 Glider data logger", "Seaglider M1 Glider data logger", 
    "Seaglider M1 Glider data logger", "Seaglider M1 Glider data logger" ;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2024 Topic to be executed during 2024 event code sprint topic Proposed topic for a code sprint activity
Projects
None yet
Development

No branches or pull requests

7 participants