Skip to content

Conversation

@VeckoTheGecko
Copy link
Contributor

@VeckoTheGecko VeckoTheGecko commented Dec 2, 2025

SGrid acts as a standard that we can use to encode information about grid staggering etc. in the metadata.

I found the logic for Sgrid parsing in xgcm (https://github.com/xgcm/xgcm/blob/c9d4302b146d070c099a833cab853157539d01f7/xgcm/sgrid.py ) to be very difficult to follow and unnecessarily complicated (with limited testing).

Rather than rely directly on this parsing in Xgcm, I decided to create our own Sgrid convention parsing for the following reasons:

  • We want tooling in Parcels for both parsing and creating Sgrid datasets (as we plan to use Sgrid conventions to be the central way for us to create FieldSets - where other methods from_mitgcm etc. flow through a from_conventions classmethod). The functions and tooling in xgcm isn't easily adaptable for the serializing of sgrid metadata.
  • Improved reliability of Sgrid parsing
  • Improved error messaging

cc @jbusecke (drop a comment if you agree that this sgrid parsing is more readable - and we can see about eventual upstreaming into xgcm)


Update

This PR is now ready for review.

  • New files
    • src/parcels/_core/sgrid.py: Adds data structures for SGRID metadata that closely follow the convention, as well as tooling for serializing/deserializing to and from attrs dictionaries and these objects. This allows you validate that the SGRID metadata is valid, and navigate the resulting metadata with rich type hints. The focus here has been on correctness, broken metadata will cause the parser to informatively fail - this is a feature (see appendix for a bit of nuanced discussion). The focus here has also been on the required attributes (and vertical_dimensions in the 2D case) as that is what is required in Parcels.
    • tests/strategies/sgrid.py: These provide various Hypothesis test strategies related to generating SGRID related metadata, including strategies for the metadata objects themselves
    • tests/test_sgrid.py: Provides testing of SGRID metadata parsing and testing of ingestion of this metadata into xgcm.Grid objects. This heavily relies on property based testing as its best suited here.

The components introduced here are modular - allowing for the potential upstreaming into other libraries and projects.

Example usage

# %%
import xarray as xr
from parcels._core import sgrid
from pprint import pprint

with open("tmp.cdl", "w") as f: # CDL on documentation website
    f.write("""\
netcdf foo {
dimensions:
    time = UNLIMITED ;
    inode = 10 ;
    jnode = 20 ;
    knode = 30 ;
    iface = 9 ;
    jface = 19 ;
    kface = 29 ;

variables:
    char time(time) ;
        time:standard_name = "time" ;
        time:long_name = "time" ;
        time:units = "seconds since 2015-01-01 00:00:00" ;
    float u(time, kface, jface, inode) ;
        u:description = "x-velocity" ;
        u:units = "m s-1" ;
        u:grid = "grid" ;
        u:location = "face1" ;
    float v(time, kface, jnode, iface) ;
        u:description = "y-velocity" ;
        u:units = "m s-1" ;
        u:grid = "grid" ;
        u:location = "face2" ;
    float w(time, knode, jface, iface) ;
        u:description = "z-velocity" ;
        u:units = "m s-1" ;
        u:grid = "grid" ;
        u:location = "face3" ;
    float c(time, kface, jface, iface) ;
        c:description = "some concentration" ;
        c:grid = "grid" ;
        c:location = "volume" ;
    float node_lat(knode, jnode, inode) ;
        node_lat:standard_name = "latitude" ;
        node_lat:units = "degree_north" ;
    float node_lon(knode, jnode, inode) ;
        node_lon:standard_name = "longitude" ;
        node_lon:units = "degree_east" ;
    float node_elevation(knode, jnode, inode) ;
        node_elevation:description = "elevation" ;
        node_elevation:units = "m" ;

    int grid ;
        grid:cf_role = "grid_topology" ;
        grid:topology_dimension = 3 ;
        grid:node_dimensions = "inode jnode knode" ;
        grid:volume_dimensions = "iface: inode (padding: none) jface: jnode (padding: none) kface: knode (padding: none)" ;
        grid:node_coordinates = "node_lon node_lat node_elevation" ;
}
""")


# # %%
# !ncgen -o tmp.nc4 tmp.cdl

# %%
ds = xr.open_dataset("tmp.nc4", decode_times=False)
print("GRID ATTRS ON DATASET:")
pprint(f"{ds.grid.attrs}")

# %%
grid = sgrid.Grid3DMetadata.from_attrs(ds.grid.attrs)


# %%
print("GRID3DMETADATA PARSED:")
print(f"{grid.cf_role=!r}")
print(f"{grid.volume_dimensions=!r}")
print(f"{grid.node_dimensions=!r}")


# %%
attrs = ds.grid.attrs
assert ds.grid.attrs == attrs

# %%
sgrid.Grid2DMetadata.from_attrs(attrs)  # fails as expected since attrs is for 3D grid


# %%

output:

GRID ATTRS ON DATASET:
("{'cf_role': 'grid_topology', 'topology_dimension': np.int32(3), "
 "'node_dimensions': 'inode jnode knode', 'volume_dimensions': 'iface: inode "
 "(padding: none) jface: jnode (padding: none) kface: knode (padding: none)', "
 "'node_coordinates': 'node_lon node_lat node_elevation'}")
GRID3DMETADATA PARSED:
grid.cf_role='grid_topology'
grid.volume_dimensions=(DimDimPadding(dim1='iface', dim2='inode', padding=<Padding.NONE: 'none'>), DimDimPadding(dim1='jface', dim2='jnode', padding=<Padding.NONE: 'none'>), DimDimPadding(dim1='kface', dim2='knode', padding=<Padding.NONE: 'none'>))
grid.node_dimensions=('inode', 'jnode', 'knode')
Traceback (most recent call last):
  File "/Users/Hodgs004/coding/repos/parcels/src/parcels/_core/sgrid.py", line 115, in from_attrs
    face_dimensions=load_mappings(attrs["face_dimensions"]),
                                  ~~~~~^^^^^^^^^^^^^^^^^^^
KeyError: 'face_dimensions'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/Hodgs004/coding/repos/parcels/tmp.py", line 86, in <module>
    sgrid.Grid2DMetadata.from_attrs(attrs)  # fails as expected since attrs is for 3D grid
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/Hodgs004/coding/repos/parcels/src/parcels/_core/sgrid.py", line 119, in from_attrs
    raise SGridParsingException(f"Failed to parse Grid2DMetadata from {attrs=!r}") from e
parcels._core.sgrid.SGridParsingException: Failed to parse Grid2DMetadata from attrs={'cf_role': 'grid_topology', 'topology_dimension': np.int32(3), 'node_dimensions': 'inode jnode knode', 'volume_dimensions': 'iface: inode (padding: none) jface: jnode (padding: none) kface: knode (padding: none)', 'node_coordinates': 'node_lon node_lat node_elevation'}

appendix

There is two kinds of users who would want to parse SGRID data:

  1. data providers: They are creating their own SGRID compliant datasets. They want to make sure that the SGRID data they create is compliant with the standard.
  2. data users: They are using the SGRID compliant datasets. If there is an error during parsing, that might be ok depending on what the attribute being parsed was and

I opted to write this code leaning towards (1) for the following reasons:

  • The use of this code in Parcels is itself from a "data provider" POV - we are using SGRID as a interchange format for the from_nemo etc. methods
  • Focussing on correctness (as well as the rigorous testing of correctness) has the biggest benefit to the SGRID ecosystem - as this code can be easily upstreamed back and contribute to creating a "canonical parser" [BUG] Improve reliability of SGRID metadata parsing xgcm/xgcm#686 (comment)
  • I would prefer that we informatively error on broken metadata than try fix it ourselves. Fix your metadata, then pass to us. If popular SGRID "compliant" models ship with broken metadata and it becomes problematic we can revisit this.

@VeckoTheGecko
Copy link
Contributor Author

cc @jbusecke (drop a comment if you agree that this sgrid parsing is more readable - and we can see about eventual upstreaming into xgcm)

Happy to just add this to the agenda for our next meeting - that way I can give you a rundown on SGRID if you need it.

@VeckoTheGecko
Copy link
Contributor Author

Ok, did a self review. +600 lines seems like a large diff, but most of the lines are in the SGRID data model/Hypothesis strategies sections - this shows how the code relates to the standard which is really useful.

@erikvansebille keen to have your review (happy to do async, or in person on Wednesday if you prefer)

Do you want me to also work on ingestion code (i.e., .from_conventions) this week? (happy to do that in a separate PR)

Copy link
Member

@erikvansebille erikvansebille left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @VeckoTheGecko, just did a first review. I'm not sure I understand the concept of a DimDimPadding, perhaps good to talk that through. Otherwise looks good, with a few suggestions below

Copy link
Member

@erikvansebille erikvansebille left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, all good then. One minor type-o fix below

@VeckoTheGecko
Copy link
Contributor Author

Thanks! Merging on green 🚀

@VeckoTheGecko VeckoTheGecko enabled auto-merge (squash) December 10, 2025 12:45
@VeckoTheGecko VeckoTheGecko merged commit 6442f06 into Parcels-code:v4-dev Dec 10, 2025
9 of 10 checks passed
@github-project-automation github-project-automation bot moved this from Backlog to Done in Parcels development Dec 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants