Skip to content

Proposal for an object oriented grids API#27

Open
JamesVarndell wants to merge 8 commits intodevelopfrom
feature/grids-api
Open

Proposal for an object oriented grids API#27
JamesVarndell wants to merge 8 commits intodevelopfrom
feature/grids-api

Conversation

@JamesVarndell
Copy link
Copy Markdown
Contributor

@JamesVarndell JamesVarndell commented Oct 24, 2025

Description

This PR introduces a new object-oriented API for working with geographical grids in earthkit-geo. The design proposal provides an alternative to raw grid specification dictionaries as a way of representing grids, providing a more Pythonic, high-level and self-documenting approach.

Note: This PR is a proposal and needs wider approval before merging.

Why add an extra API?

The existing way to specify grids in earthkit is with a grid spec dictionary:

grid = {"grid": "H32", "order": "ring"}

This is a rich specification which can define many different grid types, and will remain a useful way to define grids. However, it has several drawbacks for high-level earthkit users:

  • Hard to discover documentation - it's just a dictionary, so there's no obvious place to look for help
  • Hidden semantics - uses magic strings like "H32" that require implicit understanding to parse
  • Late error detection - issues in the grid definition won't be caught until the spec is passed into another function (regridding, plotting, etc.)
  • No IDE support - no autocompletion, type hints, or inline documentation
  • Ambiguous parameters - what does {"grid": [5, 5]} mean? The parameter names aren't visible

What is the proposed alternative?

To provide, in addition to the grid spec dictionaries, an object-oriented API using specialised classes for each grid type:

grid = ek.geo.grids.HEALPix(nside=32, order="ring")

This brings many benefits:

  • Easy-to-find documentation - each grid class has comprehensive docstrings with grid-specific parameters well-documented
  • Separation of concerns - each grid type accepts only its relevant parameters, reducing overloading and cognitive load
  • Better discoverability - available grids and their interfaces are easily found through imports, autocomplete and IDE exploration
  • Early error detection - errors are flagged on instantiation, making debugging easier
  • Pythonic and self-documenting - follows Python best practices and standard library patterns
  • Rich functionality - classes provide useful methods like size(), shape, to_latlon(), plot(), etc.
  • Clear intent - class and argument names reveal meaning, e.g. Equirectangular(dlon=5, dlat=5) vs. {"grid": [5, 5]}

Factory methods for interoperability

This API does not replace the grid specs. You can still instantiate a Grid from a spec with a factory method:

grid = ek.geo.grids.Grid.from_dict({"grid": "H32", "order": "ring"})

And all Grid classes can be transformed to a grid spec that is canonical (i.e. uniquely represents this Grid unambiguously):

grid.grid_spec  # {"grid": "H32", "order": "ring"}

And you can still use grid specs directly, too.

Example notebook

See the example notebook for a high-level overview.

Contributor Declaration

By opening this pull request, I affirm the following:

  • All authors agree to the Contributor License Agreement.
  • The code follows the project's coding standards.
  • I have performed self-review and added comments where needed.
  • I have added or updated tests to verify that my changes are effective and functional.
  • I have run all existing tests and confirmed they pass.

@sandorkertesz
Copy link
Copy Markdown
Collaborator

@JamesVarndell, eckit (and the grid) should be an optional dependency. Otherwise the tests are failing and we will not be able to merge it into develop. We cannot have failing tests in develop.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Oct 24, 2025

Codecov Report

❌ Patch coverage is 4.16667% with 207 lines in your changes missing coverage. Please review.
✅ Project coverage is 52.77%. Comparing base (1c8f747) to head (f565a24).

Files with missing lines Patch % Lines
tests/test_grids.py 4.16% 207 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           develop      #27       +/-   ##
============================================
- Coverage    97.44%   52.77%   -44.68%     
============================================
  Files            7        8        +1     
  Lines          235      451      +216     
  Branches         7        7               
============================================
+ Hits           229      238        +9     
- Misses           3      210      +207     
  Partials         3        3               

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@chpolste
Copy link
Copy Markdown
Member

chpolste commented Nov 4, 2025

I'm working on a feature for earthkit-meteo where I could use this! Need to check if a given array is compatible with a gridspec, so having access to properties like shape and to_latlon would be very useful.

@chpolste
Copy link
Copy Markdown
Member

I’d like to propose to extend this PR with a slightly more complicated class hierarchy and already accommodate projections, as this will become relevant sooner or later.

My proposed structure would be to encapsulate the coordinates and projection separately in dedicated classes and then combine them to form a grid. Alternative naming: grid/gridpoints for the unprojected information, coordinates/coordinate system for the projected; suggestions welcome.

We can have a basic ListedCoordinates class for unstructured grids and more that generate coordinates based on a function, e.g., rectilinear, healpix, etc. For the projection, we can start with a lat/lon identity (EPSG:4326 / WGS84, essentially) and then expand from there.

A coarse outline:

class Grid:

    def __init__(self, coordinates, projection):
        ...

    @classmethod
    def from_dict(cls, spec):
        ...

   def to_latlon(self):
        return self.projection.project(self.coordinates)

   def plot(self):
        ...

    @property
    def shape(self):
        return self.coordinates.shape


class Coordinates(ABC):

    @property
    @abstractmethod
    def shape(self):
        ...

    @abstractmethod
    def __iter__(self):
        ...

class ListedCoordinates(Coordinates):
    ...

class Rectilinear(Coordinates):
    ...

class HEALPix(Coordinates):
    ...

...


class Projection:
    
    @classmethod
    def from_proj(cls, proj_string):
        ...

    @classmethod
    def from_crs(cls, crs):
        ...

    def project(self, coords):
        ...

Limited area grids could be configured through the coordinate specification (only generating points for a limited area) or additionally by filtering the coordinates/gridpoints with dedicated filter objects (e.g., based on a feature/shape). These filters can be attached to a Grid instance during and after instantiation and inserted before and after projection.

This approach would maps well on how gdal, rasterio and geotiff solve these problems. It would disentangle the coordinates and projection concerns currently combined in the grid class I’ve proposed here: ecmwf/earthkit-regrid#109.

@Oisin-M
Copy link
Copy Markdown
Contributor

Oisin-M commented Nov 14, 2025

def to_latlon(self):
return self.projection.project(self.coordinates)

Just to flag, but we will not always get latitude and longitude out of this e.g. if we consider LAEA grids, so I would propose relaxing the naming to something like to_coords or similar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants