Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grib_tree function to properly handle ECMWF ensemble data #546

Open
nishadhka opened this issue Feb 28, 2025 · 1 comment
Open

grib_tree function to properly handle ECMWF ensemble data #546

nishadhka opened this issue Feb 28, 2025 · 1 comment

Comments

@nishadhka
Copy link

grib_tree function fails to properly handle ECMWF ensemble data

Issue Description

The grib_tree function in kerchunk.grib2 doesn't properly handle ECMWF ensemble forecast data, specifically:

  1. It fails to recognize and preserve ensemble member information
  2. It significantly reduces the number of groups in the output compared to input
  3. It doesn't provide a way to access ensemble dimension in the resulting zarr structure

Reproduction

When processing ECMWF ensemble data with 19 variables and 51 ensemble members (969 total message groups):

from kerchunk.grib2 import scan_grib, grib_tree
import datatree
date_str='20240229'
ecmwf_s3url=f"s3://ecmwf-forecasts/{date_str}/00z/ifs/0p25/enfo/{date_str}000000-0h-enfo-ef.grib2"
esc_groups = scan_grib(ecmwf_s3url)
original_tree = grib_tree(esc_groups)
gfs_dt = datatree.open_datatree(
    fsspec.filesystem("reference", fo=original_tree).get_mapper(""), 
    engine="zarr", 
    consolidated=False
)

# The key test: can we access ensemble members?
print(gfs_dt.keys())  # Check for variables

The resulting structure loses ensemble information, making it impossible to distinguish between different ensemble members in the output.

This gist explains the situation and a wayforward to have the ensemble number in the grib_tree.

nishadhka added a commit to icpac-igad/grib-index-kerchunk that referenced this issue Feb 28, 2025
@nishadhka nishadhka changed the title grib_tree function fails to properly handle ECMWF ensemble data grib_tree function to properly handle ECMWF ensemble data Feb 28, 2025
@martindurant
Copy link
Member

@emfdavid , do you have time to think about this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants