Skip to content

variable/*.json: root terms carry tile-specific or operation-specific long_names #190

Description

@JanStreffing

a handful of root variable/*.json entries have a long_name that's specific to one PFT tile or one daily-extreme operation, even though the term itself is the generic root. branded variants then fail the wcrp_cmip7 ATTR004 long_name registry check on real files even when the file's long_name is correct for its branding.

example, current state in variable/cveg.json:

{
    "id": "cveg",
    "long_name": "Carbon Mass in Vegetation on Grass Tiles",
    "description": "..."
}

while variable/cveggrass.json / cvegshrub.json / cvegtree.json all have "long_name": null and the per-tile text sitting in description. so a CMIP7 file with branded variable cveg_tavg-u-hxy-shb (Shrub) and long_name = "Carbon Mass in Vegetation on Shrub Tiles" (the correct text for its tile) fails the registry check because the registry says cveg.long_name = "Carbon Mass in Vegetation on Grass Tiles".

same shape affects several other root terms - all carry a long_name that belongs to a specific variant:

root current long_name
cveg "Carbon Mass in Vegetation on Grass Tiles"
hurs "Daily Minimum Near-Surface Relative Humidity over Crop Tile"
tas "Daily Minimum Near-Surface Air Temperature"
mrsol "Mean soil water content at a depth of 1 m"
gpp "Carbon Mass Flux out of Atmosphere Due to Gross Primary Production on Land [kgC m-2 s-1]"
npp "Net Primary Production on Grass Tiles as Carbon Mass Flux [kgC m-2 s-1]"
ra "Autotrophic Respiration on Shrub Tiles as Carbon Mass Flux [kgC m-2 s-1]"
rh "Heterotrophic Respiration on Shrub Tiles as Carbon Mass Flux [kgC m-2 s-1]"

proposed fix on the registry side:

  1. root term long_name becomes the generic form (e.g. cveg.long_name = "Carbon Mass in Vegetation"), or is set to null like the other generic descriptors.
  2. tile/operation variants (cveggrass, cvegshrub, cvegtree, hursmin, hursmax, tasmin, tasmax, etc.) get the specific long_name populated - usually the text already sitting in their description field.
  3. for cases where there's no separate variant term (e.g. gpp is itself the root), generalize the long_name so it doesn't carry a tile token.

related plugin issue: cc-plugin-wcrp's variable long_name lookup uses variable_id.lower() (the root term) rather than the branded-variant id, so even after the registry is split per variant, the plugin will need to look up the variant first. happy to file a sister PR upstream once this side is settled.

observed against esgvoc cmip7@1.2.6 / WCRP-universe (registry-source DB).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions