Skip to content

ESGF_Node|LUCIDexample

Matthew Harris edited this page Oct 7, 2013 · 8 revisions
  • More Actions: Raw Text Print View Delete Cache ------------------------ Check Spelling Like Pages Local Site Map ------------------------ Rename Page Delete Page ------------------------ Subscribe User ------------------------ Remove Spam Revert to this revision Package Pages Sync Pages ------------------------ Load Save SlideShow

LUCID

This is an example of how the publisher should be tweaked in order to be used by a CMIP5 related project.

These are notes I made while configuring the node and publishing data for the LUCID project. They might be incomplete and/or there might be better/easier ways to achieve the same goal. Feel free to correct or comment anything in here, thanks. --estani

Summary

  • add handler
  • lucid project configuration
  • lucid model/project
  • thredds_root to new lucid root
  • thredds url too?

LUCID procedure for a standard ESGF datanode

  1. Create directory to hold project catalogs at /esg/content/thredds/lucid (make sure the user publishing has write access to it)

  2. Add a catalog reference to /esg/content/thredds/catalog.xml that points to the main location of the lucid catalog

  3. Add the model name and project to /usr/local/cdat/lib/python2.6/site-packages/esgcet-2.8.5-py2.6.egg/esgcet/config/etc/esgcet_models_table.txt (or any other file pointed at by the config file (esg.ini) in use)

lucid | MPI-ESM-LR | http://www.ileaps.org/index.php?option=com_content&task=view&id=99 | LUCID

  1. Copy a valid esg.ini that will be used for this project

  2. Alter the esg.ini the following values:

    thredds_root = /esg/content/thredds/lucid thredds_url = http://cmip2.dkrz.de/thredds/lucid thredds_root_catalog_name = LUCID catalog thredds_dataset_roots = esg_dataroot | /esg/data ... #don't delete anythng you have had here! if you do those catalogs will get erased too! lucid | /gpfs_750/projects/LUCID/data/lucid

project_options =
  cmip5 | CMIP5 / IPCC Fifth Assessment Report | 1
  ipcc4 | IPCC Fourth Assessment Report | 2
  test | Test Project | 3
  lucid | Land-Use and Climate, Identification of robust impacts | 4

Don't remove anything from _ thredds_dataset_roots _ , just add what you need. The publisher will dump catalogs from all missing entries while restarting the TDS if you do.

  1. Add the following lucid project description:

    #------------------------------------------------------------------------------------------ # Project-specific configuration # LUCID [project:lucid]

# LUCID experiments
# project | experiment_name | experiment_description
experiment_options =
  lucid | L2A26 | model run without landuse change (after yr 2005) and with atmospheric CO2 from RCP2.6 scenario
  lucid | L2A85 | model run without landuse change (after yr 2005) and with atmospheric CO2 from RCP8.5 scenario

# Define the categories to be used for this project:
#   name | category_type | is_mandatory | is_thredds_property | display_order

categories =
  project | enum | true | true | 0
  experiment | enum | true | true | 1
  product | enum | true | true | 2
  model | string | true | true | 3
  time_frequency | enum | true | true | 4
  realm | enum | true | true | 5
  cmor_table | enum | true | true | 6
  ensemble | string | true | true | 7
  institute | enum | true | true | 8
  forcing | string | false | true | 9
  title | string | false | true | 10
  creator | enum | false | false | 11
  publisher | enum | false | false | 12
  creation_time | string | false | true | 13
  format | fixed | false | true | 14
  source | text | false | false | 15
  drs_id | string | false | true | 16
  description | text | false | false | 99

category_defaults =
  product | requested

# Enumerated values
realm_options = atmos, ocean, land, landIce, seaIce, aerosol, atmosChem, ocnBgchem
time_frequency_options = yr, mon, day, 6hr, 3hr, subhr, monClim, fx
cmor_table_options = 3hr, 6hrLev, 6hrPlev, Amon, LImon, Lmon, OImon, Oclim, Omon, Oyr, aero, cf3hr, cfDay, cfMon, cfOff, cfSites, day, fx, grids
institute_options =  BCC, CAWCR, CCCMA, CMCC, CNRM-CERFACS, CSIRO-QCCCE, EC-EARTH, GFDL, GISS, INM, IPSL, LASG, MIROC, MOHC, MPI-M, MRI, NCAR, NCC, NIMR, PCMDI

product_options = output1, output2, output

# Class name of the LUCID project handler.
handler = esgcet.config.lucid_handler:LUCIDHandler

# Format of generated dataset IDs
parent_id = wdcc.lucid
dataset_id = lucid.%(product)s.%(institute)s.%(model)s.%(experiment)s.%(time_frequency)s.%(realm)s.%(cmor_table)s.%(ensemble)s

# Directory format. This is used to determine field values by matching directory names.
#directory_format = /data/publish_test/cmip5_test #not used
dataset_name_format = lucid.%(product)s.%(institute)s.%(model)s.%(experiment)s.%(time_frequency)s.%(realm)s.%(cmor_table)s.%(ensemble)s.v%(version)s

# Exclude these variables from THREDDS catalogs. They are still added to the database.
thredds_exclude_variables = a, a_bnds, alev1, alevel, alevhalf, alt40, b, b_bnds, basin, bnds, bounds_lat, bounds_lon, dbze, depth, depth0m, depth100m, depth_bnds, geo_region, height, height10m, height2m, lat, lat_bnds, latitude, latitude_bnds, layer, lev, lev_bnds, location, lon, lon_bnds, longitude, longitude_bnds, olayer100m, olevel, oline, p0, p220, p500, p560, p700, p840, plev, plev3, plev7, plev8, plev_bnds, plevs, pressure1, region, rho, scatratio, sdepth, sdepth1, sza5, tau, tau_bnds, time, time1, time2, time_bnds, vegtype

# Maps
maps = institute_map, las_time_delta_map

institute_map = map(model : institute)
  MPI-ESM-LR | MPI-M

las_time_delta_map = map(time_frequency : las_time_delta)
  yr      | 1 year
  mon     | 1 month
  day     | 1 day
  6hr     | 6 hours
  3hr     | 3 hours
  subhr   | 1 minute
  monclim | 1 month
  fx      | fixed

# Set true if files follow the IPCC standard of one variable per file.
# If set, the THREDDS metadata is organized as per-variable datasets.
# Otherwise, the datasets are assumed to be per-time.
variable_per_file = true
  1. Create the lucid handler by copying the ipcc5 one

    cp /usr/local/cdat/lib/python2.6/site-packages/esgcet-2.8.5-py2.6.egg/esgcet/config/ipcc5_handler.py /usr/local/cdat/lib/python2.6/site-packages/esgcet-2.8.5-py2.6.egg/esgcet/config/lucid_handler.py

  2. And altering the file a little bit (replacing cmip5 by lucid mostly, but warning there's a cmip5_product that needs to remain so!)

    sed -e 's#cmip5.#lucid.#' -e 's#IPCC5#LUCID#' -e 's#CMIP5#LUCID#' -i /usr/local/cdat/lib/python2.6/site-packages/esgcet-2.8.5-py2.6.egg/esgcet/config/lucid_handler.py

  • I have published the GeoMIP data followed as the steps show in http://esgf.org/wiki/ESGF_Node/LUCIDexample , used "geomip" instead of lucid, and "GeoMIP" instead of "LUCID". But there is a point must be careful, at the 8th step, after "sed" the geomip_handler.py as: sed -e 's#cmip5\.#geomip.#' -e 's#IPCC5#GeoMIP#' -e 's#CMIP5#GeoMIP#' -i /usr/local/cdat/lib/python2.6/site-packages/esgcet-2.8.5-py2.6.egg/esgcet/config/geomip_handler.py We need to change "result = (project_id[:5]=="GeoMIP")" to "result = (project_id[:6]=="GeoMIP")" in line 144 of geomip_handler.py file. If not, there would be error info "project_id must be GeoMIP" when publishing the GeoMIP data with "--project geomip". That's because the result of project_id[:5] is "GeoMI" but not "GeoMIP".

Regards,

Qizhong Wu 2012/04/16

  1. Alter the __init__.py to point to this (this can be achieved certainly simpler, but you'll have to find out how. Feel free to correct this entry if you do!)

    1 from ipcc4_handler import IPCC4Handler 2 from ipcc5_handler import IPCC5Handler 3 from tamip_handler import TAMIPHandler 4 from obs4mips_handler import Obs4mipsHandler 5 from lucid_handler import LUCIDHandler 6 builtinProjectHandlers = { 7 'basic_builtin' : BasicHandler, 8 'ipcc4_builtin' : IPCC4Handler, 9 'ipcc5_builtin' : IPCC5Handler, 10 'lucid_builtin' : LUCIDHandler, 11 'tamip_builtin' : TAMIPHandler, 12 'obs4mips_builtin' : Obs4mipsHandler, 13 } 14 builtinFormatHandlers = { 15 'netcdf_builtin' : CdunifFormatHandler, 16 }

  2. Now add the created project and model names to the database by pointing to the created esg.ini which holds information on the project

    esginitialize -c -i lucid.esg.ini

  3. If you use a map file then start ingesting the data into the variables

    esgpublish --map test.dataset.map -i lucid.esg.ini

  4. If everything looks fine then proceed crating the TDS catalogs

    esgpublish --map test.dataset.map --project lucid -i lucid.esg.ini --noscan --thredds

  5. And finally try to publish to the gateway

    esgpublish --map test.dataset.map --project lucid -i lucid.esg.ini --publish

The configuration is very tricky, so check the FAQ and the Publisher documentation if anything fails.

Clone this wiki locally