Skip to content

Commit

Permalink
Merge pull request #705 from ACCESS-NRI/development
Browse files Browse the repository at this point in the history
Periodical merge
  • Loading branch information
atteggiani authored Jul 19, 2024
2 parents 63da524 + d937e08 commit 233a1a3
Show file tree
Hide file tree
Showing 30 changed files with 592 additions and 515 deletions.
4 changes: 2 additions & 2 deletions docs/about/user_support/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,9 +53,9 @@ Click on the questions to unfold the answers.

Both observational and model data is hosted by the National Computational Infrastructure (NCI) under different projects.

Go to our [**Observational Data**](/model_evaluation/model_evaluation_observational_catalogs) section on the ACCESS-Hive to learn how to find and access observational data.
Go to our [**Observational Data**](/model_evaluation/data/observations) section on the ACCESS-Hive to learn how to find and access observational data.

Go to our [**Model Data**](/model_evaluation/model_evaluation_model_catalogs) section on the ACCESS-Hive to learn how to find and access model data.
Go to our [**Model Data**](/model_evaluation/data/model_catalogs) section on the ACCESS-Hive to learn how to find and access model data.

In both cases, you need to have access to the specific projects at NCI in order to read the data. For more information, check the [**First Steps**](/getting_started/first_steps) page.

Expand Down
Binary file removed docs/assets/model_evaluation/netcdf_1.png
Binary file not shown.
Binary file added docs/assets/model_evaluation/xarray2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
74 changes: 74 additions & 0 deletions docs/model_evaluation/data/data_format.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# Data Formats and Standards

<!-- For this content, I have used a lot of text from this website: https://pro.arcgis.com/en/pro-app/latest/help/data/multidimensional/fundamentals-of-netcdf-data-storage.htm -->

Model evaluation often requires comparison across different models, such as for the [Coupled Model Intercomparison Project (CMIP)](https://wcrp-cmip.org). However, comparing output from different models can be tricky due to the multiple data formats and standards used across models. This is why ACCESS-NRI supports and encourages the use of common, community-supported data formats and variables.

## Data Standards
Data standards are agreed-upon guidelines for the "representation, format, definition, structuring, tagging, transmission, manipulation, use, and management" of datasets (definition from [Geoscience Australia](https://www.ga.gov.au/data-pubs/datastandards)). Abiding by these standardized guidelines allow for, among other things, easier sharing and combining of data, as well as the ability to better understand which quantities can be compared across datasets - very important for model evaluation.

An example data standard in climate models is the use of [Climate and Forecast metadata conventions (CF conventions)](http://cfconventions.org). These are designed to promote the processing and sharing of _NetCDF_ files (described in more detail below). The conventions specify metadata that provide a definitive description of what the data in each variable represents, and the spatial and temporal properties of the data.

Metadata is information about the data, which can include variable names, dimension names, units, grid information and many others. Standardized metadata can also be more easily made machine readable, allowing software packages to interpret, for example, variable names automatically and making data analysis more efficient and less error prone. The machine readability of standardized formats thus facilitates building software applications with powerful extraction, regridding and display capabilities.

Currently, many models do not abide by the CF conventions by default. However, there is a software library called [CMOR (Climate Model Output Rewriter)](https://cmor.llnl.gov) that translates native climate model output into output that complies with the CF conventions. The process of CMORizing is specifically designed for model intercomparison projects, like CMIP.

## Network Common Data Format (NetCDF)

Numerous organisations and scientific groups worldwide have adopted a file format called [_NetCDF_](https://www.unidata.ucar.edu/software/netcdf/) as a standard way to store multidimensional scientific data.

<i>NetCDF</i>, which has the file extension <i>*.nc</i>, is a self-describing, machine-independent data format of array-oriented scientific data.

<ul>
<li><b>Self-describing</b>
<br>
<i>*.nc</i> files include not only the data, but also a header with metadata that describes the data layout.

<li><b>Machine-independent</b>
<br>
<i>*.nc</i> files can be accessed by computers with different ways of storing integers, characters and floating-point numbers.

<li><b>Array-oriented</b>
<br>
<i>*.nc</i> data typically spans multiple dimensions with the same lengths (e.g., latitude, longitude and time) and variables (e.g., temperature and humidity), which are stored in arrays.
<br>
<br>
<div style="text-align: center;">
<img src="../../../assets/model_evaluation/xarray2.png" alt="Schematic of a NetCDF file with data (temperature and pressure as variables stored over the dimensions latitude, longitude, and time) and metadata" title="xarray https://xarray.dev/" width="75%"/>
</div>
</ul>

Data in a *NetCDF* file is stored in the form of arrays, where each *NetCDF* dimension has a name and a length. NetCDF variables and coordinates can also have a different number of dimensions.

For example, surface temperature variation over time at a fixed location would be stored as a one-dimensional array (with dimension *time*), whereas surface temperature that varies over a region at a fixed point in time would be stored as a two-dimensional array (with dimensions *longitude, latitude*). An example of three-dimensional (3D) data would be surface temperature varying with time over a region (with dimensions *longitude, latitude, time*), and four-dimensional (4D) data would be temperature varying with time over a region with varying altitude (with dimensions *longitude, latitude, altitude, time*).

## Loading NetCDF files

There are many ways of reading files, though a common way is via the Python package *xarray*.
<br>
For more information, refer to a <a href="https://docs.xarray.dev/en/stable/getting-started-guide/quick-overview.html" target="_blank">quick overview of xarray</a> and <a href="https://tutorial.xarray.dev/intro.html" target="_blank">xarray tutorials</a>.

*xarray* is a python package avaliable through the conda environment on NCI.
<br>
Hence, you can either use it directly (as shown below) or through the dataset capabilities of the [ACCESS-NRI Model Intake Catalog Tool](/model_evaluation/data/model_catalogs).

```
import xarray as xr
dataset = xr.open_dataset("example.nc")
dataset
```

<div style="text-align: center;">
<img src="../../../assets/model_evaluation/netcdf_example.jpg" alt="Example of an actual NetCDF file with data (precipitation/rainfall over the dimensions latitude, longitude, and time) and metadata." title="Picture from https://pro.arcgis.com/en/pro-app/latest/help/data/multidimensional/fundamentals-of-netcdf-data-storage.html" width="60%"/>
</div>

## Other Data formats

NetCDF has been described in detail here as it is the most common format for climate data and then for comparison and optimizing evaluation workflows all data would be in the same format. [Observational data](/model_evaluation/data/observations) can come from different institutions and measured with various instruments. These institutions can manage their data for users other than climate researchers, therefore the data can come in other formats including plain text formats. This data can be [_CMORised_](#data-standards), for evaluation frameworks. Reach out on the [Hive Forum](https://forum.access-hive.org.au) for assistance and suggestions of any datasets that may be missing or could be useful.


<h6>References</h6>
<ul class="references">
<li>
<a href = "https://pro.arcgis.com/en/pro-app/latest/help/data/multidimensional/fundamentals-of-netcdf-data-storage.htm" target="_blank">https://pro.arcgis.com/en/pro-app/latest/help/data/multidimensional/fundamentals-of-netcdf-data-storage.htm</a>
</li>
31 changes: 31 additions & 0 deletions docs/model_evaluation/data/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Data


<div class="card-container">
<a href="data_format" class="vertical-card aspect-ratio1to1">
<div class="card-image-container">
<img src="../../../assets/model_evaluation/netcdf_example.png" alt="Data format and standards" title="Picture from https://pro.arcgis.com/en/pro-app/latest/tool-reference/geostatistical-analyst/ga-layer-3d-to-netcdf.htm" class="img-contain white-background with-padding"></img>
</div>
<div class="card-text-container bold ">Data Format</div>
</a>
<a href="variables" class="vertical-card aspect-ratio1to1">
<div class="card-image-container">
<img src="../../assets/model_evaluation/model_evaluation_variables.png" alt="Data variables" class="img-contain white-background with-padding"></img>
</div>
<div class="card-text-container bold ">Data Variables</div>
</a>
</div>
<div class="card-container">
<a href="observations" class="vertical-card aspect-ratio1to1">
<div class="card-image-container">
<img src="../../assets/model_evaluation/model_evaluation_obs_catalog.jpg" alt="A picture of a seismograph recording seismic waves during an earthquake visualises the link to our Observational Data Catalogue. Image credit: Wf Sihardian—EyeEm/Getty Images" title="Image credit: Wf Sihardian—EyeEm/Getty Images" class="img-cover"></img>
</div>
<div class="card-text-container bold ">Observational Data</div>
</a>
<a href="model_catalogs" class="vertical-card aspect-ratio1to1">
<div class="card-image-container">
<img src="../../assets/model_evaluation/model_evaluation_model_catalog.jpg" alt="MED Conda Environment" class="img-contain white-background with-padding"></img>
</div>
<div class="card-text-container bold ">Model Data</div>
</a>
</div>
Original file line number Diff line number Diff line change
@@ -1,16 +1,17 @@
# ACCESS-NRI Intake catalog
# Accessing Model Data on Gadi

The ACCESS-NRI Intake catalog aims to provide a way for Python users to discover and load data across a broad range of climate data products available on <i>Gadi</i>.
To assist with finding and accessing model data on Gadi, ACCESS-NRI has created a catalog called the ACCESS-NRI Intake catalog.
This aims to provide a way for Python users to discover and load data across a broad range of climate data products available on <i>Gadi</i>.

For detailed information, tutorials and more, please go to:
<div class="card-container">
For detailed information, tutorials and more, please go to <a href="https://access-nri-intake-catalog.readthedocs.io/en/latest/index.html" target="_blank">ACCESS-NRI intake catalog documentation</a>.
<!-- <div class="card-container">
<a href="https://access-nri-intake-catalog.readthedocs.io/en/latest/index.html" class="vertical-card aspect-ratio2to1" target="_blank">
<div class="card-image-container">
<img src="../../assets/model_evaluation/accessnri_intake.png" alt="ACCESS-NRI intake catalog documentation" class="img-contain white-background with-padding"></img>
</div>
<div class="card-text-container bold ">Documentation</div>
</a>
</div>
</div> -->

## What is the ACCESS-NRI Intake catalog?

Expand All @@ -20,7 +21,7 @@ Each entry in the table corresponds to a different product, where the columns co

The ACCESS-NRI Intake catalog enables users to find products that satisfy their query and to subsequently load their data without having to know the location and structure of the underlying files.

## Showcase: use ACCESS-NRI Intake to find, load and plot data
## Example: use ACCESS-NRI Intake to find, load and plot data

A simple use case of the ACCESS-NRI Intake catalog is a user wants to plot a timeseries of a variable from a specific data product.
<br>
Expand Down Expand Up @@ -53,5 +54,5 @@ plt.grid()
```

<div style="text-align: center;">
<img src="../../assets/model_evaluation/intake_example.png" alt="Plot af timeseries of global average temperatures" width="50%"/>
<img src="../../../assets/model_evaluation/intake_example.png" alt="Plot af timeseries of global average temperatures" width="50%"/>
</div>
38 changes: 38 additions & 0 deletions docs/model_evaluation/data/observations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Observational Data on Gadi

NCI not only hosts numerous datasets for climate research, it also manages and optimises curated data collections. Data collections allow for easier access and use of the numerous datasets created by different organisations, optimising evaluation where numerous datasets are used.

ACCESS-NRI has a curated data collection for Climate Model Evaluation in NCI project [*ct11*](https://my.nci.org.au/mancini/project/ct11/join). This _ACCESS-NRI Replicated Datasets for Climate Model Evaluation_ provides observational datasets in a format that can be ingested by evaluation frameworks ACCESS-NRI supports. More information and updates can be found in this [post](https://forum.access-hive.org.au/t/official-release-of-the-access-nri-replicated-datasets-for-climate-model-evaluation-nci-data-collection/1661).

Here, you can browse and search the available <a href="https://geonetwork.nci.org.au/" target="_blank">NCI data collections</a>.

<!-- <div class="card-container">
<a href="https://geonetwork.nci.org.au/" target="_blank" class="vertical-card aspect-ratio2to1">
<div class="card-image-container">
<img src="/assets/model_evaluation/logo_nci_data_catalogs.png" alt="NCI Data Collections" class="img-contain white-background"></img>
</div>
<div class="card-text-container bold">Search for data here</div>
</a>
</div> -->

Some examples of NCI data collections include:
<ul>
<li>
<a href="https://esgf.llnl.gov/" target="_blank">Earth Systems Grid Federation</a> data hosted at the <a href="https://esgf.nci.org.au/projects/esgf-nci/" target="_blank">NCI ESGF Node</a>.
<li>
<a href="https://www.ecmwf.int/en/forecasts/dataset/ecmwf-reanalysis-v5" target="_blank">ECMWF atmospheric reanalyses (ERA5)</a> data. For more information, refer to the <a href="https://opus.nci.org.au/display/ERA5/ERA5+Community+Home" target="_blank">NCI ERA5 Community Page</a>.
<li>
<a href="https://copernicus.nci.org.au/sara.client/#/home" target="_blank">Sentinel Australasia Regional Access (SARA)</a> data obtained from the European Space Agency’s multi-petabyte Sentinel satellite.
<br>
</ul>


NCI also has a <a href="https://opus.nci.org.au/display/NDP/Data+Catalogue" target="_blank">user guide</a> for finding, accessing and citing data.

For example, the catalogue of above mentioned ACCESS-NRI Replicated Datasets for Climate Model Evaluation can be found by entering *ACCESS-NRI* in the <i>NCI Data Catalogue Search</i> field.

<div style="text-align: center;">
<a href="https://geonetwork.nci.org.au/geonetwork/srv/eng/catalog.search#/metadata/f7199_2480_5432_9703" target="_blank"><img src="../../../assets/model_evaluation/obs_data_ct11.png" alt="Edited Screenshot of NCI Data Catalogue for ACCESS-NRI replicated data" width="75%"/></a>
</div>

<!-- In particular, we want to highlight the Coupled Model Intercomparison Project Phases 6 and 5 that are hosted by NCI as a sponsor of the [Earth System Grid Federation (ESGF)](https://esgf.nci.org.au/projects/esgf-nci/). The ESGF are federated data centres across the globe that enable access to the largest archive of climate data world-wide. This portal allows you to find, select and download data files from the federation. -->
26 changes: 26 additions & 0 deletions docs/model_evaluation/data/variables.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Data Variables

<!-- For this content, I have used a lot of text from this website: https://pro.arcgis.com/en/pro-app/latest/help/data/multidimensional/fundamentals-of-netcdf-data-storage.htm -->

For climate modelling, we need to store multidimensional scientific data (variables) such as temperature, humidity, pressure, wind speed and direction.

Variables can be stored in multidimensional [data formats](/model_evaluation/data/data_format) such as <i>NetCDF</i> as separate dimensions where each dimension can have a name and length.


## Common variables

Variables used in climate modelling can differ in terms of naming conventions, units, etc. While this may be for historical reasons, the use of common variables is key not only for ease and compatibilty when working with the data, but also to unite the climate modelling community. Hence, projects will follow or relate to [CF conventions](http://cfconventions.org). These projects have current widely used lists.

<!-- We have created a prototype of markdown files with variable tables that can be queried via jquery -->
<!-- Because they were not ready for quick searches (jquery with extended html tables is slow), we did not include them in the Legacy Relase (July/August 2023). -->
<!-- The code and markdown files are hosted on a github repository, however: https://github.com/svenbuder/access_model_variables -->

### CMIP6 variables
You can search the extensive list of Coupled Model Intercomparison Project phase 6 <a href="https://clipc-services.ceda.ac.uk/dreq/index/var.html" target="_blank">(CMIP6) variables</a> by either the MIP variable name or associated CF standard name.

### ERA5 variables
ERA5 is the fifth generation European Centre for Medium-Range Weather Forecasts (ECMWF) atmospheric reanalysis of the global climate, which spans a period from January 1940 to present. ERA5 provides hourly estimates of a significant number of atmospheric, land and oceanic climate variables.
<br>
<br>
A full list of ERA5 parameters is available on the <a href="https://codes.ecmwf.int/grib/param-db/" target="_blank">ECMWF database</a>. It covers both the <a href="https://confluence.ecmwf.int/display/CKB/ERA5%3A+data+documentation#ERA5:datadocumentation-Parameterlistings" target="_blank">ERA5 parameter listings</a> as well as the <a href="https://confluence.ecmwf.int/display/CKB/ERA5-Land%3A+data+documentation#ERA5Land:datadocumentation-parameterlistingParameterlistings" target="_blank">ERA5-LAND parameter listings</a>.

Loading

0 comments on commit 233a1a3

Please sign in to comment.