Skip to content

Commit

Permalink
Merge pull request #477 from ACCESS-Hive/dev/sven/intake
Browse files Browse the repository at this point in the history
shorter MED intake catalog description with showcase
  • Loading branch information
dougiesquire authored Jul 28, 2023
2 parents 7c1a3fe + 112f829 commit e3964bf
Show file tree
Hide file tree
Showing 3 changed files with 33 additions and 45 deletions.
Binary file added docs/assets/model_evaluation/accessnri_intake.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/model_evaluation/intake_example.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
78 changes: 33 additions & 45 deletions docs/model_evaluation/model_evaluation_model_catalogs/index.md
Original file line number Diff line number Diff line change
@@ -1,59 +1,47 @@
# ACCESS-NRI intake Model Catalog

ACCESS-NRI is hosting a number of calculated models for you through National Computational Infrastructure (NCI) storage.
The ACCESS-NRI intake catalog aims to provide a way for Python users to discover and load data across a broad range of climate data products available on the Australian NCI supercomputer Gadi. For detailed information, tutorials and more, please go to the
<div class="card-container">
<a href="https://access-nri-intake-catalog.readthedocs.io/en/latest/index.html" class="aspect1to2-card default-text-color">
<div class="squared-card-image-container">
<img src="../../assets/model_evaluation/accessnri_intake.png" alt="ACCESS-NRI intake catalog documentation"></img>
</div>
<div class="squared-card-text-container bold">Documentation</div>
</a>
</div>

We have set up an [ACCESS-NRI intake Catalog](https://github.com/ACCESS-NRI/access-nri-intake-catalog) package that allows you to easily search and load the model data on this storage.
The premise of this ACCESS-NRI intake Catalog is to provide a ("meta") catalog of intake-esm ("sub") catalogs, which each correspond to different "experiments".
## What is the ACCESS-NRI intake Model Catalog?

## The ACCESS-NRI intake catalog
The ACCESS-NRI catalog is essentially a table of climate data products that exist on Gadi. Each entry in the table corresponds to a different product, and the columns contain attributes associated with each product–things like the models, frequencies and variables available. Users can search on the attributes to find the products that might be useful to them. For example, a user might want to know which data products contain variables X, Y and Z at monthly frequency. The ACCESS-NRI catalog enables users to find products that satisfy their query and to subsequently load their data without having to know the location and structure of the underlying files.

To have the huge amount of data from different experiments on the NCI storage at the palm of your hand, we provide a ("meta") catalog for you to query via python as part of the `#!python intake` package with our curated catalog plugin `#!python intake.cat.access_nri` .
## Showcase: use intake to easily find, load and plot data

``` py
import intake
access_nri_catalog_sections = intake.cat.access_nri
```

To use this catalog, you need access to NCI's Gadi. Check out our [Get Started with ACCESS at NCI](../model_evaluation_getting_started/index.md) guide on how to get access.
In this showcase, we'll demonstrate one of the simplest use-cases of the ACCESS-NRI intake catalog: a user wants to plot a timeseries of a variable from a specific data product. Here, the variable is a scalar ocean variable called "temp_global_ave" and the product is an ACCESS-OM2 run called "025deg_jra55_iaf_omip2_cycle1".

Once logged in to Gadi, you will need to add the `#!python access-nri-catalog` to your `#!python conda` environments and start an [ARE JupyterLab Session](https://are.nci.org.au/pun/sys/dashboard). Check out our [ACCESS-NRI Intake Catalog](https://github.com/ACCESS-NRI/access-nri-intake-catalog/blob/main/docs/getting_started/index.rst) guide for the specific setup (note that you can only read in data from specific experiments if they are loaded through the *Storage* keyword).
First we load the catalog using

Once your JupyterLab session started, you can access the `#!python intake` catalog to load the data. Take a look at this [Tutorial](https://github.com/ACCESS-NRI/access-nri-intake-catalog/blob/main/docs/how_tos/example_usage.ipynb) .

## Example Search with our intake catalog

``` py
# Impport packages for searching/loading/plotting
```python
import intake
from distributed import Client
import matplotlib.pyplot as plt

# The search process is a 2-step one
# Comparable with searching for a book in a library:
# 1) You look for the right book/catalog sections
# 2) You look for the right book/catalog in the these sections

# Load the ACCES-NRI list of catalogs for available experiment data
# Similar to an overview of library section
access_nri_catalog_sections = intake.cat.access_nri
catalog = intake.cat.access_nri
```

# Perform a search for names, models, variables etc.
example_section_search = access_nri_catalog_sections.search(name="cmip6_oi10")
Now we can load and plot available datasets of the variable "temp_global_ave" from the product "025deg_jra55_iaf_omip2_cycle1" using

# Once you are sufficiently happy with your search, you can load the "section"
catalog_sections = access_nri_catalog_sections.search(name="025deg_jra55_iaf_omip2_cycle1").to_source()
# and start looking for the right catalogs of interest
catalogs_of_interest = catalog_sections.search(filename="ocean_scalar.*")
```python
import matplotlib.pyplot as plt

# Call the client that allows use load the data efficiently
client = Client(threads_per_worker=1)
client.dashboard_link
dataset_dict = catalog["025deg_jra55_iaf_omip2_cycle1"].search(
variable="temp_global_ave"
).to_dataset_dict()

# Actually load the data
experiment_data = catalogs_of_interest.to_dataset_dict(progressbar=False)
# `dataset_dict` contains two xarray Datasets, one at daily frequency and one at monthly
dataset_dict["ocean_scalar_snapshot.1day"]["temp_global_ave"].plot(label="daily")
dataset_dict["ocean_scalar.1mon"]["temp_global_ave"].plot(label="monthly")
plt.title("")
plt.legend()
plt.grid()
```

# Et voilà, you have loaded the data and can start plotting
experiment_data["ocean_scalar_snapshot.1day"]["temp_global_ave"].plot(label="daily")
experiment_data["ocean_scalar.1mon"]["temp_global_ave"].plot(label="monthly")
_ = plt.legend()
```
<div style="text-align: center;">
<img src="../../assets/model_evaluation/intake_example.png" alt="Plot af timeseries of global average temperatures" width="50%"/>
</div>

0 comments on commit e3964bf

Please sign in to comment.