.. currentmodule:: pygeode
We'll start by taking a look at a couple of synthetic datasets defined in the tutorial module of PyGeode. The first such dataset can be imported as follows:
.. ipython:: In [1]: import pygeode as pyg In [1]: from pygeode.tutorial import t1 In [2]: print(t1)
The dataset t1
contains one pygeode variable object, Temp
, which is
defined on a two dimensional grid of 31 latitudes and 60 longitudes. The grid
is defined by the two pygeode axes objects, lat
and lon
, which span
from 85 S to 85 N, and from 0 E to 354 E. respectively.
We can take a closer look at our variable Temp
:
.. ipython:: In [3]: print(t1.Temp)
Again, we see both physical and numerical details of the grid on which this variable is defined, as well that the temperatures are defined in degrees Kelvin.
Finally, we can look as well at one of our axes:
.. ipython:: In [4]: print(t1.lat)
which we could also access as a member of the variable Temp
: t1.Temp.lat
.
These three types of objects, variables, axes, and datasets, are the core of PyGeode. Variables (:class:`Var`) encapsulate a scalar field such as temperature or one component of the wind field. Axes (:class:`Axis`), which are in fact special cases of variables, define the geophysical grid on which the scalar field exists. Finally, datasets (:class:`Dataset`) are containers for variables. These concepts match reasonably closely to their counterparts in NetCDF files (and indeed PyGeode works very naturally with NetCDF files). We'll learn more about all three soon, but to get a quick feel of what PyGeode can do, let's try plotting the contents of our first variable:
.. ipython:: # Depending on what python interpreter you are using, you may need to run # these commands to access the plotting functionality of matplotlib In [6]: import pylab as pyl; pyl.ion(); @suppress In [6]: import matplotlib as mpl; mpl.rc('figure', figsize=(5, 3)) @savefig t1Temp.png width=5in In [6]: pyg.showvar(t1.Temp)
PyGeode annotates the axes automatically, and if the Cartopy package is installed, data on a lat-lon grid is plotted over the outline of the continents using a cylindrical projection. All plots generated by PyGeode are in fact generated using the matplotlib library; the plotvar command simply makes many educated guesses about the type of plot you might want to generate based on the variable you've passed in. Many aspects of the plot can be customized through the plotvar interface (which we'll get into later on in this tutorial), but if you find you can't tweak things just so, the plot can always be manipulated using the matplotlib library itself which gives you full control.
Now let's take a look at a second dataset:
.. ipython:: In [8]: from pygeode.tutorial import t2 In [9]: print(t2)
This is a somewhat more complicated temperature field, now defined on four dimensions: time, pressure, latitude and longitude. Note this grid has over 350 million data points--enough that the 2010-era laptop this tutorial was first written on would complain if we attempted to load all of it into memory at once. This brings us to one of the major guiding assumptions of pygeode - that since we haven't done anything that needs the data itself, none of it has yet been loaded or computed. We can, nonetheless, manipulate this variable as if it has:
.. ipython:: In [10]: Tc = pyg.climatology(t2.Temp) # Compute the climatology In [11]: Tcz = Tc.mean('lon') # Compute the zonal mean In [12]: print(Tcz) In [13]: Tcp = Tc - Tcz # Compute anomaly from the zonal mean In [13]: Tcp = Tcp.rename("T'") # Rename variable In [14]: print(Tcp)
We've now computed (at least an abstract sense) the climatolgical anomaly from
the zonal mean. Note that Tcz
is defined on a reduced grid (we've lost the
Lon
axis), and it looks and behaves just like any other PyGeode variable;
PyGeode has some automatic logic for properly aligning Tc
and Tcz
when
taking their difference. However, no data has yet been loaded, no actual
averages have been taken, and no differences computed, since we haven't done
anything that needs the data. If we now go and plot the near-surface,
high-latitude anomaly of our synthetic dataset:
.. ipython:: @savefig Tcp.png width=6in In [15]: pyg.showvar(Tcp(pres=1000, time='15 Jun 00'))
We now actually need to perform these operations (though only on one pressure level and one latitude), so PyGeode goes back and accesses the required data (only), then carries out all of our operations before sending the data off to matplotlib to produce a nice contour plot.
Implied in the operations above is another principle of PyGeode. Note that
nowhere did we need to remember which dimension of an array corresponded to
time and which corresponded to longitude; nor did we need to know what index
described the 1000 hPa pressure level, or which corresponded to the 15th of
June. Moreover, we could take the difference between two fields (Tc
and
Tcz
) which weren't defined on the same grid--PyGeode takes care of the
broadcasting and alignment of the underlying numpy arrays for us, leaving us to
think about the problem we're trying to solve in the physical coordinate space
of our data. The underlying mapping is of course still there, and if it's ever
more convenient to think in the space of the data arrays, you are still free to
do so; the relevant syntax and commands are described later in this tutorial.
'This is all fine and good,' you may be thinking, 'but how do I work with my own data?' PyGeode supports a number of data formats, though it works most naturally with NetCDF (or HDF5) files. As an example, the synthetic dataset used above can be written out to a NetCDF file (this assumes that a subdirectory called 'sample_data' already exists in your working directory):
.. ipython:: In [17]: pyg.save('sample_data/file.nc', t1)
and then read back in using:
.. ipython:: In [18]: ds = pyg.open('sample_data/file.nc') In [19]: print(ds)
PyGeode is aware of the metadata in the file and constructs variables and axes accordingly, following the CF metadata standard used for climate data. Datasets that span multiple files can be handled as well, and one can tailor these calls so that PyGeode properly recognizes your data regardless of the metadata in the file.
This concludes the introductory tutorial. We've seen the three basic objects in PyGeode: variables (:class:`Var`), axes (:class:`Axis`) and datasets (:class:`Dataset`). We've seen examples of how PyGeode carries out all operations in a 'lazy' fashion, delaying all loading and processing of data until the output is explicitly required. We've seen a few examples of the automatic annotation of plots that PyGeode takes care of for you, and finally we've seen how to save and load data to a file. The next part of these tutorials gives a deeper introduction to the :doc:`tut_basics` that one can perform on PyGeode variables.