SeaExplorers send back and record two main types of files, glider files (*.gli.*
) that contain glider navigation information, and payload files (*.pld1.*
) that contain the science data. These can be subset files, *.sub.*
that Alseamar decimates for transmission, or they can be full resolution files from the glider (*.raw.*
), offloaded post mission. The raw or subset files need to be made available in a single directory for pyglider
to process.
You can download and expand example data using .get_example_data
:
import pyglider.example_data as pexamp
pexamp.get_example_data('./')
which will add a local directory example-data
to your current directory.
The processing routines all take a deployment.yaml
file as an argument, and information from this is used to fill in metadata and to map sensor names to NetCDF variable names. See {ref}ExDepl
, below.
There are four top-levels to the deployment.yaml
metadata
: The only field that is necessary here isglider_name
. The rest of the fields will be added to the netcdf files as top-level attributesglider_devices
: This is a list of the glider devices, and any information about them like make, mode, serial number. This is optional, and again is added to the netcdf top-level attributesnetcdf_variables
: These are necessary, and map from sensor name (e.g.source: GPCTD_CONDUCTIVITY
) to a data variable name (e.g.conductivity
). The fields other thansource:
are optional for the processing to run, and are placed in the attributes of the netCDF variable. However, note that many of these attributes are necessary for CF compliance.profile_variables
: This is a mapping for variables that are per-profile, rather than timeseries. They include variables like a mean position and time for the profile, and a mean derived ocean velocities.
The example script is relatively straight forward if there is no intermediate processing. See {ref}ExProc
, below.
Data comes from an input directory, and is translated to raw glider-dependent parquet files files and put in a new directory. These files are useful of their own right. Apache Parquet is a columnar oriented format for storing tabular data. Parquet files take up less space than netCDF or csv and are much faster to read and write. These files can be opened with polars.read_parquet or pandas.read_parquet. These files are then merged into a single monolithic parquet file, and this is translated to a CF-compliant timeseries netCDF file. Finally individual profiles are saved and a 2-D 1-m grid in time-depth is saved.
It is likely that between these steps the user will want to add any screening steps, or adjustments to the calibrations. PyGlider does not provide those steps.
(ExDepl)=
:language: yaml
(ExProc)=
:language: python