From b9b53546fc7b0a916addc8c6d31d68025a44121a Mon Sep 17 00:00:00 2001 From: Veronica Martinez Date: Thu, 25 Jul 2024 16:50:46 -0600 Subject: [PATCH] New guide on netCDF file format. Additional content needed --- .../data_management/file_formats/netcdf.md | 107 ++++++++++++++++++ 1 file changed, 107 insertions(+) create mode 100644 docs/source/_static/data_management/file_formats/netcdf.md diff --git a/docs/source/_static/data_management/file_formats/netcdf.md b/docs/source/_static/data_management/file_formats/netcdf.md new file mode 100644 index 0000000..172d314 --- /dev/null +++ b/docs/source/_static/data_management/file_formats/netcdf.md @@ -0,0 +1,107 @@ +# NetCDF +>**Warning** +> This guide needs additional information + +NetCDF (Network Common Data Form), is a file format that stores data in arrays. Array values may be accessed directly, +without knowing how the data are stored, and metadata information may be stored with the data. + +* Binary file format commonly used for scientific data +* Self-describing, includes metadata +* Multi-dimensional array data model + +#### Data Model (Essentials) +* variable + * Multi-dimensional array + * Column-oriented: each variable as a separate entity +* dimension + * Usually temporal, spatial, spectral, ... + * Can be unlimited length. One, at most, is recommended for a growing time dimension +* attribute + * Metadata: global and variable level +* group + * Akin to directories + * Avoid unless you really need the complex structure + + +## Purpose for this guideline + +#### Why Use NetCDF? +* Self-describing + * structure captures coordinate system (functional relationship) + * includes metadata +* Efficient storage + * packing + * compression +* Efficient access + * chunking + * http byte range + * parallel IO +* Open specification (unlike IDL save files) + +## Options for this guideline + +* NetCDF-3 classic +* NetCDF-4 built on HDF5 + * recommended but prefer classic constructs + +## How to apply this guideline + +#### NetCDF Files +* Binary format with open specification +* Requires software libraries to read and write C, Fortran, Java, python, IDL, ... +* Internal compression, don't bother to compress NetCDF files externally +* HTTP byte range requests +* Parallel IO +* nc file extension +* Don't be afraid of big files + +#### Coordinate System +* Dimensions should be used to define a coordinate system + * e.g. temporal, spatial, spectral + * Avoid using dimensions to group data + * Think "functional relationship". Each independent variable should represent a dimension. +* coordinate variable + * 1D variable with dimension of the same name + * strictly monotonic (ordered) + * no missing values + * Independent variable of functional relationship + * Every dimension should have one +* shared dimensions + * Each variable should reuse dimensions to indicate that they share the same coordinates (domain set) + +#### Time as Coordinate Variable +* If the data are a function of a single time dimension then there should be a single time variable + * avoid breaking time up by date and time of day +* Prefer numeric time units + * time unit since an epoch + * e.g. "seconds since 1970-01-01", "microseconds since 1980-01-06" + +#### Metadata +* Optional but useful to make NetCDF file self-describing +* attribute + * global (dataset level) + * title + * history (provenance) + * variable + * long_name + * units +* Conventions + * Climate and Forecast (CF) + * Attribute Convention for Data Discovery (ACDD) + * udunits: standard units + +#### Other useful variable attributes +* missing_value + * prefer over _FillValue + * NaN is a good option +* valid_range, valid_min, valid_max +* scale_factor, add_offset (packed values) +* cell_methods: standards for representing data cells (bins) + * e.g. daily average, wavelength bins + +## Useful Links +* [NetCDF User's Guide](https://docs.unidata.ucar.edu/nug/current/) +* [NetCDF ToolsUI](https://docs.unidata.ucar.edu/netcdf-java/current/userguide/toolsui_ref.html) + + +Credit: Content taken from a Confluence guide written by Doug Lindholm \ No newline at end of file