Skip to content

Commit

Permalink
New guide on netCDF file format. Additional content needed
Browse files Browse the repository at this point in the history
  • Loading branch information
vmartinez-cu committed Jul 25, 2024
1 parent 6a8855c commit b9b5354
Showing 1 changed file with 107 additions and 0 deletions.
107 changes: 107 additions & 0 deletions docs/source/_static/data_management/file_formats/netcdf.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
# NetCDF
>**Warning**
> This guide needs additional information
NetCDF (Network Common Data Form), is a file format that stores data in arrays. Array values may be accessed directly,
without knowing how the data are stored, and metadata information may be stored with the data.

* Binary file format commonly used for scientific data
* Self-describing, includes metadata
* Multi-dimensional array data model

#### Data Model (Essentials)
* variable
* Multi-dimensional array
* Column-oriented: each variable as a separate entity
* dimension
* Usually temporal, spatial, spectral, ...
* Can be unlimited length. One, at most, is recommended for a growing time dimension
* attribute
* Metadata: global and variable level
* group
* Akin to directories
* Avoid unless you really need the complex structure


## Purpose for this guideline

#### Why Use NetCDF?
* Self-describing
* structure captures coordinate system (functional relationship)
* includes metadata
* Efficient storage
* packing
* compression
* Efficient access
* chunking
* http byte range
* parallel IO
* Open specification (unlike IDL save files)

## Options for this guideline

* NetCDF-3 classic
* NetCDF-4 built on HDF5
* recommended but prefer classic constructs

## How to apply this guideline

#### NetCDF Files
* Binary format with open specification
* Requires software libraries to read and write C, Fortran, Java, python, IDL, ...
* Internal compression, don't bother to compress NetCDF files externally
* HTTP byte range requests
* Parallel IO
* nc file extension
* Don't be afraid of big files

#### Coordinate System
* Dimensions should be used to define a coordinate system
* e.g. temporal, spatial, spectral
* Avoid using dimensions to group data
* Think "functional relationship". Each independent variable should represent a dimension.
* coordinate variable
* 1D variable with dimension of the same name
* strictly monotonic (ordered)
* no missing values
* Independent variable of functional relationship
* Every dimension should have one
* shared dimensions
* Each variable should reuse dimensions to indicate that they share the same coordinates (domain set)

#### Time as Coordinate Variable
* If the data are a function of a single time dimension then there should be a single time variable
* avoid breaking time up by date and time of day
* Prefer numeric time units
* time unit since an epoch
* e.g. "seconds since 1970-01-01", "microseconds since 1980-01-06"

#### Metadata
* Optional but useful to make NetCDF file self-describing
* attribute
* global (dataset level)
* title
* history (provenance)
* variable
* long_name
* units
* Conventions
* Climate and Forecast (CF)
* Attribute Convention for Data Discovery (ACDD)
* udunits: standard units

#### Other useful variable attributes
* missing_value
* prefer over _FillValue
* NaN is a good option
* valid_range, valid_min, valid_max
* scale_factor, add_offset (packed values)
* cell_methods: standards for representing data cells (bins)
* e.g. daily average, wavelength bins

## Useful Links
* [NetCDF User's Guide](https://docs.unidata.ucar.edu/nug/current/)
* [NetCDF ToolsUI](https://docs.unidata.ucar.edu/netcdf-java/current/userguide/toolsui_ref.html)


Credit: Content taken from a Confluence guide written by Doug Lindholm

0 comments on commit b9b5354

Please sign in to comment.