json2netcdf is a Python package to convert JSON data into NetCDF4 data. The motivation? A quick and easy way to write NetCDF input files without having to hand-craft a script to do so. JSON files are simple, easy to understand and write, and, crucially, follow a hierarchical format.
Features:
- Programmatic and command line interfaces.
- Converts well-formatted JSON files and Python dictionaries to NetCDF files.
- NetCDF files can be physical or in-memory (diskless).
- Nested JSON files can be specified.
- Internally uses the Python
netCDF4
package and returnsDataset
objects. - Groups, attributes, dimensions, variables and multiple datatypes are supported.
You can use pip to install json2netcdf as a progammatic and command line interface:
$ pip install json2netcdf
A Conda environment file is also provided with the required libraries for developing or extending the package.
$ conda env create -f environment.yml
$ conda activate json2netcdf
The package has one main method, convert
, which does the file/data conversion. See below for the required formating for JSON files.
>>> import json2netcdf
>>> json2netcdf.convert(from_json={'my_var': 42}, diskless=True)
<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
dimensions(sizes):
variables(dimensions): int64 my_var()
groups:
from_json
can be a Python dictionary or the path to a JSON file. Set diskless
to True
for an in-memory NetCDF dataset to be returned (default is False
). to_netcdf
can be used to specify the location of the output NetCDF file you want (defaults to data.nc
). The convert
method can be used as a context manager, and if it isn't, the user is responsible for closing the returned dataset (nc_file.close()
). Using the example/data.json file:
>>> with json2netcdf.convert(from_json='example/data.json', to_netcdf='data.nc') as nc_file:
... nc_file['var_group']['spatial_var']
...
<class 'netCDF4._netCDF4.Variable'>
int64 spatial_var(x, y)
path = /var_group
unlimited dimensions:
current shape = (2, 2)
filling on, default _FillValue of -9223372036854775806 used
For more information on using the returned NetCDF Dataset
object, see the netCDF4 library documentation.
There is a command line interface which acts as a wrapper around json2netcdf.convert
. It requires you to specify paths to the input JSON file and output NetCDF file:
usage: json2netcdf [-h] [-v] input output
Convert JSON to NetCDF files.
positional arguments:
input path to the input JSON file
output path to store the output NetCDF file to
optional arguments:
-h, --help show this help message and exit
-v, --verbose make terminal output more verbose
Your JSON data must be well formatted, following the conventions described below. Take a look at the example JSON file at example/data.json for an idea of how to format your JSON file. In this example, we are trying to create a NetCDF file with the following data structure:
var_group (group)
spatial_var = [[1, 2], [3, 4]]
spatiotemporal_var = [[[1.1, 1.2], [1.3, 1.4]], [[1.5, 1.6], [1.7, 1.8]]]
scalar_var = 42.0
The file will have one group var_group
, with two variables: spatial_var
is a 2D array and spatiotemporal_var
is a 3D array. There is also a scalar variable scalar_var
which only belongs to the root group. As this is a NetCDF file, we need to specify dimensions for the variables, so let's say that spatial_var
has (x,y)
dimensions, and spatiotemporal_var
has (x,y,t)
dimensions. In this example, each of these has a size of 2. To define this is the JSON file, we create a dimensions
object:
{
"dimensions" : {
"x" : 2, "y" : 2, "t": 2
}
}
We can now create an object for the var_group
and place these variables in it. The square bracket notation is used to tell the script what dimensions your variables have. We will also create scalar_var
in the root group, which doesn't have any dimensions associated to it:
{
"dimensions" : {
"x" : 2, "y" : 2, "t": 2
},
"var_group" : {
"spatial_var[x,y]" : [[1, 2], [3, 4]],
"spatiotemporal_var[x,y,t]" : [[[1.1, 1.2], [1.3, 1.4]], [[1.5, 1.6], [1.7, 1.8]]]
},
"scalar_var": 42.0
}
Here, the dimensions are available from the root group (i.e. to all groups in the NetCDF file's hierarchy). If you want to add dimensions specifically for certain groups, you can include a dimensions
object within that group.
The datatype of the variable will be automatically deduced. In this example, spatial_var
will have a datatype of int64
, and the other variables will have a datatype of double
. Internally, NumPy is responsible for deducing the variable type and at this moment in time, there is no way to specify what datatype your variable is (pull requests are welcome!)
Attributes can be added to the NetCDF file by creating an attributes
object in the group you wish to add the attributes to. For example, to add attributes to the root group:
{
"dimensions" : {
"x" : 2, "y" : 2, "t": 2
},
"attributes" : {
"description" : "Example data file",
"author" : "Sam Harrison"
},
"var_group" : {
"spatial_var[x,y]" : [[1, 2], [3, 4]],
"spatiotemporal_var[x,y,t]" : [[[1.1, 1.2], [1.3, 1.4]], [[1.5, 1.6], [1.7, 1.8]]]
},
"scalar_var": 42.0
}
Attributes cannot yet be added to variables. Pull requests are welcome!.
The main input file specified when running the script can contain reference to other JSON files in its JSON data structure, so that large data sets can be split. The path to the external file must be prefixed with file::
and the contents of that file will be imported in the same place as the path to the file. Therefore, either variables or entire groups can be imported. An example is given at example/external.json:
{
"dimensions" : {
"x" : 10,
"y" : 5
},
"external_group" : "file::external_group.json",
"external_var[x,y]" : "file::external_var.json"
}
Bear in mind, if importing an array variable, the dimensions of the array must be present in the parent file. Imported files can themselves include file imports.
Whilst this library is primarily for JSON to NetCDF4 conversion, you will notice that it is really just a Python dictionary to NetCDF4 converter with the ability to also open JSON files. This means it can be flexibly used to convert other markup languages without too much trouble. For example, say we have the following YAML:
dimensions:
x: 4
var_group:
my_var[x]: [1, 2, 3, 4]
We can use the PyYAML library to load this as a dict, before converting it to NetCDF:
>>> import yaml
>>> import json2netcdf
>>> data = yaml.load("""
... dimensions:
... x: 4
... var_group:
... my_var[x]: [1, 2, 3, 4]
... """)
>>> json2netcdf.convert(data, diskless=True)
<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
dimensions(sizes): x(4)
variables(dimensions):
groups: var_group
This script is a simple way to create NetCDF files from JSON data, and doesn't support the full feature set that NetCDF offer. A few specific limitations are:
- There is no way to specify attributes for variables, only groups.
- Datatypes are implied and cannot be set explicitly.
- Only NetCDF4 files can be created.