-
Notifications
You must be signed in to change notification settings - Fork 300
Description
On behalf of an Iris User:
I'm having some trouble reading/writing NetCDF files with Iris 2.4 for datasets with string type variables. I thought I should bring what I've found to your attention and also in the hope that you might have solutions (the solution that I've found would require a small tweak to Iris' NetCDF reading - so not a viable solution for me currently). I’ve added code to demonstrate at the end of this email.
We need to read/write NetCDF files for meteorological station data. These include a list of station names, stored in a NetCDF variable of length equal to the number of stations. This corresponds to a data dimension for observations, structured in a 2D array, station index by time (not include in the example code). There are two possible approaches: using old style NetCDF character/byte arrays (undesirable - does not support special characters in our international station database) or NetCDF4 style unicode strings.
We can create an example cube as follows (you might need Python 3.7 for the ü in Düsseldorf):station_names = np.array([u'Exeter', u'London', u'Düsseldorf'])station_cube = iris.cube.Cube(station_names, long_name='station_names')
To save this cube I need to give iris.save a fill_value to use. The save also doesn’t work if the string data are stored in a masked array.
The fill value is again a problem when loading from the saved NetCDF file (see traceback at the end of this email). There appear to be two problems at the failing line in iris/fileformats/netcdf.py:1. netCDF4.default_fillvals contains no default entry for fill values for strings (other that S1 non-unicode type)2. cf_var.dtype.str[1:] fails because, on loading, the cf_var.dtype for the string data is of type str which does not have an 'str' attribute.
The failing line in iris.fileformats.netcdf._get_cf_var_data reads:fill_value = getattr(cf_var.cf_data, '_FillValue',netCDF4.default_fillvals[cf_var.dtype.str[1:]])
Problems arise in two places:
- netCDF4.default_fillvals contains no default entry for string types other than the S1 (non-unicode) dtype. This is the same problem that we had when saving without a fill_value argument set.
- cf_var.dtype.str[1:] fails because the cf_var.dtype for the loaded string data is of type str, which does not have an str attribute.
I tried a nasty hack to stop Iris from looking for a default fill_value at the failing line. This works around the problem and the cube loads without issue. This clearly this isn't a viable solution for me to implement and I’m sure that I’m missing other complexities.
I hope that this makes sense and is of some use to you. Our current solution involves over 10,000 individual NetCDF files, one for each station, as we can store Unicode strings in NetCDF attributes with no problem. The large overhead for I/O of lots of small NetCDF files is rather cumbersome in our application and for end users of the dataset.
Example code for station name I/Oimport numpy as npimport iris
filename = 'string_test.nc'
Setup a numpy array of station names to be saved. Umlaut may not work prior to python 3.7.#station_names = np.array([u'Exeter', u'London', u'Düsseldorf'])station_names = np.array([u'Exeter', u'London', u'Dusseldorf'])
Make our cube to save - station_names cannot be a masked array or iris.save fall overstation_cube = iris.cube.Cube(station_names, long_name='station_names')
Save and load to test. fill_value must be set or iris.save will fall over (no corresponding data type in netCDF4.default_fillvals).iris.save(station_cube, filename, fill_value='N/A')
Reload data. This failsloaded_station_cube = iris.load_cube(filename)
This returns the following traceback:Traceback (most recent call last):File "", line 1, in File "/[path]/lib/python3.7/site-packages/iris/init.py", line 387, in load_cubecubes = _load_collection(uris, constraints, callback).cubes()File "/[path]/lib/python3.7/site-packages/iris/init.py", line 325, in _load_collectionresult = iris.cube._CubeFilterCollection.from_cubes(cubes, constraints)File "/[path]/lib/python3.7/site-packages/iris/cube.py", line 157, in from_cubesfor cube in cubes:File "/[path]/lib/python3.7/site-packages/iris/init.py", line 312, in _generate_cubesfor cube in iris.io.load_files(part_names, callback, constraints):File "/[path]/lib/python3.7/site-packages/iris/io/init.py", line 210, in load_filesfor cube in handling_format_spec.handler(fnames, callback):File "/[path]/lib/python3.7/site-packages/iris/fileformats/netcdf.py", line 714, in load_cubescube = _load_cube(engine, cf, cf_var, filename)File "/[path]/lib/python3.7/site-packages/iris/fileformats/netcdf.py", line 524, in _load_cubedata = _get_cf_var_data(cf_var, filename)File "/[path]/lib/python3.7/site-packages/iris/fileformats/netcdf.py", line 510, in _get_cf_var_datanetCDF4.default_fillvals[cf_var.dtype.str[1:]])AttributeError: type object 'str' has no attribute 'str'