Skip to content

Commit 2a4b614

Browse files
committed
Initial dataset wrappers.
1 parent 729d0b5 commit 2a4b614

File tree

1 file changed

+47
-0
lines changed

1 file changed

+47
-0
lines changed
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
# Copyright Iris contributors
2+
#
3+
# This file is part of Iris and is released under the BSD license.
4+
# See LICENSE in the root of the repository for full licensing details.
5+
"""Module providing to netcdf datasets with automatic character encoding.
6+
7+
The requirement is to convert numpy fixed-width unicode arrays on writing to a variable
8+
which is declared as a byte (character) array with a fixed-length string dimension.
9+
10+
Numpy unicode string arrays are ones with dtypes of the form "U<character-width>".
11+
Numpy character variables have the dtype "S1", and map to a fixed-length "string
12+
dimension".
13+
14+
In principle, netCDF4 already performs these translations, but in practice current
15+
releases are not functional for anything other than "ascii" encoding -- including UTF-8,
16+
which is the most obvious and desirable "general" solution.
17+
18+
There is also the question of whether we should like to implement UTF-8 as our default.
19+
Current discussions on this are inconclusive and neither CF conventions nor the NetCDF
20+
User Guide are definite on what possible values of "_Encoding" are, or what the effective
21+
default is, even though they do both mention the "_Encoding" attribute as a potential
22+
way to handle the issue.
23+
24+
Because of this, we interpret as follows:
25+
* in the absence of an "_Encoding" attribute, we will attempt to decode bytes as UTF-8
26+
* when writing string data, in the absense of an "_Encoding" attribute (on the Iris
27+
cube or coord object), we will attempt to encode data with "ascii" : If this suceeds,
28+
we will save as is (with no "_Encoding" attribute), but if it fails we will encode
29+
as UTF-8 **and** add an "_Encoding='UTF-8'" attribute.
30+
31+
Where an "_Encoding" attribute is provided to Iris, we will honour it where possible,
32+
identifying with "codecs.lookup" : This means we support the encodings in the Python
33+
Standard Library, and name aliases which it recognises.
34+
35+
See:
36+
37+
* known problems https://github.com/Unidata/netcdf4-python/issues/1440
38+
* suggestions for how this "ought" to work, discussed in the netcdf-c library
39+
* https://github.com/Unidata/netcdf-c/issues/402
40+
41+
"""
42+
from iris.fileformats.netcdf._thread_safe_nc import DatasetWrapper
43+
44+
class EncodedDataset(DatasetWrapper):
45+
"""A dataset wrapper that translates variable data according to byte encodings."""
46+
pass
47+

0 commit comments

Comments
 (0)