|
| 1 | +# Copyright Iris contributors |
| 2 | +# |
| 3 | +# This file is part of Iris and is released under the BSD license. |
| 4 | +# See LICENSE in the root of the repository for full licensing details. |
| 5 | +"""Module providing to netcdf datasets with automatic character encoding. |
| 6 | +
|
| 7 | +The requirement is to convert numpy fixed-width unicode arrays on writing to a variable |
| 8 | +which is declared as a byte (character) array with a fixed-length string dimension. |
| 9 | +
|
| 10 | +Numpy unicode string arrays are ones with dtypes of the form "U<character-width>". |
| 11 | +Numpy character variables have the dtype "S1", and map to a fixed-length "string |
| 12 | +dimension". |
| 13 | +
|
| 14 | +In principle, netCDF4 already performs these translations, but in practice current |
| 15 | +releases are not functional for anything other than "ascii" encoding -- including UTF-8, |
| 16 | +which is the most obvious and desirable "general" solution. |
| 17 | +
|
| 18 | +There is also the question of whether we should like to implement UTF-8 as our default. |
| 19 | +Current discussions on this are inconclusive and neither CF conventions nor the NetCDF |
| 20 | +User Guide are definite on what possible values of "_Encoding" are, or what the effective |
| 21 | +default is, even though they do both mention the "_Encoding" attribute as a potential |
| 22 | +way to handle the issue. |
| 23 | +
|
| 24 | +Because of this, we interpret as follows: |
| 25 | + * in the absence of an "_Encoding" attribute, we will attempt to decode bytes as UTF-8 |
| 26 | + * when writing string data, in the absense of an "_Encoding" attribute (on the Iris |
| 27 | + cube or coord object), we will attempt to encode data with "ascii" : If this suceeds, |
| 28 | + we will save as is (with no "_Encoding" attribute), but if it fails we will encode |
| 29 | + as UTF-8 **and** add an "_Encoding='UTF-8'" attribute. |
| 30 | +
|
| 31 | +Where an "_Encoding" attribute is provided to Iris, we will honour it where possible, |
| 32 | +identifying with "codecs.lookup" : This means we support the encodings in the Python |
| 33 | +Standard Library, and name aliases which it recognises. |
| 34 | +
|
| 35 | +See: |
| 36 | +
|
| 37 | +* known problems https://github.com/Unidata/netcdf4-python/issues/1440 |
| 38 | +* suggestions for how this "ought" to work, discussed in the netcdf-c library |
| 39 | + * https://github.com/Unidata/netcdf-c/issues/402 |
| 40 | +
|
| 41 | +""" |
| 42 | +from iris.fileformats.netcdf._thread_safe_nc import DatasetWrapper |
| 43 | + |
| 44 | +class EncodedDataset(DatasetWrapper): |
| 45 | + """A dataset wrapper that translates variable data according to byte encodings.""" |
| 46 | + pass |
| 47 | + |
0 commit comments