You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The scope of what is currently named miranda.convert should be explicitly focused on the treatment of xarray-compatible data formats (i.e. from NetCDF and Zarr) as Dataset objects, to Dataset objects. All functionality focused on conversion (i.e. from CSV, FWF, MySQL, etc.) should be in a separate module. The API for these conversion functions should be similar between projects, with functional layout as follows:
Private, used to pass keyword arguments to file-naming function and leverage miranda.io for outputting files.
Called from public function via call signature (output_dir=Path() and file_format={"zarr", "netcdf")
function_that_can_be_called_by_user()
Public, should have a near-identical call signature between projects.
Should be written in such a way that dask or multiprocessing kwargs can be passed along to it for asynchronous conversion (if possible).
Values and metadata corrections should NOT be performed at this step. The goal is to have objects that are easily passed to the existing miranda.convert pipeline. Conversion to CF-compliant values should not be performed here.
Irrelevant of project, these functions should make use of a handful of common configurations:
IO:
Function that writes out NetCDF or Zarr files, separated by variable (gridded) or station*variable (station-obs)
Function that names files based using a standardized approach based on keyword arguments supplied to it.
Metadata:
Like in miranda.convert, JSON files should be leveraged for populating the metadata of newly converted datasets.
For simplicity (until a common JSON schema is determined), the JSON files used for data corrections should not be shared with those for data conversion.
It's not clear to me whether we should be drilling down further by having sub-sub-modules by data provider. To be determined.
The text was updated successfully, but these errors were encountered:
Proposal
Approach
The scope of what is currently named
miranda.convert
should be explicitly focused on the treatment ofxarray
-compatible data formats (i.e. from NetCDF and Zarr) asDataset
objects, toDataset
objects. All functionality focused onconversion
(i.e. from CSV, FWF, MySQL, etc.) should be in a separate module. The API for these conversion functions should be similar between projects, with functional layout as follows:_function_that_contains_logic_to_convert_to_xarray()
_function_that_helps_write_out_datasets_as_files()
miranda.io
for outputting files.output_dir=Path()
andfile_format={"zarr", "netcdf"
)function_that_can_be_called_by_user()
dask
ormultiprocessing
kwargs can be passed along to it for asynchronous conversion (if possible).Values and metadata corrections should NOT be performed at this step. The goal is to have objects that are easily passed to the existing
miranda.convert
pipeline. Conversion to CF-compliant values should not be performed here.Irrelevant of project, these functions should make use of a handful of common configurations:
IO:
Metadata:
miranda.convert
, JSON files should be leveraged for populating the metadata of newly converteddatasets
.It's not clear to me whether we should be drilling down further by having sub-sub-modules by data provider. To be determined.
The text was updated successfully, but these errors were encountered: