You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, when creating a QGDataset, all the QGField objects managed by the Dataset are immediately instanciated and kept for later use. This generally means there are (at least) two copies of any input/output data in memory when using the xarray interface: one in the QGField instances and one in the xarray datasets.
I would like to keep the data in xarray datasets/dataarrays as much and exclusively as possible and only temporarily make copies for the QGFields when computing. More data could then be handled with the same amout of memory I think. An issue here is that the three processing steps (interpolation, reference state, LWA/fluxes) depend on each other and use the QGField to store the intermediate results. So the objects must be kept around at least for some time in most applications.
Integrating further with dask would allow for distributing the computations onto multiple processors, enabling parallelism and allowing for even larger datasets to be processed. For this to work, the computation steps must be converted into a dask graph. I think this could be done either by using dask.delayed or maybe even better with xarray.apply_ufunc. I'm not sure though how well the OO interface can be wrapped with these, I think they are more geared towards a functional interface.
A new QGField object is created at every timestamp. Figure out a more memory-efficient way to handle time-series data.
See pull request from Chris for details #45
The text was updated successfully, but these errors were encountered: