Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solve Memory Issues of QGField #50

Open
csyhuang opened this issue Aug 21, 2022 · 1 comment
Open

Solve Memory Issues of QGField #50

csyhuang opened this issue Aug 21, 2022 · 1 comment
Assignees
Labels
contributions welcome enhancement Proposed additional functionality to the package

Comments

@csyhuang
Copy link
Owner

A new QGField object is created at every timestamp. Figure out a more memory-efficient way to handle time-series data.

See pull request from Chris for details #45

@csyhuang csyhuang added enhancement Proposed additional functionality to the package contributions welcome labels Aug 21, 2022
@csyhuang csyhuang mentioned this issue Aug 29, 2022
4 tasks
@chpolste
Copy link
Collaborator

Maybe to clarify, what this issue is about:

Currently, when creating a QGDataset, all the QGField objects managed by the Dataset are immediately instanciated and kept for later use. This generally means there are (at least) two copies of any input/output data in memory when using the xarray interface: one in the QGField instances and one in the xarray datasets.

I would like to keep the data in xarray datasets/dataarrays as much and exclusively as possible and only temporarily make copies for the QGFields when computing. More data could then be handled with the same amout of memory I think. An issue here is that the three processing steps (interpolation, reference state, LWA/fluxes) depend on each other and use the QGField to store the intermediate results. So the objects must be kept around at least for some time in most applications.

Integrating further with dask would allow for distributing the computations onto multiple processors, enabling parallelism and allowing for even larger datasets to be processed. For this to work, the computation steps must be converted into a dask graph. I think this could be done either by using dask.delayed or maybe even better with xarray.apply_ufunc. I'm not sure though how well the OO interface can be wrapped with these, I think they are more geared towards a functional interface.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributions welcome enhancement Proposed additional functionality to the package
Projects
None yet
Development

No branches or pull requests

2 participants