Skip to content

Latest commit

 

History

History
168 lines (168 loc) · 7 KB

File metadata and controls

168 lines (168 loc) · 7 KB

Yadav and Michalak (2013) Kronecker products

Y&M (2013)-style linops for 3 operators

Y&M (2013)-style linops for two arrays

Y&M2013 reduced posterior uncertainties

We don’t usually care about uncertainties at a three-hour scale. Summing the first axis of B to monthly time scale and reporting that should be plenty. We don’t often care about sub-monthly fluxes either, but the flexibility can be useful.

Implement

Test implementation

Compare performance for different input chunk-sizes

Doing everything in memory is only slightly slower, so there should be some optimum where caching is still good but I’m not spending so long building a huge dask graph and iterating through it. Around 4e3 on a side seems decent. Smaller chunks run out of memory creating the task graph, while larger chunks do a bunch of work then get stuck in sem_wait and pthread_cond_wait. Keeping everything in memory is a good way to avoid straying from the narrow range where dask works. This is now implemented.

Investigate nonzero increments for spatial structure

R packages geostatsp, geoR, fields, and RandomFields would probably be useful. Python package scikit-gstat. To be clear, this would use the “large” increments to try to fit a variogram, and use that to iterate.

Check whether storing spectrum as a numpy array makes HomogeneousIsotropicCorrelations faster

It may be re-calculating the weights on each trip through, which would be really slow. RESULT: If everything is a dask calculation, this stays in memory.

test LinearOperator subclasses with other shapes

Should this to work? Check the scipy-provided implementations: find one that doesn’t delegate to :meth:`dot` or :meth:`solve`, which are designed to handle this case. If LinearOperator handles this, code can be simpler. If not, expand things to handle this and test that.

LinearOperator[N, N].dot(array_like[…, N])

I’m not entirely sure where I’d use this. What I can use more easily is LinearOperator[…, N, N].dot(array_like[N, K])

LinearOperator[N, N].dot(array_like[N, K])

B H_T does this, and is tested by the solvers. I should write a specific test for this, though.

Put glue code from example inversion into wrapper module

Function to align influence functions

Implement

Test

Function running inversion given correlation functions et al.

Takes prior(s?), influence function, spatial and temporal correlation functions for obs and prior errors, and the lower-level method to use; turns these in to relevant operators; reshapes everything for the inversion, and runs it.

Have this also generate and use the reduced B to avoid full A.

Still need to implement this in the solvers. For variational methods, it’s faster to use some other minimizer and have the wrapper calculate the errors. For OI, I already have (HBHT + R) calculated in full, so using this directly might be faster.

Check whether saving and restoring (HBHT+R) works for a month

I think the current method may be trying to hold BH^T in memory, to avoid extra computation. Saving and restoring this matrix would make it forget it ever knew the part, and recompute it as needed. RESULT: Kinda.

Figure out interface for reduced A

Var is fairly straightforward (use CG or similar minimizer, calculate A from scratch in wrapper), and PSAS is unreliable enough that a similar approach may be a good idea. OI calculates (HBH^T + R) in full already, so using this directly is (in theory) more accurate. However, I would then need to pass in either two versions for both H and B (inside and outside parens) or three of B (reduced, half reduced, and full). $ (I - KH) B = B - B H^T (H B H^T + R)-1 H B $ RESULT: Reduced B and H

Check whether fancier statistical methods work better.

Statsmodels StateSpace API might make VARIMA simplish, Python Arch package might allow GARCH on top of that. Arch also has a few cross-validators built-in.

Learn about cross-validation and implement it.

Check whether optimization ideas help

python3.4.4 cluster_python

2.75h cpu ~82GiB mem 4.75 wall

python3.6.6 inversion_environment default numpy 1.14.5

2.75h cpu ~80GiB mem 115GiB vmem 3.2h wall Second run, with non-nan July observations: 4.1h cpu 90GiB mem 111GiB vmem 4.25h wall

inversion_environment einsum

9 towers, non-nan July obs and file prior 2.1h cpu 139GiB mem 176GiB vmem 1.6h wall 7 towers, matching July obs 7.5h cpu 160GiB mem 178GiB vmem 8.4h wall 7-day Gaussian temporal error correlations assumed 6h56 cpu 159GiB mem 176GiB vmem 5h39 wall 21-day batch inversion, don’t calculate B_HT 1h53 to load data, 3h7 for inversion 5h19 cpu 116 GiB mem 136 GiB vmem 3h11 wall 30-day batch inversion 4 hours to load data >3 hours for inversion 26-day batch inversion 5 procs for part of loading 2h07 to load data, 4h12 for inversion, 19m to write (wall) 7h17 cpu 157GiB mem 178GiB vmem 6h34 wall 30-day batch inversion, einsum output F-order, use proper flux times 10m to load data, 1h31 for inversion, 9m to write (wall) 3h42 cpu 111GiB mem 133GiB vmem 1h51 wall

optimize YMKronecker product sums

100x100 and 1000x1000 matrices, 100000x10 test vec bn.nansum: 662 ms/loop ndarray.sum: 628 ms/loop numexpr: 669 ms/loop preallocate: 609 ms/loop reshape and transpose: 536 ms/loop np.einsum: 255 ms/loop

optimize YMKronecker quadratic form

100x100 and 1000x1000 matrices, 100000x10 test vec vec.T @ (op @ vec): 263 ms/loop op.quadratic_form(vec): 273 ms/loop

Test netCDF4 vs. h5netcdf backends

netCDF4: half an hour to load data h5netcdf: refuses to work netCDF4: 5 hours to load data netCDF4: 2 hours to load data netCDF4+dask: 1h53 to load

See where I can use np.linalg.multidot

Rewrite integrators using wrapper and generators

Get a toepelitz matrix implementation using np.as_strided working

Implement Multivariate Laplace noise

Ref: Samuel Kotz, Tomaz J. Kozubowski, Krzysztof Podgórski The Laplace Distribution and Generalizations A Revisit with Applications to Communications, Economics, Engineering, and Finance URL: https://link.springer.com/book/10.1007%2F978-1-4612-0173-1#about DOI: 10.1007/978-1-4612-0173-1 ISBN: 978-1-4612-6646-4 eISBN: 978-1-4612-0173-1 Chapter seven has an algorithm for generating multivariate asymmetric Laplace random variables, which has multivariate elliptically symmetric Laplace random variables as a special case. I only need the symmetric case, but the general case might be a good addition to scipy.

CT-Lagrange compatibility

Interface described here: https://www.esrl.noaa.gov/gmd/ccgg/carbontracker-lagrange/doc/config.html