Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyMC/PyTensor Implementation of Pathfinder VI #387

Open
wants to merge 22 commits into
base: main
Choose a base branch
from

Conversation

aphc14
Copy link

@aphc14 aphc14 commented Oct 31, 2024

Another version to draft PR #386 which uses more of PyTensor's symbolic variables and compiling functions.

Questions for Review

  1. Which implementations should I continue for future improvements?
  2. Are there additional PyTensor optimisations we could leverage?

`fit_pathfinder`
- Edited `fit_pathfinder` to produce `pathfinder_state`, `pathfinder_info`, `pathfinder_samples` and `pathfinder_idata` for closer examination of the outputs.
- Changed the `num_samples` argument name to `num_draws` to avoid `TypeError` got multiple values for keyword argument 'num_samples'.
- Initial points are automatically set to jitter as jitter is required for pathfinder.

Extras
- New function 'get_jaxified_logp_ravel_inputs' to simplify previous code structure in fit_pathfinder.

Tests
- Added extra test for pathfinder to test pathfinder_info variables and pathfinder_idata  are consistent for a given random seed.
Add a new PyMC-based implementation of Pathfinder VI that uses PyTensor operations which provides support for both PyMC and BlackJAX backends in fit_pathfinder.
- Implemented  in  to support running multiple Pathfinder instances in parallel.
- Implemented  function in  for Pareto Smoothed Importance Resampling (PSIR).
- Moved relevant pathfinder files into the  directory.
- Updated tests to reflect changes in the Pathfinder implementation and added tests for new functionalities.
@aphc14
Copy link
Author

aphc14 commented Nov 4, 2024

Suppose the preferred approach is to stick with symbolic variables in PyTensor than the other non-symbolic approach in #386. In that case, I'd be happy to refactor the Multipath Pathfinder implementation in #386 to use symbolic variables and pytensor.function.

@aphc14 aphc14 force-pushed the pathfinder_w_pytensor_symbolic branch from 9bfc48c to ef2956f Compare November 7, 2024 18:04
@aphc14 aphc14 changed the title Pathfinder w pytensor symbolic PyMC/PyTensor Implementation of Pathfinder VI Nov 7, 2024
@aphc14
Copy link
Author

aphc14 commented Nov 7, 2024

This version runs much faster than #386, but the codes are messier due to the numerous pytensor symbolic variables created for the compiled pytensor functions (see the lines of code between def compute_logp and def single_pathfinder). Any suggestions for a cleaner setup would be appreciated

g: np.ndarray


class LBFGSHistoryManager:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaner to use a data class? Don't know.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, I agree. dataclass now added

Summaryh of changes:
- Remove multiprocessing code in favour of reusing compiled  for each path
-  takes only random_seed as argument for each path
- Compute graph significantly smaller by using pure pytensor op and symoblic variables
- Added LBFGSOp to compile with pytensor.function
- Cleaned up codes using pytensor variables
@aphc14 aphc14 marked this pull request as ready for review November 11, 2024 17:52
@aphc14 aphc14 marked this pull request as draft November 11, 2024 17:53
…and .

- Corrected the dimensions in comments for matrices Q and R in the  function.
- Uumerical stability in the  calculation by changing from  to .
@@ -31,11 +31,13 @@ def fit(method, **kwargs):
arviz.InferenceData
"""
if method == "pathfinder":
# TODO: Remove this once we have a pure PyMC implementation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR will provide that, no?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the latest commit addresses this

Fixed incorrect and inconsistent posterior approximations in the Pathfinder VI
algorithm by:

1. Adding missing parentheses in the phi calculation to ensure proper order
   of operations in matrix multiplications
2. Changing the sign in mu calculation from 'x +' to 'x -' to match Stan's
   implementation (which differs from the original paper)

The resulting changes now make the posterior approximations more reliable.
Implements both sparse and dense BFGS sampling approaches for Pathfinder VI:
- Adds bfgs_sample_dense for cases where 2*maxcor >= num_params.
- Moved existing  and  computations to bfgs_sample_sparse, making the sparse use cases more explicit.

Other changes:
- Sets default maxcor=5 instead of dynamic sizing based on parameters

Dense approximations are recommended when the target distribution has higher dependencies among the parameters.
Bigger changes:
- Made pmx.fit compatible with method='pathfinder'
- Remove JAX dependency when inference_backend='pymc' to support Windows users
- Improve runtime performance by setting trust_input=True for compiled functions

Minor changes:
- Change default num_paths from 1 to 4 for stable and reliable approximations
- Change LBFGS code using dataclasses
- Update tests to handle both PyMC and BlackJAX backends
- Add LBFGSInitFailed exception for failed LBFGS initialisation
- Skip failed paths in multipath_pathfinder and track number of failures
- Handle NaN values from Cholesky decompsition in bfgs_sample
- Add checks for numericl stabilty in matrix operations

Slight performance improvements:
- Set allow_gc=False in scan ops
- Use FAST_RUN mode consistently
Major:
  - Added progress bar support.

Minor
  - Added  exception for non-finite log prob values
  - Removed .
  - Allowed maxcor argument to be None, and dynamically set based on the number of model parameters.
  - Improved logging to inform users about failed paths and lbfgs initialisation.
@aphc14 aphc14 marked this pull request as ready for review November 27, 2024 17:04
@aphc14 aphc14 marked this pull request as draft November 30, 2024 16:05
@aphc14
Copy link
Author

aphc14 commented Nov 30, 2024

Need to make an important change related to how important sampling is done. Based on some tests, for trickier posteriors, psir (Pareto smoothed importance resampling) tends to cause many large peaks. In contrast to the reference posterior (what you’d get using NUTS), it doesn’t have such peaks.

Turning off resampling, you'd get psis instead, and the final posterior better resembles NUTS, so you don't get the weird peaks behaviour. But this would differ from the original paper, which uses psir.

Since the choice of importance sampling (IS) can have a big impact on the final posterior, and there are several IS methods, I plan to use a class variable that controls how IS is done based on the user inputs. I'm thinking of making psis (and not psir) the default IS behaviour as the safest and most generally reliable option.

Shouldn't take long to fix.

…d Computational Performance

- Significantly computational efficiency by combining 3 computational graphs into 1 larger compile. Removed non-shared inputs and used  with  for significant performance gains.
- Set default importance sampling method to 'psis' for more stable posterior results, avoiding local peaks seen with 'psir'.
- Introduce concurrency options ('thread' and 'process') for multithreading and multiprocessing. Defaults to No concurrency as there haven't been any/or much reduction to the compute time.
- Adjusted default  from 8 to 4 and  from 1.0 to 2.0 and maxcor to max(3*log(N), 5). This default setting lessens computational time and and the degree by which the posterior variance is being underestimated.
@aphc14 aphc14 force-pushed the pathfinder_w_pytensor_symbolic branch 2 times, most recently from 015d9f2 to e4b8996 Compare December 7, 2024 19:47
@aphc14 aphc14 marked this pull request as ready for review December 8, 2024 11:37
- Handle different importance sampling methods for reshaping and adjusting log densities.
- Modified  to return InferenceData with chain dim of size num_paths when
@aphc14 aphc14 force-pushed the pathfinder_w_pytensor_symbolic branch from 0db1733 to 885afaa Compare December 8, 2024 14:41
@aphc14
Copy link
Author

aphc14 commented Dec 8, 2024

Results are looking pretty good now! Have done a couple of tests with a few datasets and returns a posterior similar to NUTS sometimes. Speed is comparable to ADVI, and for larger models it can be about 2x or more times faster than ADVI. I've made some changes to the code to make it more efficient and more stable.

https://gist.github.com/aphc14/019f172e44c0c767a4f48e91a045e896


value = self.fn(self.x0)
grad = self.grad_fn(self.x0)
if np.all(np.isfinite(grad)) and np.isfinite(value):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when either is not finite? Clearly the the arrays are uninitialized, so is it handled somewhere further down the line?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea it is. if neither is not finite, the computed value and grad at x0 doesn't get added to history_manager, however, lbfgs continues until it stops. at the very end of lbfgs, it'll store suitable value and grad values. if the arrays are uninitialised, then there is this check in place on line 114: elif (result.status == 2) or (history_manager.count <= 1) which lets the user know a path has failed to initialised and it'll move onto the next path.

postprocessing_backend="cpu",
inference_backend="pymc",
model=None,
importance_sampling: Literal["psis", "psir", "identity", "none"] = "psis",
Copy link
Member

@fonnesbeck fonnesbeck Dec 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The importance_sampling argument isn't passed anywhere and is only relevant to the function when it is "none". Is this intended, or is something missing?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

importance_sampling is passed into multipath_pathfinder (the return for this function is return _importance_sampling(...)) and
convert_flat_trace_to_idata. when user chooses "none", it'll impact the sample size for idata, so convert_flat_trace_to_idata checks for "none".

for "psis", "psir", "identity", they are handled in pymc-experimental/pymc_experimental/inference/pathfinder/importance_sampling.py

class LBFGSOp(Op):
__props__ = ("fn", "grad_fn", "maxcor", "maxiter", "ftol", "gtol", "maxls")

def __init__(self, fn, grad_fn, maxcor, maxiter=1000, ftol=1e-5, gtol=1e-8, maxls=1000):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add type hints throughout (and docstrings, ideally).

return idata


def alpha_recover(x, g, epsilon):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incomplete docstring and missing type hints.

@ricardoV94
Copy link
Member

Add something to the docs?

Comment on lines +40 to +41
value = self.fn(self.x0)
grad = self.grad_fn(self.x0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to consider a value_and_grad function, as it can avoid many repeated operations?

Copy link
Author

@aphc14 aphc14 Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for my understanding, does "repeated operations" refer to value being computed twice: once in fn(x0) and again as part of grad_fn(x0)? or does "repeated operations" refer to something else?

I could change scipy.optimize.minimize(fun=logp_dlogp_fn, jac=True) where logp_dlogp_fn is:

first approach

logp_dlogp_fn = model.logp_dlogp_function(
    ravel_inputs=True,
    dtype="float64",
    mode=pytensor.compile.mode.Mode(linker="cvm_nogc"),
)
logp_dlogp_fn.set_extra_values({}) # without this, i'd get an error
logp_dlogp_fn._pytensor_function.trust_input = True

I would prefer this approach since I can toggle between jacobian=True/False if I need to:

second aproach

outputs, inputs = pm.pytensorf.join_nonshared_inputs(
        model.initial_point(),
        [model.logp(jacobian=jacobian), model.dlogp(jacobian=jacobian)],
        model.value_vars,
    )

logp_dlogp_fn = compile_pymc(
    [inputs], outputs, mode=pytensor.compile.mode.Mode(linker="cvm_nogc")
)
logp_dlogp_fn.trust_input = True

how does the second approach look?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

repeated operations means operations that are shared between the logp and dlogp functions. Second approach is fine, except the hardcoded mode, you should allow the user to pass compile_kwargs where they can define a custom Mode if they need to

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh I see. In that case, I might have considered the repeated operations here: #387 (comment)

I can make the changes if repeated operations weren't considered

return phi, logQ_phi


class LogLike(Op):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this an Op? How is it constructed? Why can't you work directly with PyTensor graphs?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LogLike is initialised using logp_func which is a compiled function that we already have. logp_func cannot take in a pytensor variable since its already compiled, so it has to be a numpy array with ndim=1. I needed to vectorise logp_func so that it can take in an array with ndim=3.

I wasn't sure how to make logp_func take in the symbolic input with batched dims, phi or psi, which was why I used Op.

how would you make an already compiled function receive symbolic pytensor variables as inputs?

Copy link
Member

@ricardoV94 ricardoV94 Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use pt.vectorize or pytensor.graph.replace.vectorize_graph

Comment on lines +58 to +59
np.testing.assert_allclose(idata.posterior["mu"].mean(), 5.0, atol=1.6)
np.testing.assert_allclose(idata.posterior["tau"].mean(), 4.15, atol=1.5)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are quite big atols

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they are. I think its due to the Pathfinder algorithm. Have compared it to Stan's multipath Pathfinder (teal) and the mu appears about -2 away from the reference posterior (red) which is taken from posteriordb.
image

the image below are pymc's pathfinder and NUTS (blue) estimate. the default multipath pathfinder (orange) agrees with Stan's default multipath pathfinder (teal) above. getting closer to the reference or NUTS posterior requires a different setting for jitter and num_paths.
image

as for tau, the original reference value was 4.15. just had a look at what pymc NUTS returns, and its somewhere around 3.5, whereas estimates from pymc Pathfinder is around 3.0. I guess I can set it to np.testing.assert_allclose(idata.posterior["tau"].mean(), 3.5, atol=0.6)

Comment on lines +105 to +108
assert beta.eval().shape == (L, N, 2 * J)
assert gamma.eval().shape == (L, 2 * J, 2 * J)
assert phi.eval().shape == (L, num_samples, N)
assert logq.eval().shape == (L, num_samples)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you test something more than the shapes?

@ricardoV94 ricardoV94 added the enhancements New feature or request label Dec 9, 2024
Comment on lines +95 to +112
# setting jacobian = True, otherwise get very high values for pareto k.
outputs, inputs = pm.pytensorf.join_nonshared_inputs(
model.initial_point(),
[model.logp(jacobian=jacobian), model.dlogp(jacobian=jacobian)],
model.value_vars,
)

logp_func = compile_pymc(
[inputs], outputs[0], mode=pytensor.compile.mode.Mode(linker="cvm_nogc")
)
logp_func.trust_input = True

dlogp_func = compile_pymc(
[inputs], outputs[1], mode=pytensor.compile.mode.Mode(linker="cvm_nogc")
)
dlogp_func.trust_input = True

return logp_func, dlogp_func
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relates to #387 (comment)

would the operations be shared between logp and dlogp here considering outputs comes from both model.logp and model.dlogp? Or it wouldn't since they are separated (compile fn for logp, compile separate fn for dlogp)?

This would help me determine if I need to change the existing codes to the one below:

outputs, inputs = pm.pytensorf.join_nonshared_inputs(
        model.initial_point(),
        [model.logp(jacobian=jacobian), model.dlogp(jacobian=jacobian)],
        model.value_vars,
    )

logp_dlogp_fn = compile_pymc(
    [inputs], outputs
)
logp_dlogp_fn.trust_input = True

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you have to compile both outputs (logp and dlogp) together, then the compiled function will avoid repeated operations.

If you compile different functions it won't. Does that answer your question?

@fonnesbeck
Copy link
Member

It would be useful to have a progress bar. At the moment we only get number of parameters and maxcor, which is hard to translate into expected runtime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancements New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants