Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add loo_expectation #2301

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Conversation

aadya940
Copy link
Contributor

@aadya940 aadya940 commented Dec 22, 2023

Related to #2059


📚 Documentation preview 📚: https://arviz--2301.org.readthedocs.build/en/2301/

@aadya940
Copy link
Contributor Author

aadya940 commented Jan 7, 2024

Hello! Can anyone please review this PR and let me know If I'm missing out something?

@sethaxen sethaxen self-requested a review February 14, 2024 13:59
@OriolAbril OriolAbril self-requested a review February 21, 2024 22:07
Copy link
Member

@OriolAbril OriolAbril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really sorry about not reviewing until now. Let me know if still interested, if not I will finish this PR so when merged you get credit for the contribution.

cc @sethaxen I would appreciate an assist on making sure I am not messing up input/output shapes particularly

Also, api-wise, it looks like the most common scenario is using posterior predictive samples as values. Given the first input is InferenceData, what would you think about allowing values=None and using the posterior preditive when that happens?

Comment on lines +877 to +878
values: ndarray
A vector of quantities to compute expectations for.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add information about the shape of values here. My understanding is that is should be an array/dataarray with the same shape as the pointwise log likelihood (e.g. chain, draw, obs_id) which in general won't work here. I think there should be an extra check for when the input is a dataarray so that chain, draw dimensions get stacked into __sample__ one, otherwise something like:

loo_expectation(data, data.posterior_predictive.y)

would not work as is.

Comment on lines +879 to +881
pointwise: bool, optional
If True the pointwise predictive accuracy will be returned. Defaults to
``stats.ic_pointwise`` rcParam.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be removed, it is not used anywhere

Comment on lines +885 to +886
**kwargs:
Additional keyword arguments to pass to the `psislw` function.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

psislw only takes two arguments, which are already provided explicitly, so passing any kwargs here would end up as a keyword not recognized error when calling psislw.

Comment on lines +889 to +890
expectation: float
The computed expectation of `values` across LOO posteriors.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also have an indication of the expected output shape. From the examples in https://mc-stan.org/loo/reference/E_loo.html it looks like it should have the shape of pointwise log likelihood values minus __sample__ dimension.

log_weights, _ = psislw(-log_likelihood, reff=reff, **kwargs)

# Numerically stable Weighted sum
# Do computations in the log-space for numerical stability
Copy link
Member

@OriolAbril OriolAbril Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right before that I would add a check for DataArrays (preferred input type) to see if they have chain and draw dimensions and if so stack them. Then sum only along the __sample__ dimension. also, as only stack, pointwise functions and sum are used, I think a Dataset should also be a valid input which would allow idata.posterior_predictive as default input for values

# Numerically stable Weighted sum
# Do computations in the log-space for numerical stability
w_exp = log_weights + np.log(np.abs(values))
_expectation = (np.sign(values) * np.exp(w_exp)).sum()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_expectation = (np.sign(values) * np.exp(w_exp)).sum()
expectation = (np.sign(values) * np.exp(w_exp)).sum(dim="__sample__")

The variable is only defined within the scope of the function, no need to add any underscore to the name.

Comment on lines +544 to +546
log_likelihood = get_log_likelihood(centered_eight)
log_likelihood = log_likelihood.stack(__sample__=("chain", "draw"))
values = np.arange(1, log_likelihood.shape[-1] + 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think values should the the log likelihood directly, no extra processing or using its shape to create different objects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants