-
-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
az.compare stacking weights do not sum to one #2359
Comments
Thanks for reporting, it looks quite strange. Have you been able to reproduce it with any other models? Are there NaNs or anything of the sort in the data? Also, could you share some more info about the "grid batch job"? The 2nd one is much closer to summing to 1 and might be due to numerical stability issues (in which case a fix could be renormalizing the weights right before returning them). But the 1st case looks weirder. I am also a bit surprised to see only 2 decimals on the table, is that the exact same output you got from |
No, you are right, we rounded for the logs. I switched that off. Maybe I found part of the issue. The arviz inference data showed older versions of arviz and numpyro than what I had in both of my environments. The loo objects were created with versions 0.17.0 for arviz and 0.13,2 for numpyro. We updated all environments and reran everything with arviz 0.18.0 and numpyro 0.15.1. Since then, both environments give me the same answers. A different answer yet again, and not summing up to 1 though. (0.7183 + 0.3622 = 1.0805).
Even though these are re-runs, all columns but the weight are very very similar. (each model is big. 345,539 observations) Unfortunately, my colleague didn't save the orginal 0.17 arviz loo objects before re-running ... so I cannot replicate the old numbers anymore. |
All your estimates have As for normalization there is probably a bug in the way it is implemented right now, we'll look into it and fix it.
How did you do that? Is there an easy way for us to check inside |
Yes, that is correct
Apologies, I didn't phrase that well. What I meant was, we rounded to two decimals when printing the compare dataframe out to the log-files of our scripts. That's why you only saw two decimals in the first table. We didn't do any rounding or adjustments to the loo objects before feeding them to az.compare. Noted and agreed on the warning=True |
Dear Arviz team,
Issue
when using arviz.compare() to compute stacking weights for model averaging, we got a) weights that do not sum to one and b) got two different sets of weights in two different runs with the same data. (literally, all other numbers in the compare output are the same, except for the weights. Below are the two compare results dfs we got.
Documentation
First run (on a grid batch job):
Arviz version: 0.18.0
Pandas version: 2.0.3
Numpy version: 1.24.3
Python version: 3.11.5 (main, Sep 11 2023, 13:54:46) [GCC 11.2.0]
OS info: posix, Linux, 4.19.0-25-amd64
Second run (on Jupyter server using the same resources):
Arviz version: 0.18.0
Pandas version: 2.0.3
Numpy version: 1.24.3
Python version: 3.11.5 (main, Sep 11 2023, 13:54:46) [GCC 11.2.0]
OS info: posix, Linux, 4.19.0-25-amd64
Both runs produce the same content for all columns except the weight column, and in both cases the weights do not sum to 1. (I have checked for differences for these two run environments and could not find the source).
Code that produces these tables from a bunch of loo objects, constructed with az.loo():
The models are quite big, but if necessary, I can provide more data.
Expected behavior
Unless I'm mistaken, the weights should sum to 1.00.
The text was updated successfully, but these errors were encountered: