SNPE_C: Lower acceptance rate when using large number of simulations #1416

malhotrasagar15 · 2025-03-06T01:37:58Z

malhotrasagar15
Mar 6, 2025

Hi,

Just wanted to ask if it’s expected to have a lower acceptance rate when training on a larger simulation dataset (23M). I found that there seem to be leakage problems when sampling the posterior for some observed data points, which wasn’t the case when I used a posterior trained on a smaller simulation dataset (2M). I am working with a simulator with 6 model parameters and 9 observables.

Thanks!
Sagar

Answered by michaeldeistler

Mar 6, 2025

Hi there!

I think this behavior is possible. I expect that the observations for which the acceptance rate is low are misspecified, i.e. they systematically differ from the simulated training dataset. If this is the case, then the posterior is ill-defined, and no amount of training data will fix this issue.

A simple fix is to add noise to the simulated training dataset (to make it cover a broader range of observations). There are also a range of more advanced methods, see e.g. here, but these are not implemented in the sbi toolbox.

Hope this helps!
Michael

View full answer

michaeldeistler · 2025-03-06T07:20:37Z

michaeldeistler
Mar 6, 2025
Maintainer

Hi there!

I think this behavior is possible. I expect that the observations for which the acceptance rate is low are misspecified, i.e. they systematically differ from the simulated training dataset. If this is the case, then the posterior is ill-defined, and no amount of training data will fix this issue.

A simple fix is to add noise to the simulated training dataset (to make it cover a broader range of observations). There are also a range of more advanced methods, see e.g. here, but these are not implemented in the sbi toolbox.

Hope this helps!
Michael

2 replies

malhotrasagar15 Mar 6, 2025
Author

Hi Michael,

Thanks for getting back! Actually, I do add some noise to the simulated data so that they better represent the real data. My main concern is that this behaviour is observed for more number of observations as the training dataset is increased. However, I am able to safely sample another posterior (trained on 2M simulations) given the same observations. This further improved when I used only 20k simulations. Is it possible that increasing the training dataset leads to a "better learning" of these misspecified observations?

Thanks,
Sagar

michaeldeistler Mar 6, 2025
Maintainer

Yes, it's possible because the network can learn to better discriminate misspecified observations from non-misspecified observations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SNPE_C: Lower acceptance rate when using large number of simulations #1416

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

SNPE_C: Lower acceptance rate when using large number of simulations #1416

malhotrasagar15 Mar 6, 2025

Replies: 1 comment · 2 replies

michaeldeistler Mar 6, 2025 Maintainer

malhotrasagar15 Mar 6, 2025 Author

michaeldeistler Mar 6, 2025 Maintainer

malhotrasagar15
Mar 6, 2025

Replies: 1 comment 2 replies

michaeldeistler
Mar 6, 2025
Maintainer

malhotrasagar15 Mar 6, 2025
Author

michaeldeistler Mar 6, 2025
Maintainer