Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check self-consistency of base measure sampler and logdensity #13

Open
sethaxen opened this issue Aug 22, 2021 · 6 comments
Open

Check self-consistency of base measure sampler and logdensity #13

sethaxen opened this issue Aug 22, 2021 · 6 comments

Comments

@sethaxen
Copy link
Member

For any measure μ that has the following properties:

  1. the base measure ν is a probability measure
  2. we can draw samples from ν
  3. we can compute the log density of μ,

we can use importance sampling to check the self-consistency of the log density and sampling.

To do so, we draw N times from ν and then average the density of μ wrt ν over the sample. For a correctly normalized distribution, the result should be close to 1. Failure indicates that the either the sampler from the ν is incorrectly implemented, or the logdensity of μ is.

We can do this with a function like the following, which demonstrates that our logdensity implementation for VonMisesFisher{Stiefel} may be incorrectly normalized:

using StatsFuns, Random

function check_logdensity_normalization(rng, μ, n)
    ν = basemeasure(μ)
    lognorm = mapreduce(_ -> logdensity(μ, rand(rng, ν)), logaddexp, 1:n) - log(n)
end
function check_logdensity_normalization(μ, n)
    return check_logdensity_normalization(Random.default_rng(), μ, n)
end

julia> c = [10, 0, 0];

julia> d = VonMisesFisher(Sphere(2); c = c);

julia> d2 = VonMisesFisher(Stiefel(3, 1); F = reshape(c, 3, 1));

julia> check_logdensity_normalization(MersenneTwister(20), d, 10000000)  # looks fine
0.0007839621430072441

julia> check_logdensity_normalization(MersenneTwister(30), d2, 10000000)  # oh no
-17.995624071284112
@sethaxen
Copy link
Member Author

@cscherrer it might be a good idea to include something like this in the test function for measures in MeasuresBase.

@sethaxen
Copy link
Member Author

The logic of this is based on

1 = ∫ dμ(x) = ∫ dμ(x)/dν(x) dν(x) = ∫ π(x) dν(x) ≈ n⁻¹ ∑ π(xᵢ), where xᵢ ~ ν.

Using similar logic, we have

1 = ∫ dν(x) = ∫ dν(x)/dμ(x) dμ(x) = ∫ π(x)⁻¹ dμ(x) ≈ n⁻¹ ∑ π(x)⁻¹, where xᵢ ~ μ.
This is a little more delicate because this is effectively estimating the total mass of ν, and if μ is much more concentrated than ν, then the estimate is very poor. Still, this latter way could be a good way to check self-consistency of both the logdensity and the sampler of μ when μ is relatively diffuse. An alternative is to designate a subset of the support of x over which the total mass will be estimated, ideally choosing a region whose volume is known or which can be estimated by importance sampling from the base measure and which is well covered by μ. Then use draws from μ and its log density to estimate the total mass and compare.

@cscherrer
Copy link

Great idea! (2) will miss in most cases, since you can't sample from Lebesgue measure. But we could maybe sample from a Cauchy and use log-density with respect to that.

One question I'm not sure of is reliability. The PSIS paper has some discussion about this, but I'll have to have another look to recall details.

@cscherrer
Copy link

For easy reference later, from the PSIS paper:

The textbook examples of poorly performing importance samplers occur in low dimensions when the proposal distribution has lighter tails than the target, but it would be a mistake to assume that heavy tailed proposals will stabilize importance samplers. This intuition is particularly misplaced in high dimensions, where importance sampling can fail even when the ratios have finite variance. MacKay (2003, Section 29.2) provides an example of what goes wrong in high dimensions: essentially, the ratios rs will vary by several orders of magnitude and the estimator (1) will be dominated by a few draws. Hence, even if the approximating distribution is chosen so that the importance ratios are bounded and thus (1) has finite variance, the bound can be so high and the variance so large that the behaviour of the self-normalized importance sampling estimator is practically indistinguishable from an estimator with infinite variance. This suggests that if we want an importance sampling method that works in high dimensions, we need to move beyond being satisfied with estimates that have finite variance and towards methods with built-in error assessment.

sethaxen added a commit that referenced this issue Aug 22, 2021
sethaxen added a commit that referenced this issue Aug 22, 2021
@sethaxen
Copy link
Member Author

(2) will miss in most cases, since you can't sample from Lebesgue measure. But we could maybe sample from a Cauchy and use log-density with respect to that.

Yeah, (1) has the same problem. But if we have an automated way to compute density wrt a different base measure (this is what 3-arg logdensity is for, right? Does this currently work?), then one could provide any measure whose support is a superset of that of the measure, which would default to the base measure.

One question I'm not sure of is reliability. The PSIS paper has some discussion about this, but I'll have to have another look to recall details.

Good point. If necessary, we could use PSIS to help with this, and IIRC (and as the quote indicates), this would also give us a way to diagnose when the test failed because importance sampling just didn't work. But the first think we could try is the simple truncation correction they give in eq (3).

This may not be very useful for MeasureTheory.jl itself, since it currently uses Distributions for most samplers, but I'm at least going to add a test function for this to this repo, since it successfully diagnosed problems with almost all of my logdensity implementations.

@cscherrer
Copy link

(this is what 3-arg logdensity is for, right? Does this currently work?)

It needs more work, I wouldn't trust it yet.

Good point. If necessary, we could use PSIS to help with this, and IIRC (and as the quote indicates), this would also give us a way to diagnose when the test failed because importance sampling just didn't work. But the first think we could try is the simple truncation correction they give in eq (3).

👍

This may not be very useful for MeasureTheory.jl itself, since it currently uses Distributions for most samplers, but I'm at least going to add a test function for this to this repo, since it successfully diagnosed problems with almost all of my logdensity implementations.

Great to hear! It's nice that it's relatively cheap to try, and if you hit instability it will be pretty obvious - tests will fail occasionally, which should be easy to adjust for if/when it comes up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants