-
Notifications
You must be signed in to change notification settings - Fork 219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The stochastic differential equations example in the Bayesian DiffEq tutorial doesn't work. #2216
Comments
Possibly related: TuringLang/docs#420 |
The error message indicates that |
Another question I think is worth raising: is this example even worth including in the tutorial? It's a bit unclear what it's actually inferring, no? It feels like it's almost a ABC approach where we are also inferring the epsilon? (Someone called the Turing.jl docs out for this at BayesComp 2023 actually, haha) |
Btw, the posterior estimation for the ODE in this tutorial are also not robust. I tried different generative parameter values and the posterior credible intervals do not include the generative parameters. |
What do you say @devmotion @yebai ? Should we just drop it from the tutorial? |
I know some people find this useful for illustrating how to use DiffEqs in Turing -- maybe we can try to improve it? |
I'm not suggesting removing the tutorial; I'm suggesting we remove only the SDE example. Can someone explain what we're actually doing in the SDE example? We're somehow defining a likelihood based on the mean of the SDE simulations, but also treating the variance of the likelihood as a parameter to be inferred? Does this make sense to do? Seems almost like an ABC thingy but without decreasing the "epsilon" (which is the variance being inferred here) |
We don't base it on the mean of the SDE simulations, do we? I don't remember the previous iterations of the example but based on its explanation the intention was to compute the likelihood of a single SDE simulation based on all trajectories of the SDE simulations ("data"). So indeed the use of Additionally, the example mentions SGHMC might be more suitable - I wonder why we don't use it? But generally I agree: We should fix broken examples or remove them temporarily. Failing examples give a bad impression. Related: I had completely forgotten, but based on a suggestion by @mschauer I had played around with the preconditioned Crank-Nicolson algorithm in the context of SDEs a few years ago (with AdvancedMH but also with DynamicPPL/Turing): https://gist.github.com/devmotion/37d8d706938364eeb900bc3678860da6 I haven't touched it in a while, so I assume it might require a few updates by now. |
I'm currently working on using Turing to parameter fit an SDE, so this topic is particularly of interest to me. Since I already am testing out MCMC samplers with my model, maybe I can help with fixing the current example. But, I don't quite understand the goal of the example either. When I first saw the example in the documentation, I assumed the point was to simulate multiple trajectories of the stochastic differential equation, calculate the likelihood of each trajectory based on the data independently, then combine them into one likelihood (by multiplying the likelihoods). Is this the goal? Or how exactly are they using multiple trajectories to calculate the likelihood? The way it is currently using |
Oh no, you're completely right! 👍 I don't know why I had this impression, but it was clearly wrong 🤦
Lovely:)
No, this is effectively what the example is doing:)
This is probably a mistake, yeah.
That indeed seems like it would be very useful here! |
So I am working on this example and I'm still trying to figure out where in the process things go wrong. I experimented with the priors and initial parameters, but even when set to the true values, the sampler still fails, so I think the problem runs deeper. During my troubleshooting, I tried using samplers other than NUTS, and I got an error that may be insightful when I attempted to use DynamicHMC. But to be honest, I'm unfamiliar with how the internals of Turing and DynamicHMC work, so maybe someone else here can give a little more insight. Here is the code I used. Notable differences are that I generated sdedata by simulating the equations, taking the mean of the trajectories, then adding noise to the mean. Of course I also added using Turing
using DynamicHMC
using DifferentialEquations
using DifferentialEquations.EnsembleAnalysis
# Load StatsPlots for visualizations and diagnostics.
using StatsPlots
using LinearAlgebra
# Set a seed for reproducibility.
using Random
Random.seed!(14);
u0 = [1.0, 1.0]
tspan = (0.0, 10.0)
function multiplicative_noise!(du, u, p, t)
x, y = u
du[1] = p[5] * x
du[2] = p[6] * y
end
p = [1.5, 1.0, 3.0, 1.0, 0.1, 0.1]
function lotka_volterra!(du, u, p, t)
x, y = u
α, β, γ, δ = p
du[1] = dx = α * x - β * x * y
du[2] = dy = δ * x * y - γ * y
end
prob_sde = SDEProblem(lotka_volterra!, multiplicative_noise!, u0, tspan, p)
ensembleprob = EnsembleProblem(prob_sde)
data = solve(ensembleprob, SOSRI(); saveat=0.1, trajectories=1000)
plot(EnsembleSummary(data))
# Take the mean of the 1000 simulations and add noise to the mean to simulate noisy observations
sdedata = reduce(hcat, timeseries_steps_mean(data).u) + 0.8 * randn(size(reduce(hcat, timeseries_steps_mean(data).u)))
# Plot simulation and noisy observations.
scatter!(data[1].t, sdedata'; color=[1 2], label="")
@model function fitlv_sde(data, prob)
# Prior distributions.
σ ~ InverseGamma(2, 3)
α ~ truncated(Normal(1.3, 0.5); lower=0.5, upper=2.5)
β ~ truncated(Normal(1.2, 0.25); lower=0.5, upper=2)
γ ~ truncated(Normal(3.2, 0.25); lower=2.2, upper=4)
δ ~ truncated(Normal(1.2, 0.25); lower=0.5, upper=2)
ϕ1 ~ truncated(Normal(0.12, 0.3); lower=0.05, upper=0.25)
ϕ2 ~ truncated(Normal(0.12, 0.3); lower=0.05, upper=0.25)
# Simulate stochastic Lotka-Volterra model.
p = [α, β, γ, δ, ϕ1, ϕ2]
predicted = solve(prob, SOSRI(); p=p, saveat=0.1)
# Early exit if simulation could not be computed successfully.
if predicted.retcode !== :Success
Turing.@addlogprob! -Inf
return nothing
end
# Observations.
for i in 1:length(predicted)
data[:, i] ~ MvNormal(predicted[i], σ^2 * I)
# end
end
return nothing
end;
model_sde = fitlv_sde(sdedata, prob_sde)
dynNUTS = externalsampler(DynamicHMC.NUTS())
chain_sde = sample(
model_sde,
dynNUTS,
5000;
initial_params=[1.5, 1.3, 1.2, 2.7, 1.2, 0.12, 0.1],
progress=true,
)
plot(chain_sde) The error that resulted is as follows: ┌ Info: failed to find initial stepsize
│ q =
│ 7-element Vector{Float64}:
│ 0.4054651081081644
│ ⋮
│ p =
│ 7-element Vector{Float64}:
│ 0.14413067672624455
│ ⋮
└ κ = Gaussian kinetic energy (Diagonal), √diag(M⁻¹): Fill(1.0, 7)
ERROR: DynamicHMC error: DynamicHMC.DynamicHMCError("Starting point has non-finite density.", (hamiltonian_logdensity = -Inf, logdensity = -Inf, position = [0.4054651081081644, -0.4054651081081643, -0.13353139262452288, -0.9555114450274363, -0.13353139262452288, -0.6190392084062238, -1.0986122886681098]))
hamiltonian_logdensity = -Inf
logdensity = -Inf
position = [0.4054651081081644, -0.4054651081081643, -0.13353139262452288, -0.9555114450274363, -0.13353139262452288, -0.6190392084062238, -1.0986122886681098]
Stacktrace:
[1] _error(message::String; kwargs::@Kwargs{hamiltonian_logdensity::Float64, logdensity::Float64, position::Vector{…}})
@ DynamicHMC ~/.julia/packages/DynamicHMC/gSe0t/src/utilities.jl:27
[2] local_log_acceptance_ratio
@ ~/.julia/packages/DynamicHMC/gSe0t/src/stepsize.jl:77 [inlined]
[3] warmup(sampling_logdensity::DynamicHMC.SamplingLogDensity{…}, stepsize_search::InitialStepsizeSearch, warmup_state::DynamicHMC.WarmupState{…})
@ DynamicHMC ~/.julia/packages/DynamicHMC/gSe0t/src/mcmc.jl:137
[4] #30
@ ~/.julia/packages/DynamicHMC/gSe0t/src/mcmc.jl:430 [inlined]
[5] BottomRF
@ ./reduce.jl:86 [inlined]
[6] afoldl(::Base.BottomRF{…}, ::Tuple{…}, ::InitialStepsizeSearch, ::TuningNUTS{…}, ::TuningNUTS{…}, ::TuningNUTS{…}, ::TuningNUTS{…}, ::TuningNUTS{…}, ::TuningNUTS{…}, ::TuningNUTS{…})
@ Base ./operators.jl:544
[7] _foldl_impl
@ ./reduce.jl:68 [inlined]
[8] foldl_impl
@ ./reduce.jl:48 [inlined]
[9] mapfoldl_impl
@ ./reduce.jl:44 [inlined]
[10] mapfoldl
@ ./reduce.jl:175 [inlined]
[11] foldl
@ ./reduce.jl:198 [inlined]
[12] _warmup
@ ~/.julia/packages/DynamicHMC/gSe0t/src/mcmc.jl:428 [inlined]
[13] #mcmc_keep_warmup#32
@ ~/.julia/packages/DynamicHMC/gSe0t/src/mcmc.jl:505 [inlined]
[14] initialstep(rng::TaskLocalRNG, model::DynamicPPL.Model{…}, spl::DynamicPPL.Sampler{…}, vi::DynamicPPL.TypedVarInfo{…}; kwargs::@Kwargs{…})
@ TuringDynamicHMCExt ~/.julia/packages/Turing/cH3wV/ext/TuringDynamicHMCExt.jl:81
[15] step(rng::TaskLocalRNG, model::DynamicPPL.Model{…}, spl::DynamicPPL.Sampler{…}; initial_params::Vector{…}, kwargs::@Kwargs{})
@ DynamicPPL ~/.julia/packages/DynamicPPL/E4kDs/src/sampler.jl:116
[16] step
@ ~/.julia/packages/DynamicPPL/E4kDs/src/sampler.jl:99 [inlined]
[17] macro expansion
@ ~/.julia/packages/AbstractMCMC/YrmkI/src/sample.jl:130 [inlined]
[18] macro expansion
@ ~/.julia/packages/ProgressLogging/6KXlp/src/ProgressLogging.jl:328 [inlined]
[19] macro expansion
@ ~/.julia/packages/AbstractMCMC/YrmkI/src/logging.jl:9 [inlined]
[20] mcmcsample(rng::TaskLocalRNG, model::DynamicPPL.Model{…}, sampler::DynamicPPL.Sampler{…}, N::Int64; progress::Bool, progressname::String, callback::Nothing, discard_initial::Int64, thinning::Int64, chain_type::Type, initial_state::Nothing, kwargs::@Kwargs{…})
@ AbstractMCMC ~/.julia/packages/AbstractMCMC/YrmkI/src/sample.jl:120
[21] sample(rng::TaskLocalRNG, model::DynamicPPL.Model{…}, sampler::DynamicPPL.Sampler{…}, N::Int64; chain_type::Type, resume_from::Nothing, initial_state::Nothing, kwargs::@Kwargs{…})
@ DynamicPPL ~/.julia/packages/DynamicPPL/E4kDs/src/sampler.jl:93
[22] sample
@ ~/.julia/packages/DynamicPPL/E4kDs/src/sampler.jl:83 [inlined]
[23] #sample#4
@ ~/.julia/packages/Turing/cH3wV/src/mcmc/Inference.jl:263 [inlined]
[24] sample
@ ~/.julia/packages/Turing/cH3wV/src/mcmc/Inference.jl:256 [inlined]
[25] #sample#3
@ ~/.julia/packages/Turing/cH3wV/src/mcmc/Inference.jl:253 [inlined]
[26] top-level scope
@ ~/workspace/BayenesianEstimationofSDE-Lotka-Volterra.jl:109
Some type information was truncated. Use `show(err)` to see complete types. Somehow the sampler is choosing negative values for the starting point, which of course causes any density calculations to be non-finite, so choosing an initial step value is impossible. I suspect that this is what is going on with the default NUTS sampler. When researching this error, I came across this issue in DiffEqBayes, which may be of relevance. Any insight as to why this might be happening? |
Okay, so update: I noticed that the simulations from the sampled parameters in the model were not being ran as an ensemble, but rather a single event. Since the intention is to use all the trajectories together to compute the likelihood, I computed 1000 trajectories then calculated the likelihood from them in a for-loop. Somehow, doing this change fixed the initial step size issue. I cannot figure out why this would be the case, since each observation in the loop should have the same types as the original code-- one would expect that running an ensemble with one trajectory would thus have the same output as the original code I ran. Here is the updated code I ran: using Turing
using DifferentialEquations
using DifferentialEquations.EnsembleAnalysis
# Load StatsPlots for visualizations and diagnostics.
using StatsPlots
using LinearAlgebra
# Set a seed for reproducibility.
using Random
Random.seed!(14);
u0 = [1.0, 1.0]
tspan = (0.0, 10.0)
function multiplicative_noise!(du, u, p, t)
x, y = u
du[1] = p[5] * x
return du[2] = p[6] * y
end
p = [1.5, 1.0, 3.0, 1.0, 0.1, 0.1]
function lotka_volterra!(du, u, p, t)
x, y = u
α, β, γ, δ = p
du[1] = dx = α * x - β * x * y
return du[2] = dy = δ * x * y - γ * y
end
prob_sde = SDEProblem(lotka_volterra!, multiplicative_noise!, u0, tspan, p)
ensembleprob = EnsembleProblem(prob_sde)
data = solve(ensembleprob, SOSRI(); saveat=0.1, trajectories=1000)
plot(EnsembleSummary(data))
# Take the mean of the 1000 simulations and add noise to the mean to simulate noisy observations
sdedata = reduce(hcat, timeseries_steps_mean(data).u) + 0.8 * randn(size(reduce(hcat, timeseries_steps_mean(data).u)))
# Plot simulation and noisy observations.
scatter!(data[1].t, sdedata'; color=[1 2], label="")
@model function fitlv_sde(data, prob)
# Prior distributions.
σ ~ InverseGamma(2, 3)
α ~ truncated(Normal(1.3, 0.5); lower=0.5, upper=2.5)
β ~ truncated(Normal(1.2, 0.25); lower=0.5, upper=2)
γ ~ truncated(Normal(3.2, 0.25); lower=2.2, upper=4)
δ ~ truncated(Normal(1.2, 0.25); lower=0.5, upper=2)
ϕ1 ~ truncated(Normal(0.12, 0.3); lower=0.05, upper=0.25)
ϕ2 ~ truncated(Normal(0.12, 0.3); lower=0.05, upper=0.25)
# Simulate stochastic Lotka-Volterra model.
p = [α, β, γ, δ, ϕ1, ϕ2]
remake(prob, p = p)
ensembleprob = EnsembleProblem(prob)
predicted = solve(ensembleprob, SOSRI(); p=p, saveat=0.1, trajectories=1000)
## I think this is outdated; in my code I've had to do Symbol(predicted.retcode) to get the same output
## I also don't know the best way to handle this with an ensemble simulation; should all the simulations be ignored if one did not compute successfully?
# Early exit if simulation could not be computed successfully.
# if predicted.retcode !== :Success
# Turing.@addlogprob! -Inf
# return nothing
# end
# Observations.
for j in 1:length(predicted)
for i in 1:length(predicted[j])
data[:, i] ~ MvNormal(predicted[j][i], σ^2 * I)
end
end
return nothing
end;
model_sde = fitlv_sde(sdedata, prob_sde)
chain_sde = sample(
model_sde,
NUTS(0.25),
5000;
initial_params=[1.5, 1.3, 1.2, 2.7, 1.2, 0.12, 0.1],
progress=false,
)
plot(chain_sde) I don't have an output to show for it because the sampling takes a long time on my machine (it's been a few hours and it's still only at 8%), but it did give this little tidbit which suggests that the problem was solved. ┌ Info: Found initial step size
└ ϵ = 0.0015625 EDIT: Ah! Having reviewed my last two comments, I think I have figured it out. I made a change in the code that I thought was insignificant, but I think was actually quite important. In the original example, there is a line to exit early if the simulation does not succeed: # Early exit if simulation could not be computed successfully.
if predicted.retcode !== :Success
Turing.@addlogprob! -Inf
return nothing
end However, the original way that return codes are handled by SciMLBase has been deprecated. Now the return codes are of type Enum(Int), so # Early exit if simulation could not be computed successfully.
if !SciMLBase.successful_retcode(predicted)
Turing.@addlogprob! -Inf
return nothing
end By the way, how should this be handled with the ensemble simulation? In my updated version, I computed 1000 trajectories rather than computing only one (as is done in the original example). We can iterate over the 1000 trajectories and check for failed return codes, but if even just one out of a thousand fails, the entire set of simulations is thrown out. Is this a result we want? |
Lovely stuff @gjgress !
Thank you for discovering this 🙏 Regarding many vs. a single trajectory, I believe we have the following scenarios: Single trajectoryThis is using a random likelihood, which is going to be invalid for something like NUTS if you don't also infer the noise used in the SDE solve (as @devmotion is doing in his PCN example). Hence, the example in the tutorial is incorrect. Ofc we can also view it as a one-sample approx to the multiple trajectories approach, but this will have too high of a variance to be a valid thing to use in NUTS (we're entering the territory of psuedo-marginal samplers). Multiple trajectoriesHere there are two things we can do: a) use the expectation of the It seems to me that what we want here is the latter case, i.e. Technically, this is actually So all in all, yes, I think you're correct that we want to use multiple trajectorie and compute the likelihood for each of the realizations, i.e. your last example code:) |
Ideally we should reject all the trajecotries if only one of them fails to solve (since this means the particular realization of the parameters resulted in "invalid" trajectories). It might also be crucial to the adaptation of the parameters in something like NUTS. Note you might also be able to speed this computation up signfiicantly by using threads (if you'r e not already). |
@torfjelde Thanks for the insight on the statistical side of things! Glad to hear that my approach works best here after all. With all this said, is there anything in my last example that ought to be changed for the documentation (aside from the comments I put here and there)? If it is fine, I can submit a pull request for the documentation. I'll also regenerate the plots that were used in the example.
By the way, are there any pseudo-marginal samplers available in Turing or related libraries?
This was my conclusion as well, but I don't have much experience with this type of problem, so I wanted to be sure 👍
I already am for my project, but I wanted to reproduce the code exactly as written when I was testing this example. I may use threads when generating the plots for the documentation. |
The SDE tutorial has been extracted into a new tutorial and converted to a Quarto notebook. Feel free to push changes to the PR TuringLang/docs#446 |
I found the 'entire trajectory simulation from prior'-based inference used in the tutorial's SDE example quite unusual from a statistical signal processing perspective. Generally, the variance of the SDE's prior can become unboundedly large at large time steps (unless the process is stationary). This means that simulating the entire trajectory can result in inefficient samples that are very far from the observations at large time steps. To address this issue, we usually use sequential Monte Carlo methods when the parameter is fixed. This will simulate/infer p(trajectories|data, parameter) incrementally, point by point or piece by piece, and produce an (unnormalised) likelihood approximation that can be used for correcting the parameter. Generally, an SDE defines a state-space model prior for time series, making the parameter estimation of an SDE a static parameter estimation problem in state-space models. For a review, see "On Particle Methods for Parameter Estimation in State-Space Models" This is also precisely the problem that PMCMC was designed to address. A canonical PMCMC should produce reliable results for this task, albeit perhaps inefficiently. |
This section is broken (again?) https://turinglang.org/v0.31/tutorials/10-bayesian-differential-equations/#inference-of-a-stochastic-differential-equation
The plot in the tutorials shows no sampling (the chains are just flat lines).
Running the section gives this warning (over and over):
The summary is just the initial parameters:
The text was updated successfully, but these errors were encountered: