do operator / conditioning #5280

drbenvincent · 2021-12-22T10:50:39Z

drbenvincent
Dec 22, 2021
Collaborator

Desired functionality

It would be pretty cool if you could:

Define a joint distribution P(params, data) model once
Sample an arbitrary conditional distribution to generate data, eg. P(data|params)
Conduct inference based on that generated data, P(params|data)

Below I have summarised some approaches and relevant information. This is doable now, but what would be nice is if this were to become more of an in-built language feature with a smoother and simpler API. Thoughts and ideas very welcome.

An existing solution

Chatting with @ricardoV94 and @lucianopaz, we came up with the follow code that works (in v4) with a simple example:

import pymc as pm
import numpy as np
import arviz as az

# 1. Define P(μ,σ, x)
def model_factory(x=None):
    with pm.Model() as model:
        mu = pm.Normal("mu", 0, 1)
        sigma = pm.HalfNormal("sigma", 1)
        pm.Normal("x", mu, sigma, observed=x)
    return model

# 2. Generate N samples from P(x|μ=0, σ=0.1)
N = 100
with model_factory(x=np.empty(N)):
    obs = az.from_dict({"mu": 0, "sigma": 0.1})
    x = pm.sample_posterior_predictive(obs)

# 3. infer P(μ,σ|x)
with model_factory(x=x.posterior_predictive.x.data):
     trace = pm.sample()
        
az.summary(trace)

This is already pretty neat. You can relatively concisely define a joint distribution (or rather a function which returns a model) in step 1, condition upon data in step 2, then in step 3 run inference, conditioning on the data generated in step 2.

While the use of the model factory is pretty simple, I think it would not necessarily be simple to a newcomer to PyMC. So while it is pretty concise (and neat!) I don't think it represents the most optimal API.

An alternative approach

@ricardoV94 also suggested this approach. I favour this one a little less as it requires you to pre-commit to N at the time of defining the joint distribution.

import pymc as pm
import aesara
import numpy as np
import arviz as az

N = 10

# 1. define P(μ,σ, x) 
with pm.Model() as model:
    mu = pm.Normal("mu", 0, 1)
    sigma = pm.HalfNormal("sigma", 1)
    x_data = pm.Data('x_data', np.full(N, np.nan))
    x = pm.Normal("obs", mu, sigma, observed=x_data)

# 2. generate N samples from P(x|mu=1,sigma=0.1)
_x = x.eval({mu: 1, sigma: 0.1})

# 3. inference P(mu,sigma|x)
with model:
    pm.set_data({'x_data': _x})
    trace = pm.sample()

Alternative approaches for step 2

@ricardoV94 also suggested aesara.function([mu, sigma], x) which can be used for step 1 and 2

# 1. define P(mu, sigma, x)
with pm.Model() as m:
    mu = pm.Normal("mu", 0, 1)
    sigma = pm.HalfNormal("sigma", 1)
    x = pm.Normal("obs", mu, sigma)

# 2. Generate N samples from P(x|mu=0, sigma=0.1) 
N = 10
cond_x = aesara.function([mu, sigma], x)
_x = np.array([cond_x(0, 0.1) for _ in range(N)])

do operator?

@ericmjl raised the point that step 2, where we condition on data, is basically the do operator from Pearl/causal inference. So the API for step 2 could be called conditioning, or something like do to make the link with causal inference stronger. I think this would be pretty cool!

twiecki · 2021-12-22T12:09:34Z

twiecki
Dec 22, 2021
Maintainer

I like it and we can definitely weigh the pros and cons of the different approaches. I don't think we should restrict ourselves to what is possible right now but also think about what we would like the ideal API to look like.

For my money, I do like the last code snippet where we don't even set an observed but only specify the joint. Perhaps we could extend this use-case. Here's some pseudo-code ideas:

# 1. define joint P(mu, sigma, x)
with pm.Model() as m:
    mu = pm.Normal("mu", 0, 1)
    sigma = pm.HalfNormal("sigma", 1)
    x = pm.Normal("obs", mu, sigma)

# 2. Generate N samples from P(x|mu=0, sigma=0.1)
with m:
    data = pm.sample_cond(var=x, mu=0, sigma=.1, samples=100)
    # alternative 2: sample all RVs not specified - infer x
    data = pm.sample_cond(mu=0, sigma=.1, samples=100)
    # alternative 3
    data = pm.sample_prior_predictive(mu=0, sigma=.1, samples=100)

# Sample from posterior
with m:
    # idea 1
    idata = pm.sample(observed={x: data})
    # idea 2
    m.set_observed(x=data) # or x.set_observed(data)
    idata = pm.sample()
    # idea 3
    idata = pm.sample_cond(x=data, samples=1000) # do posterior inference, do we still need pm.sample()? related to alternative 2 above. In this case, might want to just add the conditioning to pm.sample() directly.

9 replies

ericmjl Dec 31, 2021
Maintainer

I would vote this order: #1, #3, #2.

lucianopaz Jan 12, 2022
Maintainer

I really like set_observed. The idea of having a sample_cond is also great, because it would be able to run HMC on models with potentials to be able to generate samples from conditional distributions like the posterior predictive or whatever. I'm not so excited about the observed kwarg to pm.sample, but that seems like an easy to use API.

ricardoV94 Jan 12, 2022
Maintainer

Problem with observed after variable definition is that you can't (easily?) change the dimensionality of a Aesara variable after it's created. Users would need to write stuff like data = pm.Normal('data', 0, 1, size=(2, 2)) (i.e., data.broadcastable==[False, False]) if they wanted to then observe a matrix of data. We use the observed data do automatically resize the variable during it's creation so that users don't need to worry about it.

twiecki Jan 12, 2022
Maintainer

Or we could just fold this into the pm.set_data() API. You can set values on Data containers, or RVs.

ricardoV94 Jan 12, 2022
Maintainer

You still can't change the dimensionality

ericmjl · 2021-12-30T02:36:19Z

ericmjl
Dec 30, 2021
Maintainer

Wanted to contribute to the discussion a notebook on counterfactuals. I think I have it done right, though if others are better-versed in the topic than I am, I'd be open to correction.

https://github.com/ericmjl/causality/blob/master/docs/07-do-operator.ipynb

6 replies

drbenvincent Dec 30, 2021
Collaborator Author

From your post, I find "observational" counterfactuals and "structural" counterfactuals might be clearer terms. But maybe personalised vs systems taps in to an existing terminology from causal inference land?

drbenvincent Dec 30, 2021
Collaborator Author

PS. That link is broken

ericmjl Dec 30, 2021
Maintainer

Link updated, thanks for the catch, @drbenvincent!

ericmjl Dec 30, 2021
Maintainer

But maybe personalised vs systems taps in to an existing terminology from causal inference land?

I think I just made up those terms 😹.

twiecki Dec 30, 2021
Maintainer

what about local vs global?

twiecki · 2022-07-18T17:03:36Z

twiecki
Jul 18, 2022
Maintainer

@ericmjl had another really neat idea:

counterfactual_model = model.do(x=3)
with counterfactual_model:
    pm.sample()

Could also do stochastic do operators:

counterfactual_model = model.do(x=pm.Normal("x", mu=3))

I think what this is really doing though is just replacing RVs, so maybe the more apt name would be:
counterfactual_model = model.replace_RVs(x=3)

2 replies

ferrine Jul 22, 2022
Maintainer

We do not need names for stochastic do, simply

counterfactual_model = model.do(x=pm.Normal.dist(mu=3))

ferrine Jul 22, 2022
Maintainer

there could be an arbitrary expression

ferrine · 2022-07-22T08:20:30Z

ferrine
Jul 22, 2022
Maintainer

I like pm.sample_cond most, but we can highlight the do operator with pm.sample_do. It will allow all the magic happen in one place without state sharing as with model.do

0 replies

drbenvincent · 2022-07-23T11:16:46Z

drbenvincent
Jul 23, 2022
Collaborator Author

Could also do stochastic do operators:
counterfactual_model = model.do(x=pm.Normal("x", mu=3))

So this is either a genius idea or it isn't...

We can replace a node with a constant, or a stochastic, and cut the incoming edges. But does it also make sense to be able to do more advanced graph surgery? We should be able to surgically attach an entire model into another model, right?

What I'm thinking is kind of inspired by human cognition. Let's say we go to school and we learn about the causal structure of Interesting Thing A. The next day we go to school and learn about the causal structure of Interesting Thing B. Then if it is pointed out to us that Interesting Thing A and B are related (e.g. they share a node) then you can combine your understanding into a larger causal graph.

This could feasibly be a big deal. Let's say we were interested not just in parameter estimation of a proposed model, but in causal discovery, then the ability to focus in on sub-problems allows the problem to become much more tractable.

Kind of speculating here, but can't we think of this kind of thing happening when we revise our beliefs about the causal structure of the world?

Anyway... while we are in brainstorming mode, I'd just like to put this proposal forward that we could replace a node (with the do operator) with an entire model and get back a new, combined model.

0 replies

ferrine · 2022-07-23T12:01:06Z

ferrine
Jul 23, 2022
Maintainer

Replacing a node with a model. Isn't it similar to a stochastic node? We do no inference there and only query the model about our hypothesis. If you had a model you could pass the random variable from its idata posterior. Is it what you mean? The result would combine the estimated model 1 with estimated model 2 into counterfactual trace with predictions, yet another idata

0 replies

cluhmann · 2022-07-23T14:50:43Z

cluhmann
Jul 23, 2022
Maintainer

Here are a few citations that discuss weak/unreliable/uncertain interventions. These include scenarios in which the causal intervention is successful only to some extent (e.g., a doctor recommending a lifestyle change to a patient) and scenarios in which the intervention impacts the observed variables in unknown ways (e.g., a new drug that may impact one or many different genes). In all cases, the causal interventions can modeled by augmenting the causal graph (e.g., an "intervention" node added to the graph) in ways that are more elaborate than the traditional do-operator implies.

Eaton, D., & Murphy, K. (2007). Belief net structure learning from uncertain interventions. J Mach Learn Res, 1, 1-48.
Korb, K. B., Hope, L. R., Nicholson, A. E., & Axnick, K. (2004, August). Varieties of causal intervention. In Pacific Rim international conference on artificial intelligence (pp. 322-331). Springer, Berlin, Heidelberg.
Woodward, J. (2005). Making things happen: A theory of causal explanation. Oxford university press.
Eberhardt, F., & Scheines, R. (2007). Interventions and causal inference. Philosophy of science, 74(5), 981-995.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

do operator / conditioning #5280

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 7 comments 17 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

do operator / conditioning #5280

drbenvincent Dec 22, 2021 Collaborator

Desired functionality

An existing solution

An alternative approach

Alternative approaches for step 2

do operator?

Replies: 7 comments · 17 replies

twiecki Dec 22, 2021 Maintainer

ericmjl Dec 31, 2021 Maintainer

lucianopaz Jan 12, 2022 Maintainer

ricardoV94 Jan 12, 2022 Maintainer

twiecki Jan 12, 2022 Maintainer

ricardoV94 Jan 12, 2022 Maintainer

ericmjl Dec 30, 2021 Maintainer

drbenvincent Dec 30, 2021 Collaborator Author

drbenvincent Dec 30, 2021 Collaborator Author

ericmjl Dec 30, 2021 Maintainer

ericmjl Dec 30, 2021 Maintainer

twiecki Dec 30, 2021 Maintainer

twiecki Jul 18, 2022 Maintainer

ferrine Jul 22, 2022 Maintainer

ferrine Jul 22, 2022 Maintainer

ferrine Jul 22, 2022 Maintainer

drbenvincent Jul 23, 2022 Collaborator Author

ferrine Jul 23, 2022 Maintainer

cluhmann Jul 23, 2022 Maintainer

drbenvincent
Dec 22, 2021
Collaborator

Replies: 7 comments 17 replies

twiecki
Dec 22, 2021
Maintainer

ericmjl Dec 31, 2021
Maintainer

lucianopaz Jan 12, 2022
Maintainer

ricardoV94 Jan 12, 2022
Maintainer

twiecki Jan 12, 2022
Maintainer

ricardoV94 Jan 12, 2022
Maintainer

ericmjl
Dec 30, 2021
Maintainer

drbenvincent Dec 30, 2021
Collaborator Author

drbenvincent Dec 30, 2021
Collaborator Author

ericmjl Dec 30, 2021
Maintainer

ericmjl Dec 30, 2021
Maintainer

twiecki Dec 30, 2021
Maintainer

twiecki
Jul 18, 2022
Maintainer

ferrine Jul 22, 2022
Maintainer

ferrine Jul 22, 2022
Maintainer

ferrine
Jul 22, 2022
Maintainer

drbenvincent
Jul 23, 2022
Collaborator Author

ferrine
Jul 23, 2022
Maintainer

cluhmann
Jul 23, 2022
Maintainer