*Bounty: 50*

*Bounty: 50*

I’m re-reading some of the early chapters of Pearl’s seminal *Causality* and I’m realizing that I can’t come up with more than 2 good examples of probability distribution, Bayesian Network pairs that fails as probability distribution, **Causal** Bayesian Network pairs.

From Pearl, the formal definition of a **Causal** Bayesian Network is:

A DAG $ G $ is said to be a causal Bayesian network compatible with [the set of all intervention distributions] $ mathbf{P}_* $ if and only if the following three conditions hold for every $ P_x in mathbf{P}_* $:

(i) $ P_x(v) $ is Markov relative to $ G $;

(ii) $ P_x(v_i mid text{pa}_i) = 1 $ for all $ V_i in X $ whenever $ v_i $ is consistent with $ X = x $;

(iii) $ P_x(v_i mid text{pa}_i) = P(v_i mid text{pa}_i) $ for all $ V_i notin X $ whenever $ text{pa}_i $ is consistent with $ X = x $, i.e. each $ P(v_i mid text{pa}_i) $ remains invariant to interventions not involving $ V_i $.

I’ve only come up with two potential counter-examples.

The first is the following: say we have $ X $ which represents “clouds in the sky” and $ Y $ which represents “it’s raining.” Now, say we postulate a graph, $ G $, in which $ Y rightarrow X $. In words, “rain causes clouds.”

In order to satisfy criterion (iii) in the above definition, $ P_{text{do}(Y = 1)}(X = 1 mid Y = 1) $ must equal $ P(X = 1 mid Y = 1) $. However, since rain does not in fact cause clouds, were we truly able to intervene on rain, we’d find that $ P_{text{do}(Y=1)}(X = 1 mid Y = 1) $ would just equal $ P(X=1) $. Thus, as our intuition would lead us to believe, the graph $ G $ that represents “rain causes clouds” does not qualify as a **Causal** Bayesian Network.

My second example, of which I’m less sure, is the following: say we want to know the effect of some treatment ($ X $), e.g. vitamins vs. no vitamins, on some health marker ($ Y $). In order to do so, we’re going to run a randomized controlled trial, which will give us the intervention distribution for $ text{do}(X) $ (for both possible values of $ X $). Technically, we can deal with this by modeling the treatment assignment as a separate variable from the actual treatment (often done in instrumental variable analyses). However, say we instead model our experiment with a three (rather than 4) variable graph $ X rightarrow Y leftarrow U $ (treatment $ X $, outcome $ Y $, unobserved confounding $ U $) that conflates treatment with treatment assignment. In our intervention distribution generated by the randomized controlled trial, $ P_{do(X = x)}(X=x) < 1 $, violating criterion (ii).

I spent some time trying to generate other examples, in particular ones that violated criterion (i) but struggled to. Can others share more? I’d also love to have my second example (in|)validated!