A Manual for Reading Independence

1 Marginal Independence

$X,Y$ . There are three equivalent ways to define the following marginal independence:

X ⊥ Y

$X$ $Y$ .

1.1 Via joint probability

P (X, Y) = P (X) P (Y)

1.2 Via conditional probability

P (X | Y) = P (X)

1.3 Via conditional probability the other way

P (Y | X) = P (Y)

$1.2$ $1.3$ bidirectional $X$ $Y$ $Y$ $X$ . So we can say

$X$ $Y$ ;
$Y$ $X$ ;
$X$ $Y$ are independent from each other.

The three sentences above mean the same.

2 Conditional Independence

$X,Y,Z$ . Analogously, there are three equivalent ways to define the following conditional independence:

X ⊥ Y | Z

$X$ $Y$ $Z$ .

2.1 Via joint probability

P (X, Y | Z) = P (X | Z) P (Y | Z)

There is also another way of saying this, which will come useful in probabilistic graphical models:

P (X, Y, Z) \propto f_{1} (X, Z) f_{2} (Y, Z)

$X,Y,Z$ $f_1$ $f_2$ .

2.2 Via conditional probability

P (X | Y, Z) = P (X | Z)

$Z$ $Y$ $X$ .

2.3 Via conditional probability the other way

P (Y | X, Z) = P (Y | Z)

$Z$ $X$ $Y$ .

$2.2$ $2.3$ bidirectional $X$ $Y$ $Y$ $X$ under the same condition.

3 Reading independence from probabilistic graphical models

3.1 Fundamentals

We have defined marginal and conditional independence. Then we want to read independence in the language of probabilistic graphical models. The following table covers six very fundamental graphical structure and the independence relationship they imply.

	Probabilistic Graphical Model	Marginal Independence	Conditional Independence
1	$X \rightarrow Y$	$X \not\perp Y \| \emptyset$
2	$Y \leftarrow X$	$X \not\perp Y \|\emptyset$
3 descendant	$X \rightarrow W \rightarrow Y$	$X \not\perp Y \|\emptyset$	$X \perp Y \|W$
4 descendant	$X \leftarrow W \leftarrow Y$	$X \not\perp Y \|\emptyset$	$X \perp Y \|W$
5 common parent	$X \leftarrow W \rightarrow Y$	$X \not\perp Y \|\emptyset$	$X \perp Y \|W$
6 collision	$X \rightarrow W \leftarrow Y$	$X \perp Y \|\emptyset$	$X \not\perp Y \| W$

$X \rightarrow W \leftarrow Y$ $X$ $Y$ $W$ $X$ $Y$ , which is an example of "explaining away".

3.2 Applying to larger models

$X$ $Y$ are independence given a set of observations, we take two steps.

$X$ $Y$ .
Judge whether any of these trails are active. For a trail to be active, we need to check the nodes on the trail and see:
- $W$ $X \rightarrow W \leftarrow Y$ ), the collision node itself or one of its descendants is observed.
- For non-collision nodes, they are not observed.

$X$ $Y$ $X$ $Y$ .

We discriminate between collision nodes and non-collision nodes because only the collision node has different independence property among line 3-6 in our table. For probability influence to flow, we need to activate collision by observing them and avoid cutting caused by observing a non-collision node.

The point I tend to forget is that observing a descendant of a collision node also activates collision. This is because observing a node's descendant is indirectly observing the node itself. We used line 3,4, and 6 in the table in this reasoning, making this quite subtle.

3.3 Markov blanket

$X$ $\mathcal S$ $\mathcal S$ $X$ $X$ $X$ .

$X$ in a graph consists of:

$X$ ;
$X$ ;
$X$ (sometimes called spouses because spouses have a common children node).

$X$ $\text{spouse} \rightarrow \text{child} \leftarrow X$ $X$ $X$ .

$X$ $X$ $X$ (siblings).

4 Remark

Usually, visualization is more intuitive than texts, tables, and equations. However, I find probabilistic graphical models are not that kind of visualization. They are not intuitive and easy to misunderstand for me. What is more intuitive is actually the textual story behind it. Then, I realized probabilistic graphical models are not graph visualization. They are models!

Therefore, I now think of probabilistic graphical models as yet another mathematical notation. Like any other funny-looking math notations, they are not intuitive at first sight and it takes practice to get them right. So, kindly grant ourselves patience. 🙃