A Manual for Reading Independence

1 Marginal Independence

Let us consider two random variables X,Y. There are three equivalent ways to define the following marginal independence:

XY

read as X is independent of Y.

1.1 Via joint probability

P(X,Y)=P(X)P(Y)

1.2 Via conditional probability

P(X|Y)=P(X)

1.3 Via conditional probability the other way

P(Y|X)=P(Y)

Note that the equivalency between 1.2 and 1.3 implies that independence is bidirectional. If X can influence Y, then Y can also influence X. So we can say

The three sentences above mean the same.

2 Conditional Independence

Let us consider three random variables X,Y,Z. Analogously, there are three equivalent ways to define the following conditional independence:

XY | Z

read as X is independent of Y given Z.

2.1 Via joint probability

P(X,Y|Z)=P(X|Z)P(Y|Z)

There is also another way of saying this, which will come useful in probabilistic graphical models:

P(X,Y,Z)f1(X,Z)f2(Y,Z)

This is to say the joint probability of X,Y,Z is proportional to the product of two factors f1 and f2.

2.2 Via conditional probability

P(X|Y,Z)=P(X|Z)

The story behind this expression is: if I already know Z, then Y gives me no additional information that changes my probability of X.

2.3 Via conditional probability the other way

P(Y|X,Z)=P(Y|Z)

The story behind this expression is: if I already know Z, then X gives me no additional information that changes my probability of Y.

Note that the equivalency between 2.2 and 2.3 also implies that probability flow is also bidirectional. If X can influence Y under some condition, then Y can also influence X under the same condition.

3 Reading independence from probabilistic graphical models

3.1 Fundamentals

We have defined marginal and conditional independence. Then we want to read independence in the language of probabilistic graphical models. The following table covers six very fundamental graphical structure and the independence relationship they imply.

 Probabilistic Graphical ModelMarginal IndependenceConditional Independence
1XYX⊥̸Y| 
2YXX⊥̸Y| 
3 descendantXWYX⊥̸Y|XY|W
4 descendantXWYX⊥̸Y|XY|W
5 common parentXWYX⊥̸Y|XY|W
6 collisionXWYXY|X⊥̸Y|W

The first five lines of the following table make intuitive sense. The one graphical structure where people make mistakes the most (okay, where I make mistakes the most) is the "collision": XWY. When both X and Y influence W, knowing X tells us something about Y, which is an example of "explaining away".

3.2 Applying to larger models

Probabilistic graphical models in practice are much larger than just three random variables. If we face a large graph and are asked whether X and Y are independence given a set of observations, we take two steps.

  1. Find all trails that link X to Y.

  2. Judge whether any of these trails are active. For a trail to be active, we need to check the nodes on the trail and see:

    • For collision nodes (like W in XWY), the collision node itself or one of its descendants is observed.

    • For non-collision nodes, they are not observed.

If there exist one or more active trails, X will be dependent of Y. If all trails are inactive, X will be independent of Y.

We discriminate between collision nodes and non-collision nodes because only the collision node has different independence property among line 3-6 in our table. For probability influence to flow, we need to activate collision by observing them and avoid cutting caused by observing a non-collision node.

The point I tend to forget is that observing a descendant of a collision node also activates collision. This is because observing a node's descendant is indirectly observing the node itself. We used line 3,4, and 6 in the table in this reasoning, making this quite subtle.

3.3 Markov blanket

The Market blanket of a random variable X is a random variable set S such that: given S, X is independent of any other random variables. Markov blanket come useful because when one wants to infer X with some observations available, we know it is enough to just consider the Markov blanket. Information about other random variables that are not in the Markov blanket is useless for inferring X.

The Markov boundary of a random variable X in a graph consists of:

  1. All parents of X;

  2. All children of X;

  3. All the other parents of the children of X (sometimes called spouses because spouses have a common children node).

It is obvious that we need all parents and children of X. Then we also need the spouses nodes because they form a collision: spousechildX. Given the child, X is not independent of the spouse node. Hence, spouses nodes are also needed to safely blanket X.

It seems like a mouthful to say "other parents of the children of X". But I find an easy but informal way to memorize is to think of the Markov blanket as someone who is a bit heartless to siblings. Because the Markov blanket consists of other parents of the children of X (spouses) but not other children of the parents of X (siblings).

4 Remark

Usually, visualization is more intuitive than texts, tables, and equations. However, I find probabilistic graphical models are not that kind of visualization. They are not intuitive and easy to misunderstand for me. What is more intuitive is actually the textual story behind it. Then, I realized probabilistic graphical models are not graph visualization. They are models!

Therefore, I now think of probabilistic graphical models as yet another mathematical notation. Like any other funny-looking math notations, they are not intuitive at first sight and it takes practice to get them right. So, kindly grant ourselves patience. 🙃

© Yedi Zhang | Last updated: March 2023