Isserlis' Theorem

Isserlis' theorem rewrites higher-order moments of zero-mean Gaussian random variables as a sum of 2nd-order moments of pairings. For zero-mean Gaussian random variables x1,x2,,xd, Isserlis' theorem tells us for even d

(1)E[x1x2xd]=pPd2{i,j}pE[xixj]

where p denotes a pair of indices in {1,2,,d}. There are (d1)!! terms in the sum.

For odd d​, E[x1x2xd]=0. Note that the 2nd-order moments are also covariances for zero-mean random variables: E[xixj]=Cov[xixj].

As an example, Isserlis' theorem applied to the 4​-th order moments goes like this

(2)E[x1x2x3x4]=E[x1x2]E[x3x4]+E[x1x3]E[x2x4]+E[x1x4]E[x2x3]

Proof from Scratch

We have the random variable xRd, which follows a zero-mean Gaussian distribution xN(0,Σ).

We use a quadratic identity equation

(3)(xΣv)Σ1(xΣv)=xΣ1x2xv+vΣv

Because the probability density function integrate to 1, we have

(4)1=e12(xΣv)Σ1(xΣv)|2πΣ|dx=e12xΣ1x+xv12vΣv|2πΣ|dx(5)e12vΣv=e12xΣ1x+xv|2πΣ|dx

Then we make the genius move of differentiating Equation (4) with respect to v1,v2,,vd and evaluating at v1,v2,,vd=0. Differentiating the right-hand side, we get exactly the moments we want

1|2πΣ|e12xΣ1x+xvdxv1v2vd|v1,v2,,vd=0=1|2πΣ|e12xΣ1x+xvv1v2vd|v1,v2,,vd=0dx=1|2πΣ|x1x2xde12xΣ1xdx(6)E[x1x2xd]

Differentiating the left-hand side, we get

(7)e12vΣvv1v2vd|v1,v2,,vd=0=(1+12vΣv+12!(12vΣv)2)+v1v2vd|v1,v2,,vd=0={0,odd d1(d/2)!(12vΣv)d/2v1v2vd|v1,v2,,vd=0,even d

This is getting really cumbersome to write. I am lazy; so let's just assume d=4 as an example of the even d case. We continue the derivation for d=4

(8)12!(12vΣv)2v1v2v3v4=v1v2v3v418i,j,k,l4vivjvkvlΣijΣkl

Note that only the terms with v1v2v3v4 in the sum have nonzero derivatives with respect to v1,v2,v3,v4. There are 4!=24 arrangements for {v1,v2,v3,v4}. There are C42/2=3 permutations for ΣijΣkl, namely Σ12Σ34,Σ13Σ24,Σ14Σ23. Each permutation of ΣijΣkl appears 24/3=8 times. We thus write out Equation (8) as

v1v2v3v418i,j,k,l4vivjvkvlΣijΣkl=18(8Σ12Σ34+8Σ13Σ24+8Σ14Σ23)(9)=Σ12Σ34+Σ13Σ24+Σ14Σ23

By the definition of the covariance matrix, we finish the proof

E[x1x2x3x4]=Σ12Σ34+Σ13Σ24+Σ14Σ23=E[x1x2]E[x3x4]+E[x1x3]E[x2x4]+E[x1x4]E[x2x3]

Although I only completed the derivation for d=4, we can generalize the result to arbitrary even d by induction. For instance, for d=6, we can first break things down to 2 and 4 and then break down the 4. Namely, we first have

E[x1x2x3x4x5x6]=E[x1x2]E[x3x4x5x6]+E[x1x3]E[x2x4x5x6]+E[x1x4]E[x2x3x5x6]+E[x1x5]E[x2x3x4x6]+E[x1x6]E[x2x3x4x5]

The 4-th order moment in each term can be further decomposed to three terms that only contain 2nd moments using Equation (2). So we eventually break the 6-th order moment into 5×3=15 terms.

Proof via Stein's Lemma

If one already knows Stein's Lemma, Isserlis' theorem is quite easy to prove. Stein's lemma states that for zero-mean Gaussian random variables x1,x2,,xd, we have

E[x1f(x1,x2,,xd)]=i=1dE[x1xi]E[xif(x1,x2,,xd)]

Applying this lemma to E[x1x2x3x4] immediately gives us Equation (2)

E[x1x2x3x4]=i=14E[x1xi]E[x2x3x4xi]=E[x1x1]E[0]+E[x1x2]E[x3x4]+E[x1x3]E[x2x4]+E[x1x4]E[x2x3]=E[x1x2]E[x3x4]+E[x1x3]E[x2x4]+E[x1x4]E[x2x3]

Comments

Isserlis' theorem is specific to zero-mean Gaussian random variables. It does not extend to other distributions.

Isserlis' theorem allows the random variables to have correlations. If the random variables are independent x1x2, we have E[x1x2]=E[x1]E[x2] no matter what distributions x1 and x2​ follow.

© Yedi Zhang | Last updated: September 2024