1 Cumulants - University of Chicagopmcc/courses/stat306/2017/cumulants.pdfwith rth cumulant r, and...

23
1 Cumulants 1.1 Definition The rth moment of a real-valued random variable X with density f (x) is μ r = E(X r )= Z -∞ x r f (x) dx for integer r =0, 1,.... The value is assumed to be finite. Provided that it has a Taylor expansion about the origin, the moment generating function M (ξ )= E(e ξX )= E(1 + ξX + ··· + ξ r X r /r!+ ···) = X r=0 μ r ξ r /r! is an easy way to combine all of the moments into a single expression. The rth moment is the rth derivative of M at the origin: μ r = M (r) (0). The cumulants κ r are the coefficients in the Taylor expansion of the cumulant generating function about the origin K(ξ ) = log M (ξ )= X r κ r ξ r /r!, so that κ r = K (r) 0). Evidently μ 0 = 1 implies κ 0 = 0. The relationship between the first few moments and cumulants, obtained by extracting coefficients from the expansion, is as follows κ 1 = μ 1 κ 2 = μ 2 - μ 2 1 κ 3 = μ 3 - 3μ 2 μ 1 +2μ 3 1 κ 4 = μ 4 - 4μ 3 μ 1 - 3μ 2 2 + 12μ 2 μ 2 1 - 6μ 4 1 κ 5 = μ 5 - 5μ 4 μ 1 - 10μ 3 μ 2 + 20μ 3 μ 2 1 + 30μ 2 2 μ 1 - 60μ 2 μ 3 1 + 24μ 5 1 In the reverse direction μ 2 = κ 2 + κ 2 1 μ 3 = κ 3 +3κ 2 κ 1 + κ 3 1 μ 4 = κ 4 +4κ 3 κ 1 +3κ 2 2 +6κ 2 κ 2 1 + κ 4 1 μ 5 = κ 5 +5κ 4 κ 1 + 10κ 3 κ 2 + 10κ 3 κ 2 1 + 15κ 2 2 κ 1 + 10κ 2 κ 3 1 + κ 5 1 . 1

Transcript of 1 Cumulants - University of Chicagopmcc/courses/stat306/2017/cumulants.pdfwith rth cumulant r, and...

Page 1: 1 Cumulants - University of Chicagopmcc/courses/stat306/2017/cumulants.pdfwith rth cumulant r, and let != e2ˇi=n be a primitive nth root of unity. The discrete Fourier combination

1 Cumulants

1.1 Definition

The rth moment of a real-valued random variable X with density f(x) is

µr = E(Xr) =

∫ ∞−∞

xrf(x) dx

for integer r = 0, 1, . . .. The value is assumed to be finite. Provided that ithas a Taylor expansion about the origin, the moment generating function

M(ξ) = E(eξX) =E(1 + ξX + · · ·+ ξrXr/r! + · · ·)

=∞∑r=0

µrξr/r!

is an easy way to combine all of the moments into a single expression. Therth moment is the rth derivative of M at the origin: µr = M (r)(0).

The cumulants κr are the coefficients in the Taylor expansion of thecumulant generating function about the origin

K(ξ) = logM(ξ) =∑r

κrξr/r!,

so that κr = K(r)0). Evidently µ0 = 1 implies κ0 = 0.The relationship between the first few moments and cumulants, obtained

by extracting coefficients from the expansion, is as follows

κ1 = µ1

κ2 = µ2 − µ21

κ3 = µ3 − 3µ2µ1 + 2µ31

κ4 = µ4 − 4µ3µ1 − 3µ22 + 12µ2µ

21 − 6µ4

1

κ5 = µ5 − 5µ4µ1 − 10µ3µ2 + 20µ3µ21 + 30µ2

2µ1 − 60µ2µ31 + 24µ5

1

In the reverse direction

µ2 = κ2 + κ21

µ3 = κ3 + 3κ2κ1 + κ31

µ4 = κ4 + 4κ3κ1 + 3κ22 + 6κ2κ

21 + κ4

1

µ5 = κ5 + 5κ4κ1 + 10κ3κ2 + 10κ3κ21 + 15κ2

2κ1 + 10κ2κ31 + κ5

1.

1

Page 2: 1 Cumulants - University of Chicagopmcc/courses/stat306/2017/cumulants.pdfwith rth cumulant r, and let != e2ˇi=n be a primitive nth root of unity. The discrete Fourier combination

In particular, κ1 = µ1 is the mean of X, κ2 is the variance, and κ3 =E((X − µ1)3). Higher-order cumulants are not the same as moments aboutthe mean.

This definition of cumulants is nothing more than the formal relationbetween the coefficients in the Taylor expansion of one function M(ξ) withM(0) = 1, and the coefficients in the Taylor expansion of logM(ξ). For ex-ample Student’s t distribution on five degrees of freedom has finite momentsup to order four, with infinite moments of order five and higher. The mo-ment generating function does not exist for real ξ 6= 0, but the characteristicfunction

M(iξ) = e−|ξ|(1 + |ξ|+ ξ2/3)

is real and finite for all real ξ. Both M(iξ) and K(iξ) have Taylor expansionsabout ξ = 0 up to order four only:

M(iξ) = (1− |ξ|+ ξ2/2− |ξ|3/3! + ξ4/4! + o(ξ4)) (1 + |ξ|+ ξ2/3)

= 1 + ξ2(13 + 1

2 − 1) + |ξ|3(−13 + 1

2 −16) + ξ4( 1

4! −13! + 1

6) + o(ξ4)

= 1− ξ2/6 + ξ4/4! + o(ξ4);

K(iξ) =−|ξ|+ log(1 + |ξ|+ ξ2/3)

= ξ2/3− (|ξ|+ ξ2/3)2/2 + (|ξ|+ ξ2/3)3/3− (|ξ|+ ξ2/3)4/4 + · · ·=−ξ2/6 + ξ4/36 + o(ξ4),

which means µ2 = κ2 = 1/3, µ4 = 1 and κ4 = 2/3.The normal distribution N(µ, σ2) has cumulant generating function ξµ+

ξ2σ2/2, a quadratic polynomial implying that all cumulants of order threeand higher are zero. Marcinkiewicz (1935) showed that the normal dis-tribution is the only distribution whose cumulant generating function is apolynomial, i.e., the only distribution having a finite number of non-zerocumulants. The Poisson distribution with mean µ has moment generatingfunction exp(µ(eξ − 1)) and cumulant generating function µ(eξ − 1). Con-sequently all the cumulants are equal to the mean.

Two distinct distributions may have the same moments, and hence thesame cumulants. This statement is fairly obvious for distributions whosemoments are all infinite, or even for distributions having infinite higher-order moments. But it is much less obvious for distributions having finitemoments of all orders. Heyde (1963) gave one such pair of distributions withdensities

f1(x) = exp(−(log x)2/2)/(x√

2π)

f2(x) = f1(x)[1 + sin(2π log x)/2]

2

Page 3: 1 Cumulants - University of Chicagopmcc/courses/stat306/2017/cumulants.pdfwith rth cumulant r, and let != e2ˇi=n be a primitive nth root of unity. The discrete Fourier combination

for x > 0. The first of these is called the log normal distribution. To showthat these distributions have the same moments it suffices to show that∫ ∞

0xkf1(x) sin(2π log x) dx = 0

for integer k ≥ 1, which can be shown by making the substitution log x =y + k.

Cumulants of order r ≥ 2 are called semi-invariant on account of their be-haviour under affine transformation of variables (Thiele 188?, Dressel 1942).For r ≥ 2, the rth cumulant of the affine transformation a + bX is brκr,independent of a. This behaviour is considerably simpler than that of mo-ments. However, moments about the mean are also semi-invariant, so thisproperty alone does not explain why cumulants are so natural for statisticalpurposes.

The notion of a cumulant can be traced to the work of Thiele (18??),who called them semi-invariants, but the moderm theory of cumulants andthe associated k-statistics begins with the remarkable 1929 paper by Fisher.Fisher used the term ‘cumulative moment function’ for what we now call thecumulant generating function on account of its behaviour under convolutionof independent random variables. For the coefficients in the expansion,the term cumulant was suggested by Hotelling in a letter to Fisher, whoapproved of the coinage.

Let S = X + Y be the sum of two independent random variables. Themoment generating function of the sum is the product

MS(ξ) = E(eξ(X+Y )) = E(eξXeξY ) = MX(ξ)MY (ξ),

and the cumulant generating function of the sum is the sum of the cumulantgenerating functions

KS(ξ) = KX(ξ) +KY (ξ).

Consequently, the rth cumulant of the sum is the sum of the rth cumulants.By extension, if X1, . . . Xn are independent real-valued random variables,the rth cumulant of the sum is the sum of the rth cumulants. If they arealso identically distributed, the rth cumulant is nκr, and the rth cumulantof the standardized sum n−1/2(X1 + · · ·+Xn) is n1−r/2κr. Provided that thecumulants are finite, all cumulants of order r ≥ 3 of the standardized sumtend to zero, which is a simple demonstration of the central limit theorem.

Good (195?) obtained an expression for the rth cumulant of X as the rthmoment of the discrete Fourier transform of an independent and identicallydistributed sequence as follows. Let X1, X2, . . . be independent copies of X

3

Page 4: 1 Cumulants - University of Chicagopmcc/courses/stat306/2017/cumulants.pdfwith rth cumulant r, and let != e2ˇi=n be a primitive nth root of unity. The discrete Fourier combination

with rth cumulant κr, and let ω = e2πi/n be a primitive nth root of unity.The discrete Fourier combination

Z = X1 + ωX2 + · · ·+ ωn−1Xn

is a complex-valued random variable whose distribution is invariant underrotation Z ∼ ωZ through multiples of 2π/n. The rth cumulant of the sum isκr∑nj=1 ω

rj , which is equal to nκr if r is a multiple of n, and zero otherwise.Consequently E(Zr) = 0 for integer r < n and E(Zn) = nκn.

1.2 Multivariate cumulants

Somewhat surprisingly, the relation between moments and cumulants is sim-pler and more transparent in the multivariate case than in the univariatecase. Let X = (X1, . . . , Xk) be the components of a random vector in Rk.In a departure from the univariate notation, we write κr = E(Xr) for thecomponents of the mean vector, κrs = E(XrXs) for the components of thesecond moment matrix, κrst = E(XrXsXt) for the third moments, and soon. It is convenient notationally to adopt Einstein’s summation conventionin which ξrX

r denotes the linear combination ξ1X1 + · · ·+ ξkX

k, the squareof the linear combination is (ξrX

r)2 = ξrξsXrXs a sum of k2 terms, and so

on for higher powers.Technically speaking, the components of a vector X in V = Rk are de-

noted by X1, . . . , Xk, which is abbreviated to Xr using a dummy superscriptranging over the index set [k]. A linear functional ξ:V → R is a vector inthe dual space V ′ of linear functionals. The rth component of ξ is typicallydenoted by ξr using subscripts for the coefficients, so that the value of ξ atX is a the real number

ξ(X) = ξ1X1 + · · ·+ ξkX

k

which is abbreviated to ξrXr. In Einstein’s notation, each repeated index

should occur exactly twice, once as a subscript indexing the linear functional,and once as a superscript indexing the components of a vector in V.

The tensor product of X with Y is a vector or tensor in V⊗2 whose(r, s)-component is (X ⊗ Y )rs = XrY s. An arbitrary vector A in V⊗2 is ak × k array of components Ars, which can be decomposed as the sum of asymmetric array and a skew-symmetric array

Ars = (Ars +Asr)/2 + (Ars −Asr)/2 = sym2(A) + alt2(A).

4

Page 5: 1 Cumulants - University of Chicagopmcc/courses/stat306/2017/cumulants.pdfwith rth cumulant r, and let != e2ˇi=n be a primitive nth root of unity. The discrete Fourier combination

In this setting sym2 and alt2 are projections V⊗2 → V⊗2, which are linearand complementary (the image of one is the kernel of the other). Thedimensions are k(k + 1)/2 for sym2(V) and k(k − 1)/2 for alt2(V).

Note that (X ⊗X)rs = (X ⊗X)sr implies that the tensor product of avector X ∈ V with itself is symmetric, so X⊗X is a vector in the symmetrictensor product space sym2(V). The tensor product map X 7→ X⊗X definesa transformation V → sym2(V) from one vector space into another, andalthough 0 7→ 0, the transformation is not linear: (X + Y )⊗2 is not equal toX⊗2 + Y ⊗2. The image of the tensor product transformation is the set ofrank-one [symmetric] tensors or matrices, and the span of these matrices isthe whole space sym2(V).

The moment generating function of [the distribution of] a random vari-able X taking values in V is a function M(ξ) = E(exp(ξrX

r)) on the dualspace of linear functionals. Its Taylor expansion is

M(ξ) = 1 + ξrκr + 1

2!ξrξsκrs + 1

3!ξrξsξtκrst + · · · ,

where each of the joint moments

κr = E(Xr), κrs = E(XrXs), κrst = E(XrXsXt), . . .

is a symmetric tensor in V, V⊗2, V⊗3. and so on.The cumulants are defined as the coefficients κr,s, κr,s,t, . . . in the Taylor

expansion

logM(ξ) = ξrκr + 1

2!ξrξsκr,s + 1

3!ξrξsξtκr,s,t + · · · .

This notation does not distinguish first-order moments from first-order cu-mulants, but commas separating the superscripts serve to distinguish higher-order cumulants from moments:

κr,s = cum2(Xr, Xs), κr,s,t = cum3(Xr, Xs, Xt), . . . .

Each superscript in this setting denote a vector component r ∈ [k], not apower.

Comparison of coefficients reveals that the each moment κrs, κrst, . . . isa sum over partitions of the superscripts, each term in the sum being aproduct of cumulants:

κrs = κr,s + κrκs

κrst = κr,s,t + κr,sκt + κr,tκs + κs,tκr + κrκsκt

= κr,s,t + κr,sκt[3] + κrκsκt

κrstu = κr,s,t,u + κr,s,tκu[4] + κr,sκt,u[3] + κr,sκtκu[6] + κrκsκtκu.

5

Page 6: 1 Cumulants - University of Chicagopmcc/courses/stat306/2017/cumulants.pdfwith rth cumulant r, and let != e2ˇi=n be a primitive nth root of unity. The discrete Fourier combination

Each parenthetical number indicates a sum over distinct partitions havingthe same block sizes. The fourth-order moment is a sum of 15 distinctcumulant products of which three are of type 22 with two blocks of size two,and six are of type 212 with three blocks:

rs|tu[3] = {rs|tu, rt|su, ru|st}κr,sκt,u[3] = κr,sκt,u + κr,tκs,u + κr,uκs,t

rs|t|u[6] = {rs|t|u, rt|s|u, ru|s|t, st|r|u, su|r|t, tu|r|s}κr,sκtκu[6] = κr,sκtκu + · · ·+ κt,uκrκs.

The blocks are unlabelled, so tu|rs ≡ rs|tu, and each block is a subset ofthe four elements or labels rstu, so rs|tu = sr|tu = rs|ut = sr|tu.

In the reverse direction, each cumulant is also a sum over partitionsof the indices. Each term in the sum is a product of moments, but withcoefficient (−1)ν−1(ν − 1)! where ν is the number of blocks:

κr,s = κrs − κrκs

κr,s,t = κrst − κrsκt[3] + 2κrκsκt

κr,s,t,u = κrstu − κrstκu[4]− κrsκtu[3] + 2κrsκtκu[6]− 6κrκsκtκu.

Partition notation serves one additional purpose. It establishes momentsand cumulants as special cases of generalized cumulants, which includes ob-jects of the type κr,st = cov(Xr, XsXt), κrs,tu = cov(XrXs, XtXu), andκrs,t,u with incompletely partitioned indices. These objects arise very natu-rally in statistical work involving asymptotic approximation of distributions.They are intermediate between moments and cumulants, and have charac-teristics of both.

Every generalized cumulant can be expressed as a sum of certain prod-ucts of ordinary cumulants. Some examples are as follows:

κrs,t = κr,s,t + κrκs,t + κsκr,t

= κr,s,t + κrκs,t[2]

κrs,tu = κr,s,t,u + κr,s,tκu[4] + κr,tκs,u[2] + κr,tκsκu[4]

κrs,t,u = κr,s,t,u + κr,t,uκs[2] + κr,tκs,u[2]

Each generalized cumulant is associated with a partition τ of the given setof indices. For example, κrs,t,u is associated with the partition τ = rs|t|u offour indices into three blocks. Each term on the right is a cumulant productassociated with a partition σ of the same indices. The coefficient is oneif the least upper bound σ ∨ τ has a single block, otherwise zero. Thus,

6

Page 7: 1 Cumulants - University of Chicagopmcc/courses/stat306/2017/cumulants.pdfwith rth cumulant r, and let != e2ˇi=n be a primitive nth root of unity. The discrete Fourier combination

with τ = rs|t|u, the product κr,sκt,u does not appear on the right becauseσ ∨ τ = rs|tu has two blocks.

As an example of the way these formulae may be used, let X be a scalarrandom variable with cumulants κ1, κ2, κ3, . . .. By translating the secondformula in the preceding list, we find that the variance of the squared variableis

var(X2) = κ4 + 4κ3κ1 + 2κ22 + 4κ2κ

21,

reducing to κ4 + 2κ22 if the mean is zero.

Exercises 1.2

1.2.1 Let V = Rn, and let A be a tensor in V⊗2 with components Aij . Showthat

sym2(A)ij = (Aij +Aji)/2

alt2(A)ij = (Aij −Aji)/2

are linear projections V⊗2 → V⊗2. Show also that the projections are com-plementary. What are the dimensions of the image spaces sym2(V) andalt2(V)?

1.2.2 Let A be a tensor in V⊗3 with components Aijk. Show that

sym3(A)ijk = (Aijk +Ajik +Akji +Aikj +Ajki +Akij)/6

alt3(A)ijk = (Aijk −Ajik −Akji −Aikj +Ajki +Akij)/6

res3(A)ijk = (2Aijk −Ajki −Akij)/3

are linear projections V⊗3 → V⊗3, i.e., satisfying T (T (A)) = T (A). Showalso that the projections are complementary in the sense that the kernel ofeach one is the direct sum of the images of the other two. What are thedimensions of the image spaces sym3(V), alt3(V) and res3(V)?

1.3 Partition lattice

1.3.1 Poset

A Boolean function S × S → {0, 1}, i.e., a subset of S2, is called a partialorder if the subset or relationship (≤) is reflexive and transitive. Reflexivemeans a ≤ a [is true] for every a ∈ S; transitive means that a ≤ b and b ≤ cimplies a ≤ c. To simplify matters, a third condition is imposed such thata ≤ b and b ≤ a together imply a = b; otherwise it is necessary to reducethe discussion to equivalence classes.

7

Page 8: 1 Cumulants - University of Chicagopmcc/courses/stat306/2017/cumulants.pdfwith rth cumulant r, and let != e2ˇi=n be a primitive nth root of unity. The discrete Fourier combination

The real line is a partially ordered set that is also a total order (eithera ≤ b or b ≤ a for every pair). The real plane or complex plane orderedcomponentwise (a ≤ b if a1 ≤ b1 and a2 ≤ b2) is partially ordered but notcompletely ordered: there exist pairs for which a ≤ b and b ≤ a are bothfalse. For any set A each S ⊂ 2A is a collection of subsets of A, which ispartially ordered by subset inclusion. The set of subspaces of a vector spaceV is partially ordered by subspace inclusion. The set of factorial models(factorial subspaces) generated by factors A,B,C is also partially orderedby subspace inclusion.

1.3.2 Set partition

Let n ≥ 1 be a positive integer, and let [n] = {1, . . . , n} be a finite set. Apartition of the set [n] is a collection of disjoint non-empty subsets, calledblocks, whose union is [n]. The partition type is the set of block sizes countedwith multiplicity. Since the sum of the block sizes is n, the partition type isa partition of the integer n

For example {{1, 3}, {2, 5}, {4}} and {{1, 5}, {2, 3}, {4}} are two distinctpartitions of [5], usually abbreviated to 13|25|4 and 15|23|4. Since a partitionis a set of subsets, the order of the blocks, and the order within blocks areignored. All told, there are 15 distinct partitions of [5] of the same type2 + 2 + 1 or 22 1.

A partition of [n] is also an equivalence relation B: [n]2 → {0, 1}. Inother words, B ⊂ [n]2 is a symmetric Boolean matrix that is also reflexiveand transitive. The matrix representations of 13|25|4 and 15|23|4 are

13|25|4 =

1 0 1 0 00 1 0 0 11 0 1 0 00 0 0 1 00 1 0 0 1

, 15|23|4 =

1 0 0 0 10 1 1 0 10 1 1 0 00 0 0 1 01 0 0 0 1

.

Evidently the expressions 32|4|51 and 15|23|4 determine the same subsetB ⊂ [5]2, and therefore the same partition.

Let Pn be the set of partitions of [n]. For n ≤ 5 the elements of Pngrouped by partition type are

P1 : 1

P2 : 12, 1|2P3 : 123, 12|3, 13|2, 23|1, 1|2|3P4 : 1234, 123|4 [4], 12|34 [3], 12|3|4 [6], 1|2|3|4

8

Page 9: 1 Cumulants - University of Chicagopmcc/courses/stat306/2017/cumulants.pdfwith rth cumulant r, and let != e2ˇi=n be a primitive nth root of unity. The discrete Fourier combination

P5 : 12345, 1234|5 [5], 123|45 [10], 123|4|5 [10], 12|34|5 [15], 12|3|4|5 [10], 1|2|3|4|5

Thus,12|34 [3] = {12|34, 13|24, 14|23} ⊂ P4

is the subset consisting of all distinct set partitions of type 22, and

12|34|5 [15] ⊂ P5

is the subset of 15 partitions of type 122 (five ways to choose the singleton,and three ways to split the remaining four into two pairs). Each subset of agiven type is an orbit of the symmetric group [n]→ [n] acting on partitions

Pn in the obvious way by permutation or re-labelling of elements.When we say that B = 15|23|4 is a partition of [5], we make no distinc-

tion between different representations: (i) as a subset B ⊂ [5]2; (ii) as thesymmetric binary matrix displayed above with rows and columns indexedby [n]; (iii) as a Boolean function B: [n]2 → {0, 1}; (iv) as the set of disjointnon-empty subsets B = {{1, 5}, {2, 3}, {4}}.

The symbol #A applied to a set A means the number of its elements;the same symbol #B applied to a partition B denotes the number of blocks(which is the same as the rank of the matrix of B).

1.3.3 Sub-partition

Let B,B′ be two partitions of the same finite set [n]. If each block of B is asubset of some block of B′ we write B ≤ B′, which is a partial order on Pn.As subsets of the square [n]2, B ≤ B′ if and only if B ⊂ B′. The maximalpartition is the one-block partition {[n]}, conventionally denoted by 1 or 1n;the minimal partition is the n-block partition by singletons, conventionallydenoted by 0 or 0n. As subsets of the n-square, 1n = [n]2 is the entiresquare or the n × n matrix whose components are all one; 0n = diag([n]2)is the diagonal subset or the identity matrix In or the Kronecker functionδ(B,B′). For every partition B we have 0n ≤ B ≤ 1n.

To each pair of partitions B,B′ ∈ Pn there corresponds a least upperbound B ∨ B′ = B′ ∨ B and a greatest lower bound B ∧ B′ = B′ ∧ B.The greatest lower bound is the intersection B ∩B′ of B and B′ as subsetsof [n]2; equivalently, the blocks of B ∧ B′ are the non-empty intersectionsof the blocks of B with the blocks of B′. The least upper bound is theintersection of all partitions that contain B ∪ B′ as subsets of [n]2. Forexample, the least upper bound of 12|34|5|6 and 15|36|2|4 is 125|346.

The partition lattices P2–P4 are illustrated as graphs in Figure 1, witheach partition as a node, and an edge joining two nodes B < B′ only if

9

Page 10: 1 Cumulants - University of Chicagopmcc/courses/stat306/2017/cumulants.pdfwith rth cumulant r, and let != e2ˇi=n be a primitive nth root of unity. The discrete Fourier combination

12

1|2

123

12|3 13|2 23|1

1|2|3....................................................................................

....................................................................................

....................................................................................

....................................................................................

1234

123|4 13|24 124|3 12|34 134|2 14|23 234|1

13|2|4 24|1|3 12|3|4 34|1|2 14|2|3 23|1|4

1|2|3|4

......................

......................

......................

......................

......................

......................

................

................

................

................

................

...........

.........................................................

..........

..........

..........

..........

..........

.......

................

................

................

................

................

...........

......................

......................

......................

......................

......................

......................

.......................................................

........................................................

......................

......................

......................

......................

......................

......................

......................

..................

.......................................................

.......................................................

............................

............................

............................

............................

............................

............................

............................

...................

................

................

................

................

................

................

................

..............

.........................................................

.......................................................

.......................................................

.........................................................

................

................

................

................

................

................

................

..............

................

................

................

................

................

................

................

..............

.......................................................

.......................................................

..................................

..................................

..................................

..................................

..................................

..................................

..................................

.........................

........................................................

.......................................................

....................

....................

....................

....................

....................

...............

............

............

............

............

............

............

....

.............................................

.............................................

............................................................................

....................

....................

....................

....................

....................

...............

Figure 1: Hasse diagrams for smaller partition lattices.

B′ is a parent of B, i.e., there is no intermediate partition B′′ such thatB < B′′ < B.

1.3.4 Zeta and Mobius functions

For any partially ordered set S, the partial order is encoded in the zetafunction ζ(a, b) = 1 if a ≤ b and zero otherwise. In particular, this impliesζ(a, a) = 1 for every a ∈ S. If the elements of S are listed in non-decreasingorder, then ζ is ab upper triangular matrix with unit values along the diago-nal. For example, the zeta function for P3 with elements listed in decreasingnumber of blocks is

S = {1|2|3, 12|3, 13|2, 23|1, 123}

ζ =

1 1 1 1 1

1 0 0 11 0 1

1 11

,with all entries below the diagonal equal to zero, and also some of thoseabove the diagonal.

10

Page 11: 1 Cumulants - University of Chicagopmcc/courses/stat306/2017/cumulants.pdfwith rth cumulant r, and let != e2ˇi=n be a primitive nth root of unity. The discrete Fourier combination

Let S be finite and let f be a real-valued function on S, i.e., f ∈ RS .Then F = ζ ′f is also a real-valued function, so ζ ′ is a linear function RS →RS . In fact

F (a) = (ζ ′f)(a) =∑b

ζ(b, a)f(b) =∑b≤a

f(b)

is the cumulative sum of f -values over the subset b ≤ a. If S is a finitelattice, it has a minimal element 0, and the sum is a sum over the latticeinterval [0, a].

The inverse function m(a, b) is also upper-triangular, and satisfies∑b∈S

ζ(a, b)m(b, c) = δ(a, c) =∑b∈S

m(a, b)ζ(b, c)

and the sum may be restricted to the interval [a, c] = {b: a ≤ b ≤ c}. Inparticular, for a < c, the sum over the interval [a, c] is zero∑

a≤b≤cm(b, c) = 0 =

∑a≤b≤c

m(a, b).

More explicitly, back-substitution gives m(a, a) = 1 for every a and

m(a, c) = −∑a<b≤c

m(b, c)

for a < c.The Mobius function depends on the structure of the lattice. For the

partition lattice Pn, it is sufficient to know that the value relative to themaximal one-block partition is

m(σ,1n) = (−1)#σ−1(#σ − 1)! = (−1)#σ−1Γ(#σ),

the gamma function with alternating sign, independent of the block sizes.More generally, if σ is a partition of [n], the restriction of σ to b ⊂ [n] isa partition of the subset b denoted by σ[b]. For example, the restriction ofσ = 13|24|5 to b = {2, 3, 4, 5} is σ[b] = 24|3|5 consisting of two completeblocks and one partial block of σ. In general, the restriction σ[b] is not asubset of the blocks of σ. For any interval [σ, τ ] with σ ≤ τ , each blockb ∈ τ is the union of certain blocks σ[b] ⊂ σ, so the restriction σ[b] isa subset consisting of certain blocks of σ (with no partial blocks). TheMobius function is

m(σ, τ) =∏b∈τ

m(σ[b],1b) = (−1)#σ−#τ∏b∈τ

Γ(#σ[b])

for σ ≤ τ , and zero otherwise.

11

Page 12: 1 Cumulants - University of Chicagopmcc/courses/stat306/2017/cumulants.pdfwith rth cumulant r, and let != e2ˇi=n be a primitive nth root of unity. The discrete Fourier combination

1.4 Cumulants and generalized cumulants

The relationship between moments and cumulants is most conveniently de-scribed by summation over the partition lattice. Let [n] = {1, . . . , n} be theindex set. For any subset b ⊂ [n] let µb = E(

∏i∈b Y

i) be the moment andlet κb be the joint cumulant or order #b of the variables Y [b] = {Y i: i ∈ b}.Then

µ[n] =∑σ∈Pn

∏b∈σ

κb

κ[n] =∑σ∈Pn

(−1)#σ−1(#σ − 1)!∏b∈σ

µb.

More generally, for any partition τ ∈ Pn, the interval [0, τ ] is isomorphicwith the Cartesian product

∏b∈τ [0, b] of smaller lattices. Consequently, the

moment product F (τ) =∏b∈τ µ

b is expressible as a sum of cumulant prod-ucts f(σ) =

∏b∈σ κ

b over partitions σ in [0, τ ].

F (τ) =∏b∈τ

µb =∑σ≤τ

∏b∈σ

κb =∑σ≤τ

f(σ)

f(τ) =∏b∈τ

κb =∑σ≤τ

m(σ, τ)∏b∈σ

µb.

Now consider a different sort of mixed cumulant of order less than thenumber of variables. A partition τ splits the random variables Y 1, . . . , Y n

into disjoint subsets, one subset Y [b] for each block b ∈ τ . Now consider thejoint cumulant κ(τ) of order k = #τ of the variables

X1 =∏i∈b1

Y i, . . . , Xk =∏i∈bk

Y i.

By definition,

κ(τ) = cumk(X1, . . . , Xk) =

∑ς≥τ

m(ς,1n)F (ς)

where F (ς) is the moment product over the blocks of ς. Now substitute theexpression F (ς) =

∑σ≤ς f(σ) for moment products in terms of cumulant

products

κ(τ) =∑ς≥τ

m(ς,1n)∑σ≤ς

f(σ)

=∑σ∈Pn

f(σ)∑ς≥τς≥σ

m(ς,1n)

12

Page 13: 1 Cumulants - University of Chicagopmcc/courses/stat306/2017/cumulants.pdfwith rth cumulant r, and let != e2ˇi=n be a primitive nth root of unity. The discrete Fourier combination

=∑σ∈Pn

f(σ)∑

ς≥τ∨σm(ς,1n)

κ(τ) =∑

σ:τ∨σ=1n

∏b∈σ

κb

This result is fundamental for computing means, variances and higher-order cumulants of quadratic forms and higher-order polynomial functionsof random variables. To understand how it works, we list, for various smallintegers n and selected partitions τ ∈ Pn, the subset of partitions σ ∈ Pnsatisfying the connectivity condition σ ∨ τ = 1n.

A short list of some connected set-partitions

τ {σ : σ ∨ τ = 1}

123 123, 12|3, 13|2, 23|1, 1|2|312|3 123, 13|2, 23|11|2|3 123

123|4 1234, 124|3 [3], 14|23 [3], 14|2|3 [3]12|34 1234, 123|4 [4], 13|24 [2], 13|2|4 [4]12|3|4 1234, 134|2 [2], 13|24 [2]

...excluding partitions having a singleton block...123|456 123456, 1234|56 [6], 1245|36 [9], 124|356 [9],

12|34|56 [9], 14|25|36 [6]12|34|56 123456, 1235|46 [12], 123|456 [6], 135|246 [4]

13|25|46 [8]

For a scalar random variable Y having zero mean, it follows from lines 5, 7,8 of the table that

var(Y 2) = κ4 + 2κ22

var(Y 3) = κ6 + 15κ4κ2 + 9κ23 + 15κ3

2

cum3(Y 2) = κ6 + 12κ4κ2 + 10κ23 + 8κ3

2.

1.5 Gaussian moments and the Isserlis-Wick formulae

Suppose that the random vector ψ = (ψ1, . . . , ψn) is zero-mean Gaussian inRn with covariance matrix K whose (i, j) component is Ki,j . We ask forthe joint cumulant of order n of the n squared variables

κ(τ) = cumn(|ψ1|2, . . . , |ψn|2).

13

Page 14: 1 Cumulants - University of Chicagopmcc/courses/stat306/2017/cumulants.pdfwith rth cumulant r, and let != e2ˇi=n be a primitive nth root of unity. The discrete Fourier combination

If the component variables are all equal, ψ1 = · · · = ψn, then |ψ2| is dis-tributed as κ2χ

21. The cumulant generating function is − log(1 − 2κ2ξ)/2,

and the cumulants are cumn(|ψ|2) = 2n−1(n− 1)!κn2 . The moment generat-ing function is (1− 2κ2ξ)

−1/2, so the moment of order n is proportional tothe ascending factorial

(−1/2)↓n(−2κ2)n = (1/2)↑n(2κ2)n =(2n)!κn2

2n n!.

For the more general setting, τ is a partition of the set [2n] of type 2n

τ = 1, 1′|2, 2′| · · · |n, n′

in which [n] and [n]′ are duplicate copies of the same index set, and |ψr|2 givesrise to two matching indices r, r′, distinguished by primes but numericallyequal. Since the only non-zero cumulants are of order two, the expressionfor κ(τ) is a sum over all partitions σ of type 2n such that σ ∨ τ = 12n.

Consider a cyclic permutation π: [n] → [n], with the same permutationalso acting on the second copy π: [n]′ → [n]′, i.e.,

1 7→ π(1) 7→ π2(1) 7→ · · · 7→ πn−1(1) 7→ πn(1) = 1

1′ 7→ π(1′) 7→ π2(1′) 7→ · · · 7→ πn−1(1′) 7→ πn(1′) = 1′

We associate with π a partition σ of [2n] as follows

σ = 1π(1′)|π(1)π2(1′)|π2(1)π3(1′)| · · · |πn−1(1)1′

so that each block contains a pair j, π(j)′ with j ∈ [n] and π(j)′ ∈ [n]′.Clearly, σ ∨ τ = 12n, and the contribution of σ to κ(τ) is

n∏j=1

Kj,π(j).

There are 2n−1 essentially distinct ways of assigning the primes to one el-ement in each block of τ , so the total contribution to the joint cumulantis

κ(τ) = 2n−1∑

π:#π=1

n∏j=1

Kj,π(j) = 2−1 cyp(2K),

where the sum runs over (n− 1)! permutations π: [n]→ [n] having a singlecycle.

14

Page 15: 1 Cumulants - University of Chicagopmcc/courses/stat306/2017/cumulants.pdfwith rth cumulant r, and let != e2ˇi=n be a primitive nth root of unity. The discrete Fourier combination

The expected value of the n-fold product µ(τ) = E∏|ψj |2 is the sum

over partitions of [n] of cumulant products

µ(τ) =∑σ∈Pn

∏b∈σ

κ(τ [b])

=∑σ∈Pn

∏b∈σ

2#b−1 cyp(K[b])

=∑σ∈Πn

2n−#σn∏j=1

κj,σ(j)

= per1/2(2K) = 2n per1/2(K).

This derivation implies that per1/2(K) ≥ 0 for all positive semi-definitesymmetric matrices, and a simple extension shows that perα/2(K) ≥ 0 forall positive integers α.

For any real number α, the α-permanent perα(K) of a square matrix Kof order n is the sum over n! permutations

perα(K) =∑

σ:[n]→[n]

α#σn∏j=1

Kj,σ(j)

cyp(K) =∑

Σ:#σ=1

n∏j=1

Kj,σ(j) = limα→0

α−1 perα(K).

Note that perα(K) is a polynomial of degree n in α; it is homogeneous oftotal degree n in K. In addition, per−1(K) = (−1)n det(K).

In applications to quantum mechanics, ψ arises as a wave function, whichis regarded as a zero-mean complex-valued Gaussian process. Each compo-nent is distributed symmetrically with respect to rotation in the complexplane, and the joint distribution is invariant with respect to scalar multipli-cation by a unit complex number. For this setting, all the algebraic jointmoments and cumulants of ψ are zero. The only non-zero cumulants arethose of the form E(ψrψs) = Krs = Ksr, and higher-order products involv-ing an equal number of conjugated and non-conjugated terms.

The argument given above for Gaussian cumulants applies equally tocomplex-valued Gaussian process. The details are a little simpler in thecomplex case because the label-assignment factor 2n−1 does not arise: [n] isthe set of indices, [n′] is the conjugated copy, and there are no symmetriesfrom switching elements. As a result,

κ(τ) = cumn(|ψ1|2, . . . , |ψn|2) = cyp(K)

µ(τ) =E(|ψ1|2 · · · |ψn|2) = per1(K) = per(K).

15

Page 16: 1 Cumulants - University of Chicagopmcc/courses/stat306/2017/cumulants.pdfwith rth cumulant r, and let != e2ˇi=n be a primitive nth root of unity. The discrete Fourier combination

This derivation implies that per(K) ≥ 0 for all positive semi-definite Her-mitian matrices, and a simple extension shows that perα(K) ≥ 0 for allpositive integers α.

Exercises 1.5

1.5.1 Let ψ be a complex-valued zero-mean process on some space withcovariance function K(x, x′) = E(ψ(x)ψ(x′)). Let x = {x1, . . . , xn} andx′ = {x′1, . . . , x′n} be two subsets of n points in the domain, possibly but notnecessarily equal. Use the matching argument given above to compute thejoint cumulants

cum2n(ψ(x1), . . . , ψ(xn), ψ(x′1), . . . , ψ(x′n)),

cumn(ψ(x1)ψ(x′1), . . . , ψ(xn)ψ(x′n)),

and the joint moment

E(∏x∈x

ψ(x)∏x′∈x′

ψ(x′)).

Both of these are complex numbers.

1.5.2 Explain how the answers to the preceding exercise are modified if ψ isa real-valued Gaussian process.

1.6 Exponential families

2 Generating functions and formal power series

2.1 The vector spaces Seq and Seqn

To any sequence of scalars f = (f1, f2, . . .), in which fn is a scalar, we mayassociate a formal polynomial in a variable t by

f(t) = f1t+ f2t2/2! + f3t

3/3! + · · ·+ fntn/n! + · · ·

Thus f(t) is the generating function for the sequence f ; we say it is a gener-ating function of exponential type because fn is the coefficient of tn/n! in thepolynomial. In particular, the constant sequence with fn = 1 correspondsto the function exp(t)− 1, while the sequence mn = (−1)n−1(n− 1)! corre-sponds to the inverse function m(t) = log(1+ t). Some further examples are

16

Page 17: 1 Cumulants - University of Chicagopmcc/courses/stat306/2017/cumulants.pdfwith rth cumulant r, and let != e2ˇi=n be a primitive nth root of unity. The discrete Fourier combination

given below.

Function Coefficient sequence fn

f(t) (f1, f2, . . .)(1− t)−1 − 1 n!− log(1− t) (n− 1)!exp(t)− 1 1

exp(et − 1)− 1 Bn (Bell number)log(1 + t) (−1)n−1(n− 1)!

(1 + t)α − 1 α↓n = α(α− 1) · · · (α− n+ 1)

Whether f is regarded as a sequence of scalars or as a formal polynomial,the set of such sequences or formal polynomials is denoted by Seq. Tosay that f is a formal polynomial is to imply that certain operations onsequences f, g are carried out as if these were polynomials. The simplestsuch operations are addition and scalar multiplication. Thus if α is a scalar,αf is a sequence whose components are (αf1, αf2, . . .), and the associatedpolynomial is (αf)(t) = αf(t). Likewise, addition operates component-wise,so that

f + g= (f1 + g1, f2 + g2, . . . , fn + gn, . . .)

(f + g)(z) = f(z) + g(z)

The set of sequences with these operations of addition and scalar multipli-cation is a vector space denoted by Seq+. The restriction to finite sequencesof n components is a vector space of dimension n denoted by Seq+

n .

2.2 Composition of series

Let g(ξ) be a formal power series of exponential type in the variable ξ

g(ξ) =∞∑r=1

grξr/r!.

Evidently each monomial g2(ξ), g3(ξ), . . . is also a formal power series. Thecoefficient of ξn/n! in the monomial g2 is

g2(ξ)[ξn/n!] =∑

r+s=n

n!

r! s!grgs

where the sum runs over ordered pairs (r, s) in the square [n]2 subject to therestriction that they add to n. If n is odd, every pair occurs twice, once as

17

Page 18: 1 Cumulants - University of Chicagopmcc/courses/stat306/2017/cumulants.pdfwith rth cumulant r, and let != e2ˇi=n be a primitive nth root of unity. The discrete Fourier combination

(r, s) and once as (s, r); if n is even, every pair occurs twice except for thepair on the diagonal with r = s = n/2. For example, if n = 6, the sum runsover pairs (1, 5), (2, 4), (3, 3), (4, 2), (5, 1) with transpose pairs contributingequally.

For combinatorial accounting purposes, it is more convenient to combinethe symmetric pairs, in effect to ignore the order of the two parts. Insteadof summing over strictly positive integer vectors adding to n (compositionsof n), we combine similar terms and sum over integer partitions having twoparts. For n = 6, the total contribution of the {1, 5} terms is 2! × 6 g1g5,the total contribution of the {2, 4} terms is 2!×15g2g4 and the contributionof the (3, 3) term is 1! × 20g2

3 = 2! × 10g23. The number of partitions of [6]

of type 5 + 1 is 6, the number of type 4 + 2 is 15, and the number of type32 is 10 (not 20). For that reason, is most convenient to write the quadraticmonomial as a sum over partitions of the set [n] into two blocks

g2(ξ)[ξn/n!] =∑

r+s=n

n!

r! s!grgs = 2!

∑σ∈Pn:#σ=2

∏b∈σ

g#b,

where σ is a partition of [n], #σ is the number of elements (blocks), b ∈ σis a block, and #b is the block size.

The extension to higher-order monomials follows the same pattern

g3(ξ)[ξn/n!] =∑

r+s+t=n

n!

r! s! t!grgsgt = 3!

∑σ∈P(3)

n

∏b∈σ

g#b

gk(ξ)[ξn/n!] =∑

r1+···+rk=n

n!

r1! · · · rk!gr1 · · · grk = k!

∑σ∈P(k)

n

∏b∈σ

g#b,

where P(k)n ⊂ Pn is the subset of partitions having k blocks. The first sum

runs over k-part compositions of n, which are strictly positive integer vectorsr = (r1, . . . , rk) whose components add to n. Two compositions consistingof the same components in a different order determine the same integerpartition. If the components of r are distinct integers, there are k! permu-tations giving rise to the same product, and n!/(r1! · · · rk!) is the numberof partitions of the set [n] having k blocks of sizes r1, . . . , rk. Otherwise, ifsome of the components of r are equal, the number of compositions givingrise to a particular product gr1 · · · grk is reduced by the product of the fac-torials of the multiplicities. For example, if n = 10 and k = 4, the pointr = (1, 2, 3, 4) gives rise to the product g1g2g3g4, and there are k! compo-sitions giving rise to the same product. However, if n = 8 and k = 5, the

18

Page 19: 1 Cumulants - University of Chicagopmcc/courses/stat306/2017/cumulants.pdfwith rth cumulant r, and let != e2ˇi=n be a primitive nth root of unity. The discrete Fourier combination

composition r = (2, 2, 2, 1, 1) gives rise to g32g

21, but there are only k!/(3!2!)

similar compositions giving rise to the same product.To each integer partition m = 1m12m2 · · ·nmn having k = m. parts, there

correspond k!/(m1!m2! · · ·) compositions with parts r1, . . . , rk in some order.In the expansion of gk(ξ), each composition r of n has a combinatorial factorn!/r!. This means that the product associated with the integer partition m

gm = gm11 gm2

2 · · · gmnnhas a combinatorial factor

n!

r1! · · · rk!× k!

m1! · · ·mn!=

n! k!∏j j!

mj mj !,

which is k! times the number of set partitions of [n] corresponding to thegiven integer partition. In other words, these awkward combinatorial factorsare automatically accommodated by summation over set partitions:

gk(ξ)

k!=∑n≥k

ξn

n!

∑σ∈P(k)

n

∏b∈σ

g#b.

Now let f be another formal power series of exponential type. The com-positional product fg is a sequence whose polynomial is f(g(ξ)). Note thatfg is ordinarily different from gf , so the product operation is not commu-tative. However, it is associative, so fgh is well defined. By definition, theseries expansion of the composition (fg)(ξ) is a linear combination of themonomials gk(ξ)/k! with coefficients fk, giving

(fg)(ξ) =∞∑k=1

fkgk(ξ)

k!

=∞∑n=1

ξn

n!

∑σ∈Pn

f#σ

∏b∈σ

g#b

(fg)n =∑σ∈Pn

f#σ

∏b∈σ

g#b.

With this operation, the space of sequences or formal exponential generatingfunctions is a vector space with a non-commutative compositional product(f, g) 7→ fg that is linear in the first argument but not in the second.

Exercises 2.2

2.2.1 Use Faa di Bruno’s formula to derive the coefficient (fg)n in the Taylorexpansion of the composition.

19

Page 20: 1 Cumulants - University of Chicagopmcc/courses/stat306/2017/cumulants.pdfwith rth cumulant r, and let != e2ˇi=n be a primitive nth root of unity. The discrete Fourier combination

2.3 Inverse function

The unit sequence e = (1, 0, 0, . . .), i.e., e(ξ) = ξ is the compositional identitysatisfying eg = ge = g for every g. If g is given with g1 6= 0, the composi-tional equation fg = e has a left-inverse solution, satisfying f1 = g−1

1 , andrecursively

fn = −g−n1

∑1≤k<n

∑σ∈P(k)

n

fk∏b∈σ

g#b

for n > 1. This produces f1 = g−11 ,

f2g31 =−g2,

f3g51 =−g1g3 + 3g2

2

f4g71 =−g4g

21 + 10g1g2g3 − 15g3

2

f5g91 =−g5g

31 + 15g4g2g

21 + 10g2

3g21 − 105g3g

22g1 + 105g4

2

f6g111 =−g6g

41 + 21g5g2g

31 + 35g4g3g

31 − 210g4g

22g

21 − 280g2

3g2g21 + 1260g3g

32g1 − 945g5

2

Conversely, the equation gf = e has a right-inverse solution satisfying f1 =g−1

1 , and subsequently

fn = −g−11

∑1<k≤n

∑σ∈P(k)

n

gk∏b∈σ

f#b

for n > 1. This sequence produces the same solution, which is

fkgk1 =

k−1∑ν=1

(−1)νg−ν1

∑σ∈P(?ν)

k+ν−1

∏b∈σ

g#b

for k > 1, where the sum runs over partitions of [k + ν − 1] having ν non-singleton blocks. (Why?)

Note that if gk = 1 for every k, then f = (1,−1, 2,−6, 24,−120, ...), andthe extension to general k ≥ 1 is fk = (−1)k−1(k − 1)!; conversely, if fk = 1for every k, then gk = (−1)k−1(k − 1)!.

For f(ξ) = log(1 + ξ), the coefficients are fn = (−1)n−1(n−1)!, in whichcase

log(1 + g(ξ))[ξn/n!] =∑σ∈Pn

(−1)n−1(n− 1)!∏b∈σ

g#b

is the expression for the cumulant of order n in terms of moment productsof total order n. Conversely, for f(ξ) = exp(ξ) − 1 with fn = 1, we obtainthe expression for the moment of order n in terms of cumulant products oftotal order n.

20

Page 21: 1 Cumulants - University of Chicagopmcc/courses/stat306/2017/cumulants.pdfwith rth cumulant r, and let != e2ˇi=n be a primitive nth root of unity. The discrete Fourier combination

2.4 The vector spaces Seq? and Seq?n.

Consider now a different operation (convolution) on the space of sequences,one that is more readily understood as an operation on polynomials than asan operation on sequences. For any scalar α and sequences f, g, define thesequences f ? g and α ? g by

(f ? g)(ξ) = f(ξ) + g(ξ) + f(ξ)g(ξ),

(α ? g)(ξ) = (1 + g(ξ))α − 1.

Equivalently, 1+(f ?g)(ξ) = (1+f(ξ))(1+g(ξ)), showing that this operationis commutative: f ? g = g ? f . In addition 1 ? f = f and f ? f = 2 ? f forevery f , the zero vector is the zero sequence, and so on.

In terms of the augmented sequences with f0 = g0 = 1, the componentsof f ? g and α ? g are

(f ? g)n =n∑j=0

(n

j

)fjgn−j

(α ? g)n =∑σ∈Pn

α↓#σ∏b∈σ

g#b

where σ ∈ Pn is a partition of [n] containing #σ blocks, and for each blockb ∈ σ, #b is the number of elements. The descending factorial productα↓r = α(α − 1) · · · (α − r + 1) is the rth Taylor coefficient associated withthe function (1 + ξ)α − 1.

The space of sequences with these operations is a vector space denoted bySeq?, the vector-space properties being more obvious from the polynomialrepresentation than the sequence representation. The restriction to finitesequences of n components, or to polynomials of degree less than or equalto n, is an n-dimensional vector space denoted by Seq?n, which may beidentified with the subspace of Seq∗ having zero components for r > n. InSeq∗n the product 1 + (f ? g)(ξ) is the restriction of the polynomial product(1 + f(ξ))(1 + g(ξ)) to terms of degree ≤ n, so that (f ? g)r = 0 for r > n.Likewise for α?f . This restriction by polynomial degree is a linear projectionSeq? → Seq?n, i.e., it commutes with vector-space operations.

By a homomorphism, we mean a linear transformation T : Seq? → Seqthat also acts on the finite-dimensional restrictions, i.e. T Seq?n ⊂ Seqn.The set of such linear transformations is itself a vector space, closed underaddition and scalar multiplication. It is readily seen that the logarithmictransformation acting on polynomials 1+f(ξ) 7→ log(1+f(ξ)) carries polyno-mials in Seq? to polynomials in Seq, and also carries Seq?n to Seqn preserving

21

Page 22: 1 Cumulants - University of Chicagopmcc/courses/stat306/2017/cumulants.pdfwith rth cumulant r, and let != e2ˇi=n be a primitive nth root of unity. The discrete Fourier combination

vector-space operations. It is also apparent that there is essentially only onesuch transformation. Thus Hom(Seq?,Seq) is the one-dimensional space oflinear transformations that are scalar multiples of

(Tg)1 = g1

(Tg)2 = g2 − g21

(Tg)3 = g3 − 3g2g1 + 2g31

(Tg)n =∑σ∈Pn

m#σ

∏b∈σ

g#b

where mr = (−1)r−1(r − 1)! is the coefficient defining the logarithmic func-tion m(t) = log(1+t). This T is of course, the transformation that generatescumulants from moments. Despite the occurrence of multiplicative terms,T is a linear transformation Seq? → Seq on vector spaces.

3 Approximation of distributions

3.1 Edgeworth approximation

3.2 Saddlepoint approximation

4 Samples and sub-samples

A function f :Rn → R is symmetric if f(x1, . . . , xn) = f(xP(1), . . . , xP(n))for each permutation P of the arguments. For example, the total Tn =x1 + · · · + xn, the average Tn/n, the min, max and median are symmetricfunctions, as are the sum of squares Sn =

∑x2i , the sample variance s2

n =(Sn−T 2

n/n)/(n−1) and the mean absolute deviation∑|xi−xj |/(n(n−1)).

A vector x in Rn is an ordered list of n real numbers (x1, . . . , xn)or a function x: [n] → R where [n] = {1, . . . , n}. For m ≤ n, a 1–1function ϕ: [m] → [n] is a sample of size m, the sampled values beingxϕ = (xϕ(1), . . . , xϕ(m)). All told, there are n(n − 1) · · · (n − m + 1) dis-tinct samples of size m that can be taken from a list of length n. A sequenceof functions fn:Rn → R is consistent under subsampling if, for each fm, fn,

fn(x) = aveϕ fm(xϕ),

where aveϕ denotes the average over samples of size m. For m = n, thiscondition implies only that fn is a symmetric function.

Although the total and the median are both symmetric functions, neitheris consistent under subsampling. For example, the median of the numbers

22

Page 23: 1 Cumulants - University of Chicagopmcc/courses/stat306/2017/cumulants.pdfwith rth cumulant r, and let != e2ˇi=n be a primitive nth root of unity. The discrete Fourier combination

(0, 1, 3) is one, but the average of the medians of samples of size two is 4/3.However, the average xn = Tn/n is sampling consistent. Likewise the samplevariance s2

n =∑

(xi − x)2/(n− 1) with divisor n− 1 is sampling consistent,but the mean squared deviation

∑(xi− xn)2/n with divisor n is not. Other

sampling consistent functions include Fisher’s k-statistics, the first few ofwhich are k1,n = xn, k2,n = s2

n for n ≥ 2,

k3,n = n∑

(xi − xn)3/((n− 1)(n− 2))

k4,n =

defined for n ≥ 3 and n ≥ 4 respectively.For a sequence of independent and identically distributed random vari-

ables, the k-statistic of order r ≤ n is the unique symmetric function suchthat E(kr,n) = κr. Fisher (1929) derived the variances and covariances.The connection with finite-population sub-sampling was developed by Tukey(1954).

23