SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3...

78
SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) Dr Haziq Jamil FOS M1.09 Universiti Brunei Darussalam Semester II 2019/20

Transcript of SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3...

Page 1: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

SM-4331 Advanced StatisticsChapter 3 (Important Univariate and Multivariate

Distributions)

Dr Haziq JamilFOS M1.09

Universiti Brunei Darussalam

Semester II 2019/20

Page 2: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

Outline

1 χ2-distribution

2 Student’s t-distribution

3 F -distribution

4 Multivariate distributionsBivariate distributionsMultivariate distributions

5 Multinomial and categorical distributionMultinomial distributionCategorical distribution

6 Multivariate normal distribution

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 0 / 70

Page 3: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

χ2-distribution »

χ2-distribution

The χ2-distribution is an important distribution in statistics. It is closelylinked with the normal, Student’s t and F distributions. Inference for thevariance parameter σ2 relies on χ2-distributions. More importantly, mostgoodness-of-fit tests are based on χ2-distributions.

Definition 1 (χ2-distribution)

Let Z1, . . . ,Zkiid∼ N(0, 1), i.e. each Zi has pdf f (zi ) = (2π)−1/2e−z

2i /2 for

i = 1, . . . , k . Then,

X = Z 21 + · · ·+ Z 2

k =k∑

i=1

Z 2i

follows a χ2-distribution with k degrees of freedom. We write X ∼ χ2k .

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 1 / 70

Page 4: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

χ2-distribution »

χ2-distribution (cont.)

RemarkOut of curiosity, the pdf of a χ2

k distribution is f (x) = Cxk/2−1e−x/2,where the normalising constant C is equal to 2−k/2Γ−1(k/2) (Γ(·) is thegamma function). The form of the pdf is less important to know than thedefinition of χ2

k distribution given in Definition 1.

Here are some important properties of the χ2k distribution.

1. X has support over [0,∞).2. E(X ) = k .3. Var(X ) = 2k .4. If X1 ∼ χ2

k1and X2 ∼ χ2

k2, and X1 ⊥ X2, then X1 + X2 ∼ χ2

k1+k2.

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 2 / 70

Page 5: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

χ2-distribution »

χ2-distribution (cont.)

Proof.Prove properties 2–4 as an exercise.

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 3 / 70

Page 6: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

χ2-distribution »

χ2-distribution (cont.)

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 4 / 70

Page 7: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

χ2-distribution »

Probabilities tables for the χ2-distribution

Probabilities such as

P(χ2k ≤ x) =

∫ x

0fX (x) dx

where fX is the pdf of χ2k cannot be found in closed form. It is calculated

using computer approximations for the integral above. In R, use pchisq().

Alternatively, statistical tables are used. You will find tables for percentilesof the χ2 distribution. That is, you are able to find the value of x := χ2

k(A)such that

P(χ2k ≤ x) =

∫ x

0fX (x) dx = A

for various values of A and k .

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 5 / 70

Page 8: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

χ2-distribution »

Example

Example 2

Let Y1, . . . ,Yniid∼ N(µ, σ2). Then,

Zi =Yi − µσ

∼ N(0, 1).

Hence,1σ2

n∑i=1

(Yi − µ)2 =n∑

i=1

Z 2i ∼ χ2

n.

Note that

1σ2

n∑i=1

(Yi − µ)2 =1σ2

n∑i=1

(Yi − Yn)2 +n

σ2 (Yn − µ)2. (1)

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 6 / 70

Page 9: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

χ2-distribution »

Example (cont.)

Example 2Since Yn ∼ N(µ, σ2/n), it must be that

n

σ2 (Yn − µ)2 ∼ χ21.

It can also be proved (see Exercise Sheet 3) that

1σ2

n∑i=1

(Yi − Yn)2 ∼ χ2n−1.

Thus, the decomposition in (1) may formally be written as

χ2n = χ2

n−1 + χ21

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 7 / 70

Page 10: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

1 χ2-distribution

2 Student’s t-distribution

3 F -distribution

4 Multivariate distributions

5 Multinomial and categorical distribution

6 Multivariate normal distribution

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 7 / 70

Page 11: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

Student’s t-distribution »

Student’s t-distribution

This is another important distribution in statistics, because:• The t-test is perhaps the most frequently used statistical test in

application.• Confidence intervals for normal mean with unknown variance may be

accurately constructed based on the t-distribution.

Historical note: The t-distribution was first studied by the EnglishmanWilliam Sealy Gosset (1876-1937), who worked as a statistician forGuinness, writing under the pen-name “Student”.

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 8 / 70

Page 12: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

Student’s t-distribution »

Student’s t-distribution (cont.)

Definition 3Suppose• Z ∼ N(0, 1),• X ∼ χ2

k , and• X ⊥ Z , i.e. X and Z are independent.

Then, the distribution of the random variable

T =Z√X/k

is called the t-distribution with k degrees of freedom. We write T ∼ tk .

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 9 / 70

Page 13: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

Student’s t-distribution »

Student’s t-distribution (cont.)

RemarkThe pdf for T ∼ tk is given by

f (t) ∝(1 +

t2

k

)− k+12

,

but once again the actual form of the pdf is not as important as thedefinition of the t-distribution.

Some important properties of the t-distribution:1. T is continuous and symmetric over (−∞,∞).2. E(T ) = 0, provided E(|T |) <∞ (k > 1).3. Var(T ) = k

k−2 .4. Technically, k ∈ R, but we will usually deal with k ∈ Z > 0.

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 10 / 70

Page 14: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

Student’s t-distribution »

Student’s t-distribution (cont.)

5. tkD−→ N(0, 1) as k →∞.

Proof.

If X ∼ χ2k , then by definition X = Z 2

1 + · · ·+ Z 2k , where Zi

iid∼ N(0, 1). Bythe LLN,

X

k=

Z 21 + · · ·+ Z 2

k

kP−→ E(Z 2

1 ) = 1.

as k →∞. Therefore,√X/k

P−→ 1, and in particular,

T =Z√X/k

D−→ N(0, 1)

following Slutzky’s theorem.

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 11 / 70

Page 15: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

Student’s t-distribution »

Student’s t-distribution (cont.)

6. The t-distribution has heavy tails. That is, if T ∼ tk , E(|T |k) =∞.Comparing this to the normal distribution: X ∼ N(µ, σ2),E(|X |k) <∞ for any k > 0.

RemarkThis ‘heavy-tails’ property is a useful property in modelling abnormalphenomena or outliers (e.g. in financial or insurance data). C.f. “robuststatistics”.

Explore the t-distribution vs normal distribution here:https://eripoll12.shinyapps.io/t_Student/

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 12 / 70

Page 16: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

Student’s t-distribution »

Student’s t-distribution (cont.)

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 13 / 70

Page 17: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

Student’s t-distribution »

An important property of normal samples

Theorem 4 (An important property of normal samples)

Let {X1, . . . ,Xn} be a sample from N(µ, σ2). Let

X =1n

n∑i=1

Xi , s2 =1

n − 1

n∑i=1

(Xi − X )2, and SE(X ) = s/√n.

Then,i. X ∼ N(µ, σ2/n)

ii. (n − 1)s2/σ2 ∼ χ2n−1

iii. X ⊥ s2

iv.√n(X−µ)

s = X−µSE(X )

∼ tn−1

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 14 / 70

Page 18: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

Student’s t-distribution »

An important property of normal samples (cont.)

Proof.i. follows directly from properties of normal distributions, and earlier we“proved” 1

σ2

∑ni=1(Xi − X )2 ∼ χ2

n−1 which implies ii.

Consider any Xj , j ∈ {1, . . . , n} and Cov(Xj − X , X ):

Cov(Xj − X , X ) = Cov(Xj , X )− Cov(X , X )

= Cov

(Xj ,

1n

n∑i=1

Xi

)− Var(X )

=1n

n∑i=1

Cov(Xj ,Xi )− σ2/n

= σ2/n − σ2/n = 0

Since the covariance is zero and they are normal, they are independent.

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 15 / 70

Page 19: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

Student’s t-distribution »

An important property of normal samples (cont.)

Proof.Following this, if X is independent of Xj − X for any j , it stands to reasonthat X is also independent of X = (X1 − X , . . . ,Xn − X )>, and also of

X>X =(X1 − X · · · Xn − X

)X1 − X...

Xn − X

=n∑

i=1

(Xi − X )2 = (n − 1)s2,

and thus also of s2.

Remark: we used the fact that if X ⊥ Yi , then g(X ) ⊥ g(Yi ), and alsog(X ) ⊥ {g(Y1) + · · ·+ g(Yn)}.

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 16 / 70

Page 20: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

Student’s t-distribution »

An important property of normal samples (cont.)

Proof.Finally, putting everything together,

N(0,1)︷ ︸︸ ︷√n(X − µ)/σ√√√√√ χ2n−1︷ ︸︸ ︷(n − 1)s2/σ2

n−1

=X − µs/√n

=X − µSE(X )

∼ tn−1.

RemarkThis is why for normal distributions where σ2 is unknown, and is estimatedby the unbiased sample variance s2, the standardised sample mean followsa t-distribution! This gives rise to the t-test.

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 17 / 70

Page 21: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

1 χ2-distribution

2 Student’s t-distribution

3 F -distribution

4 Multivariate distributions

5 Multinomial and categorical distribution

6 Multivariate normal distribution

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 17 / 70

Page 22: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

F -distribution »

F -distribution

The F -distribution is another notable distribution in statistics. It commonlyarises as the null distribution of a test statistic, particularly in the analysisof variance (ANOVA).

Definition 5Let X1 ∼ χ2

k1and X2 ∼ χ2

k2. Then, the distribution of

Y =X1/k1

X2/k2

is called the F -distribution with (k1, k2) degrees of freedom. We writeY ∼ Fk1,k2 .

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 18 / 70

Page 23: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

F -distribution »

F -distribution (cont.)

RemarkNot even going to bother writing down the pdf! See for yourself:https://en.wikipedia.org/wiki/F-distribution. Remember thedefinition, though.

Some important properties of the F -distribution:1. Y is continuous and has support over [0,∞), provided k1 > 1.2. E(Y ) = k2

k2−2 , provided k2 > 2.

3. Var(Y ) =2k22 (k1+k2−2)

k1(k2−2)2(k2−4), provided k2 > 4.

4. Technically, k1, k2 ∈ R>0, but we will usually deal with k1, k2 ∈ Z > 0.

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 19 / 70

Page 24: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

F -distribution »

F -distribution (cont.)

5. If Y ∼ Fk1,k2 , then Y−1 ∼ Fk2,k1 .6. If T ∼ tk , then T 2 ∼ F1,k .

Proof.Exercise: Prove properties 5 and 6.

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 20 / 70

Page 25: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

F -distribution »

F -distribution (cont.)

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 21 / 70

Page 26: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

F -distribution »

The analysis of variance

The ANOVA, despite its name, is a (collection of) methods used to analysedifferences among group means in a sample. The ANOVA was developedby Sir Ronald Fisher.

Example 6Let Xij ∼ N(µj , σ

2), i = 1, . . . , nj and j = 1, . . . ,m with both µj and σ2

unknown. Let n =∑m

j=1 nj be the total sample size. Define the grandmean and the respective group means to be X = n−1∑

i ,j Xij andXj = n−1

j

∑nji=1 Xij respectively. The “total sum of squares” is

S =∑i ,j

(Xij − X )2

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 22 / 70

Page 27: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

F -distribution »

The analysis of variance

Example 6The total sum of squares can be decomposed into

S =

S1︷ ︸︸ ︷∑i ,j

(Xij − Xj)2 +

S2︷ ︸︸ ︷∑j

nj(Xj − X )2

where S1 is called the “within sum of squares” (how much variation amongindividuals in each group) and S2 the “ between sum of squares” (howmuch variation in the mean among groups).

There is the concept of “degrees of freedom”: n − 1 in the TSS, m − 1 inthe BSS, and therefore n −m in the WSS.

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 23 / 70

Page 28: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

F -distribution »

The analysis of variance

Example 6This gives rise to the ANOVA table:

Source SS d.f. MSS F-statistic

Between∑

j nj(Xj − X )2 m − 1∑

j nj (Xj−X )2

m−1

∑j nj (Xj−X )2/(m−1)∑i,j (Xij−Xj )2/(n−m)

Within∑

i ,j(Xij − Xj)2 n −m

∑i,j (Xij−Xj )

2

n−m

Total∑

i ,j(Xij − X )2 n − 1

What is the distribution of F?

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 24 / 70

Page 29: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

F -distribution »

The analysis of variance

Example 6We have seen that

1σ2

∑i ,j

(Xij − X )2 ∼ χ2n−1.

In fact, we can also show similarly that

1σ2

∑i ,j

(Xij − Xj)2 ∼ χ2

n−m.

Using these two facts, we deduce that

1σ2

∑j

nj(Xj − X )2 ∼ χ2m−1

from the property of χ2-distributions.

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 25 / 70

Page 30: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

F -distribution »

The analysis of variance

Example 6So now,

F =

���1/σ2

χ2m−1︷ ︸︸ ︷∑j

nj(Xj − X )2/(m − 1)

���1/σ2

χ2n−m︷ ︸︸ ︷∑i ,j

(Xij − X )2/(n −m)

is a ratio of two χ2-distributions, which means that F follows anF -distribution with (m − 1, n −m) degrees of freedom.

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 26 / 70

Page 31: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

F -distribution »

The analysis of variance

Stop to thinkWhat happens when there are only two groups (m = 2)?• What is the distribution of χ2

1 equal to?• What is the distribution of the test statistic F?• What is the distribution of

√F?

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 27 / 70

Page 32: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

1 χ2-distribution

2 Student’s t-distribution

3 F -distribution

4 Multivariate distributions

5 Multinomial and categorical distribution

6 Multivariate normal distribution

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 27 / 70

Page 33: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

Multivariate distributions » Bivariate distributions

Bivariate distributions

For a pair of r.v. (X ,Y ), the joint pdf fX ,Y (x , y) for (X ,Y ) is a probabilitydistribution that gives the probability that each of X and Y falls in anyparticular range or set of values specified for these variables.

The marginal distributions may be obtained by

fX (x) =

{∑y fX ,Y (x , y) if Y is discrete∫

y fX ,Y (x , y) dy if Y is continuous

fY (y) =

{∑x fX ,Y (x , y) if X is discrete∫

x fX ,Y (x , y) dx if X is continuous

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 28 / 70

Page 34: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

Multivariate distributions » Bivariate distributions

Bivariate distributions (cont.)

The joint cdf is defined as

FX ,Y (x , y) = P(X ≤ x ,Y ≤ y).

From this, the marginal distributions (cdf) are given by

FX (x) = P(X ≤ x ,Y ≤ ∞) = FX ,Y (x ,∞)

andFY (y) = P(Y ≤ ∞,Y ≤ y) = FX ,Y (∞, y)

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 29 / 70

Page 35: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

Multivariate distributions » Bivariate distributions

Bivariate distributions (cont.)

To be even more specific, we define the joint pdf as follows:

Definition 7 (Joint bivariate pdf)In the discrete case, i.e. X and Y are two discrete r.v., the jointprobability mass function (pmf) is

pX ,Y (x , y) = P(X = x ,Y = y)

In the continuous case, i.e. X and Y are two continuous r.v., the jointprobability density function (pdf) is

fX ,Y (x , y) =∂2

∂x∂yFX ,Y (x , y).

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 30 / 70

Page 36: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

Multivariate distributions » Bivariate distributions

Properties of bivariate distributions

1. Since the joint pdf is a probability distribution, it must satisfy∑x

∑y

P(X = x ,Y = y) = 1

in the discrete case, and∫x

∫yfX ,Y (x , y) dx dy = 1

in the continuous case.

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 31 / 70

Page 37: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

Multivariate distributions » Bivariate distributions

Properties of bivariate distributions (cont.)

2. We can write the pmf/pdf in terms of conditional distributions. Forthe discrete case, this is

pX ,Y (x , y) = P(Y = y |X = x) P(X = x)

= P(X = x |Y = y) P(Y = y).

For the continuous case, this is

fX ,Y (x , y) = fY |X (y |x)fX (x)

= fX |Y (x |y)fY (y).

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 32 / 70

Page 38: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

Multivariate distributions » Bivariate distributions

Properties of bivariate distributions (cont.)

3. Though we won’t be looking at these, it is possible to have a “mixed”joint pdf/pmf, i.e. X is continuous and Y is discrete, or the other wayaround. The joint pdf may be written

fX ,Y (x , y) = fX |Y (x |y) P(Y = y)

= P(Y = y |X = x)fX (x)

and its joint cdf by

FX ,Y (x , y) =∑y≤y

∫ x

−∞fX ,Y (x , y) dx

One common example is when using logistic regression in predictingprobability of a binary outcome, conditional on the value of acontinuously distributed outcome.

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 33 / 70

Page 39: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

Multivariate distributions » Bivariate distributions

Properties of bivariate distributions (cont.)

4. The covariance between X and Y , denoted Cov(X ,Y ), is

Cov(X ,Y ) = E [(X − EX )(Y − EY )]

= E(XY )− EX · EY

5. The correlation between X and Y , denoted ρXY , is

ρXY =Cov(X ,Y )√VarX · VarY

Stop to think• What is the covariance between a r.v. X and itself?• What possible values can ρ take?

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 34 / 70

Page 40: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

Multivariate distributions » Bivariate distributions

Properties of bivariate distributions (cont.)

6. Two r.v. X and Y are independent if and only if the joint cdfsatisfies

FX ,Y (x , y) = FX (x) · FY (y).

As for their pmf/pdf,

pX ,Y (x , y) = P(X = x ,Y = y) = P(X = x) · P(Y = y)

in the discrete case, and

fX ,Y (x , y) = fX (x) · fY (y)

in the continuous case.

RemarkX ⊥ Y ⇒ Cov(X ,Y ) = 0, but not necessarily the other way around.

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 35 / 70

Page 41: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

Multivariate distributions » Bivariate distributions

Discrete bivariate distributions

If X takes discrete values x1, . . . , xm and Y takes discrete values y1, . . . , yn,their joint pmf may be presented in table:

y1 y2 · · · yn

x1 p11 p12 · · · p1n p1·x2 p21 p22 · · · p2n p2·...

......

. . ....

...xm pm1 pm2 · · · pmn pm·

p·1 p·2 · · · p·n

where pij = P(X = xi ,Y = yj), and pi· and p·j are the marginalprobabilities of X and Y respectively.

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 36 / 70

Page 42: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

Multivariate distributions » Bivariate distributions

Discrete bivariate distributions (cont.)

So from the previous slides we know that

p·j =n∑

i=1

pij

pi· =m∑j=1

pij

and of course,∑n

i=1∑m

j=1 pij = 1.

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 37 / 70

Page 43: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

Multivariate distributions » Bivariate distributions

Examples

Example 8Flip a fair coin two times. Let X = 1 if it is heads in the first flip, andX = 0 if it is tails. Let Y = 1 if the outcomes in the two flips are thesame, and Y = 0 if the two outcomes are different. The joint probabilityfunction is

Y = 1 Y = 0

X = 1 1/4 1/4X = 0 1/4 1/4

It is easy to see that X and Y are independent (although we had notassumed this).

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 38 / 70

Page 44: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

Multivariate distributions » Bivariate distributions

Examples (cont.)

Example 9Consider a uniform distribution on the unit square [0, 1]× [0, 1]. It has pdfgiven by

f (x , y) =

{1 0 ≤ x ≤ 1, 0 ≤ y ≤ 10 otherwise

This is a well-defined pdf, as f ≥ 0 and∫ ∫

f (x , y) dx dy = 1. It is alsoeasy to see that X and Y are independent.

Find P(X < 1/2,Y < 1/2) and P(X + Y < 1).

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 39 / 70

Page 45: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

Multivariate distributions » Bivariate distributions

Examples (cont.)

Example 9

P(X < 1/2,Y < 1/2) =

∫ 1/2

0

∫ 1/2

0dx dy

=[[xy ]

1/20

]1/20

= 1/4.

Note that the set {x + y < 1} corresponds to {0 < y < 1, 0 < x < 1− y}.

P(X + Y < 1) =

∫ 1

0dy

∫ 1−y

0dx

=

∫ 1

0dy [x ]1−y0

=

∫ 1

0(1− y) dy =

[y − y2/2

]10 = 1/2.

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 40 / 70

Page 46: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

Multivariate distributions » Bivariate distributions

Conditional distributions

If X and Y are not independent, knowing X should be helpful indetermining Y , as X may carry some information on Y . Therefore itmakes sense to define the distribution of Y given, say X = x . This is theconcept of conditional distributions.

If X and Y are discrete, then you have probably seen that

P(Y = y |X = x) =P(Y = y ,X = x)

P(X = x)=

P(X = x |Y = y) P(Y = y)

P(X = x)

However, this definition does not extend to continuous random variables,because the probability of a single point has mass zero.

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 41 / 70

Page 47: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

Multivariate distributions » Bivariate distributions

Conditional distributions (cont.)

Definition 10For continuous r.v. X and Y , the conditional pdf of Y given X = x is

fY |X (y |x) =fX ,Y (x , y)

fX (x)

• As a function of y , fY |X (y |x) is a pdf, keeping the value of Xconstant at x :

P(Y ∈ A|X = x) =

∫AfY |X (y |x) dy

• Both the conditional mean E(Y |X = x) and conditional varianceVar(Y |X = x) are functions of x :

E(Y |X = x) =

∫yfY |X (y |x) dy

Var(Y |X = x) =

∫ (y E(Y |X = x)

)2fY |X (y |x) dy

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 42 / 70

Page 48: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

Multivariate distributions » Bivariate distributions

Conditional distributions (cont.)• If X and Y are independent, then fY |X (y |x) = fY (y)• Note that

fY |X (y |x)fX (x) = fX ,Y (x , y) = fX |Y (x |y)fY (y)

which offers alternative ways to determine the joint pdf.• For any r.v. X and Y ,

EX

(E(Y |X )

)= E(Y )

Proof.

E(

E(Y |X ))

=

∫ {∫y fY |X (y |x) dy

}fX (x) dx

=

∫ ∫y fX ,Y (x , y) dx dy

=

∫y f (y) dy = E(Y )

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 43 / 70

Page 49: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

Multivariate distributions » Bivariate distributions

Conditional distributions (cont.)

Example 11Let fX ,Y (x , y) = e−y for 0 < x < y <∞, and 0 otherwise. Find fY |X (y |x)and fX |Y (x |y).

We need to find the marginals first:

fX (x) =

∫ ∞x

e−y dy = e−x , 0 < x <∞

fY (y) =

∫ y

0e−y dx = ye−y , 0 < y <∞

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 44 / 70

Page 50: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

Multivariate distributions » Bivariate distributions

Conditional distributions (cont.)

Example 11Thus,

fY |X (y |x) =fX ,Y (x , y)

fX (x)=

e−y

e−x= ex−y , x < y <∞

fX |Y (x |y) =fX ,Y (x , y)

fY (y)=

e−y

ye−y= y−1, 0 < x < y

Note that given Y = y , X |(Y = y) ∼ Unif(0, y).

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 45 / 70

Page 51: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

Multivariate distributions » Multivariate distributions

Multivariate distributions

The bivariate results we saw earlier can be extended to more than twovariables, resulting in multivariate distributions.

Let X = (X1, . . . ,Xn)> be a random vector consisting of n r.v.. The jointcdf is defined as

F (x1, . . . , xn) = P(X1 ≤ x1, . . . ,Xn ≤ xn)

If X is continuous, the joint pdf satisfies

fX ,Y (x , y) =∂n

∂x1 · · · ∂xnF (x1, . . . , xn).

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 46 / 70

Page 52: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

Multivariate distributions » Multivariate distributions

Multivariate distributions (cont.)

• In general, the pdf admits the factorisation

f (x1, . . . , xn) = f (x1)f (x2, . . . , xn|x1)

= f (x1)f (x2|x1)f (x3, . . . , xn|x1, x2)

...= f (x1)f (x2|x1)f (x3|x1, x2) · · · f (xn|x1, . . . , xn−1)

where f (xj |x1, . . . , xj−1) denotes the conditional pdf of Xj givenX1 = x1, . . . ,Xj−1 = xj−1.

• On the other hand, X1, . . . ,Xn are independent if and only if

f (x1, . . . , xn) = f (x1)f (x2) · · · f (xn)

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 47 / 70

Page 53: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

Multivariate distributions » Multivariate distributions

Random samples

• We will often say “X1, . . . ,Xn is a random sample from a distributionwith pdf f ”. Without specifying independence, we should not assumethat it is an iid sample.• Thus, when doing inference, we should work with the joint pdf

f (x1, . . . , xn). Several multivariate distributions will be discussed inthe next section.• On the other hand, if X1, . . . ,Xn

iid∼ f (x), then and only then

f (x) = f (x1, . . . , xn) =n∏

i=1

f (xi )

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 48 / 70

Page 54: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

1 χ2-distribution

2 Student’s t-distribution

3 F -distribution

4 Multivariate distributions

5 Multinomial and categorical distribution

6 Multivariate normal distribution

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 48 / 70

Page 55: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

Multinomial and categorical distribution » Multinomial distribution

Multinomial distribution

The multinomial distribution is an extension of the binomial distribution.Suppose we threw an k-sided die n times, and we recorded Xi , the numberof times the ith side turned up (i = 1, . . . , k). Then X = (X1, . . . ,Xk)>

follows a multinomial distribution.

Definition 12 (Multinomial distribution)

Let X = (X1, . . . ,Xk)> ∼ Mult(n, p1, . . . , pk). Then, the pmf of X is givenby

f (x1, . . . , xk) =n!

x1!x2! · · · xk !px11 px22 · · · p

xkk

Here, pj is the probability of success associated with event Xj .

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 49 / 70

Page 56: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

Multinomial and categorical distribution » Multinomial distribution

Multinomial distribution (cont.)

• Obviously, each pj ≥ 0, and that∑k

j=1 pj = 1.

• We also observe that if out of the n trials, Xj is the number of“success” associated with the jth outcome, then

I X1 + · · ·+ Xk = n, and therefore, the Xjs are not independent.I Xj ∼ Bin(n, pj), and hence E(Xj) = npj and Var(Xj) = npj(1− pj).

• We can show that Cov(Xj ,Xj ′) = −npjpj ′ (see Exercise 3).

• Therefore,

E(X) = np ∈ Rk

Var(X) = n[diag(p)− pp>

]∈ Rk×k

where p = (p1, . . . , pk)>.

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 50 / 70

Page 57: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

Multinomial and categorical distribution » Multinomial distribution

Multinomial distribution (cont.)

Expanding the expectation vector and variance-covariance matrix in full:

E(X) =

np1...

npk

Var(X) =

np1(1− p1) −np1p2 · · · −np1pk−np1p2 np2(1− p2) · · · −np2pk

......

. . ....

−np1pk −np2pk · · · npk(1− pk)

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 51 / 70

Page 58: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

Multinomial and categorical distribution » Multinomial distribution

Multinomial distribution (cont.)

• Marginal distributions after collapsing categories are also multinomial.E.g. (X1, . . . ,X5)> ∼ Mult(n, p1, . . . , p5), then(X1 + X2,X3 + X4,X5)> ∼ Mult(n, p1 + p2, p3 + p4, p5).

• Conditional distributions are also multinomial.E.g.(X1, . . . ,X5)> ∼ Mult(n, p1, . . . , p5), then

(X1,X2)>|(X3 = x3,X4 = x4,X5 = x5)

∼ Mult

(n − x3 − x4 − x5,

p1

p1 + p2,

p2

p1 + p2

)

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 52 / 70

Page 59: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

Multinomial and categorical distribution » Multinomial distribution

Multinomial distribution (cont.)

Example 13Suppose that in a three-way election for a large country, candidate Areceived 20% of the votes, candidate B received 30% of the votes, andcandidate C received 50% of the votes. If six voters are selected randomly,what is the probability that there will be exactly one supporter forcandidate A, two supporters for candidate B and three supporters forcandidate C in the sample?

Let (XA,XB ,XC ) represent the number of voters for candidates A, B and Crespectively. Then (XA,XB ,XC ) Mult(6, 0.2, 0.3, 0.5). SoP(XA = 1,XB = 2,XC = 3) = 6!

1!2!3!0.210.320.53 = 0.135.

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 53 / 70

Page 60: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

Multinomial and categorical distribution » Multinomial distribution

Parameter estimation

QuestionSuppose that you observe a random sample X = (Xi1, . . . ,Xik) from aMult(n, p1, . . . , pk) distribution, where each Xij tells you the “number ofsuccesses” for category j . What is an appropriate estimator for each of thepj? Work this out as an exercise. Hint: Each component of X isBern(pj).

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 54 / 70

Page 61: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

Multinomial and categorical distribution » Categorical distribution

Categorical distribution

When we have only one trial (n = 1), we have a special case of themultinomial distribution. There is exactly one entry in X = (X1, . . . ,Xk)that is equal to one, while the rest is zero.

Definition 14 (Categorical distribution)

Let X = (X1, . . . ,Xk)> ∼ Cat(p1, . . . , pk). Then, the pmf of X is given by

f (x1, . . . , xk) = px11 px22 · · · pxkk

Here, pj is the probability of success associated with event Xj .

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 55 / 70

Page 62: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

Multinomial and categorical distribution » Categorical distribution

Categorical distribution (cont.)

• Since each Xj ∼ Bin(1, pj) ≡ Bern(pj), the categorical distribution iseffectively a generalisation of the Bernoulli distribution.

• Interestingly, we can also define

Y = arg maxj

Xj

so that Y is a random variable taking on one distinct valuej ∈ {1, . . . , k} (category) with probability pj . This formulation has alot of importance in choice modelling in statistics and econometrics.

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 56 / 70

Page 63: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

1 χ2-distribution

2 Student’s t-distribution

3 F -distribution

4 Multivariate distributions

5 Multinomial and categorical distribution

6 Multivariate normal distribution

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 56 / 70

Page 64: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

Multivariate normal distribution »

Multivariate normal distribution

This is the multivariate extension of the regular normal distribution. It isundoubtedly the most commonly used distribution in statistics, and it has alot of interesting properties.

Definition 15 (Multivariate normal distribution)

Let X = (X1, . . . ,Xk)> be distributed according to a multivariate normaldistribution with mean vector µ ∈ Rk and variance-covariance matrixΣ ∈ Rk×k . It has pdf

f (x) = (2π)−k/2|Σ|−1/2 exp

{−12

(x− µ)>Σ−1(x− µ)

}and we write X ∼ Nk(µ,Σ).

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 57 / 70

Page 65: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

Multivariate normal distribution »

Multivariate normal distribution (cont.)

Here are some properties of the multivariate normal distribution:1. µ is a vector of length k , and Σ is a square k × k , symmetric,

positive-definite matrix.2. The first and second moments of X are

E(X) = µ

E(XX>) = Σ + µµ>

so therefore,

Var(X) = E((X− µ)(X− µ)>

)= E(XX>)− µµ> = Σ

3. Let Σ = (σij). Then

σij =

{Var(Xi ) if i = j

Cov(Xi ,Xj) if i 6= j

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 58 / 70

Page 66: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

Multivariate normal distribution »

Multivariate normal distribution (cont.)

4. When σij = 0 for all i 6= j , i.e. the components of X are uncorrelated,we have Σ = diag(σ11, . . . , σkk). Also, |Σ| =

∏ki=1 σii . Hence, the

pdf admits a simpler form

f (x) =k∏

i=1

{1√2πσii

e− 1

2σii(xi−µi )2

}and therefore, X1, . . . ,Xk are independent, and each Xi ∼ N(µi , σii ).

RemarkIn general, two r.v. X and Y which are independent imply thatCov(X ,Y ) = 0, but not necessarily the other way around. But if X and Yare two normal r.v., then X and Y are independent if and only ifCov(X ,Y ) = 0.

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 59 / 70

Page 67: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

Multivariate normal distribution »

Multivariate normal distribution (cont.)

5. Suppose that we can partition X into X = (Xa,Xb)>, and similarly,

µ =

(µa

µb

)Σ =

(Σa Σab

Σ>ab Σb

).

Then,I Marginal distributions. Xa ∼ Ndim Xa(µa,Σa) and

Xb ∼ Ndim Xb(µb,Σb).

I Conditional distributions. Xa|Xb ∼ Ndim Xa(µa, Σa) andXb ∼ Ndim Xb

(µb, Σb), where

µa = µa + ΣabΣ−1b (Xb − µb) µb = µb + Σ>abΣ

−1a (Xa − µa)

Σa = Σa −ΣabΣ−1b Σ>ab Σb = Σb −Σ>abΣ

−1a Σab

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 60 / 70

Page 68: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

Multivariate normal distribution »

Multivariate normal distribution (cont.)

6. Let A be a Rq×k constant matrix, and b ∈ Rk a constant vector.Then Y := AX + b ∈ Rq has distribution

Y ∼ Nq(Aµ,AΣA>)

This is simply the linearity property of normal distributions for themultivariate case.

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 61 / 70

Page 69: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

Multivariate normal distribution »

Standard multivariate normal

A special case of the multivariate normal is when µ = (0, . . . , 0)>, andΣ = Ik . We would then have the standard multivariate normal distributionZ ∼ Nk(0, Ik).

The pdf of the standard normal is

φ(z) = (2π)−k/2 exp

{−12z>z

}

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 62 / 70

Page 70: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

Multivariate normal distribution »

Standard multivariate normal (cont.)

Suppose that L is a (non-singular) matrix such that LL> = Σ. Note that,by the linearity property of the multivariate normal, X = µ + LZ is alsonormally distributed, with mean

E(X) = µ + LE(Z) = µ

and variance

Var(X) = 0 + LVar(Z)L> = LIkL> = Σ.

Therefore X ∼ Nk(µ,Σ).

This property is often used for simulating from multivariate normals: 1)Generate samples from k iid N(0, 1) distributions; then 2) apply thelinearity transformation above.

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 63 / 70

Page 71: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

Multivariate normal distribution »

Standard multivariate normal (cont.)

Of course, the other way around works too. If X ∼ Nk(µ,LL>), then

Z = L−1(X− µ)

has meanE(Z) = L−1(EX− µ) = 0

and variance

Var(Z) = L−1(VarX)(L−1)> = L−1LL>(L−1)> = Ik .

So it is possible to “standardise” a multivariate normal distribution. This isuseful when we want to calculate multivariate normal probabilities(although admittedly, this is a very computer-intensive problem involvingnumerical approximations).

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 64 / 70

Page 72: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

Multivariate normal distribution »

Standard multivariate normal (cont.)

Strategies for decomposing the variance-covariance matrix Σ:• Eigendecomposition. For a positive-definite matrix, we have that

Σ = ΓΛΓ>. The matrix Γ is a matrix of eigenvectors of Σ, andΛ = diag(λ1, . . . , λk) is a diagonal matrix of eigenvalues. Additionallyfor positive matrices, ΓΓ> = Ik (orthogonal).• Cholesky decomposition. The Cholesky decomposition of a

positive-definite matrix is a decomposition of the form Σ = LL>,where L is a lower-triangular matrix with real and positive diagonalentries.• LDL decomposition. Closely related to the Cholesky decomposition,

this is a decomposition of the form Σ = LDL>, where D is a diagonalmatrix, and L is a lower-unitriangular matrix (all diagonal elements are1).

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 65 / 70

Page 73: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

Multivariate normal distribution »

Example

Example 16Suppose that X1 ∼ N(µ1, σ

21), and X2 ∼ N(µ2, σ

22). The covariance

between X1 and X2 is Cov(X1,X2) = σ12.

Then X = (X1,X2)> is a bivariate normal distribution with meanµ = (µ1, µ2)>, and variance

Σ =

(σ2

1 σ12σ12 σ2

2

).

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 66 / 70

Page 74: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

Multivariate normal distribution »

Example (cont.)

Example 16The pdf is

f

(x1x2

)=(2π)−1 det

(σ2

1 σ12σ12 σ2

2

)−1

× exp

{−12(x1 − µ1 x2 − µ2

)( σ21 σ12

σ12 σ22

)−1(x1 − µ1x2 − µ2

)}

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 67 / 70

Page 75: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

Multivariate normal distribution »

Example (cont.)

Example 16

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 68 / 70

Page 76: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

Multivariate normal distribution »

Parameter estimation

Very briefly, suppose that you had n samples from a k-variate normaldistribution. That is, you observe X1, . . . ,Xn, where each Xi ∈ Rk is ak-dimensional vector.

The unconstrained maximum likelihood estimators for µ and Σ are

µ =1n

n∑i=1

xi = (X1, . . . , Xk)> ∈ Rk

and

Σ =1n

n∑i=1

(xi − µ)(xi − µ)> ∈ Rk×k .

It turns out also that µ D−→ N(µ,Σ/n), as n→∞ (multivariate CLT).

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 69 / 70

Page 77: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

Multivariate normal distribution »

Parameter estimation

QuestionHow many total unknown parameters are there in a k-variate normaldistribution? Work this out as an exercise.

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 70 / 70

Page 78: SM-4331 Advanced Statistics Chapter 3 (Important ... · SM-4331 Advanced Statistics Chapter 3 (Important Univariate and Multivariate Distributions) DrHaziqJamil FOS M1.09 Universiti

References I

Casella, G. and R. L. Berger (2002). Statistical Inference. 2nd ed. PacificGrove, CA: Duxbury. ISBN: 978-0-534-24312-8.

Pawitan, Y. (2001). In All Likelihood. Statistical Modelling and InferenceUsing Likelihood. Oxford University Press. ISBN: 978-0-19-850765-9.

Jamil, H. (Oct. 2018). “Regression modelling using priors depending onFisher information covariance kernels (I-priors)”. PhD thesis. LondonSchool of Economics and Political Science.

Wasserman, L. (2013). All of statistics: a concise course in statisticalinference. Springer Science & Business Media.

Haziq Jamil - https://haziqj.ml SM-4331 Chapter 3 Sem II 2019/20 70 / 70