Module4: IntroductiontotheNormalGamma ModelModule 4: Introduction to the Normal Gamma Model Author:...

22
Module 4: Introduction to the Normal Gamma Model Rebecca C. Steorts and Lei Qian

Transcript of Module4: IntroductiontotheNormalGamma ModelModule 4: Introduction to the Normal Gamma Model Author:...

Page 1: Module4: IntroductiontotheNormalGamma ModelModule 4: Introduction to the Normal Gamma Model Author: Rebecca C. Steorts and Lei Qian Created Date: 2/15/2017 9:42:48 PM ...

Module 4: Introduction to the Normal GammaModel

Rebecca C. Steorts and Lei Qian

Page 2: Module4: IntroductiontotheNormalGamma ModelModule 4: Introduction to the Normal Gamma Model Author: Rebecca C. Steorts and Lei Qian Created Date: 2/15/2017 9:42:48 PM ...

Agenda

I NormalGamma distributionI posterior derivationI application to IQ scores

Page 3: Module4: IntroductiontotheNormalGamma ModelModule 4: Introduction to the Normal Gamma Model Author: Rebecca C. Steorts and Lei Qian Created Date: 2/15/2017 9:42:48 PM ...

NormalGamma distributionLet X1, . . . ,Xn

iid∼ N (µ, λ−1) and assume both

I the mean µ andI the precision λ = 1/σ2 are unknown.

The NormalGamma(m, c, a, b) distribution, with m ∈ R andc, a, b > 0, is a joint distribution on (µ, λ) obtained by letting

µ|λ ∼ N (m, (cλ)−1)λ ∼ Gamma(a, b)

In other words, the joint p.d.f. is

p(µ, λ) = p(µ|λ)p(λ) = N (µ | m, (cλ)−1)Gamma(λ | a, b)

which we will denote by NormalGamma(µ, λ | m, c, a, b).

Page 4: Module4: IntroductiontotheNormalGamma ModelModule 4: Introduction to the Normal Gamma Model Author: Rebecca C. Steorts and Lei Qian Created Date: 2/15/2017 9:42:48 PM ...

NormalGamma distribution (continued)It turns out that this provides a conjugate prior for (µ, λ).

One can show the posterior is

µ,λ|x1:n ∼ NormalGamma(M,C ,A,B) (1)

i.e., p(µ, λ|x1:n) = NormalGamma(µ, λ | M,C ,A,B), where

M = cm +∑n

i=1 xic + n

C = c + nA = a + n/2B = b + 1

2(cm2 − CM2 +

∑ni=1 x2

i).

Page 5: Module4: IntroductiontotheNormalGamma ModelModule 4: Introduction to the Normal Gamma Model Author: Rebecca C. Steorts and Lei Qian Created Date: 2/15/2017 9:42:48 PM ...

NormalGamma distribution (continued)

For interpretation, B can also be written (by rearranging terms) as

B = b + 12

n∑i=1

(xi − x̄)2 + 12

cnc + n (x̄ −m)2. (2)

Page 6: Module4: IntroductiontotheNormalGamma ModelModule 4: Introduction to the Normal Gamma Model Author: Rebecca C. Steorts and Lei Qian Created Date: 2/15/2017 9:42:48 PM ...

NormalGamma distribution (continued)

I M: Posterior mean for µ. It is a weighted average (convexcombination) of the prior mean and the sample mean:

M = cc + nm + n

c + n x̄ .

I C : “Sample size” for estimating µ. (The standard deviation ofµ|λ is λ−1/2/

√C .)

I A: Shape for posterior on λ. Grows linearly with sample size.I B: Rate (1/scale) for posterior on λ. Equation 2 decomposes

B into the prior variation, observed variation (sample variance),and variation between the prior mean and sample mean:

B = (prior variation)+12n(observed variation)+1

2cn

c+n (variation bw means).

Page 7: Module4: IntroductiontotheNormalGamma ModelModule 4: Introduction to the Normal Gamma Model Author: Rebecca C. Steorts and Lei Qian Created Date: 2/15/2017 9:42:48 PM ...

Posterior derivation

First, consider the NormalGamma density. Dropping constants ofproportionality, multiplying out (µ−m)2 = µ2 − 2µm + m2, andcollecting terms, we have

NormalGamma(µ, λ | m, c, a, b) (3)= N (µ | m, (cλ)−1)Gamma(λ | a, b)

=

√cλ2π exp

(− 1

2cλ(µ−m)2) ba

Γ(a)λa−1 exp(−bλ)

∝µ,λ

λa−1/2 exp(− 1

2λ(cµ2 − 2cmµ+ cm2 + 2b)). (4)

Page 8: Module4: IntroductiontotheNormalGamma ModelModule 4: Introduction to the Normal Gamma Model Author: Rebecca C. Steorts and Lei Qian Created Date: 2/15/2017 9:42:48 PM ...

Posterior derivation (continued)

Similarly, for any x ,

N (x | µ, λ−1) =

√λ

2π exp(− 1

2λ(x − µ)2)

∝µ,λ

λ1/2 exp(− 1

2λ(µ2 − 2xµ+ x2)). (5)

Page 9: Module4: IntroductiontotheNormalGamma ModelModule 4: Introduction to the Normal Gamma Model Author: Rebecca C. Steorts and Lei Qian Created Date: 2/15/2017 9:42:48 PM ...

Posterior derivation (continued)

p(µ, λ|x1:n)∝µ,λ

p(µ, λ)p(x1:n|µ, λ)

∝µ,λ

λa−1/2 exp(− 1

2λ(cµ2 − 2cmµ+ cm2 + 2b))

× λn/2 exp(− 1

2λ(nµ2 − 2(∑

xi )µ+∑

x2i ))

= λa+n/2−1/2 exp(− 1

2λ((c + n)µ2 − 2(cm +

∑xi )µ+ cm2 + 2b +

∑x2

i))

(a)= λA−1/2 exp(− 1

2λ(Cµ2 − 2CMµ+ CM2 + 2B

))(b)∝ NormalGamma(µ, λ | M,C ,A,B)

where step (b) is by Equation 4, and step (a) holds if A = a + n/2,C = c + n, CM = (cm +

∑xi ), and

CM2 + 2B = cm2 + 2b +∑

x2i .

Page 10: Module4: IntroductiontotheNormalGamma ModelModule 4: Introduction to the Normal Gamma Model Author: Rebecca C. Steorts and Lei Qian Created Date: 2/15/2017 9:42:48 PM ...

Posterior derivation (continued)

This choice of A and C match the claimed form of the posterior,and solving for M and B, we get M = (cm +

∑xi )/(c + n) and

B = b + 12(cm2 − CM2 +

∑x2

i ),

as claimed.

Alternative derivation: complete the square all way (longer andmuch more tedious).

Page 11: Module4: IntroductiontotheNormalGamma ModelModule 4: Introduction to the Normal Gamma Model Author: Rebecca C. Steorts and Lei Qian Created Date: 2/15/2017 9:42:48 PM ...

Do a teacher’s expectations influence studentachievement?

Do a teacher’s expectations influence student achievement? In afamous study, Rosenthal and Jacobson (1968) performed anexperiment in a California elementary school to try to answer thisquestion. At the beginning of the year, all students were given anIQ test. For each class, the researchers randomly selected around20% of the students, and told the teacher that these students were“spurters” that could be expected to perform particularly well thatyear. (This was not based on the test—the spurters were randomlychosen.) At the end of the year, all students were given another IQtest. The change in IQ score for the first-grade students was:1

1The original data is not available. This data is from the ex1321 dataset ofthe R package Sleuth3, which was constructed to match the summary statisticsand conclusions of the original study.

Page 12: Module4: IntroductiontotheNormalGamma ModelModule 4: Introduction to the Normal Gamma Model Author: Rebecca C. Steorts and Lei Qian Created Date: 2/15/2017 9:42:48 PM ...

Do a teacher’s expectations influence studentachievement?

## Warning: package 'ggplot2' was built under R version 3.3.2

#### Attaching package: 'dplyr'

## The following objects are masked from 'package:plyr':#### arrange, count, desc, failwith, id, mutate, rename, summarise,## summarize

## The following objects are masked from 'package:stats':#### filter, lag

## The following objects are masked from 'package:base':#### intersect, setdiff, setequal, union

#### Attaching package: 'reshape'

## The following object is masked from 'package:dplyr':#### rename

## The following objects are masked from 'package:plyr':#### rename, round_any

Page 13: Module4: IntroductiontotheNormalGamma ModelModule 4: Introduction to the Normal Gamma Model Author: Rebecca C. Steorts and Lei Qian Created Date: 2/15/2017 9:42:48 PM ...

Spurters/Control Data

#spurtersx <- c(18, 40, 15, 17, 20, 44, 38)#controly <- c(-4, 0, -19, 24, 19, 10, 5, 10,

29, 13, -9, -8, 20, -1, 12, 21,-7, 14, 13, 20, 11, 16, 15, 27,23, 36, -33, 34, 13, 11, -19, 21,6, 25, 30,22, -28, 15, 26, -1, -2,43, 23, 22, 25, 16, 10, 29)

iqData <- data.frame(Treatment =c(rep("Spurters", length(x)),rep("Controls", length(y))),Gain = c(x, y))

Page 14: Module4: IntroductiontotheNormalGamma ModelModule 4: Introduction to the Normal Gamma Model Author: Rebecca C. Steorts and Lei Qian Created Date: 2/15/2017 9:42:48 PM ...

An initial exploratory analysis

Plot the number of students versus the change in IQ score forthe two groups. How strongly does this data support the hypothesisthat the teachers’ expectations caused the spurters to performbetter than their classmates?

Page 15: Module4: IntroductiontotheNormalGamma ModelModule 4: Introduction to the Normal Gamma Model Author: Rebecca C. Steorts and Lei Qian Created Date: 2/15/2017 9:42:48 PM ...

Histogram of Change in IQ ScoresxLimits = seq(min(iqData$Gain) - (min(iqData$Gain) %% 5),

max(iqData$Gain) + (max(iqData$Gain) %% 5),by = 5)

ggplot(data = iqData, aes(x = Gain,fill = Treatment,colour = I("black"))) +

geom_histogram(position = "dodge", alpha = 0.5,breaks = xLimits, closed = "left")+

scale_x_continuous(breaks = xLimits,expand = c(0,0))+

scale_y_continuous(expand = c(0,0),breaks = seq(0, 10, by = 1))+

ggtitle("Histogram of Change in IQ Scores") +labs(x = "Change in IQ Score", fill = "Group") +theme(plot.title = element_text(hjust = 0.5))

0

1

2

3

4

5

6

7

8

9

10

−35 −30 −25 −20 −15 −10 −5 0 5 10 15 20 25 30 35 40 45

Change in IQ Score

coun

t Group

Controls

Spurters

Histogram of Change in IQ Scores

Page 16: Module4: IntroductiontotheNormalGamma ModelModule 4: Introduction to the Normal Gamma Model Author: Rebecca C. Steorts and Lei Qian Created Date: 2/15/2017 9:42:48 PM ...

Histogram of Change in IQ Scores

0

1

2

3

4

5

6

7

8

9

10

−35 −30 −25 −20 −15 −10 −5 0 5 10 15 20 25 30 35 40 45

Change in IQ Score

coun

t Group

Controls

Spurters

Histogram of Change in IQ Scores

Page 17: Module4: IntroductiontotheNormalGamma ModelModule 4: Introduction to the Normal Gamma Model Author: Rebecca C. Steorts and Lei Qian Created Date: 2/15/2017 9:42:48 PM ...

IQ Tests and Modeling

IQ tests are purposefully calibrated to make the scores normallydistributed, so it makes sense to use a normal model here:

spurters: X1, . . . ,XnSiid∼ N (µS , λ

−1S )

controls: Y1, . . . ,YnCiid∼ N (µC , λ

−1C ).

I We are interested in the difference between the means—inparticular, is µS > µC?

I We don’t know the standard deviations σS = λ−1/2S and

σC = λ−1/2C , and the sample seems too small to estimate them

very well.

Page 18: Module4: IntroductiontotheNormalGamma ModelModule 4: Introduction to the Normal Gamma Model Author: Rebecca C. Steorts and Lei Qian Created Date: 2/15/2017 9:42:48 PM ...

IQ Tests and Modeling

On the other hand, it is easy using a Bayesian approach: we justneed to compute the posterior probability that µS > µC :

P(µS > µC | x1:nS , y1:nC ).

Let’s use independent NormalGamma priors:

spurters: (µS ,λS) ∼ NormalGamma(m, c, a, b)controls: (µC ,λC ) ∼ NormalGamma(m, c, a, b)

Page 19: Module4: IntroductiontotheNormalGamma ModelModule 4: Introduction to the Normal Gamma Model Author: Rebecca C. Steorts and Lei Qian Created Date: 2/15/2017 9:42:48 PM ...

Hyperparameter settings

I m = 0 (Don’t know whether students will improve or not, onaverage.)

I c = 1 (Unsure about how big the mean change will be—priorcertainty in our choice of m assessed to be equivalent to onedatapoint.)

I a = 1/2 (Unsure about how big the standard deviation of thechanges will be.)

I b = 102a (Standard deviation of the changes expected to bearound 10 =

√b/a = E(λ)−1/2.)

Page 20: Module4: IntroductiontotheNormalGamma ModelModule 4: Introduction to the Normal Gamma Model Author: Rebecca C. Steorts and Lei Qian Created Date: 2/15/2017 9:42:48 PM ...

Prior Samples## Warning: Removed 2031 rows containing missing values (geom_point).

0

10

20

30

40

−50 −25 0 25 50

µ (Mean Change in IQ Score)

λ−12 (

Std

. Dev

. of C

hang

e)

Prior Samples

Page 21: Module4: IntroductiontotheNormalGamma ModelModule 4: Introduction to the Normal Gamma Model Author: Rebecca C. Steorts and Lei Qian Created Date: 2/15/2017 9:42:48 PM ...

Original question

“What is the posterior probability that µS > µC?”

I The easiest way to do this is to take a bunch of samples fromeach of the posteriors, and see what fraction of times we haveµS > µC .

I This is an example of a Monte Carlo approximation (muchmore to come on this in the future).

I To do this, we draw N = 106 samples from each posterior:

(µ(1)S , λ

(1)S ), . . . , (µ(N)

S , λ(N)S ) ∼ NormalGamma(A1,B1,C1,D1)

(µ(1)C , λ

(1)C ), . . . , (µ(N)

C , λ(N)C ) ∼ NormalGamma(A2,B2,C2,D2)

Page 22: Module4: IntroductiontotheNormalGamma ModelModule 4: Introduction to the Normal Gamma Model Author: Rebecca C. Steorts and Lei Qian Created Date: 2/15/2017 9:42:48 PM ...

Original question (continued)

What are the updated posterior values?

and obtain the approximation

P(µS > µC | x1:nS , y1:nC ) ≈ 1N

N∑i=1

I(µ

(i)S > µ

(i)C)

=??.

In lab this week, you will explore this data set more and analyze thisin your homework assignment.