A 250-Year Argument - Stanford University

Post on 18-Apr-2022

4 views 0 download

Transcript of A 250-Year Argument - Stanford University

A 250-Year Argument

Belief, Behavior, and the Bootstrap

Bradley Efron

Stanford University

The Greater World of Mathematics and Science

Mathematics

StatisticsA.I.

TerraIncognita

Ap

plied

Sciences

A 250-Year Argument 1

The Physicist’s Twins

• Sonogram “Twin boys on the way!”

• Physicist “What’s the probability my twins will be identical

rather than fraternal?”

• Doctor “One third of twins are identical.”

A 250-Year Argument 2

Bayes Rule for the Twins

• Prior odds:Pr{identical}

Pr{fraternal}=

1/32/3=

12

(past experience)

• Likelihood ratio:

Pr{same sex|identical}

Pr{same sex|fraternal}=

11/2= 2

(current evidence)

• Posterior odds:Pr{identical|same sex}

Pr{fraternal|same sex}= ? (updated beliefs)

• Bayes rule:

Posterior odds = Prior odds · Likelihood ratio =12· 2 = 1

• My answer : “50/50”

A 250-Year Argument 3

If All Twins Were Sonogrammed:

5

Identical

Twins are:

Fraternal

Same sex Different

Physicist

Sonogram shows:

Doctor

2/3

1/3

1/3

1/3 0

1/3

b a

c d

A 250-Year Argument 4

Belief and Inference

• θ: unknown state of nature (identical or fraternal?)

• π(θ): prior beliefs for θ (1/3, 2/3)

• x: current evidence (sonogram)

• fθ(x): probability model for x given θ

• Question What is π(θ|x)? (posterior beliefs given x)

A 250-Year Argument 5

Bayes Rule (1763)

• π(θ|x) = cπ(θ) · fθ(x)

↑ ↑ ↑

posterior

beliefs

prior

beliefs

likelihood

function

• “c” makes π(θ|x) sum to 1

• Likelihood function fθ(x) with x fixed, θ varying, e.g.,

fθ(x) = 1√

2πe−

12 (θ−x)2

:

2 3 4 5 6 7 8

0.0

0.1

0.2

0.3

0.4

x theta−−><−−theta

A 250-Year Argument 6

Bayes Inference without Prior Experience

“Objective Bayes”

• “p” population proportion of identical twins [Doctor : p = 13 ]

• Principle of insufficient reason (Laplace, Bernoulli) “In the

absence of prior experience, assume p equally likely to have

any value between 0 and 1.” [opposed Venn, Keynes, Fisher]

• Invariant prior (Harold Jeffreys, 1930s):

π(p) = cp−12 (1 − p)−

12

A 250-Year Argument 7

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Possible prior densities for p, the Population prop Identical,And the corresponding predictions for the Physicist

p, the population proportion of Identical twins

prio

r de

nsity

pi(p

)

DoctorProb=.5

JeffreysProb=.67

LaplaceProb=.67

1/3●

A 250-Year Argument 8

Frequentist Statistics (Behaviorism)

• θ = unknown parameter, x = observed data,

fθ(x) probability model (but no prior beliefs π(θ))

• “t(x)” some statistical procedure

(test, estimate, confidence interval, . . . )

• Inference based on behavior of t(x) in repeated use

• Optimality find best t(x)(R.A. Fisher, 1920s; J. Neyman, 1930s)

A 250-Year Argument 9

●●

●●

0 20 40 60

3040

5060

70

Scores of 22 students on two tests 'mechanics' and 'vectors';Sample Correlation Coefficient is .498 +−??

means: mec 38.9 vec 50.6mechanics score −−>

vect

ors

scor

e −

−>

39.0

50.6

A 250-Year Argument 10

Student Score Data

• n = 22 students’ scores on two tests: mechanics, vectors

• Data y = (y1, y2, . . . , y22) with yi = (meci, veci)

• Parameter of interest θ = correlation (mec, vec)

• Sample correlation coefficient θ = 0.498± ??

A 250-Year Argument 11

R. A. Fisher

• 1915 : probability density fθ(θ)

(hypergeometric series)

• 1922–30 : θ is maximum likelihood estimate (MLE)

• Frequentist optimality of MLE minimize expected squared

error E{(θ − θ

)2}

• Bivariate normal models

A 250-Year Argument 12

Jerzy Neyman (1930s)

• Optimal frequentist tests and confidence intervals

• 90% confidence interval for θ:

θ ∈ [0.164, 0.717]

• Neyman’s construction covers true θ 90% of the time, in

repeated use

A 250-Year Argument 13

−0.5 0.0 0.5 1.0

01

23

4

Neyman's 90% confidence interval for student score correlation:.164 < theta < .717

thetahat* −−>

Fis

her's

den

sity

f(th

etah

at*

| the

ta)

−−

>

.05.05

.164 .717.498

theta=.164

theta=.717

A 250-Year Argument 14

Jeffreys’ Invariant Prior

• Jeffreys’ objective (or “uninformative”) prior for correlation:

π(θ) = 1/(1 − θ2)

• General formula one over square root of Fisher’s information

bound for the variance of the MLE (transforms correctly under

change of variables)

A 250-Year Argument 15

−0.2 0.0 0.2 0.4 0.6 0.8

0.0

0.5

1.0

1.5

2.0

Bayes posterior density pi(theta | thetahat) for the 22 students;90% Credible Limits = [.164,.718]; Neyman Limits [.164,.717]

theta −−>

post

erio

r de

nsity

−−

>

5% 90% 5%

.164 .718● ●

A 250-Year Argument 16

More Students

n θ

22 .498

44 .663

66 .621

88 .553

∞ [.415, .662]

A 250-Year Argument 17

*****

**** ***

**

*

*

* *** ***

*

*** *

** **

***

*

* ****

***

** **

** *

** *

***

**

*

**

**

***

***

******

*** *

** *

*

**

***

*

**

*

***

**

***

* *

**

** **

**

**

**** *

**

*

* ** ** **

** ****

* **

*

**

*

*

**** ****

*

***

**** * **

**

** *

*** *****

** *

*

* ****

* **

**

***

**

*

*

****

* ****

****

*

** *

*** *

****** **

*

**** *** **

**** *

*

****

*** ***

*

***

***

*

****

**

****

** *

***

**

** *

*** **

*

***

***

**

***

** *

***** **

***

** * **

*

*

** **

**

*** ** ** *

** *

**** ** ** *

** * ****

*

*

*****

*

***

*

*** * *

**

** **

* ** *

****** *

***

**** **** ***

****

***

*

* *

***

* ** **

* ** **

***** *

******

**** *

*** ***

**

****

**

***

**

* *

***

*

**

***

***

*****

** ** * *

*

*

*** *** *

*

*

**** *

***

*

**

** *

* *

*****

*

**

*** * ** *

***

* ***** *

**

** *

* *** ****

**

*

****

***

*

** * ****

* **** *****

*****

* ****

***

***

* ***

*

* *

**

**

**

**

*** *

**

*

* *

**** ***

***

***

***

*****

**

*

** **

** *

***

**** *

** ****

**

* ** ** **

* ** ** **

*

*** **

****

****

****

** *****

** **

*** ***

**

** *

***

****

*** **

* *****

*

**** **

* ***

**

**

** *

***** *

* * **

******

***

* **** **

****

***

** *

***

* *

**

** *

**

* * **

* **

* *****

* *

** ***

**

*** ****

***

* *

****

*

**** ** *

*

*

* ** ****

***

* *

* * ***

** * **

****** *

*****

**

*** ***

**

****

*

***

**

**

** **** *

**

** *

**

*

***

**

*

****

64 66 68 70 72 74

6065

7075

Galton's 1886 distribution of child's height vs parents';Ellipses are contours of best fit bivariate normal density;

Red dot at bivariate average (68.3, 68.1)

parents' height

child

's h

eigh

t

68.3

68.1

A 250-Year Argument 18

Bivariate Normal Distribution

• “y ∼ N2(µ,Σ)” (y, µ ∈ R2, Σ 2 × 2 pos def):

fµ,Σ(y) =1

2π|Σ|−

12 e−

12 (y−µ)tΣ−1(y−µ)

• µ center of ellipse, Σ their shape

• 5 parameters: 2 means, 2 variances, 1 correlation

A 250-Year Argument 19

A More Difficult Problem

• θ = “eigenratio” =λ1

λ1 + λ2(λ1 > λ2 eigenvalues Σ)

• Student score data y (22 × 2) gives MLEs µ, Σ, and

θ = 0.793±?

• Not true: fµ,Σ(θ)

depends only on θ

• There are 4 “nuisance parameters”

A 250-Year Argument 20

0.5 0.6 0.7 0.8 0.9 1.0

02

46

Posterior density: eigenratio, Jeffreys prior bivariate normal; 90% credible limits [.68,.89]; Bootstrap CI [.63,.88]

Red dots are Bootstrap 90% confidence limitseigenratio−−>

post

erio

r de

nsity

−−

>

● ●

A 250-Year Argument 21

Bootstrap Methods (Automatic Frequentist Inference)

• Original data yi ∼ N2(µ,Σ), i = 1, 2, . . . , 22

– gives MLEs µ, Σ, and θ = 0.793

• Bootstrap data y∗i ∼ N2(µ, Σ), i = 1, 2, . . . , 22

– gives θ∗ = bootstrap eigenratio

• 10,000 θ∗s • 58% exceed θ (upward bias)

• Reweighting formula puts bigger weights on smaller θ∗s

• Confidence limits are the weighted bootstrap percentiles

A 250-Year Argument 22

10000 bootstrap eigenratio values from student score data(bivariate normal model); Red line shows confidence weights

58% of the bootstrap values exceed .793bootstrap eigenratios −−>

Fre

quen

cy

0.5 0.6 0.7 0.8 0.9 1.0

010

020

030

040

050

060

0

MLE=.793● ●

A 250-Year Argument 23

Gibbs Sampling (Automatic Bayes Inference)

• Given: prior π(θ), data x, model fθ(x)

• Approximates: π(θ|x) by Markov chain random walk

• “MCMC”, “Metropolis-Hastings”, . . . (A-Bomb?)

• Most often used with convenient “ uninformative” priors

A 250-Year Argument 24

Prostate Cancer Study

(Singh et al 2002)

• 102 men: 52 prostate cancer, 50 healthy controls

• Each man assessed for activity of 6033 genes

• Statistic xi measures differences in activity, patients minus

controls, for genei, i = 1, 2, . . . , 6033.

• Probability model

xi ∼ N(δi, 1) (normal, mean δi, variance 1)

δi the true difference or effect size

A 250-Year Argument 25

Prostate Study (Singh et al 2002): difference estimates x[i]comparing cancer patients with normal controls, 6033 genes

hash marks show 10 largest x valuesdifference estimates x[i] −−>

Fre

quen

cy

−4 −2 0 2 4

010

020

030

040

0

if allx[i]=0

gene 610x=5.29

A 250-Year Argument 26

Bayesian Analysis (for one gene)

• Assume δ has prior density π(δ)

• Prob model fδ(x) =1√

2πe−

12 (x−δ)2

• Marginal density m(x) =∫∞

−∞

fδ(x)π(δ) dδ

(overall density of x taking account of randomness in δ)

• Bayes posterior expectation (“Tweedie’s formula”)

E{δ|x} = x +d

dxlog m(x)

A 250-Year Argument 27

Empirical Bayes Analysis

• We don’t know prior π(δ), but histogram provides a smooth

estimate m(x) for m(x)

• Empirical Bayes estimate:

E{δi|xi} = xi +ddx

log m(x)∣∣∣∣∣xi

• Frequentist estimation of a Bayesian inference

A 250-Year Argument 28

−4 −2 0 2 4 6

−2

02

4

Empirical Bayes estimates of E{delta|x}, the expected truedifference delta[i] given the observed difference x[i]

Estimates near 0 for the 93% of genes in [−2,2]difference value x[i] −−>

E{d

elta

[i] |

x[i]}

−−

>

x[610]=5.29

estimate= 4.07

| |

A 250-Year Argument 29

Score Sheet

Bayes Frequentist

1. Belief (prior)

2. Principled

3. One distribution

4. Dynamic

5. Individual (subjective)

6. Aggressive

1. Behavior (method)

2. Opportunistic

3. Many distributions (bootstrap?)

4. Static

5. Community (objective)

6. Defensive

A 250-Year Argument 30