A 250-Year Argument

Belief, Behavior, and the Bootstrap

Bradley Efron

Stanford University

The Greater World of Mathematics and Science

Mathematics

StatisticsA.I.

TerraIncognita

Sciences

A 250-Year Argument 1

The Physicist’s Twins

• Sonogram “Twin boys on the way!”

• Physicist “What’s the probability my twins will be identical

rather than fraternal?”

• Doctor “One third of twins are identical.”

Bayes Rule for the Twins

• Prior odds:Pr{identical}

Pr{fraternal}=

1/32/3=

(past experience)

• Likelihood ratio:

Pr{same sex|identical}

Pr{same sex|fraternal}=

11/2= 2

(current evidence)

• Posterior odds:Pr{identical|same sex}

Pr{fraternal|same sex}= ? (updated beliefs)

• Bayes rule:

Posterior odds = Prior odds · Likelihood ratio =12· 2 = 1

• My answer : “50/50”

If All Twins Were Sonogrammed:

Identical

Twins are:

Fraternal

Same sex Different

Physicist

Sonogram shows:

Doctor

Belief and Inference

• θ: unknown state of nature (identical or fraternal?)

• π(θ): prior beliefs for θ (1/3, 2/3)

• x: current evidence (sonogram)

• fθ(x): probability model for x given θ

• Question What is π(θ|x)? (posterior beliefs given x)

Bayes Rule (1763)

• π(θ|x) = cπ(θ) · fθ(x)

↑ ↑ ↑

posterior

beliefs

likelihood

function

• “c” makes π(θ|x) sum to 1

• Likelihood function fθ(x) with x fixed, θ varying, e.g.,

fθ(x) = 1√

2πe−

12 (θ−x)2

2 3 4 5 6 7 8

x theta−−><−−theta

Bayes Inference without Prior Experience

“Objective Bayes”

• “p” population proportion of identical twins [Doctor : p = 13 ]

• Principle of insufficient reason (Laplace, Bernoulli) “In the

absence of prior experience, assume p equally likely to have

any value between 0 and 1.” [opposed Venn, Keynes, Fisher]

• Invariant prior (Harold Jeffreys, 1930s):

π(p) = cp−12 (1 − p)−

0.0 0.2 0.4 0.6 0.8 1.0

Possible prior densities for p, the Population prop Identical,And the corresponding predictions for the Physicist

p, the population proportion of Identical twins

DoctorProb=.5

JeffreysProb=.67

LaplaceProb=.67

1/3●

Frequentist Statistics (Behaviorism)

• θ = unknown parameter, x = observed data,

fθ(x) probability model (but no prior beliefs π(θ))

• “t(x)” some statistical procedure

(test, estimate, confidence interval, . . . )

• Inference based on behavior of t(x) in repeated use

• Optimality find best t(x)(R.A. Fisher, 1920s; J. Neyman, 1930s)

●●

0 20 40 60

Scores of 22 students on two tests 'mechanics' and 'vectors';Sample Correlation Coefficient is .498 +−??

means: mec 38.9 vec 50.6mechanics score −−>

Student Score Data

• n = 22 students’ scores on two tests: mechanics, vectors

• Data y = (y1, y2, . . . , y22) with yi = (meci, veci)

• Parameter of interest θ = correlation (mec, vec)

• Sample correlation coefficient θ = 0.498± ??

R. A. Fisher

• 1915 : probability density fθ(θ)

(hypergeometric series)

• 1922–30 : θ is maximum likelihood estimate (MLE)

• Frequentist optimality of MLE minimize expected squared

error E{(θ − θ

• Bivariate normal models

Jerzy Neyman (1930s)

• Optimal frequentist tests and confidence intervals

• 90% confidence interval for θ:

θ ∈ [0.164, 0.717]

• Neyman’s construction covers true θ 90% of the time, in

repeated use

−0.5 0.0 0.5 1.0

Neyman's 90% confidence interval for student score correlation:.164 < theta < .717

thetahat* −−>

−−

.05.05

.164 .717.498

theta=.164

theta=.717

Jeffreys’ Invariant Prior

• Jeffreys’ objective (or “uninformative”) prior for correlation:

π(θ) = 1/(1 − θ2)

• General formula one over square root of Fisher’s information

bound for the variance of the MLE (transforms correctly under

change of variables)

−0.2 0.0 0.2 0.4 0.6 0.8

Bayes posterior density pi(theta | thetahat) for the 22 students;90% Credible Limits = [.164,.718]; Neyman Limits [.164,.717]

theta −−>

−−

5% 90% 5%

.164 .718● ●

More Students

22 .498

44 .663

66 .621

88 .553

∞ [.415, .662]

**** ***

* *** ***

* ****

******

**** *

* ** ** **

** ****

**** ****

**** * **

*** *****

* ****

****** **

**** *** **

**** *

*** ***

*** **

***** **

** * **

*** ** ** *

**** ** ** *

** * ****

*** * *

* ** *

****** *

**** **** ***

* ** **

***** *

******

**** *

*** ***

** ** * *

*** *** *

**** *

*** * ** *

* ***** *

* *** ****

** * ****

* **** *****

* ****

**** ***

**** *

** ****

* ** ** **

*** **

** *****

*** ***

*** **

* *****

**** **

***** *

* * **

******

* **** **

* * **

* *****

** ***

*** ****

**** ** *

* ** ****

* * ***

** * **

****** *

*** ***

** **** *

64 66 68 70 72 74

Galton's 1886 distribution of child's height vs parents';Ellipses are contours of best fit bivariate normal density;

Red dot at bivariate average (68.3, 68.1)

parents' height

Bivariate Normal Distribution

• “y ∼ N2(µ,Σ)” (y, µ ∈ R2, Σ 2 × 2 pos def):

fµ,Σ(y) =1

2π|Σ|−

12 e−

12 (y−µ)tΣ−1(y−µ)

• µ center of ellipse, Σ their shape

• 5 parameters: 2 means, 2 variances, 1 correlation

A More Difficult Problem

• θ = “eigenratio” =λ1

λ1 + λ2(λ1 > λ2 eigenvalues Σ)

• Student score data y (22 × 2) gives MLEs µ, Σ, and

θ = 0.793±?

• Not true: fµ,Σ(θ)

depends only on θ

• There are 4 “nuisance parameters”

0.5 0.6 0.7 0.8 0.9 1.0

Posterior density: eigenratio, Jeffreys prior bivariate normal; 90% credible limits [.68,.89]; Bootstrap CI [.63,.88]

Red dots are Bootstrap 90% confidence limitseigenratio−−>

−−

● ●

Bootstrap Methods (Automatic Frequentist Inference)

• Original data yi ∼ N2(µ,Σ), i = 1, 2, . . . , 22

– gives MLEs µ, Σ, and θ = 0.793

• Bootstrap data y∗i ∼ N2(µ, Σ), i = 1, 2, . . . , 22

– gives θ∗ = bootstrap eigenratio

• 10,000 θ∗s • 58% exceed θ (upward bias)

• Reweighting formula puts bigger weights on smaller θ∗s

• Confidence limits are the weighted bootstrap percentiles

10000 bootstrap eigenratio values from student score data(bivariate normal model); Red line shows confidence weights

58% of the bootstrap values exceed .793bootstrap eigenratios −−>

0.5 0.6 0.7 0.8 0.9 1.0

MLE=.793● ●

Gibbs Sampling (Automatic Bayes Inference)

• Given: prior π(θ), data x, model fθ(x)

• Approximates: π(θ|x) by Markov chain random walk

• “MCMC”, “Metropolis-Hastings”, . . . (A-Bomb?)

• Most often used with convenient “ uninformative” priors

Prostate Cancer Study

(Singh et al 2002)

• 102 men: 52 prostate cancer, 50 healthy controls

• Each man assessed for activity of 6033 genes

• Statistic xi measures differences in activity, patients minus

controls, for genei, i = 1, 2, . . . , 6033.

• Probability model

xi ∼ N(δi, 1) (normal, mean δi, variance 1)

δi the true difference or effect size

Prostate Study (Singh et al 2002): difference estimates x[i]comparing cancer patients with normal controls, 6033 genes

hash marks show 10 largest x valuesdifference estimates x[i] −−>

−4 −2 0 2 4

if allx[i]=0

gene 610x=5.29

Bayesian Analysis (for one gene)

• Assume δ has prior density π(δ)

• Prob model fδ(x) =1√

2πe−

12 (x−δ)2

• Marginal density m(x) =∫∞

−∞

fδ(x)π(δ) dδ

(overall density of x taking account of randomness in δ)

• Bayes posterior expectation (“Tweedie’s formula”)

E{δ|x} = x +d

dxlog m(x)

Empirical Bayes Analysis

• We don’t know prior π(δ), but histogram provides a smooth

estimate m(x) for m(x)

• Empirical Bayes estimate:

E{δi|xi} = xi +ddx

log m(x)∣∣∣∣∣xi

• Frequentist estimation of a Bayesian inference

−4 −2 0 2 4 6

Empirical Bayes estimates of E{delta|x}, the expected truedifference delta[i] given the observed difference x[i]

Estimates near 0 for the 93% of genes in [−2,2]difference value x[i] −−>

−−

x[610]=5.29

estimate= 4.07

Score Sheet

Bayes Frequentist

1. Belief (prior)

2. Principled

3. One distribution

4. Dynamic

5. Individual (subjective)

6. Aggressive

1. Behavior (method)

2. Opportunistic

3. Many distributions (bootstrap?)

4. Static

5. Community (objective)

6. Defensive

A 250-Year Argument - Stanford University

Transcript of A 250-Year Argument - Stanford University

A 250-Year Argument - Stanford University

Documents

Transcript of A 250-Year Argument - Stanford University

Agenda - Stanford University

High Landau Lifshitz Permeability Argumenthelper.ipam.ucla.edu/publications/meta2010/meta2010_8990.pdf•high-frequency magnetism • landau-lifshitz permeability argument • homogenization

Phys 250 Quantum Optics, Final

Longnecker Ch86 p1455-1471 - Stanford Medicine

registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250

Support Vector Machines - Department of Statistics - Stanford

Color-Octet J/ψ Production at Low p · Color-Octet J/ψProduction at Low p⊥ Wai-Keung Tang∗ Stanford Linear Accelerator Center, Stanford University, Stanford, CA 94309 M. V¨anttinen†

Bo’az Klartag Tel-Aviv University MRC Lecture, Stanford ...

Stanford Notes of Probability Theory

Pushdown Automata - The Stanford University InfoLab

,1)250( 7e&1,&2 352

Μεγάλο μίκυ 250

Always Valid Inference (Ramesh Johari, Stanford)

OUTLINE - Stanford University

MOS TRANSISTOR REVIEW - Stanford University

Complexity Analysis beyond Convex Optimization - Stanford University

Robot Control - Stanford School of Engineering - Stanford

Sustanon 250 Prodej Legal Anabolic Steroids

Low- κ Dielectrics - Stanford University Lowk.pdfDepartment of Electrical Engineering Stanford University ... almost every category of thermo-mechanical properties. ... Porous SiLK

Giorgio Gratta Physics Dept Stanford University