Bayesian and Frequentist Issues in Modern Inference

32
Bayesian and Frequentist Issues in Modern Inference Bradley Efron Stanford University

Transcript of Bayesian and Frequentist Issues in Modern Inference

Page 1: Bayesian and Frequentist Issues in Modern Inference

Bayesian and Frequentist Issues

in Modern Inference

Bradley Efron

Stanford University

Page 2: Bayesian and Frequentist Issues in Modern Inference

Small Data

Classical statistics Direct tests and estimates of individual

parameters within well-defined models (MLE, Neyman–Pearson)

Not much:

I data-based model selection

I Bayesian combination of related problems

Today Methodology (not Philosophy)

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 2 / 32

Page 3: Bayesian and Frequentist Issues in Modern Inference

Bayesian Inference

Parameter: µ ∈ Ω

Observed data: x

Prior: g(µ)

Probability distributions:fµ(x), µ ∈ Ω

Parameter of interest: θ = t(µ)

Eθ|x =

∫Ω

t(µ)fµ(x)g(µ) dµ/∫

Ω

fµ(x)g(µ) dµ

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 3 / 32

Page 4: Bayesian and Frequentist Issues in Modern Inference

Jeffreysonian Bayes Inference“Uninformative Priors”

What if we don’t know prior g?

Jeffreys: g(µ) =∣∣∣I(µ)

∣∣∣1/2 where I(µ) = cov∇µ log fµ(x)

(the Fisher information matrix)

Can still use Bayes theorem but how accurate are the estimates?

Frequentist variability of Et(µ)|x

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 4 / 32

Page 5: Bayesian and Frequentist Issues in Modern Inference

General Accuracy Formula

µ and x ∈ Rp

Vµ = covµ(x)

αx(µ) = ∇x log fµ(x) =(. . . ,

∂ log fµ(x)

∂xi, . . .

)T

LemmaE = E

t(µ)|x

has gradient ∇xE = cov

t(µ), αx(µ)|x

.

TheoremThe delta-method standard deviation of E is

sd(E) =[cov

t(µ), αx(µ)|x

T Vx covt(µ), αx(µ)|x

]1/2.

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 5 / 32

Page 6: Bayesian and Frequentist Issues in Modern Inference

Implementation

Posterior sample from µ|x

µ1, µ2, . . . , µB (MCMC?)

Each µi gives ti = t(µi) and

αi = αx(µi)

E =∑

ti/B Et(µ)|x

cov =∑B

i=1 (αi − α)(ti − t

) /B

sd =[covT Vx cov

]1/2

No additional sampling for cov

−4 −2 0 2

02

46

810

12

(mu[i], alpha[i], t[i])

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 6 / 32

Page 7: Bayesian and Frequentist Issues in Modern Inference

Diabetes DataEfron et al. (2004), “LARS”

n = 442 subjects

p = 10 predictors: age, sex, bmi, glu,. . .

Response: y = disease progression at one year

Model: yn×1

= Xn×p

βp×1

+ en×1

[e ∼ Nn(0, I)]

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 7 / 32

Page 8: Bayesian and Frequentist Issues in Modern Inference

Bayesian LassoPark and Casella (2008)

Model: y ∼ Nn(Xβ, I)

Prior: g(β) = e−γL1(β) [γ = 0.37]

Then posterior mode at Lasso βγ

Subject 125: θ125 = xT125β

How accurate are Bayes posterior inferences for θ125?

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 8 / 32

Page 9: Bayesian and Frequentist Issues in Modern Inference

Bayesian Analysis

MCMC: posterior sampleβi for i = 1,2, . . . ,10,000

Gives

θ125,i = xT

125βi , i = 1,2, . . . ,10,000

θ125,i ∼ 0.248 ± 0.072

General accuracy formula frequentist sd 0.071 for E = 0.248[αx(µ) = XT Xβ

]

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 9 / 32

Page 10: Bayesian and Frequentist Issues in Modern Inference

Posterior CDF for Subject 125

cdfy(c) = Prθ125 ≤ c |y

si =

1 as ti ≤ c

0 as ti > c

cdfy(c) =∑B

1 si/B

For c = 0.3:

cdfy(c)= 0.762 ± 0.304

Bayes frequentistestimate sd

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 10 / 32

Page 11: Bayesian and Frequentist Issues in Modern Inference

0.1 0.2 0.3 0.4

0.0

0.2

0.4

0.6

0.8

1.0

Posterior cdf for mu125, Diabetes data, 10000 MCMC draws,Prior exp−.37*L1(beta); verts are +− One Frequentist Standard Dev

Upper 95% credible limit is .342 +− .069c value

Pro

bm

u125

< c

| da

ta

][

.342

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 11 / 32

Page 12: Bayesian and Frequentist Issues in Modern Inference

Exponential Families

fα(β)

= eαT β−ψ(α)f0

(β), with α, β in Rp

Natural parameter α, sufficient statistic β, expectation β = Eαβ[

Poisson : fµ(x) = e−µµx/x! : x = β, µ = β, α = log(µ)]

General accuracy formula For E = Et(β)|β

,

sd(E) =cov

(t , α

∣∣∣β)TVα cov

(t , α

∣∣∣β)1/2

with Vα = covα=α

(β).

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 12 / 32

Page 13: Bayesian and Frequentist Issues in Modern Inference

Better Frequentist Inferences

For E = Et(β)

∣∣∣β in exfam fα(β)

= eαT β−ψ(α)f0

(β)

Parametric bootstrap fα(·)→[β∗1, β

2, . . . , β∗

j , . . . , β∗

J

]−→

[· · ·E∗j = E

t(β)

∣∣∣∣β∗j · · · ] −→ bootstrap conf int for E

Trouble Need new MCMC sample for each β∗j

Shortcut Reweight original MCMC sample (importance

sampling)

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 13 / 32

Page 14: Bayesian and Frequentist Issues in Modern Inference

Digression: Posterior Exponential Family

fα(β)

= eαT β−ψ(α)f0

(β): natural param α, suff stat β, “carrier” f0

(β)

Posterior exponential family

g(α|suff stat b) = e(b−β)Tα−φ(b)g

(α∣∣∣β)

natural param b, suff stat α, carrier g(α∣∣∣β)

Importance sampling Reweight the original MCMC realizations:

Eti |b =

B∑i=1

tiWi(b)

/ B∑i=1

Wi(b)[Wi(b) = e(b−β)

Tαi

]

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 14 / 32

Page 15: Bayesian and Frequentist Issues in Modern Inference

Bootstrap Intervals Without BootstrappingDiCiccio and Efron (1992)

“abc” Investigate Et |b for b near β

Requires p + 2 numerical 2nd derivatives of E function

Next: Applied to posterior cdf for mu125

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 15 / 32

Page 16: Bayesian and Frequentist Issues in Modern Inference

0.1 0.2 0.3 0.4

0.0

0.2

0.4

0.6

0.8

1.0

Heavy curve is posterior cdf for mu125, diabetes data.Vertical bars frequentist central 68% abc confidence intervals.

(Light lines show +− one frequentist standard error)

c value

Pro

bm

u125

< c

| da

ta

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 16 / 32

Page 17: Bayesian and Frequentist Issues in Modern Inference

Estimation After Model Selection

Usually:

(a) look at data

(b) choose model (linear, quad, cubic . . . ?)

(c) fit estimates using chosen model

(d) analyze as if pre-chosen

Today Include model selection process in the analysis

Question Effects on standard errors, confidence intervals, etc.?

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 17 / 32

Page 18: Bayesian and Frequentist Issues in Modern Inference

Cholesterol Data

n = 164 men took Cholestyramine for ∼ 7 years

x = compliance measure (adjusted: x ∼ N(0,1))

y = cholesterol decrease

Wish to estimate regression values

µj = Ey |x = xj for j = 1,2, . . . ,164

µ = (µ1, µ2, . . . , µ164)T

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 18 / 32

Page 19: Bayesian and Frequentist Issues in Modern Inference

−2 −1 0 1 2

050

100

Cholesterol data, n=164 subjects: cholesterol decrease plottedversus adjusted compliance; Green curve is OLS cubic regression;

Red points indicate 5 featured subjects

compliance

chol

este

rol d

ecre

ase

1

2

3

4

5

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 19 / 32

Page 20: Bayesian and Frequentist Issues in Modern Inference

Cp Selection Criterion

Regression model yn×1

= Xn×m

βm×1

+ en×1

[ei ∼ (0, σ2)

]Cp criterion

∥∥∥y − X β∥∥∥2

+ 2mσ2

β = OLS estimate, m = “degrees of freedom”

Model selection From possible models X1,X2,X3, . . .

choose the one minimizing Cp.

Then use OLS estimate from chosen model.

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 20 / 32

Page 21: Bayesian and Frequentist Issues in Modern Inference

Cp for Cholesterol Data

Model df Cp − 80000 (Boot %)

M1 (linear) 2 1132 (19%)

M2 (quad) 3 1412 (12%)

M3 (cubic) 4 667 (34%)

M4 (quartic) 5 1591 (8%)

M5 (quintic) 6 1811 (21%)

M6 (sextic) 7 2758 (6%)

(σ = 22 from “full model”M6)

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 21 / 32

Page 22: Bayesian and Frequentist Issues in Modern Inference

Nonparametric Bootstrap Analysis

data = (xi , yi), i = 1,2, . . . ,n = 164 gave original estimate

µ = X3β3

Bootstrap data set: data∗ =(xj , yj)

∗, j = 1,2, . . . ,n

where

(xj , yj)∗ drawn randomly and with replacement from data:

data∗ −→Cp

m∗ −→OLS

β∗m∗ −→ µ∗ = Xm∗ β∗

m∗

I did this all B = 4000 times.

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 22 / 32

Page 23: Bayesian and Frequentist Issues in Modern Inference

B=4000 nonparametric bootstrap replications for the model−selectedregression estimate of Subject 1; boot (m,stdev)=(−2.63,8.02);

76% of the replications less than original estimate 2.71

Red triangles are 2.5th and 97.5th boot percentilesbootstrap estimates for subject 1

Fre

quen

cy

−30 −20 −10 0 10 20

050

100

150

200

250

^ ^2.71

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 23 / 32

Page 24: Bayesian and Frequentist Issues in Modern Inference

−3 −2 −1 0 1 2 3

−3

−2

−1

01

23

Smooth Estimation Model: ' y ' is observed data; Ellipses indicate bootstrap distribution for ' y* ';

Red curves level surfaces of equal estimation for thetahat=t(y)

thetahat=t(y)

y

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 24 / 32

Page 25: Bayesian and Frequentist Issues in Modern Inference

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 25 / 32

Page 26: Bayesian and Frequentist Issues in Modern Inference

1 2 3 4 5 6

−40

−30

−20

−10

010

2030

Boxplot of Cp boot estimates for Subject 1; B=4000 bootreps;Red bars indicate selection proportions for Models 1−6

only 1/3 of the bootstrap replications chose Model 3selected model

subj

ect 1

est

imat

es

Model3

*********

MODEL 3

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 26 / 32

Page 27: Bayesian and Frequentist Issues in Modern Inference

Bootstrap Smoothing

Idea Replace original estimator t(y) with bootstrap average

s(y) =

B∑i=1

t(y∗i

) /B

Model averaging

Same as bagging (“bootstrap aggregation,” Breiman)

Removes discontinuities, reduces variance

Approximate confidence interval: s(y) ± 1.96 · sd

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 27 / 32

Page 28: Bayesian and Frequentist Issues in Modern Inference

Accuracy Theorem

Notation s0 = s(y), t ∗i = t(y∗i ), i = 1,2, . . .B

Y ∗ij = # of times jth data point appears in ith boot sample

covj =∑B

i=1 Y ∗ij ·(t ∗i − s0

) /B

[covariance Y ∗ij with t ∗i

]TheoremThe delta method standard deviation estimate for s0 is

sd =

n∑j=1

cov2j

1/2

,

always ≤

B∑i=1

(t ∗i − s0

)2 /B

1/2

, the boot stdev for t(y).

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 28 / 32

Page 29: Bayesian and Frequentist Issues in Modern Inference

Projection Interpretation

0

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 29 / 32

Page 30: Bayesian and Frequentist Issues in Modern Inference

1 2 3 4 5

0.0

0.2

0.4

0.6

0.8

1.0

1.2

Standard Deviation of smoothed estimate relative to original (Red)for five subjects; green line is stdev Naive Cubic Model

bottom numbers show original standard deviationsSubject number

Rel

ativ

e st

dev

7.9 3.9 4.1 4.7 6.8

*

* *

*

*

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 30 / 32

Page 31: Bayesian and Frequentist Issues in Modern Inference

Model Probability Estimates

34% of the 4000 bootreps chose the cubic model

Poor man’s Bayes posterior prob for “cubic”

How accurate is that 34%?

Apply accuracy theorem to indicator function for choosing “cubic”

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 31 / 32

Page 32: Bayesian and Frequentist Issues in Modern Inference

Model Boot % ± Standard Error

M1 (linear) 19% ±24

M2 (quad) 12% ±18

M3 (cubic) 34% ±24

M4 (quartic) 8% ±14

M5 (quintic) 21% ±27

M6 (sextic) 6% ±6

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 32 / 32