Bayesian and Frequentist Issues in Modern Inference

Post on 12-Sep-2021

8 views 0 download

Transcript of Bayesian and Frequentist Issues in Modern Inference

Bayesian and Frequentist Issues

in Modern Inference

Bradley Efron

Stanford University

Small Data

Classical statistics Direct tests and estimates of individual

parameters within well-defined models (MLE, Neyman–Pearson)

Not much:

I data-based model selection

I Bayesian combination of related problems

Today Methodology (not Philosophy)

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 2 / 32

Bayesian Inference

Parameter: µ ∈ Ω

Observed data: x

Prior: g(µ)

Probability distributions:fµ(x), µ ∈ Ω

Parameter of interest: θ = t(µ)

Eθ|x =

∫Ω

t(µ)fµ(x)g(µ) dµ/∫

Ω

fµ(x)g(µ) dµ

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 3 / 32

Jeffreysonian Bayes Inference“Uninformative Priors”

What if we don’t know prior g?

Jeffreys: g(µ) =∣∣∣I(µ)

∣∣∣1/2 where I(µ) = cov∇µ log fµ(x)

(the Fisher information matrix)

Can still use Bayes theorem but how accurate are the estimates?

Frequentist variability of Et(µ)|x

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 4 / 32

General Accuracy Formula

µ and x ∈ Rp

Vµ = covµ(x)

αx(µ) = ∇x log fµ(x) =(. . . ,

∂ log fµ(x)

∂xi, . . .

)T

LemmaE = E

t(µ)|x

has gradient ∇xE = cov

t(µ), αx(µ)|x

.

TheoremThe delta-method standard deviation of E is

sd(E) =[cov

t(µ), αx(µ)|x

T Vx covt(µ), αx(µ)|x

]1/2.

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 5 / 32

Implementation

Posterior sample from µ|x

µ1, µ2, . . . , µB (MCMC?)

Each µi gives ti = t(µi) and

αi = αx(µi)

E =∑

ti/B Et(µ)|x

cov =∑B

i=1 (αi − α)(ti − t

) /B

sd =[covT Vx cov

]1/2

No additional sampling for cov

−4 −2 0 2

02

46

810

12

(mu[i], alpha[i], t[i])

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 6 / 32

Diabetes DataEfron et al. (2004), “LARS”

n = 442 subjects

p = 10 predictors: age, sex, bmi, glu,. . .

Response: y = disease progression at one year

Model: yn×1

= Xn×p

βp×1

+ en×1

[e ∼ Nn(0, I)]

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 7 / 32

Bayesian LassoPark and Casella (2008)

Model: y ∼ Nn(Xβ, I)

Prior: g(β) = e−γL1(β) [γ = 0.37]

Then posterior mode at Lasso βγ

Subject 125: θ125 = xT125β

How accurate are Bayes posterior inferences for θ125?

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 8 / 32

Bayesian Analysis

MCMC: posterior sampleβi for i = 1,2, . . . ,10,000

Gives

θ125,i = xT

125βi , i = 1,2, . . . ,10,000

θ125,i ∼ 0.248 ± 0.072

General accuracy formula frequentist sd 0.071 for E = 0.248[αx(µ) = XT Xβ

]

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 9 / 32

Posterior CDF for Subject 125

cdfy(c) = Prθ125 ≤ c |y

si =

1 as ti ≤ c

0 as ti > c

cdfy(c) =∑B

1 si/B

For c = 0.3:

cdfy(c)= 0.762 ± 0.304

Bayes frequentistestimate sd

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 10 / 32

0.1 0.2 0.3 0.4

0.0

0.2

0.4

0.6

0.8

1.0

Posterior cdf for mu125, Diabetes data, 10000 MCMC draws,Prior exp−.37*L1(beta); verts are +− One Frequentist Standard Dev

Upper 95% credible limit is .342 +− .069c value

Pro

bm

u125

< c

| da

ta

][

.342

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 11 / 32

Exponential Families

fα(β)

= eαT β−ψ(α)f0

(β), with α, β in Rp

Natural parameter α, sufficient statistic β, expectation β = Eαβ[

Poisson : fµ(x) = e−µµx/x! : x = β, µ = β, α = log(µ)]

General accuracy formula For E = Et(β)|β

,

sd(E) =cov

(t , α

∣∣∣β)TVα cov

(t , α

∣∣∣β)1/2

with Vα = covα=α

(β).

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 12 / 32

Better Frequentist Inferences

For E = Et(β)

∣∣∣β in exfam fα(β)

= eαT β−ψ(α)f0

(β)

Parametric bootstrap fα(·)→[β∗1, β

2, . . . , β∗

j , . . . , β∗

J

]−→

[· · ·E∗j = E

t(β)

∣∣∣∣β∗j · · · ] −→ bootstrap conf int for E

Trouble Need new MCMC sample for each β∗j

Shortcut Reweight original MCMC sample (importance

sampling)

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 13 / 32

Digression: Posterior Exponential Family

fα(β)

= eαT β−ψ(α)f0

(β): natural param α, suff stat β, “carrier” f0

(β)

Posterior exponential family

g(α|suff stat b) = e(b−β)Tα−φ(b)g

(α∣∣∣β)

natural param b, suff stat α, carrier g(α∣∣∣β)

Importance sampling Reweight the original MCMC realizations:

Eti |b =

B∑i=1

tiWi(b)

/ B∑i=1

Wi(b)[Wi(b) = e(b−β)

Tαi

]

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 14 / 32

Bootstrap Intervals Without BootstrappingDiCiccio and Efron (1992)

“abc” Investigate Et |b for b near β

Requires p + 2 numerical 2nd derivatives of E function

Next: Applied to posterior cdf for mu125

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 15 / 32

0.1 0.2 0.3 0.4

0.0

0.2

0.4

0.6

0.8

1.0

Heavy curve is posterior cdf for mu125, diabetes data.Vertical bars frequentist central 68% abc confidence intervals.

(Light lines show +− one frequentist standard error)

c value

Pro

bm

u125

< c

| da

ta

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 16 / 32

Estimation After Model Selection

Usually:

(a) look at data

(b) choose model (linear, quad, cubic . . . ?)

(c) fit estimates using chosen model

(d) analyze as if pre-chosen

Today Include model selection process in the analysis

Question Effects on standard errors, confidence intervals, etc.?

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 17 / 32

Cholesterol Data

n = 164 men took Cholestyramine for ∼ 7 years

x = compliance measure (adjusted: x ∼ N(0,1))

y = cholesterol decrease

Wish to estimate regression values

µj = Ey |x = xj for j = 1,2, . . . ,164

µ = (µ1, µ2, . . . , µ164)T

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 18 / 32

−2 −1 0 1 2

050

100

Cholesterol data, n=164 subjects: cholesterol decrease plottedversus adjusted compliance; Green curve is OLS cubic regression;

Red points indicate 5 featured subjects

compliance

chol

este

rol d

ecre

ase

1

2

3

4

5

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 19 / 32

Cp Selection Criterion

Regression model yn×1

= Xn×m

βm×1

+ en×1

[ei ∼ (0, σ2)

]Cp criterion

∥∥∥y − X β∥∥∥2

+ 2mσ2

β = OLS estimate, m = “degrees of freedom”

Model selection From possible models X1,X2,X3, . . .

choose the one minimizing Cp.

Then use OLS estimate from chosen model.

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 20 / 32

Cp for Cholesterol Data

Model df Cp − 80000 (Boot %)

M1 (linear) 2 1132 (19%)

M2 (quad) 3 1412 (12%)

M3 (cubic) 4 667 (34%)

M4 (quartic) 5 1591 (8%)

M5 (quintic) 6 1811 (21%)

M6 (sextic) 7 2758 (6%)

(σ = 22 from “full model”M6)

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 21 / 32

Nonparametric Bootstrap Analysis

data = (xi , yi), i = 1,2, . . . ,n = 164 gave original estimate

µ = X3β3

Bootstrap data set: data∗ =(xj , yj)

∗, j = 1,2, . . . ,n

where

(xj , yj)∗ drawn randomly and with replacement from data:

data∗ −→Cp

m∗ −→OLS

β∗m∗ −→ µ∗ = Xm∗ β∗

m∗

I did this all B = 4000 times.

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 22 / 32

B=4000 nonparametric bootstrap replications for the model−selectedregression estimate of Subject 1; boot (m,stdev)=(−2.63,8.02);

76% of the replications less than original estimate 2.71

Red triangles are 2.5th and 97.5th boot percentilesbootstrap estimates for subject 1

Fre

quen

cy

−30 −20 −10 0 10 20

050

100

150

200

250

^ ^2.71

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 23 / 32

−3 −2 −1 0 1 2 3

−3

−2

−1

01

23

Smooth Estimation Model: ' y ' is observed data; Ellipses indicate bootstrap distribution for ' y* ';

Red curves level surfaces of equal estimation for thetahat=t(y)

thetahat=t(y)

y

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 24 / 32

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 25 / 32

1 2 3 4 5 6

−40

−30

−20

−10

010

2030

Boxplot of Cp boot estimates for Subject 1; B=4000 bootreps;Red bars indicate selection proportions for Models 1−6

only 1/3 of the bootstrap replications chose Model 3selected model

subj

ect 1

est

imat

es

Model3

*********

MODEL 3

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 26 / 32

Bootstrap Smoothing

Idea Replace original estimator t(y) with bootstrap average

s(y) =

B∑i=1

t(y∗i

) /B

Model averaging

Same as bagging (“bootstrap aggregation,” Breiman)

Removes discontinuities, reduces variance

Approximate confidence interval: s(y) ± 1.96 · sd

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 27 / 32

Accuracy Theorem

Notation s0 = s(y), t ∗i = t(y∗i ), i = 1,2, . . .B

Y ∗ij = # of times jth data point appears in ith boot sample

covj =∑B

i=1 Y ∗ij ·(t ∗i − s0

) /B

[covariance Y ∗ij with t ∗i

]TheoremThe delta method standard deviation estimate for s0 is

sd =

n∑j=1

cov2j

1/2

,

always ≤

B∑i=1

(t ∗i − s0

)2 /B

1/2

, the boot stdev for t(y).

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 28 / 32

Projection Interpretation

0

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 29 / 32

1 2 3 4 5

0.0

0.2

0.4

0.6

0.8

1.0

1.2

Standard Deviation of smoothed estimate relative to original (Red)for five subjects; green line is stdev Naive Cubic Model

bottom numbers show original standard deviationsSubject number

Rel

ativ

e st

dev

7.9 3.9 4.1 4.7 6.8

*

* *

*

*

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 30 / 32

Model Probability Estimates

34% of the 4000 bootreps chose the cubic model

Poor man’s Bayes posterior prob for “cubic”

How accurate is that 34%?

Apply accuracy theorem to indicator function for choosing “cubic”

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 31 / 32

Model Boot % ± Standard Error

M1 (linear) 19% ±24

M2 (quad) 12% ±18

M3 (cubic) 34% ±24

M4 (quartic) 8% ±14

M5 (quintic) 21% ±27

M6 (sextic) 6% ±6

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 32 / 32