Bayesian and Frequentist Issues in Modern Inference
Transcript of Bayesian and Frequentist Issues in Modern Inference
Bayesian and Frequentist Issues
in Modern Inference
Bradley Efron
Stanford University
Small Data
Classical statistics Direct tests and estimates of individual
parameters within well-defined models (MLE, Neyman–Pearson)
Not much:
I data-based model selection
I Bayesian combination of related problems
Today Methodology (not Philosophy)
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 2 / 32
Bayesian Inference
Parameter: µ ∈ Ω
Observed data: x
Prior: g(µ)
Probability distributions:fµ(x), µ ∈ Ω
Parameter of interest: θ = t(µ)
Eθ|x =
∫Ω
t(µ)fµ(x)g(µ) dµ/∫
Ω
fµ(x)g(µ) dµ
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 3 / 32
Jeffreysonian Bayes Inference“Uninformative Priors”
What if we don’t know prior g?
Jeffreys: g(µ) =∣∣∣I(µ)
∣∣∣1/2 where I(µ) = cov∇µ log fµ(x)
(the Fisher information matrix)
Can still use Bayes theorem but how accurate are the estimates?
Frequentist variability of Et(µ)|x
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 4 / 32
General Accuracy Formula
µ and x ∈ Rp
Vµ = covµ(x)
αx(µ) = ∇x log fµ(x) =(. . . ,
∂ log fµ(x)
∂xi, . . .
)T
LemmaE = E
t(µ)|x
has gradient ∇xE = cov
t(µ), αx(µ)|x
.
TheoremThe delta-method standard deviation of E is
sd(E) =[cov
t(µ), αx(µ)|x
T Vx covt(µ), αx(µ)|x
]1/2.
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 5 / 32
Implementation
Posterior sample from µ|x
µ1, µ2, . . . , µB (MCMC?)
Each µi gives ti = t(µi) and
αi = αx(µi)
E =∑
ti/B Et(µ)|x
cov =∑B
i=1 (αi − α)(ti − t
) /B
sd =[covT Vx cov
]1/2
No additional sampling for cov
−4 −2 0 2
02
46
810
12
(mu[i], alpha[i], t[i])
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 6 / 32
Diabetes DataEfron et al. (2004), “LARS”
n = 442 subjects
p = 10 predictors: age, sex, bmi, glu,. . .
Response: y = disease progression at one year
Model: yn×1
= Xn×p
βp×1
+ en×1
[e ∼ Nn(0, I)]
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 7 / 32
Bayesian LassoPark and Casella (2008)
Model: y ∼ Nn(Xβ, I)
Prior: g(β) = e−γL1(β) [γ = 0.37]
Then posterior mode at Lasso βγ
Subject 125: θ125 = xT125β
How accurate are Bayes posterior inferences for θ125?
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 8 / 32
Bayesian Analysis
MCMC: posterior sampleβi for i = 1,2, . . . ,10,000
Gives
θ125,i = xT
125βi , i = 1,2, . . . ,10,000
θ125,i ∼ 0.248 ± 0.072
General accuracy formula frequentist sd 0.071 for E = 0.248[αx(µ) = XT Xβ
]
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 9 / 32
Posterior CDF for Subject 125
cdfy(c) = Prθ125 ≤ c |y
si =
1 as ti ≤ c
0 as ti > c
cdfy(c) =∑B
1 si/B
For c = 0.3:
cdfy(c)= 0.762 ± 0.304
Bayes frequentistestimate sd
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 10 / 32
0.1 0.2 0.3 0.4
0.0
0.2
0.4
0.6
0.8
1.0
Posterior cdf for mu125, Diabetes data, 10000 MCMC draws,Prior exp−.37*L1(beta); verts are +− One Frequentist Standard Dev
Upper 95% credible limit is .342 +− .069c value
Pro
bm
u125
< c
| da
ta
][
.342
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 11 / 32
Exponential Families
fα(β)
= eαT β−ψ(α)f0
(β), with α, β in Rp
Natural parameter α, sufficient statistic β, expectation β = Eαβ[
Poisson : fµ(x) = e−µµx/x! : x = β, µ = β, α = log(µ)]
General accuracy formula For E = Et(β)|β
,
sd(E) =cov
(t , α
∣∣∣β)TVα cov
(t , α
∣∣∣β)1/2
with Vα = covα=α
(β).
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 12 / 32
Better Frequentist Inferences
For E = Et(β)
∣∣∣β in exfam fα(β)
= eαT β−ψ(α)f0
(β)
Parametric bootstrap fα(·)→[β∗1, β
∗
2, . . . , β∗
j , . . . , β∗
J
]−→
[· · ·E∗j = E
t(β)
∣∣∣∣β∗j · · · ] −→ bootstrap conf int for E
Trouble Need new MCMC sample for each β∗j
Shortcut Reweight original MCMC sample (importance
sampling)
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 13 / 32
Digression: Posterior Exponential Family
fα(β)
= eαT β−ψ(α)f0
(β): natural param α, suff stat β, “carrier” f0
(β)
Posterior exponential family
g(α|suff stat b) = e(b−β)Tα−φ(b)g
(α∣∣∣β)
natural param b, suff stat α, carrier g(α∣∣∣β)
Importance sampling Reweight the original MCMC realizations:
Eti |b =
B∑i=1
tiWi(b)
/ B∑i=1
Wi(b)[Wi(b) = e(b−β)
Tαi
]
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 14 / 32
Bootstrap Intervals Without BootstrappingDiCiccio and Efron (1992)
“abc” Investigate Et |b for b near β
Requires p + 2 numerical 2nd derivatives of E function
Next: Applied to posterior cdf for mu125
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 15 / 32
0.1 0.2 0.3 0.4
0.0
0.2
0.4
0.6
0.8
1.0
Heavy curve is posterior cdf for mu125, diabetes data.Vertical bars frequentist central 68% abc confidence intervals.
(Light lines show +− one frequentist standard error)
c value
Pro
bm
u125
< c
| da
ta
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 16 / 32
Estimation After Model Selection
Usually:
(a) look at data
(b) choose model (linear, quad, cubic . . . ?)
(c) fit estimates using chosen model
(d) analyze as if pre-chosen
Today Include model selection process in the analysis
Question Effects on standard errors, confidence intervals, etc.?
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 17 / 32
Cholesterol Data
n = 164 men took Cholestyramine for ∼ 7 years
x = compliance measure (adjusted: x ∼ N(0,1))
y = cholesterol decrease
Wish to estimate regression values
µj = Ey |x = xj for j = 1,2, . . . ,164
µ = (µ1, µ2, . . . , µ164)T
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 18 / 32
−2 −1 0 1 2
050
100
Cholesterol data, n=164 subjects: cholesterol decrease plottedversus adjusted compliance; Green curve is OLS cubic regression;
Red points indicate 5 featured subjects
compliance
chol
este
rol d
ecre
ase
1
2
3
4
5
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 19 / 32
Cp Selection Criterion
Regression model yn×1
= Xn×m
βm×1
+ en×1
[ei ∼ (0, σ2)
]Cp criterion
∥∥∥y − X β∥∥∥2
+ 2mσ2
β = OLS estimate, m = “degrees of freedom”
Model selection From possible models X1,X2,X3, . . .
choose the one minimizing Cp.
Then use OLS estimate from chosen model.
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 20 / 32
Cp for Cholesterol Data
Model df Cp − 80000 (Boot %)
M1 (linear) 2 1132 (19%)
M2 (quad) 3 1412 (12%)
M3 (cubic) 4 667 (34%)
M4 (quartic) 5 1591 (8%)
M5 (quintic) 6 1811 (21%)
M6 (sextic) 7 2758 (6%)
(σ = 22 from “full model”M6)
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 21 / 32
Nonparametric Bootstrap Analysis
data = (xi , yi), i = 1,2, . . . ,n = 164 gave original estimate
µ = X3β3
Bootstrap data set: data∗ =(xj , yj)
∗, j = 1,2, . . . ,n
where
(xj , yj)∗ drawn randomly and with replacement from data:
data∗ −→Cp
m∗ −→OLS
β∗m∗ −→ µ∗ = Xm∗ β∗
m∗
I did this all B = 4000 times.
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 22 / 32
B=4000 nonparametric bootstrap replications for the model−selectedregression estimate of Subject 1; boot (m,stdev)=(−2.63,8.02);
76% of the replications less than original estimate 2.71
Red triangles are 2.5th and 97.5th boot percentilesbootstrap estimates for subject 1
Fre
quen
cy
−30 −20 −10 0 10 20
050
100
150
200
250
^ ^2.71
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 23 / 32
−3 −2 −1 0 1 2 3
−3
−2
−1
01
23
Smooth Estimation Model: ' y ' is observed data; Ellipses indicate bootstrap distribution for ' y* ';
Red curves level surfaces of equal estimation for thetahat=t(y)
thetahat=t(y)
y
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 24 / 32
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 25 / 32
1 2 3 4 5 6
−40
−30
−20
−10
010
2030
Boxplot of Cp boot estimates for Subject 1; B=4000 bootreps;Red bars indicate selection proportions for Models 1−6
only 1/3 of the bootstrap replications chose Model 3selected model
subj
ect 1
est
imat
es
Model3
*********
MODEL 3
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 26 / 32
Bootstrap Smoothing
Idea Replace original estimator t(y) with bootstrap average
s(y) =
B∑i=1
t(y∗i
) /B
Model averaging
Same as bagging (“bootstrap aggregation,” Breiman)
Removes discontinuities, reduces variance
Approximate confidence interval: s(y) ± 1.96 · sd
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 27 / 32
Accuracy Theorem
Notation s0 = s(y), t ∗i = t(y∗i ), i = 1,2, . . .B
Y ∗ij = # of times jth data point appears in ith boot sample
covj =∑B
i=1 Y ∗ij ·(t ∗i − s0
) /B
[covariance Y ∗ij with t ∗i
]TheoremThe delta method standard deviation estimate for s0 is
sd =
n∑j=1
cov2j
1/2
,
always ≤
B∑i=1
(t ∗i − s0
)2 /B
1/2
, the boot stdev for t(y).
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 28 / 32
Projection Interpretation
0
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 29 / 32
1 2 3 4 5
0.0
0.2
0.4
0.6
0.8
1.0
1.2
Standard Deviation of smoothed estimate relative to original (Red)for five subjects; green line is stdev Naive Cubic Model
bottom numbers show original standard deviationsSubject number
Rel
ativ
e st
dev
7.9 3.9 4.1 4.7 6.8
*
* *
*
*
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 30 / 32
Model Probability Estimates
34% of the 4000 bootreps chose the cubic model
Poor man’s Bayes posterior prob for “cubic”
How accurate is that 34%?
Apply accuracy theorem to indicator function for choosing “cubic”
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 31 / 32
Model Boot % ± Standard Error
M1 (linear) 19% ±24
M2 (quad) 12% ±18
M3 (cubic) 34% ±24
M4 (quartic) 8% ±14
M5 (quintic) 21% ±27
M6 (sextic) 6% ±6
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 32 / 32