Conversations with Carl Morris - Stanford...

32
Three 2nd Thoughts on Empirical Bayes Inference —Conversations with Carl Morris Bradley Efron Stanford University

Transcript of Conversations with Carl Morris - Stanford...

Page 1: Conversations with Carl Morris - Stanford Universityfinmath.stanford.edu/~ckirby/brad/talks/2015Three2ndThoughts.pdfage sex bmi map tc ldl hdl tch ltg glu prog 59 1 32.1 101 157 93.2

Three 2nd Thoughts on Empirical Bayes Inference

— Conversations with CarlMorris—

Bradley Efron

Stanford University

Page 2: Conversations with Carl Morris - Stanford Universityfinmath.stanford.edu/~ckirby/brad/talks/2015Three2ndThoughts.pdfage sex bmi map tc ldl hdl tch ltg glu prog 59 1 32.1 101 157 93.2

Empirical Bayes (Robbins, 1950s)

Unknown prior density g(θ) gives θ1, θ2, . . . , θN (unobserved)

Observations x1, x2, . . . , xN with xiind∼ p(xi |θi) (e.g., xi ∼ Poi(θi))

The Goal— To estimate the parameters θi

Amazing Fact: For large N we can nearly achieve Bayes risk

“Large parallel studies contain their own Bayesian information.”

Bradley Efron Stanford University

Three 2nd Thoughts on EB Inference 2 / 32

Page 3: Conversations with Carl Morris - Stanford Universityfinmath.stanford.edu/~ckirby/brad/talks/2015Three2ndThoughts.pdfage sex bmi map tc ldl hdl tch ltg glu prog 59 1 32.1 101 157 93.2

“Normal Normal Case”:

θi ∼ N(M,A) and xi ∼ N(θi,1)

Bayes Rule θBayesi = E{θi |xi} = M + (1 − B)(xi −M)

B = 1/(A + 1) [shrinkage factor A/(A + 1)]

Bayes Risk RBayes = E

n∑1

(θi − θ

Bayesi

)2 = N(1 − B)

MLE θMLEi = xi : RMLE = N

RBayes/RMLE = 1 − B

Bayes saves proportion B of the risk

Bradley Efron Stanford University

Three 2nd Thoughts on EB Inference 3 / 32

Page 4: Conversations with Carl Morris - Stanford Universityfinmath.stanford.edu/~ckirby/brad/talks/2015Three2ndThoughts.pdfage sex bmi map tc ldl hdl tch ltg glu prog 59 1 32.1 101 157 93.2

The James–Stein Estimator (1961)

James–Stein θJSi = M +

(1 − B

) (xi − M

)where M = x and B = (N − 3)

/∑N1

(xi − M

)2(unbiased ests)

Risk: RJS/RBayes = 1 + 3

/(N · A)

For N = 18, A = 1, JS loses 1/6 of Bayes savings

“Shrinkage estimation”

Theorem

{∑ (θi − θJS

i

)2}< Eθ

{∑ (θi − θMLE

i

)2}

always!

Bradley Efron Stanford University

Three 2nd Thoughts on EB Inference 4 / 32

Page 5: Conversations with Carl Morris - Stanford Universityfinmath.stanford.edu/~ckirby/brad/talks/2015Three2ndThoughts.pdfage sex bmi map tc ldl hdl tch ltg glu prog 59 1 32.1 101 157 93.2

The 18 baseball players

MLE Truth JS

Clemente .400 .346 .294F. Robinson .378 .298 .289F. Howard .356 .276 .285Johnstone .333 .222 .280...

......

...E. Rodriguez .222 .226 .256Campaneris .200 .285 .252Munson .178 .316 .247Alvis .156 .200 .242

Squared error: .075 .021

Bradley Efron Stanford University

Three 2nd Thoughts on EB Inference 5 / 32

Page 6: Conversations with Carl Morris - Stanford Universityfinmath.stanford.edu/~ckirby/brad/talks/2015Three2ndThoughts.pdfage sex bmi map tc ldl hdl tch ltg glu prog 59 1 32.1 101 157 93.2

Learning From the Experience of Others

Why does Clemente’s good performance increase Munson’s

estimate?

Parallel data sets let you “learn from the experience of others”

Which others?

Bradley Efron Stanford University

Three 2nd Thoughts on EB Inference 6 / 32

Page 7: Conversations with Carl Morris - Stanford Universityfinmath.stanford.edu/~ckirby/brad/talks/2015Three2ndThoughts.pdfage sex bmi map tc ldl hdl tch ltg glu prog 59 1 32.1 101 157 93.2

Relevance (Efron–Morris 1972)

θreli = M +

[1 − ρ(xi)B

] (xi − M

)Relevance function ρ(·) decreases with

∣∣∣xi − M∣∣∣

Limited Translation “Never shrink more than one unit

away from MLE” −→ if xi ∼ N(θi , σ2) then “unit” = σ

Clemente: θreli = 0.334 (cf. θJS

i = 0.294)

Loses about 10% of θJS savings

Bradley Efron Stanford University

Three 2nd Thoughts on EB Inference 7 / 32

Page 8: Conversations with Carl Morris - Stanford Universityfinmath.stanford.edu/~ckirby/brad/talks/2015Three2ndThoughts.pdfage sex bmi map tc ldl hdl tch ltg glu prog 59 1 32.1 101 157 93.2

False Discovery Rates (Benjamini–Hochberg 1995)

EB Testing Which of N cases are “significant”?

DTI Study: 6 dyslexics vs 6 controls, N = 15,443 voxels

zinull∼ N(0,1) for ith voxel

Fdr All z’s determine each zi ’s significance

“locfdr”: 174 voxels with zi ≥ 3.10 have Fdr ≤ 0.10

Bradley Efron Stanford University

Three 2nd Thoughts on EB Inference 8 / 32

Page 9: Conversations with Carl Morris - Stanford Universityfinmath.stanford.edu/~ckirby/brad/talks/2015Three2ndThoughts.pdfage sex bmi map tc ldl hdl tch ltg glu prog 59 1 32.1 101 157 93.2

DTI data: z−scores for 15443 voxels;174 voxels with z >= 3.10 have Fdr <= .10

z−score

Fre

quen

cy

−4 −2 0 2 4

020

040

060

080

010

00

| |^ ^||| | ||| | || ||| || ||| | || | |||| | || | | ||| | |||| || ||| || |||| || ||| || ||| |||| || | ||||| | || || | ||| | ||||| || |||| || || | || | | | |||||||| |||| | || | ||||| || || | | |||| || || ||| | || | |||| |||||| | ||| | ||| | |||| | ||| ||

3.10Fdr <= .10

Bradley Efron Stanford University

Three 2nd Thoughts on EB Inference 9 / 32

Page 10: Conversations with Carl Morris - Stanford Universityfinmath.stanford.edu/~ckirby/brad/talks/2015Three2ndThoughts.pdfage sex bmi map tc ldl hdl tch ltg glu prog 59 1 32.1 101 157 93.2

20 40 60 80

−2

02

4

distance x −−>

z−va

lues

16%ile

84%ile

median

**

*

**

**

* **

* **

*

*

* ***

*

*

**

*

*

*

**

*

* **

* *

**

* *

* ** *

*

***

*

*

* * * *

*

*

**

* * *

* **

**

* *

** *

*

*

* **

**

*

* **

*

* *

**

** **

** *

* ** *

*

*

*

**

**

*

* *

**

**

* ** * *

** ** *

*

**

**

** **

** *

**

* ** *

*

*

**

* *

*

**

**

*

***

* **

*

* * * **

**

**

**

**

*

*

*

*

**

**** *

*

**

DTI z-values versus distance from back of brain

Bradley Efron Stanford University

Three 2nd Thoughts on EB Inference 10 / 32

Page 11: Conversations with Carl Morris - Stanford Universityfinmath.stanford.edu/~ckirby/brad/talks/2015Three2ndThoughts.pdfage sex bmi map tc ldl hdl tch ltg glu prog 59 1 32.1 101 157 93.2

Relevance (Efron 2010, §10.3)

xi = distance of voxel i from back of brain

Target voxel i0

Voxeli counts amount ρ(xi − xi0) toward Fdri

Example ρ = 1 if |xi − xi0 | ≤ 10, ρ = 0 otherwise

or ρ = exp(|xi − xi0 |/10)

Kicking the can down the street . . .

Bradley Efron Stanford University

Three 2nd Thoughts on EB Inference 11 / 32

Page 12: Conversations with Carl Morris - Stanford Universityfinmath.stanford.edu/~ckirby/brad/talks/2015Three2ndThoughts.pdfage sex bmi map tc ldl hdl tch ltg glu prog 59 1 32.1 101 157 93.2

Second 2nd Thought: Ridge Regression

(Hoerl–Kinnard 1970)

OLS y = Xn×p

β+ ε with εiind∼ (0,1)

β = S−1X ′y [S = (X ′X)]

Ridge Estimate

β(λ) = (S + λI)−1S β

Shrinks estimate β toward zero

Empirical Bayes data-based choice of λ

Example Diabetes Study: n = 442, p = 10 predictors

Bradley Efron Stanford University

Three 2nd Thoughts on EB Inference 12 / 32

Page 13: Conversations with Carl Morris - Stanford Universityfinmath.stanford.edu/~ckirby/brad/talks/2015Three2ndThoughts.pdfage sex bmi map tc ldl hdl tch ltg glu prog 59 1 32.1 101 157 93.2

Diabetes study: First 7 of n = 442 patients

(predict prog from the 10 covariates)

age sex bmi map tc ldl hdl tch ltg glu prog

59 1 32.1 101 157 93.2 38 4 2.11 87 15148 0 21.6 87 183 103.2 70 3 1.69 69 7572 1 30.5 93 156 93.6 41 4 2.03 85 14124 0 25.3 84 198 131.4 40 5 2.12 89 20650 0 23.0 101 192 125.4 52 4 1.86 80 13523 0 22.6 89 139 64.8 61 2 1.82 68 9736 1 22.0 90 160 99.6 50 3 1.72 82 138

Bradley Efron Stanford University

Three 2nd Thoughts on EB Inference 13 / 32

Page 14: Conversations with Carl Morris - Stanford Universityfinmath.stanford.edu/~ckirby/brad/talks/2015Three2ndThoughts.pdfage sex bmi map tc ldl hdl tch ltg glu prog 59 1 32.1 101 157 93.2

0.00 0.05 0.10 0.15 0.20 0.25

−50

00

500

Ridge trace for standardized diabetes data

lambda

beta

coe

ffici

ents

age

sex

bmi

map

tc

ldl

hdltch

ltg

gluage

sex

bmi

map

tc

ldl

hdltch

ltg

glu

Bradley Efron Stanford University

Three 2nd Thoughts on EB Inference 14 / 32

Page 15: Conversations with Carl Morris - Stanford Universityfinmath.stanford.edu/~ckirby/brad/talks/2015Three2ndThoughts.pdfage sex bmi map tc ldl hdl tch ltg glu prog 59 1 32.1 101 157 93.2

Comparison with James–Stein

βJS =

1 − p − 2

β′Sβ

β = 0.97β for diabetes

µJS = X βJS guaranteed to beat µ = X β

No such result for ridge regression and no automatic choice of λ

But. . . there’s more than one way to shrink a cat

Bradley Efron Stanford University

Three 2nd Thoughts on EB Inference 15 / 32

Page 16: Conversations with Carl Morris - Stanford Universityfinmath.stanford.edu/~ckirby/brad/talks/2015Three2ndThoughts.pdfage sex bmi map tc ldl hdl tch ltg glu prog 59 1 32.1 101 157 93.2

Estimate and stdevs for components of β

betamle betaridge sdmle sdridge

age −10.0 1.3 59.7 52.7sex −239.8 −207.2 61.2 53.2bmi 519.8 489.7 66.5 56.3map 324.4 301.8 65.3 55.7tc −792.2 −83.5 416.2 43.6ldl 476.7 −70.8 338.6 52.4hdl 101.0 −188.7 212.3 58.4tch 177.1 115.7 161.3 70.8ltg 751.3 443.8 171.7 58.4glu 67.6 86.7 65.9 56.6

Bradley Efron Stanford University

Three 2nd Thoughts on EB Inference 16 / 32

Page 17: Conversations with Carl Morris - Stanford Universityfinmath.stanford.edu/~ckirby/brad/talks/2015Three2ndThoughts.pdfage sex bmi map tc ldl hdl tch ltg glu prog 59 1 32.1 101 157 93.2

Three Uses of Regression Rules

Given any vector v of covariates, y = r(v)

e.g., y = v′βJS or v′β(λ)

1. Prediction [JS]

Want small prediction error E(y − y)2 (y the response at v)

2. Response Surface Estimation

Want y accurate for E{y |v}

3. Explanation [Ridge]

Form of r(v) shows importance of the individual covariates

Bradley Efron Stanford University

Three 2nd Thoughts on EB Inference 17 / 32

Page 18: Conversations with Carl Morris - Stanford Universityfinmath.stanford.edu/~ckirby/brad/talks/2015Three2ndThoughts.pdfage sex bmi map tc ldl hdl tch ltg glu prog 59 1 32.1 101 157 93.2

Ridge Regression and Regularization

Alternate Formβ(λ) = arg minβ

{‖y−Xβ‖2+λ‖β‖2

}↖ ↗

OLS penalty

“Regularizes” OLS by penalizing large values of ‖β‖

Equivalently: Bayes vs prior Np(0, I/λ)

Lasso

β(λ) = arg minβ

‖y − Xβ‖2 + λp∑1

|βj |

Now sometimes shrinkage all the way to zero (“Sparsity”)

Bradley Efron Stanford University

Three 2nd Thoughts on EB Inference 18 / 32

Page 19: Conversations with Carl Morris - Stanford Universityfinmath.stanford.edu/~ckirby/brad/talks/2015Three2ndThoughts.pdfage sex bmi map tc ldl hdl tch ltg glu prog 59 1 32.1 101 157 93.2

0 2 4 6 8 10 12

−50

00

500

step

beta

val

s

age

sex

bmi

map

tc

ldl

hdltch

ltg

glu

Inf 889 453 316 130 89 69 20 5.5 5.1 2.2 1.3 OLSlambda−−>

Lasso coefficients as function of regulizer lambda;Cp calculations say lambda=20 best

Bradley Efron Stanford University

Three 2nd Thoughts on EB Inference 19 / 32

Page 20: Conversations with Carl Morris - Stanford Universityfinmath.stanford.edu/~ckirby/brad/talks/2015Three2ndThoughts.pdfage sex bmi map tc ldl hdl tch ltg glu prog 59 1 32.1 101 157 93.2

Comparison of β estimates

mle ridge lasso JS

age −10.0 1.3 0.0 −9.7sex −239.8 −207.2 −197.8 −232.6bmi 519.8 489.7 522.3 504.2map 324.4 301.8 297.2 314.7tc −792.2 −83.5 −103.9 −768.4ldl 476.7 −70.8 0.0 462.4hdl 101.0 −188.7 −223.9 98.0tch 177.1 115.7 0.0 171.8ltg 751.3 443.8 514.7 728.7glu 67.6 86.7 54.8 65.6

Bradley Efron Stanford University

Three 2nd Thoughts on EB Inference 20 / 32

Page 21: Conversations with Carl Morris - Stanford Universityfinmath.stanford.edu/~ckirby/brad/talks/2015Three2ndThoughts.pdfage sex bmi map tc ldl hdl tch ltg glu prog 59 1 32.1 101 157 93.2

Robbins versus Stein

1956 θi ∼ g(θ) and xi ∼ Poi(θi) ind for i = 1, . . . ,N

Marginal Density f(x) =

∫∞

0g(θ)e−θθx/x! dθ

for x = 0,1,2, . . .

Robbins’ Formula

E{θi |xi = x} = (x + 1)f(x + 1)/f(x)

Emp Bayes Estimate E{θi |xi = x} = (x + 1)f(x + 1)/f(x)

f(x) = #{xi = x}/N

Bradley Efron Stanford University

Three 2nd Thoughts on EB Inference 21 / 32

Page 22: Conversations with Carl Morris - Stanford Universityfinmath.stanford.edu/~ckirby/brad/talks/2015Three2ndThoughts.pdfage sex bmi map tc ldl hdl tch ltg glu prog 59 1 32.1 101 157 93.2

Comparison (biased)

James–Stein

I Elegant airtight theorem

I Works with or without prior g(θ)

I Small N

Robbins-type Empirical Bayes

I Asymptotic justification

I Nonparametric (slow)

I “Need N in the thousands!”

Bradley Efron Stanford University

Three 2nd Thoughts on EB Inference 22 / 32

Page 23: Conversations with Carl Morris - Stanford Universityfinmath.stanford.edu/~ckirby/brad/talks/2015Three2ndThoughts.pdfage sex bmi map tc ldl hdl tch ltg glu prog 59 1 32.1 101 157 93.2

Auto insurance example: N = 9461

Claims x 0 1 2 3 4 5 6 7

#{xi < x} 7840 1317 239 42 14 4 4 1

E{θi |xi = x} .168 .363 .527 1.33 1.43 6.00 1.25

Gamma∗ .164 .398 .633 .87 1.10 1.34 1.57

∗ Assumes g(θ) a Gamma density (“parametric EB”)

Bradley Efron Stanford University

Three 2nd Thoughts on EB Inference 23 / 32

Page 24: Conversations with Carl Morris - Stanford Universityfinmath.stanford.edu/~ckirby/brad/talks/2015Three2ndThoughts.pdfage sex bmi map tc ldl hdl tch ltg glu prog 59 1 32.1 101 157 93.2

Tweedie’s Formula (1956?)

θi ∼ g(θ) and xi ∼ N(θi ,1) independent i = 1,2, . . . ,N

Marginal Density f(x) =

∫∞

−∞

g(θ)e−(x−θ)

2/2√

2πdθ

E{θi |xi = x} = x +ddx

log f(x)

Empirical Bayes: fit smooth f(x) to histogram of (x1, x2, . . . , xN);

E{θi |xi = x} = x +ddx

log f(x)

(gives James–Stein if assume log g(x) quadratic, e.g., normal)

Bradley Efron Stanford University

Three 2nd Thoughts on EB Inference 24 / 32

Page 25: Conversations with Carl Morris - Stanford Universityfinmath.stanford.edu/~ckirby/brad/talks/2015Three2ndThoughts.pdfage sex bmi map tc ldl hdl tch ltg glu prog 59 1 32.1 101 157 93.2

Prostate Cancer Study

102 men, 52 prostate cancer and 50 healthy controls

Each assessed on N = 6033 genes

zi = two-sample test statistic, patients vs controls (z = “x”)

zi ∼ N(θi ,1) i = 1,2, . . . ,N

θi is “effect size” for ith gene

Null genes: θi = 0

Bradley Efron Stanford University

Three 2nd Thoughts on EB Inference 25 / 32

Page 26: Conversations with Carl Morris - Stanford Universityfinmath.stanford.edu/~ckirby/brad/talks/2015Three2ndThoughts.pdfage sex bmi map tc ldl hdl tch ltg glu prog 59 1 32.1 101 157 93.2

N=6033 z−values, prostate cancer study

z values

Fre

quen

cy

−4 −2 0 2 4

010

020

030

040

0

5th degreelog poly fit

Bradley Efron Stanford University

Three 2nd Thoughts on EB Inference 26 / 32

Page 27: Conversations with Carl Morris - Stanford Universityfinmath.stanford.edu/~ckirby/brad/talks/2015Three2ndThoughts.pdfage sex bmi map tc ldl hdl tch ltg glu prog 59 1 32.1 101 157 93.2

−4 −2 0 2 4 6

−4

−2

02

46

Tweedie's estimate of E{theta|z} for prostate cancer study;using 5th degree polynomial model for f(z)

z value

E{t

heta

|z}

Curve nearly 0 for −2 < z < 2; E{theta|x=3}=1.05

Bradley Efron Stanford University

Three 2nd Thoughts on EB Inference 27 / 32

Page 28: Conversations with Carl Morris - Stanford Universityfinmath.stanford.edu/~ckirby/brad/talks/2015Three2ndThoughts.pdfage sex bmi map tc ldl hdl tch ltg glu prog 59 1 32.1 101 157 93.2

Empirical Bayes Hypothesis Testing (Fdr)

Prior g(θ)→ θi → zi ∼ N(θi ,1), i = 1,2, . . . ,N

g(θ) has atom π0 at θ = 0 (the null cases)

False Discovery Rates

Fdr(z0) = Pr{θi = 0|zi ≥ z0} = π0Φ(z0)/F(z0)

Φ(z0) = Pr{N(0,1) ≥ z0} and F(z0) = Prg{zi ≥ z0}

Bradley Efron Stanford University

Three 2nd Thoughts on EB Inference 28 / 32

Page 29: Conversations with Carl Morris - Stanford Universityfinmath.stanford.edu/~ckirby/brad/talks/2015Three2ndThoughts.pdfage sex bmi map tc ldl hdl tch ltg glu prog 59 1 32.1 101 157 93.2

Fdr Control Algorithm (1995)

Estimate Fdr(z) with

Fdr(z) = Φ(z)/F(z), F(z) : #{zi ≥ z}/N

Reject θi = 0 if Fdr(zi) ≤ q

Frequentist Theorem

Expected proportion false discoveries ≤ q

Empirical Bayes: Fdr(zi) decreases if more of the other zj ’s

exceed zi

Bradley Efron Stanford University

Three 2nd Thoughts on EB Inference 29 / 32

Page 30: Conversations with Carl Morris - Stanford Universityfinmath.stanford.edu/~ckirby/brad/talks/2015Three2ndThoughts.pdfage sex bmi map tc ldl hdl tch ltg glu prog 59 1 32.1 101 157 93.2

−2 0 2 4 6

0.0

0.2

0.4

0.6

0.8

1.0

Estimated false discovery rates for the prostate cancer data;Fdr(3.57) = .10; thirteen genes with z > 3.57

z value

Fdr

(z),

lwd=

3

3.57|| |||| ||| || ||

.10

Bradley Efron Stanford University

Three 2nd Thoughts on EB Inference 30 / 32

Page 31: Conversations with Carl Morris - Stanford Universityfinmath.stanford.edu/~ckirby/brad/talks/2015Three2ndThoughts.pdfage sex bmi map tc ldl hdl tch ltg glu prog 59 1 32.1 101 157 93.2

“There’s nobody less Bayesian

than an Empirical Bayesian” (Dennis Lindley, 1969)

Early skepticism of empirical Bayes (relevance)

Bayes prior experience influences current inference

Emp Bayes current experience (of others) influences inference

Crucial idea is not “estimating prior g(θ)” but applying an

estimated prior to individual cases

Bradley Efron Stanford University

Three 2nd Thoughts on EB Inference 31 / 32

Page 32: Conversations with Carl Morris - Stanford Universityfinmath.stanford.edu/~ckirby/brad/talks/2015Three2ndThoughts.pdfage sex bmi map tc ldl hdl tch ltg glu prog 59 1 32.1 101 157 93.2

References

Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate . . . .JRSS-B 57: 289–300, the original Fdr paper.

Copas, J. B. (1969). Compound decisions and empirical Bayes (with discussion).JRSS-B 31: 397–425, Lindley and other skeptics.

Efron, B. (2010). IMS Monographs 1. Fdr as empirical Bayes.Efron, B. (2011). JASA 106: 1602–1614, Tweedie’s estimate.Efron, B. and Morris, C. (1972). JASA 67: 130–139, limited translation and relevance.Efron, B. and Morris, C. (1973). JASA 68: 117–130, the baseball players and other

examples.Hoerl, A. E. and Kennard, R. W. (1970). Ridge regression . . . . Technometrics 12:

69–82.James, W. and Stein, C. (1961). Estimation with quadratic loss. In Proceedings of the

4th Berkeley Symposium, 361–379, JS estimates.Morris, C. N. (1983). Parametric empirical Bayes inference . . . . JASA 78: 47–65,

parametric EB and EB confidence intervals.Robbins, H. (1956). An empirical Bayes approach to statistics. In Proceedings of the

3rd Berkeley Symposium, 157–163.Bradley Efron Stanford University

Three 2nd Thoughts on EB Inference 32 / 32