Bayesian and frequentist inference

41
Overview Methods of inference Asymptotic theory Examples Constructing priors Conclusions Bayesian and frequentist inference Nancy Reid March 26, 2007 Don Fraser, Ana-Maria Staicu

Transcript of Bayesian and frequentist inference

Page 1: Bayesian and frequentist inference

Overview Methods of inference Asymptotic theory Examples Constructing priors Conclusions

Bayesian and frequentist inference

Nancy Reid

March 26, 2007

Don Fraser, Ana-Maria Staicu

Page 2: Bayesian and frequentist inference

Overview Methods of inference Asymptotic theory Examples Constructing priors Conclusions

Overview

Methods of inference

Asymptotic theoryApproximate posteriorsmatching priors

ExamplesLogistic regressionnormal circle

Constructing priors

Conclusions

Page 3: Bayesian and frequentist inference

Overview Methods of inference Asymptotic theory Examples Constructing priors Conclusions

A Basic Structure• data y = (y1, . . . , yn) Rn

• model f (y ; θ) probability density• parameters θ = (θ1, . . . , θk ) Rk

• inference about θ (or components of θ)• an interval of values of θ consistent with the data• a probability distribution for θ, conditional on the data

Page 4: Bayesian and frequentist inference

Overview Methods of inference Asymptotic theory Examples Constructing priors Conclusions

Page 5: Bayesian and frequentist inference

Overview Methods of inference Asymptotic theory Examples Constructing priors Conclusions

• “The average 12 month weight loss on the Atkins diet was10.3 lbs, compared to 3.5, 5.7 and 4.8 lbs on the Zone,Learn and Ornish diets, respectively.” 1

• “mean 12 month weight loss on the Atkins diet was 4.7 kg(95% confidence interval 3.1 to 6.3 kg)”

• “mean 12 month weight loss was significantly differentbetween the Atkins and the Zone diets (p < 0.05)”

• The probability that the average weight loss on the Atkinsdiet is greater than that on the Zone diet is 0.983

1JAMA, Mar 7

Page 6: Bayesian and frequentist inference
Page 7: Bayesian and frequentist inference
Page 8: Bayesian and frequentist inference

Overview Methods of inference Asymptotic theory Examples Constructing priors Conclusions

Bayesian inference

• π(θ | y) ∝ f (y ; θ)π(θ)

• π(θ | y) =f (y ; θ)π(θ)∫f (y ; θ)π(θ)dθ

• parameter of interest ψ(θ)

• πm(ψ | y) =∫{ψ(θ)=ψ} π(θ | y)dθ

• integrals often computed using simulation methods• marginal posterior density summarizes all information

about ψ available from the data

Page 9: Bayesian and frequentist inference

Overview Methods of inference Asymptotic theory Examples Constructing priors Conclusions

nonBayesian inference

• find a function of the data t(y), say, with1. distribution depending only on ψ2. distribution that is known

• usually based on the log-likelihood function`(θ; y) = log f (y ; θ) θ̂ ≡ arg sup `(θ; y)θ̂ψ ≡ arg supψ(θ)=ψ `(θ; y)

• likelihood ratio statistic: 2{`(θ̂; y)− `(θ̂ψ; y)} d−→ χ2d1

• Wald statistic Σ̂−1/2(ψ̂ − ψ)d−→ N(0, I)

• if ψ is scalar: r∗(ψ).∼ N(0,1)

• statistic is derived from the model (via the likelihoodfunction)

• distribution comes from asymptotic theory• likelihood is not (usually) integrated but is (usually)

maximized

Page 10: Bayesian and frequentist inference

Overview Methods of inference Asymptotic theory Examples Constructing priors Conclusions

... Bayesian inference

• well defined basis for inference• internally consistent• leads to optimal results from one point of view• requires a probability distribution for θ ( a prior distribution)• not necessarily calibrated• not always clear how much answers depend on the choice

of prior• requires (high-) dimensional integration, or simulation, or

approximation• useful in applications

Page 11: Bayesian and frequentist inference

Overview Methods of inference Asymptotic theory Examples Constructing priors Conclusions

... nonBayesian inference

• properties well understood• calibrated: properties correspond to long-run frequency• choice of function t(y) may be non-optimal• approximation to distribution of t(Y ) may be poor• approximation to distribution of t(Y ) may be excellent• useful in applications

Page 12: Bayesian and frequentist inference

Overview Methods of inference Asymptotic theory Examples Constructing priors Conclusions

• Fisher (1920s, frequentist): “I know only one case inmathematics of a doctrine which has been accepted anddeveloped by the most eminent men of their time, and isnow perhaps accepted by men now living, which at thesame time has appeared to a succession of sound writersto be fundamentally false and devoid of foundation”

• Lindley (1950s, Bayesian): “Every statistician would be aBayesian if he took the trouble to read the literaturethoroughly and was honest enough to admit that he mighthave been wrong”

Page 13: Bayesian and frequentist inference

Overview Methods of inference Asymptotic theory Examples Constructing priors Conclusions

Approximate posteriors

∫ ψ

πm(ψ | y)dψ = Φ(r∗B){1 + O(n−3/2)}

r∗B = r +1r

logqB

rr = [2{`p(ψ̂)− `p(ψ)}]1/2

qB = −`′p(ψ)j−1/2p (ψ̂)

|jλλ(θ̂)|1/2

|jλλ(θ̂ψ)|1/2

π(θ̂ψ)

π(θ̂)

`p(ψ) = `(ψ, λ̂ψ) = `(θ̂ψ)

Page 14: Bayesian and frequentist inference

Approximate posteriors

∫ ψ

πm(ψ | y)dψ = Φ(r∗B){1 + O(n−3/2)}

r∗B = r +1r

logqB

rr = [2{`p(ψ̂)− `p(ψ)}]1/2

qB = −`′p(ψ)j−1/2p (ψ̂)

|jλλ(θ̂)|1/2

|jλλ(θ̂ψ)|1/2

π(θ̂ψ)

π(θ̂)

`p(ψ) = `(ψ, λ̂ψ) = `(θ̂ψ)2007

-03-

27Bayesian and frequentist inference

Asymptotic theory

Approximate posteriors

Approximate posteriors

marginal posterior density:

πm(ψ | y) =

∫{ψ(θ)=ψ}

exp{`(θ; y}π(θ)dθ/ ∫

exp{`(θ; y)}π(θ)dθ

= cexp{`(θ̂ψ)}|jλλ(θ̂ψ)|−1/2π(θ̂ψ)

exp{`(θ̂)}|j(θ̂)|−1/2π(θ̂){1 + O(n−3/2)}

.= c exp{`(θ̂ψ)− `(θ̂)} |jλλ(θ̂ψ)|−1/2

jp(ψ̂)−1/2|jλλ(θ̂ψ)|−1/2

π(θ̂ψ)

π(θ̂)

πm(r | y) = c exp(−12

r2)r

qB

= c exp(−12

r2 − logr

qB)

Page 15: Bayesian and frequentist inference

Overview Methods of inference Asymptotic theory Examples Constructing priors Conclusions

... approximation to the posterior is really good

• Example:yi ∼ N(θ, I), yi = (yi1, . . . , yik ), θ = (θ1, . . . , θk )

• ψ =√

(θ21 + ...+ θ2

k )

• flat prior π(θ)dθ ∝ dθ• posterior is proper, in fact is χ2′

k (n‖y‖2)

• r∗B =√

n(ψ̂ − ψ) +1

√n(ψ̂ − ψ)

log

{(ψ

ψ̂

)(k−1)/2}

Page 16: Bayesian and frequentist inference

Overview Methods of inference Asymptotic theory Examples Constructing priors Conclusions

... normal circle, k=2

2 3 4 5 6 7 8

0.0

0.2

0.4

0.6

0.8

1.0

normal circle, n=1

psi

post

erio

r m

argi

nal

2 3 4 5 6 7 8−0.

0003

0−

0.00

020

−0.

0001

00.

0000

0

normal circle

psi

diffe

renc

e

Page 17: Bayesian and frequentist inference

Overview Methods of inference Asymptotic theory Examples Constructing priors Conclusions

... normal circle, increasing k

2 3 4 5 6 7 8

0.0

0.2

0.4

0.6

0.8

1.0

changing k

psi

post

erio

r m

argi

nal

2 3 4 5 6 7 8

0.00

0.01

0.02

0.03

0.04

0.05

increasing k

psi

diffe

renc

e

k=5k=10k=20

Page 18: Bayesian and frequentist inference

Overview Methods of inference Asymptotic theory Examples Constructing priors Conclusions

Are Bayesian and nonBayesian solutions reallythat different?

• posterior probability limit• Prθ|Y{θ ≤ θ(1−α)|y} = 1− α

• upper confidence bound• PrY |θ{θ ≤ θ(1−α)(π,Y )} = 1− α

• first depends on the prior, second is evaluated under themodel

• is there a prior for which both limits are equal?• no2, but there might be a prior for which they are

approximately equal:PrY |θ{θ ≤ θ(1−α)(π,Y )} = 1− α+ O(1/n)

• matching prior

2except in a location model

Page 19: Bayesian and frequentist inference

Overview Methods of inference Asymptotic theory Examples Constructing priors Conclusions

... matching priors

• if θ ∈ R, then π(θ) ∝ i1/2(θ) (Welch & Peers, 1963)• if θ = (ψ, λ) then

π(θ) ∝ i1/2ψψ (θ)g(λ)

(Peers, 1965; Tibshirani, 1989)• where we reparametrize if necessary so that iψλ(θ) = 0• matching priors are one version of objective priors• they depend on the model• reference priors of Berger and Bernardo are a different

version of objective priors• with a slightly different goal: maximize the ‘distance’ from

prior to posterior• good news: Bayesian inference with matching priors is

calibrated• bad news: there is a whole family of such priors, even in

relatively simple situations

Page 20: Bayesian and frequentist inference

Overview Methods of inference Asymptotic theory Examples Constructing priors Conclusions

... good news: matching priors are unique

when used in the r∗ approximation(DiCiccio & Martin 93, Staicu 07)∫ ψ

πm(ψ | y)dψ = Φ(r∗B){1 + O(n−3/2)}

r∗B = r +1r

logqB

rr = [2{`p(ψ̂)− `p(ψ)}]1/2

qB = −`′p(ψ)j−1/2p (ψ̂)

|jλλ(θ̂)|1/2

|jλλ(θ̂ψ)|1/2

π(ψ, λ̂ψ)

π(ψ̂, λ̂)

= ”i1/2ψψ (ψ, λ̂ψ)

i1/2ψψ (ψ̂, λ̂)

g(λ̂ψ)

g(λ̂)

when ψ orthogonal to λ, λ̂ψ = λ̂{1 + Op(n−1)}

Page 21: Bayesian and frequentist inference

Overview Methods of inference Asymptotic theory Examples Constructing priors Conclusions

Logistic regression

The first ten out of 79 sets of observations on the physicalcharacteristics of urine. Presence/absence of calcium oxalatecrystals is indicated by 1/0. Two cases with missing values.3

Case Crystals Specific gravity pH Osmolarity Conductivity Urea Calcium1 0 1.021 4.91 725 — 443 2.452 0 1.017 5.74 577 20.0 296 4.493 0 1.008 7.20 321 14.9 101 2.364 0 1.011 5.51 408 12.6 224 2.155 0 1.005 6.52 187 7.5 91 1.166 0 1.020 5.27 668 25.3 252 3.347 0 1.012 5.62 461 17.4 195 1.408 0 1.029 5.67 1107 35.9 550 8.489 0 1.015 5.41 543 21.9 170 1.16

10 0 1.021 6.13 779 25.7 382 2.21...

.

.

.

3Andrews & Herzberg, 1985

Page 22: Bayesian and frequentist inference

Overview Methods of inference Asymptotic theory Examples Constructing priors Conclusions

... logistic regression

Model: Independent binary responses Y1, . . . ,Yn with

Pr(Yi = 1) =exp(xT

i β)

1 + exp(xTi β)

linear exponential family:

`(β; y) = exp{β0Σyi + β1Σx1iyi + · · ·+ β6Σx6iyi − c(β)− d(y)}

parameter of interest ψ = β6, sayorthogonal nuisance parameter λj = E(yxj)

λ̂ψ ≡ λ̂ (special to exponential families)π(ψ, λ) ∝ i1/2

ψψ (ψ, λ), unique

Page 23: Bayesian and frequentist inference

Overview Methods of inference Asymptotic theory Examples Constructing priors Conclusions

beta_p

BN

and L

apla

ce a

ppro

xim

ation

-0.08 -0.06 -0.04 -0.02 0.0 0.02

-4-2

02

4

Page 24: Bayesian and frequentist inference

Overview Methods of inference Asymptotic theory Examples Constructing priors Conclusions

... logistic regression

Confidence/posterior interval for parameter of interest

method lower bound upper bound p-value for 0Φ(r∗) −0.058 0.00029 0.052Φ(r∗B) −0.058 0.00028 0.052

1st order −0.063 −0.00062 0.047

First line uses a (very accurate) approximation to theconditional distribution of Σyix6i , given sufficient statistics fornuisance parameters (exponential family)Fitted using cond library of package hoa

Page 25: Bayesian and frequentist inference

... logistic regression

Confidence/posterior interval for parameter of interest

method lower bound upper bound p-value for 0Φ(r∗) −0.058 0.00029 0.052Φ(r∗B) −0.058 0.00028 0.052

1st order −0.063 −0.00062 0.047

First line uses a (very accurate) approximation to theconditional distribution of Σyix6i , given sufficient statistics fornuisance parameters (exponential family)Fitted using cond library of package hoa

2007

-03-

27Bayesian and frequentist inference

Examples

Logistic regression

... logistic regression

n = 77,d = 7, but the information in 77 binary observations seems tobe comparable to the information in about 10 continuous observations

Page 26: Bayesian and frequentist inference

Overview Methods of inference Asymptotic theory Examples Constructing priors Conclusions

Matching priors need to be targetted on theparameter of interest

• as do all objective priors• ‘flat’ priors for vector parameters don’t lead to calibrated

inferences• Example: yi ∼ N(θi ,1), i = 1, . . . , k• exact and approximate posterior for

∑θ2

i nearly identical

Page 27: Bayesian and frequentist inference

Overview Methods of inference Asymptotic theory Examples Constructing priors Conclusions

2 3 4 5 6 7 8

0.0

0.2

0.4

0.6

0.8

1.0

changing k

psi

post

erio

r m

argi

nal

2 3 4 5 6 7 8

0.00

0.01

0.02

0.03

0.04

0.05

increasing k

psi

diffe

renc

e

k=5k=10k=20

Page 28: Bayesian and frequentist inference

Overview Methods of inference Asymptotic theory Examples Constructing priors Conclusions

Matching priors need to be targetted on theparameter of interest

• as do all objective priors• ‘flat’ priors for vector parameters don’t lead to calibrated

inferences• Example: yi ∼ N(θi ,1), i = 1, . . . , k• exact and approximate posterior for

∑θ2

i nearly identical• posterior used prior π(θ) ∝ 1, marginalize to ψ =

∑θ2

i

• posterior limits are not probability limits• there is an exact nonBayesian solution, based on marginal

density of∑

y2i (noncentral χ2) Dawid, Stone, Zidek, 1974

Page 29: Bayesian and frequentist inference

Overview Methods of inference Asymptotic theory Examples Constructing priors Conclusions

−3 −2 −1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

0.5

0.6

dependence on dimension

psi − y1

diffe

renc

e

dim = 10dim = 5dim = 2

Page 30: Bayesian and frequentist inference

Overview Methods of inference Asymptotic theory Examples Constructing priors Conclusions

−3 −2 −1 0 1 2 3

0.00

0.02

0.04

0.06

0.08

dependence on n

y1−psi

diffe

renc

e

12345102030

Page 31: Bayesian and frequentist inference

Overview Methods of inference Asymptotic theory Examples Constructing priors Conclusions

nonBayesian asymptotics

• find a function of the data with known distributiondepending on ψ

• our candidate: r∗F = r∗F (ψ; y) with N(0,1) distribution•

r∗F = r +1r

logqF

r•

qF = {χ(θ̂)− χ(θ̂ψ)}/σ̂χ• a maximum likelihood type departure• χ(θ) = scalar function of ϕ(θ)

• θ → ϕ(θ) = ϕ(θ; y0): a re-parametrization of the model

Page 32: Bayesian and frequentist inference

Overview Methods of inference Asymptotic theory Examples Constructing priors Conclusions

... nonBayesian asymptotics

• where does this come from and why does it work• using ϕ we can construct an approximate exponential

family model• this is the route to accurate density approximation using

saddlepoint-type arguments• to get ϕ(θ) differentiate the log-likelihood on the sample

space

• ϕj(θ) = ddt `(θ; y0 + tv j)

∣∣∣t=0

• jiggle the observed data in directions vj , and record thechange in the log likelihood

• gives a way to implement approximate conditioning• leads to quite accurate frequentist solutions

Page 33: Bayesian and frequentist inference

Overview Methods of inference Asymptotic theory Examples Constructing priors Conclusions

... does this help with finding priors?

1. strong matching priors (FR, 2002)• posterior marginal distribution s(ψ) = Φ(r∗B)

• nonBayesian p-value function p(ψ) = Φ(r∗F )

r∗F = r +1r

logqF

r

r∗B = r +1r

logqB

r

• Set qF = qB, and solve for prior π (if possible)• guarantees matching to 3rd order• leads to a data dependent prior• automatically targetted on parameter of interest• only a function of the parameter of interest

Page 34: Bayesian and frequentist inference

... does this help with finding priors?

1. strong matching priors (FR, 2002)• posterior marginal distribution s(ψ) = Φ(r∗B)

• nonBayesian p-value function p(ψ) = Φ(r∗F )

r∗F = r +1r

logqF

r

r∗B = r +1r

logqB

r

• Set qF = qB, and solve for prior π (if possible)• guarantees matching to 3rd order• leads to a data dependent prior• automatically targetted on parameter of interest• only a function of the parameter of interest

2007

-03-

27Bayesian and frequentist inference

Constructing priors

... does this help with finding priors?

qB = −`′p(ψ)j−1/2p (ψ̂) blue prior

qF = {χ(θ̂)− χ(θ̂ψ)}/σ̂χ

Page 35: Bayesian and frequentist inference

Overview Methods of inference Asymptotic theory Examples Constructing priors Conclusions

... does this help

2. approximate location models• every model f (y ; θ) can be approximated by a location

model (locally)• location models have automatic prior for vector θ: flat• change at θ that generates an increment d θ̂ at observed

data, W (θ), say

• flat prior is π(θ) ∝ |W (θ)| = |AV (θ)| = |d θ̂dθ|

• V is a n × k matrix with i th row

Vi(θ) = −F;θ′(y0

i ; θ)

Fy (y0i ; θ)

• gives a natural default prior for vector parameters• still need to deal with marginalization to target parameters

of interest

Page 36: Bayesian and frequentist inference

... does this help

2. approximate location models• every model f (y ; θ) can be approximated by a location

model (locally)• location models have automatic prior for vector θ: flat• change at θ that generates an increment d θ̂ at observed

data, W (θ), say

• flat prior is π(θ) ∝ |W (θ)| = |AV (θ)| = |d θ̂dθ|

• V is a n × k matrix with i th row

Vi(θ) = −F;θ′(y0

i ; θ)

Fy (y0i ; θ)

• gives a natural default prior for vector parameters• still need to deal with marginalization to target parameters

of interest

2007

-03-

27Bayesian and frequentist inference

Constructing priors

... does this help

• approximate location model f (y − β(θ)), say

• factors as f1(a)f2(θ̂ − β(θ))

• if f ′2(·) = 0 then

•d θ̂ = βθ(θ)dθ

• flat prior is |βθ(θ)|

Page 37: Bayesian and frequentist inference

Overview Methods of inference Asymptotic theory Examples Constructing priors Conclusions

Summarizing

• asymptotic theory is useful for obtaining (very) goodapproximations to distributions

• can be used in Bayesian or nonBayesian contexts• can be used to derive posterior intervals that are calibrated• can be used to simplify the choice of matching priors• finding priors is not straightforward• likelihood plus first derivative change in the sample space

is ‘instrinsic’ to inference, at least asymptotically toO(n−3/2)

• flat priors for a location parameter will not lead to calibratedposteriors for curved parameters

Page 38: Bayesian and frequentist inference

Overview Methods of inference Asymptotic theory Examples Constructing priors Conclusions

We’re all Bayesians really

Economist Jan 14, 2006

Page 39: Bayesian and frequentist inference

Overview Methods of inference Asymptotic theory Examples Constructing priors Conclusions

... Bayesian

Griffiths and Tenenbaum Psychological Science• asked subjects to make predictions based on limited

evidence• concluded that the predictions were consistent with

Bayesian methods of inference• “the results of our experiment reveal a far closer

correspondence between optimal statistical inference andeveryday cognition than that suggested by previousresearch”

Page 40: Bayesian and frequentist inference

Overview Methods of inference Asymptotic theory Examples Constructing priors Conclusions

... Bayesian

• Goldstein (2006): “we argue first that the subjectivist Bayesapproach is the only feasible method for tackling manyscientific problems”

• Neal (1998): “Using ”technological” or ”reference” priorschosen solely for convenience, or out of a mis-guideddesire for pseudo-objectivity ... [or] using priors that varywith the amount of data that you have collected ... have noreal Bayesian justification, and since they are usuallyoffered with no other justification either, I consider them tobe highly dubious.”

• Wasserman (2006): “I think it would be hard to defend asequence of subjective analyses that yield poor frequencycoverage”

Page 41: Bayesian and frequentist inference

References1. ba.stat.cmu Volume 2 # 3 Berger 2006, Goldstein 2006,

Wasserman 20062. DiCiccio and Martin 19933. Fraser Reid 20024. Little 20065. Tibshirani 19896. Welch and Peers 1963