Approximate inference for vector parameters

40
Models and inference Motivation Directional tests Tangent exponential model Conclusion Approximate inference for vector parameters Nancy Reid April 20, 2011 with Don Fraser, Anthony Davison, Nicola Sartori 1 / 44

Transcript of Approximate inference for vector parameters

Page 1: Approximate inference for vector parameters

Models and inference Motivation Directional tests Tangent exponential model Conclusion

Approximate inference for vector parameters

Nancy Reid

April 20, 2011

with Don Fraser, Anthony Davison, Nicola Sartori

1 / 44

Page 2: Approximate inference for vector parameters

Models and inference Motivation Directional tests Tangent exponential model Conclusion

Models and inference

Motivation

Directional tests

Contingency tableContinuous example

Tangent exponential model

Conclusion

2 / 44

Page 3: Approximate inference for vector parameters

Models and inference Motivation Directional tests Tangent exponential model Conclusion

Parametric models and likelihoodI model f (y ; θ), θ ∈ Rd

I data y = (y1, . . . , yn) independent observationsI log-likelihood function `(θ; y) = log f (y ; θ)

I parameter of interest θ = (ψ, λ), ψ ∈ Rd0

I max. likelihood estimate θ = (ψ, λ); θψ = (ψ, λψ)

I log-likelihood ratio w(ψ) = 2{`(ψ, λ)− `(ψ, λψ)}= 2{`p(ψ)− `p(ψ)}

I profile log-likelihood function `p(ψ) = `(ψ, λψ)

3 / 44

Page 4: Approximate inference for vector parameters

16 17 18 19 20 21 22 23

−4

−3

−2

−1

0

log−likelihood function

θθ

log−

likel

ihoo

d

θθθθ

θθ −− θθ

1.92 w/2

Page 5: Approximate inference for vector parameters

Models and inference Motivation Directional tests Tangent exponential model Conclusion

Inference using likelihood ratioI θ = (ψ, λ), ψ ∈ Rd0 H0 : ψ = ψ0I likelihood ratio test:

w(ψ0) = 2{`(ψ, λ)− `(ψ0, λψ0)} .∼ χ2d0

I Bartlett correction:

w(ψ0) =w(ψ0)

1 + B(ψ0)/n.∼ χ2

d0{1 + O(n−2)}

(a)M(a)M

(a)

m-4-4M-4M

-4

-3-3M-3M

-3

-2-2M-2M

-2

-1-1M-1M

-1

00M0M

0

11M1M

1

22M2M

2

m-4-4M-4M

-4

-3-3M-3M

-3

-2-2M-2M

-2

-1-1M-1M

-1

00M0M

0

11M1M

1

22M2M

2

(c)M(c)M

(c)

m-4-4M-4M

-4

-3-3M-3M

-3

-2-2M-2M

-2

-1-1M-1M

-1

00M0M

0

11M1M

1

22M2M

2

m-4-4M-4M

-4

-3-3M-3M

-3

-2-2M-2M

-2

-1-1M-1M

-1

00M0M

0

11M1M

1

22M2M

2

5 / 44

Page 6: Approximate inference for vector parameters

Models and inference Motivation Directional tests Tangent exponential model Conclusion

Targetted inference: single parametersI θ = (ψ, λ), λ ∈ Rd−1

I likelihood ratio test w(ψ) = 2{`p(ψ)− `p(ψ)} .∼ χ21

I likelihood rootr(ψ) = sign(ψ − ψ)

√w(ψ)

.∼ N(0,1) Op(n−1/2)I modified likelihood root

r∗(ψ) = r(ψ)+1

r(ψ)log{

Q(ψ)

r(ψ)

}Op(n−3/2) or Op(n−1)

I

Q(ψ) =χ(θ)− χ(θψ)

σχ︸ ︷︷ ︸scalar lined up with ψ

|jλλ(θ)|1/2

|jλλ(θψ)|1/2︸ ︷︷ ︸correction re λ

jλλ(θ) = −∂2`(θ)/∂λ∂λT

I r∗(ψ) gives the ‘right’ inferential summary re ψI and is well approximated by standard normal

6 / 44

Page 7: Approximate inference for vector parameters

Models and inference Motivation Directional tests Tangent exponential model Conclusion

Right inferential summaryI Example: linear regression with non-normal errors1

yi = x ′i β + σεi ; β7×1, i = 1, . . . ,32

Normal errors t4 errorsEst (SE) z Est (SE) z

Constant −13.26 (3.140) −4.22 −11.86 (3.70) −3.21date 0.212 (0.043) 4.91 0.196 (0.049) 4.02log(cap) 0.723 (0.119) 6.09 0.682 (0.129) 5.31NE 0.249 (0.074) 3.36 0.239 (0.080) 2.97CT 0.140 (0.060) 2.32 0.143 (0.063) 2.26log(N) −0.088 (0.042) −2.11 −0.072 (0.048) −1.51PT −0.226 (0.114) −1.99 −0.265 (0.110) −2.42

95% confidence interval for β5(−0.173,−0.002), ε ∼ Normal r∗ : (−0.161,−0.026), ε ∼ t4

1BDR, Ch.5.27 / 44

Page 8: Approximate inference for vector parameters

Models and inference Motivation Directional tests Tangent exponential model Conclusion

... right inferential summary

Eθ(Y | t , x ,T ) = T eλ1 tλ2{1 + eλ3xψ}T man-yrs. at risk x # cigarettes t Years smoking ψ = ψ0 = 1

8 / 44

Page 9: Approximate inference for vector parameters

Models and inference Motivation Directional tests Tangent exponential model Conclusion

... inferential summaryI Eθ(Y | t , x ,T ) = T eλ1 tλ2{1 + eλ3xψ}

I H0 : ψ = 1 =⇒ linear increase in death rate with ‘dose’

log-likelihood root r = 1.506 p = 0.066

modified likelihood root r∗ = 1.491 p = 0.068

9 / 44

Page 10: Approximate inference for vector parameters

Models and inference Motivation Directional tests Tangent exponential model Conclusion

Accurate approximation

16 17 18 19 20 21 22 23

0.0

0.2

0.4

0.6

0.8

1.0

Pvalue functions

θθ

p−va

lue

10 / 44

Page 11: Approximate inference for vector parameters

Models and inference Motivation Directional tests Tangent exponential model Conclusion

Accurate approximation

16 17 18 19 20 21 22 23

0.0

0.2

0.4

0.6

0.8

1.0

Pvalue functions

θθ

p−va

lue

lik. root rmle. qscore srstar

11 / 44

Page 12: Approximate inference for vector parameters

Models and inference Motivation Directional tests Tangent exponential model Conclusion

Accurate approximation

1 5 50 500

0.0

01

0.0

05

0.0

50

0.5

00

Poisson mean

Perc

ent re

lative e

rror

+

++

++

+++++++++++++++++++++++++++++++++++

c

c

c

cc

cc

ccccccccccccccccccccc

+

++

++++++++++++++++++++++++++++++++++++

1 5 50 500

0.0

20.1

00.5

02.0

010.0

0

Total

Perc

ent re

lative e

rror

cc

c

cc

ccccccccccccccc

DFR, 2006scalar ψ vector ψ

continuous response r∗ : O(n−3/2) directional: O(n−1)

discrete response r∗ : O(n−1) directional: O(n−1)12 / 44

Page 13: Approximate inference for vector parameters

Models and inference Motivation Directional tests Tangent exponential model Conclusion

Accurate approximation

1 5 50 500

0.0

01

0.0

05

0.0

50

0.5

00

Poisson mean

Perc

ent re

lative e

rror

+

++

++

+++++++++++++++++++++++++++++++++++

c

c

c

cc

cc

ccccccccccccccccccccc

+

++

++++++++++++++++++++++++++++++++++++

1 5 50 500

0.0

20.1

00.5

02.0

010.0

0

Total

Perc

ent re

lative e

rror

cc

c

cc

ccccccccccccccc

DFR, 2006scalar ψ vector ψ

continuous response r∗ : O(n−3/2) directional: O(n−1)

discrete response r∗ : O(n−1) directional: O(n−1)13 / 44

Page 14: Approximate inference for vector parameters

Models and inference Motivation Directional tests Tangent exponential model Conclusion

Directional testsI given a vector of ‘departures’ from ψ0

e.g. (ψ1 − ψ01, . . . , ψd0 − ψ0d0)

I compute a directional departure based on the magnitudeof the vector, conditional on its direction

I m.l.e. not parameterization invariant; use instead adeparture measure based on score variable s

I Propose: directed departure on profile sample space Sψ0

I all sample points that give the same estimate for thenuisance parameter λψ

I

Sψ = {s : `λ(θ0ψ; s) = 0} = {s : θψ = θ0

ψ}

a surface of dimension d0, passing through data point y0

14 / 44

Page 15: Approximate inference for vector parameters

Models and inference Motivation Directional tests Tangent exponential model Conclusion

... directional testsI Directed departure on Sψ0 λ = λψ0

I observed value s0, corresponding to m.l.e. θ0

I expected value s0 under H0, corresponding to constrainedm.l.e. θ = θ0

ψ0s0 = −`θ(θψ0)

I Distribution of magnitude of |s − s0|; given the direction(s − s0)/|s − s0|

psi1

psi2

0 5 10 15

−4−2

02

46

8

−3 −2 −1 0 1

−0.2

0.0

0.2

0.4

0.6

0.8

s1

s2

Sψ0

....... line through s0, s0

15 / 44

Page 16: Approximate inference for vector parameters

Models and inference Motivation Directional tests Tangent exponential model Conclusion

Directional p-valueI line s(t) from hypothesis, s0, to data, s0 : s0 + t(s0 − s0)

I f (s;ψ0): distribution of s under H0.I Observed value s0 (t = 1)I Expected value s0 (t = 0)I along the line s(t) we have

f (s;ψ0)ds = f{s(t);ψ0}dt = f{s0 + t(s0 − s0);ψ0}dt .I directional p-value:

p(ψ0) =

∫∞1 td0−1f{s(t);ψ0}dt∫∞0 td0−1f{s(t);ψ0}dt

I one-dimensional integrals computed numerically

16 / 44

Page 17: Approximate inference for vector parameters

Models and inference Motivation Directional tests Tangent exponential model Conclusion

Example: 2× 3 contingency tableI activity amongst psychiatric patients 2

Affective disorders Schizophrenics NeuroticsRetarded 12 13 5

Not retarded 18 17 25

I model: log-linear y ∼ Poisson, log{E(y)} = Xθ ∈ R6

I nuisance parameter λ ∈ R4 main effectsI parameter of interest (ψ1, ψ2) interactionI H0 : ψ = ψ0 = (0,0) independenceI line s(t) from hypothesis, s0, to data, s0 : s0 + t(s0 − s0)

I Here s ≡ y `(θ; y) = exp Σ(yi log θi − θi)

2Everitt, 1992 CH17 / 44

Page 18: Approximate inference for vector parameters

Models and inference Motivation Directional tests Tangent exponential model Conclusion

... 2× 3 contingency tableI expected frequencies under the null hypothesis t = 0

Affective disorders Schizophrenics NeuroticsRetarded 10 10 10

Not retarded 20 20 20

I need to stop at t = tmax = 2.I the expected frequencies corresponding to tmax = 2

Affective disorders Schizophrenics NeuroticsRetarded 14 16 0

Not retarded 16 14 30

I All tables along the line s(t) have the same margins.

18 / 44

Page 19: Approximate inference for vector parameters

Models and inference Motivation Directional tests Tangent exponential model Conclusion

Log-likelihood along the line s(t)

psi1

psi2

−3 −1 1 2 3

−20

24

6

t=0

psi1

psi2

−3 −1 1 2 3−2

02

46

t=0.5

psi1

psi2

−3 −1 1 2 3

−20

24

6

t=1

psi1

psi2

−3 −1 1 2 3

−20

24

6

t=1.33

psi1

psi2

−3 −1 1 2 3

−20

24

6

t=1.67

psi1

psi2

−3 −1 1 2 3

−20

24

6

t=2

19 / 44

Page 20: Approximate inference for vector parameters

Models and inference Motivation Directional tests Tangent exponential model Conclusion

Directional p-valueI The directional p-value is equal to 0.050

0.0 0.5 1.0 1.5 2.0

t

first order χ22 approximation W (ψ0) 0.047

Skovgaard (2001 SJS) modified version W ∗(ψ0) 0.048simulated conditional 0.051

20 / 44

Page 21: Approximate inference for vector parameters

Models and inference Motivation Directional tests Tangent exponential model Conclusion

Another 2× 3 tableI party identification by race 3

Democrat Independent RepublicanBlack 103 15 11White 341 105 405

I H0: independence nested in saturated modelI first order likelihood ratio p-value: 2.43× 10−20

I directional p-value: 3.14× 10−20.

3Agresti, 2002 Wiley21 / 44

Page 22: Approximate inference for vector parameters

Models and inference Motivation Directional tests Tangent exponential model Conclusion

... party identification exampleI Expected (t = 0)

Democrat Independent RepublicanBlack 58.44 15.80 54.76White 385.56 104.20 361.24

I Observed (t = 1)

Democrat Independent RepublicanBlack 103 15 11White 341 105 405

I Boundary (tmax = 1.251)

Democrat Independent RepublicanBlack 114.20 14.80 0.00White 329.80 105.20 416.00

22 / 44

Page 23: Approximate inference for vector parameters

Models and inference Motivation Directional tests Tangent exponential model Conclusion

Example: Infant survival data4

Infant SurvivalAge Smoking Gestation No Yes

< 30 < 5 ≤ 260 50 315> 260 24 4012

5+ ≤ 260 9 40> 260 6 459

30+ < 5 ≤ 260 41 147> 260 14 1594

5+ ≤ 260 4 11> 260 1 124

4Agresti, 199223 / 44

Page 24: Approximate inference for vector parameters

Models and inference Motivation Directional tests Tangent exponential model Conclusion

...infant survival dataI response variables: length of gestation and infant survivalI explanatory variables: age of mother (A), number of

cigarettes smoked per day (S)I null model: all main effects and three first order

interactions (IG, IA and SA) λ ∈ R8

I full model: two additional first order interaction parametersψ: for IS and GA

I first order likelihood ratio p-value = 0.052.I directional p-value = 0.056.

24 / 44

Page 25: Approximate inference for vector parameters

Models and inference Motivation Directional tests Tangent exponential model Conclusion

Example: suicide behavior data

age m o ycause sex

drug f 450 154 259m 399 93 398

gas f 13 5 15m 82 6 121

gun f 26 7 14m 168 33 155

hang f 450 185 95m 797 316 455

jump f 71 38 40m 51 26 55

other f 60 10 38m 82 14 124

25 / 44

Page 26: Approximate inference for vector parameters

Models and inference Motivation Directional tests Tangent exponential model Conclusion

... suicide behavior dataI suicide classified by sex, age and cause5

I H0: independence, nested in saturated modelI nuisance parameter λ: intercept, main effects, first order

interaction parameters ∈ R26

I saturated model: add parameter of interest ψ ∈ R10;second order interaction between the three variables.

I The first order likelihood ratio p-value is 0.136.I The directional p-value is 0.141.

5Everitt, 1992 CH26 / 44

Page 27: Approximate inference for vector parameters

Models and inference Motivation Directional tests Tangent exponential model Conclusion

Score variable?I exponential family model

f (y ; θ) = exp{θ′s(y)− c(θ)− d(y)}I score s carries all the information about θI θ = (ψ, λ), ψ ∈ Rd0

I f (s1 | s2;ψ) free of λI available from saddlepoint approximationI we use this on line s(t) = s0 + t(s0 − s0)I s0: hypothesis (e.g. independence table)I s0: observed value (e.g. observed table)

I psi1

psi2

0 5 10 15

−4−2

02

46

8

−3 −2 −1 0 1

−0.2

0.0

0.2

0.4

0.6

0.8

s1

s2

0.0 0.5 1.0 1.5 2.0

t

27 / 44

Page 28: Approximate inference for vector parameters

Models and inference Motivation Directional tests Tangent exponential model Conclusion

Example: continuous dataI Yij ∼ N(µi , σ

2i ), j = 1, . . . ,ni ; i = 1, . . . ,g

I H0 : σ21 = σ2

2 = · · · = σ2g

I ψ = σ2i /σ

2g , λ = (σ2

g , µ1, . . . , µg)

I score s = (n1y1., . . . ,ng yg.,Σy21j , . . . ,Σy2

gj)

I maximum likelihood estimates: v2i = Σj(yij − yi.)

2/ni

I under the hypothesis: v2 = Σij(yij − yi.)2/Σni

I observed s0 = 0I expected s0 = (0, . . . ,0,n1(v2 − v2

1 ), . . . ,ng(v2g − v2))

I need density along the line f{s0 + t(s0 − s0);ψ0}dt

28 / 44

Page 29: Approximate inference for vector parameters

Models and inference Motivation Directional tests Tangent exponential model Conclusion

... continuous dataI Yij ∼ N(µi , σ

2i ), j = 1, . . . ,ni ; i = 1, . . . ,g

I H0 : σ21 = σ2

2 = · · · = σ2g

I density along the line f{s0 + t(s0 − s0);ψ0}dt = h(t), sayI

h(t) ∝g∏

i=1

{tv2i + (1− t)v2}(ni−3)/2

I

p(ψ0) =

∫∞1 td0−1h(t)dt∫∞0 td0−1h(t)dt

29 / 44

Page 30: Approximate inference for vector parameters

Models and inference Motivation Directional tests Tangent exponential model Conclusion

... continuous data

30 / 44

Page 31: Approximate inference for vector parameters

Models and inference Motivation Directional tests Tangent exponential model Conclusion

... continuous data

31 / 44

Page 32: Approximate inference for vector parameters

Models and inference Motivation Directional tests Tangent exponential model Conclusion

Tangent exponential modelI Every model f (y ; θ) on Rn can be approximated by an

exponential family model:

fTEM(s; θ)ds = exp{ϕ(θ)′s + `(θ)}h(s)ds (1)

I s is a score variable on Rd s(y) = −`ϕ(θ0; y)

I `(θ) = `(θ; y0) is the observed log-likelihood functionI ϕ(θ) = ϕ(θ; y0) is the canonical parameter ∈ Rd

to be describedI has the same observed log likelihood function

as the original modelI has same first derivative on the sample space, at y0, as

the original model by definition of ϕI (1) approximates original model to O(n−1)

32 / 44

Page 33: Approximate inference for vector parameters

Models and inference Motivation Directional tests Tangent exponential model Conclusion

Exponential family modelsI linear exponential family:

f (y ; θ) = exp{ϕ(θ)′s(y)− c(θ)− d(y)}

I canonical parameter obtained as

∂`(θ; y)

∂s(y)= ϕ(θ)

I Example N(µ, σ2):

`(θ; y) =µ

σ2

∑yi −

12σ2

∑y2

i −nµ2

2σ2 − n logσ

I Example Bin(n,p):

`(θ; y) = logp

1− py + n log(1− p)

33 / 44

Page 34: Approximate inference for vector parameters

Models and inference Motivation Directional tests Tangent exponential model Conclusion

Inference with TEMfTEM(s; θ) = exp{ϕ(θ)′s + `(θ)}h(s)

ϕ(θ) = ϕ(θ; y0), `(θ) = `(θ; y0)

1. marginalize to eliminate nuisance parameter using tangentexponential model

I likelihood root r∗(ψ) = r(ψ) +1

r(ψ)log{Q(ψ)

r(ψ)}

I r(ψ) = ±√

[2{`(θ)− `(θψ)}] likelihood root

I Q(ψ) =|ϕ(θ)− ϕ(θψ) ϕλ(θψ)|

|ϕθ(θ)||j(θ)|1/2

|jλλ(θψ)|1/2

2. saddlepoint approximation gives distribution approximationI r∗ .∼ N(0,1) O(n−3/2)

34 / 44

Page 35: Approximate inference for vector parameters

Models and inference Motivation Directional tests Tangent exponential model Conclusion

Canonical parameter ϕ(θ)I if f (y ; θ) is an exponential family, ϕ is sitting in the modelI if notI if y is continuous, define

V =dydθ

∣∣∣∣y=y0,θ=θ0

y = (y1, . . . , yn)

I ??I zi = zi(yi ; θ) with a fixed distribution, e.g. (yi − µ)/σ

I V = −(∂z∂y

)−1 ∂z∂θ

∣∣∣∣∣y=y0,θ=θ0

n × p

I

ϕ(θ) = ϕ(θ; y0) =∂`(θ; y)

∂V

∣∣∣∣y=y0

=n∑

i=1

∂`(θ; y0)

∂yiVi

35 / 44

Page 36: Approximate inference for vector parameters

Models and inference Motivation Directional tests Tangent exponential model Conclusion

... canonical parameter ϕ(θ)I ϕ is a data-derivative of log-likelihood, `;V (θ; y0)

I doesn’t work if if the data is discreteI y −→ s score variable

Idydθ−→ dE(s; θ)

dθDFR, 2006

I

Vi =∂

∂θE(si ; θ)

∣∣∣∣θ=θ0

I

ϕ(θ) =n∑

i=1

∂`(θ; y0)

∂si

∣∣∣∣θ=θ0

Vi

36 / 44

Page 37: Approximate inference for vector parameters

Models and inference Motivation Directional tests Tangent exponential model Conclusion

Vector parameter inferenceI use `(θ; y0) to construct generalized likelihood ratio w(ψ)

I use ϕ(θ; y0) to get a modified versionI combine with nuisance parameter adjustmentI saddlepoint approximationI evaluation p-value conditionally

37 / 44

Page 38: Approximate inference for vector parameters

Models and inference Motivation Directional tests Tangent exponential model Conclusion

SummaryI exponential family, parameter of interest a single canonical

parameterI saddlepoint approximationI tangent exponential model, with locally defined canonical

parameterI gives both inference summary and accurate approximation

for arbitrary scalar parameter of interestI extending to discrete data −→ requires new definition of

canonical parameter

38 / 44

Page 39: Approximate inference for vector parameters

Models and inference Motivation Directional tests Tangent exponential model Conclusion

... summaryI exponential family, parameter of interest a vector of

canonical parametersI directional p-value based on saddlepoint approximationI tangent exponential model, for either discrete or

continuous data, gives joint densityI eliminating nuisance parameter seems to be more difficult

for non-canonical parameters – wipI hoping for insight from the structure of the solution – also

wipI Thanks!! Nicola, Don, Anthony

39 / 44

Page 40: Approximate inference for vector parameters

Models and inference Motivation Directional tests Tangent exponential model Conclusion

ReferencesFraser, D.A.S. and Massam, H. (1985). Conical tests. Stat. Hefte

Skovgaard, I.M. (1988). Saddlepoint expansions for directional testprobabilities. JRSS B

Skovgaard, I.M. (2001). Likelihood asymptotics. SJS

Cheah, P.K., Fraser, D.A.S., Reid, N. (1994). Multiparameter testing inexponential models. Biometrika

Davison, A.C., Fraser, D.A.S., Reid, N. (2006). Improved likelihood inferencefor discrete data. JRSS B

Davison, A.C., Fraser, D.A.S., Reid, N., Sartori, N. (2011). On assessingvector valued parameters. in progress

\beamertemplatenavigationsymbolsempty

40 / 44