Ralph S. Silva · Teoria Bayesiana Tests and Confidence Regions The Bayes factor can be perceived...

21
T EORIA BAYESIANA Ralph S. Silva Departamento de M ´ etodos Estat´ ısticos Instituto de Matem ´ atica Universidade Federal do Rio de Janeiro

Transcript of Ralph S. Silva · Teoria Bayesiana Tests and Confidence Regions The Bayes factor can be perceived...

Page 1: Ralph S. Silva · Teoria Bayesiana Tests and Confidence Regions The Bayes factor can be perceived as a Bayesian likelihood ratio since, if ˇ 0 is the prior distribution under H

TEORIA BAYESIANA

Ralph S. Silva

Departamento de Metodos EstatısticosInstituto de Matematica

Universidade Federal do Rio de Janeiro

Page 2: Ralph S. Silva · Teoria Bayesiana Tests and Confidence Regions The Bayes factor can be perceived as a Bayesian likelihood ratio since, if ˇ 0 is the prior distribution under H

Teoria Bayesiana

Sumario

Tests and Confidence Regions

Page 3: Ralph S. Silva · Teoria Bayesiana Tests and Confidence Regions The Bayes factor can be perceived as a Bayesian likelihood ratio since, if ˇ 0 is the prior distribution under H

Teoria Bayesiana

Tests and Confidence Regions

No model is right, but some models are less wrong than others.

Consider a statistical model f (x |θ) with θ ∈ Θ0. Given a subset of interest ofΘ0 ⊂ Θ, the question to be answered is

H0 : θ ∈ Θ0,

usually called the null hypothesis.

If additional information such as θ ∈ Θ0 ∪Θ1 6= Θ is available, then we definethe alternative hypothesis against as

H1 : θ ∈ Θ1.

Page 4: Ralph S. Silva · Teoria Bayesiana Tests and Confidence Regions The Bayes factor can be perceived as a Bayesian likelihood ratio since, if ˇ 0 is the prior distribution under H

Teoria Bayesiana

Tests and Confidence Regions

Under this formalization, every test procedure ϕ appears as an estimator ofIΘ0 (θ) and we only need a loss function L(θ, ϕ) to derive the Bayesestimators.

The loss function proposed by Neyman and Pearson is the 0-1 loss

L(θ, ϕ) =

{1, if ϕ 6= IΘ0 (θ);0, otherwise.

For this loss, the Bayesian solution is

ϕπ(x) =

{1, if Pπ(θ ∈ Θ0|x) > Pπ(θ ∈ Θc

0|x);0, otherwise.

This estimator is easily justified on an intuitive basis since it chooses thehypothesis with the largest posterior probability.

Page 5: Ralph S. Silva · Teoria Bayesiana Tests and Confidence Regions The Bayes factor can be perceived as a Bayesian likelihood ratio since, if ˇ 0 is the prior distribution under H

Teoria Bayesiana

Tests and Confidence Regions

A generalization of the above loss is to penalize differently errors when thenull hypothesis is true or false. The weighted 0-1 losses

L(θ, ϕ) =

0, if ϕ = IΘ0 (θ);a0, if θ ∈ Θ0 and ϕ = 0,a1, if θ 6∈ Θ0 and ϕ = 1,

are called “a0 − a1” for obvious reason.

PropositionUnder the “a0 − a1”, the Bayes estimator associated with a prior distribution πis

ϕπ(x) =

{1, if Pπ(θ ∈ Θ0|x) >

a1

a0 + a1;

0, otherwise.

Proof: Since the posterior loss is

L(π, ϕ|x) =

∫Θ

L(θ, ϕ)π(θ|x)dθ

= a0Pπ(θ ∈ Θ0|x)I{0}(ϕ) + a1Pπ(θ 6∈ Θ0|x)I{1}(ϕ),

the Bayes estimator can be derived directly.

Page 6: Ralph S. Silva · Teoria Bayesiana Tests and Confidence Regions The Bayes factor can be perceived as a Bayesian likelihood ratio since, if ˇ 0 is the prior distribution under H

Teoria Bayesiana

Tests and Confidence Regions

The Bayes Factor

DefinitionThe Bayes factor is the ratio of the posterior probabilities of the null and thealternative hypotheses over the ratio of the prior probabilities of the null andthe alternative hypotheses, i.e.,

Bπ01(x) =P(θ ∈ Θ0|x)

P(θ ∈ Θ1|x)

/π(θ ∈ Θ0)

π(θ ∈ Θ1).

In the particular case where Θ0 = {θ0} and Θ1 = {θ1}, the Bayes factorsimplifies to the usual likelihood ratio

Bπ01(x) =f (x |θ0)

f (x |θ1).

Page 7: Ralph S. Silva · Teoria Bayesiana Tests and Confidence Regions The Bayes factor can be perceived as a Bayesian likelihood ratio since, if ˇ 0 is the prior distribution under H

Teoria Bayesiana

Tests and Confidence Regions

The Bayes factor can be perceived as a Bayesian likelihood ratio since, if π0

is the prior distribution under H0 and π1 the prior distribution under H1, then

Bπ01(x) =

∫Θ0

f (x |θ0)π0(θ)dθ∫Θ1

f (x |θ1)π1(θ)dθ=

m0(x)

m1(x).

As indicated above, the Bayes factor is, from a decision-theoretic point ofview, completely equivalent to the posterior probability of the null hypothesisas, under “a0 − a1” loss, H0 is accepted when

Bπ01(x) =a1

a2

/η0

η1=

a1η1

a0η0,

whereη0 = π(θ ∈ Θ0) and η1 = π(θ ∈ Θ1) = 1− η0.

This alternative thus provides an illustration of the duality existing betweenloss and prior distribution.

Page 8: Ralph S. Silva · Teoria Bayesiana Tests and Confidence Regions The Bayes factor can be perceived as a Bayesian likelihood ratio since, if ˇ 0 is the prior distribution under H

Teoria Bayesiana

Tests and Confidence Regions

It shows that it is equivalent to weight both hypotheses equally,η0 = η1 = 1/2, and to modify the error penalties into a′i = aiηi (i = 0, 1) or topenalize similarly both types of errors (a0 = a1 = 1), when the priordistribution incorporates the actual weights in the weighted prior probabilities,

η′0 =a0η0

a0η0 + a1η1and η1 =

a1η1

a0η0 + a1η1.

A scale to judge the evidence in favor of or against H0 brought by the data,outside a true decision-theoretic setting:

I if log10(Bπ10) varies between 0 and 0.5, the evidence against H0 is poor,I if it is between 0.5 and 1, it is is substantial,I if it is between 1 and 2, it is strong, andI if it is above 2 it is decisive.

Page 9: Ralph S. Silva · Teoria Bayesiana Tests and Confidence Regions The Bayes factor can be perceived as a Bayesian likelihood ratio since, if ˇ 0 is the prior distribution under H

Teoria Bayesiana

Tests and Confidence Regions

Point-null hypotheses

Considering the point-null hypothesis H0 : θ = θ0 andπ0(θ) = η0IΘ0 (θ) + (1− η0)g1(θ). The posterior probability of H0 is given by

π(Θ0|x) =f (x |θ0)η0∫

f (x |θ)π(θ)dθ=

f (x |θ0)η0

f (x |θ0)η0 + (1− η0)m1(x)

whereg1(θ) ∝ π(θ)IΘ1 (θ) and m1(x) =

∫Θ1

f (x |θ)g1(θ)dθ.

Page 10: Ralph S. Silva · Teoria Bayesiana Tests and Confidence Regions The Bayes factor can be perceived as a Bayesian likelihood ratio since, if ˇ 0 is the prior distribution under H

Teoria Bayesiana

Tests and Confidence Regions

Pseudo-Bayes factors

DefinitionGiven an improper prior π, a sample (x1, . . . , xn) is a training sample if thecorresponding posterior π(|x1, . . . , xn) is proper and is a minimal trainingsample if no subsample is a training sample.

When facing a hypothesis H0 with a prior distribution π0, with a broaderalternative H1 with a prior distribution π1, if the minimal training sample underH1 is such that π0(x(−`)) is also proper, the pseudo-Bayes factor

B(`)10 =

∫Θ1

f1(x(−`)|θ1)π1(θ1|x(`))dθ1∫Θ0

f0(x(−`)|θ0)π0(θ0|x(`))dθ0

is then independent from the normalizing constants used in both π0 and π1.

Page 11: Ralph S. Silva · Teoria Bayesiana Tests and Confidence Regions The Bayes factor can be perceived as a Bayesian likelihood ratio since, if ˇ 0 is the prior distribution under H

Teoria Bayesiana

Tests and Confidence Regions

LemmaIn the case of independent distributions, the pseudo-Bayes factor can bewritten as

B(`)10 = B10(x)× B01(x(`)), with

B10(x) =

∫Θ1

f1(x |θ1)π1(θ1)dθ1∫Θ0

f0(x |θ0)π0(θ0)dθ0

B01(x(`)) =

∫Θ0

f0(x(`)|θ0)π0(θ0)dθ0∫Θ1

f1(x(`)|θ1)π1(θ1)dθ1

Page 12: Ralph S. Silva · Teoria Bayesiana Tests and Confidence Regions The Bayes factor can be perceived as a Bayesian likelihood ratio since, if ˇ 0 is the prior distribution under H

Teoria Bayesiana

Tests and Confidence Regions

A way to remove the dependence on the training sample is to average thedifferent pseudo-Bayes factors over all the possible training samples x(`). Thenext difficulty is to decide what kind of averaging should be used here.

I the arithmetic intrinsic Bayes factor

BA10 =

1L

∑x(`)

B(`)10 = B10(x)

1L

∑x(`)

B01(x(`)),

where L is the number of different training samples;I the geometric intrinsic Bayes factor

BG10 = exp

1L

∑x(`)

log(B(`)

10

) = B10(x) exp

1L

∑x(`)

log(B01(x(`))

) ; and

I the median intrinsic Bayes factor

BM10 = med

(B(`)

10

)= B10(x)med

(B01(x(`))

).

Page 13: Ralph S. Silva · Teoria Bayesiana Tests and Confidence Regions The Bayes factor can be perceived as a Bayesian likelihood ratio since, if ˇ 0 is the prior distribution under H

Teoria Bayesiana

Tests and Confidence Regions

Another idea is to use a fraction 0 < b < 1 of the likelihood to “properize” theprior:∫

Θ0

[f0(x |θ0)]bπ0(θ0)dθ0 <∞ and∫

Θ1

[f1(x |θ1)]bπ1(θ1)dθ1 <∞.

The remaining fraction (1− b) of the likelihood is then used to run the test.

BF10 =

∫Θ1

[f1(x |θ1)]1−bπb1 (θ1|x)dθ1∫

Θ0

[f0(x |θ0)]1−bπb0 (θ0|x)dθ0

= B10(x)

∫Θ0

[f0(x |θ0)]bπ0(θ0)dθ0∫Θ1

[f1(x |θ1)]bπ1(θ1)dθ1

.

where πb0 (θ0|x) and πb

1 (θ1|x) denote the pseudo-posteriors associated with[f0(x |θ0)]b and [f1(x |θ1)]b, respectively.

For exponential families, the fraction b clearly corresponds to a trainingsample size, since for n observations from an exponential family withsufficient statistic T

[exp{θ.nT (x)− nΨ(θ)}]b = exp{θ.[bn]T (x)− [bn]Ψ(θ)}.

Page 14: Ralph S. Silva · Teoria Bayesiana Tests and Confidence Regions The Bayes factor can be perceived as a Bayesian likelihood ratio since, if ˇ 0 is the prior distribution under H

Teoria Bayesiana

Tests and Confidence Regions

There are enough difficulties with these pseudo-Bayes factors to make usquestion their use in testing and model choice problems:

I Bayes factors, when associated with proper priors, do satisfy somecoherence properties such as

B12 = B10B02 and B01 = 1/B10.

Most pseudo-Bayes factors do not, even though the fractional Bayesfactor satisfies BF

01 = 1/BF10.

I When the pseudo-Bayes factors can be expressed as true Bayes factors,the corresponding intrinsic priors are not necessarily appealing, andthese priors do depend on the choice of the improper reference priors π0

and π1, hence are hardly intrinsic.I The pseudo-Bayes factors may also exhibit a bias in favor of one of the

hypotheses, in the sense that they can be expressed as a true Bayesfactor multiplied by a certain factor.

I Most often, the pseudo-Bayes factors do not correspond to any trueBayes factor, and they may give strongly biased solutions. For instance,the arithmetic intrinsic Bayes factors do not have intrinsic priors for mostone-sided testing problems.

Page 15: Ralph S. Silva · Teoria Bayesiana Tests and Confidence Regions The Bayes factor can be perceived as a Bayesian likelihood ratio since, if ˇ 0 is the prior distribution under H

Teoria Bayesiana

Tests and Confidence Regions

I Pseudo-Bayes factors may simply not exist for a whole class of models,(e.g. mixture of normals model).

I There are many ways of defining pseudo-Bayes factors and, while mostare arguably logical, there is no coherent way of ordering them.Pseudo-Bayes factors, as defined here, do agree with the LikelihoodPrinciple, but the multiplication of possible answers, even if those areclose, is not a good signal to users. Similarly, there is no clear-cutprocedure for the choice of the fraction b in the fractional Bayes factors,since the minimal training sample size is not always clearly defined.

I The issue of computing pseudo-Bayes factors has not been mentionedso far. But each Bayes factor B(`)

10 may be a complex integral and the

derivation of the averaged intrinsic Bayes factor may involve(

nm

)integrals of this kind, if m is the minimal training sample size. FractionalBayes factors are easier to compute in exponential settings, but otherdistributions are much more difficult to handle.

Page 16: Ralph S. Silva · Teoria Bayesiana Tests and Confidence Regions The Bayes factor can be perceived as a Bayesian likelihood ratio since, if ˇ 0 is the prior distribution under H

Teoria Bayesiana

Tests and Confidence Regions

A second decision-theoretic approach

The testing problem formalized by Neyman and Pearson can be expressedas estimating the indicator function IΘ0 (θ) under the 0-1 loss or, equivalently,the absolute error loss

L1(θ, ϕ) = |ϕ− IΘ0 (θ)|.

We turn to a less restrictive theory, according to which estimators take valuesin D = [0, 1] and can be considered as indicators of the degree of evidence infavor of H0.

The L1(θ, ϕ) loss is too similar to the 0-1 loss function as it provides the sameBayes procedures

ϕπ(x) =

{1, if Pπ(θ ∈ Θ0|x) > Pπ(θ 6∈ Θ0|x)0, otherwise.

Page 17: Ralph S. Silva · Teoria Bayesiana Tests and Confidence Regions The Bayes factor can be perceived as a Bayesian likelihood ratio since, if ˇ 0 is the prior distribution under H

Teoria Bayesiana

Tests and Confidence Regions

In the opposite, strictly convex losses, such as the quadratic loss

L2(θ, ϕ) = (ϕ− IΘ0 (θ))2,

lead to more adaptive estimators.

PropositionUnder the loss L2(θ, ϕ), the Bayes estimator associated with π is theposterior probability

ϕπ(x) = Pπ(θ ∈ Θ0|x).

The posterior expectation of IΘ0 (θ) is nothing but the posterior probability ofΘ0.

This (proper) loss provides a decision-theoretic foundation to the use ofposterior probabilities as Bayesian answers.

Page 18: Ralph S. Silva · Teoria Bayesiana Tests and Confidence Regions The Bayes factor can be perceived as a Bayesian likelihood ratio since, if ˇ 0 is the prior distribution under H

Teoria Bayesiana

Tests and Confidence Regions

DefinitionFor a one-sided test, i.e., for hypotheses of the form H0 : θ 6 θ0 versusH1 : θ > θ0, an interval [t1, t2] is said to be a truncation set for the estimator ϕif ϕ(t) = 1 when t < t1 and ϕ(t) = 0 when t > t2. For a two-sided test ofH0 : θ ∈ [θ1, θ2] versus H1 : θ 6∈ [θ1, θ2], the interval [t1, t2] is said to be atruncation set for the estimator ϕ if ϕ(t) = 0 when t 6∈ [t1, t2].

TheoremFor the two-sided problem H0 : θ ∈ [θ1, θ2] versus H1 : θ 6∈ [θ1, θ2], anestimator ϕ with truncation set [t1, t2] is admissible if there exist a probabilitymeasure π0 [θ1, θ2] and a σ-finite measure π1 on [t2, t2]c such that

ϕ(x) =

∫f (x |θ)π0(θ)dθ∫

f (x |θ)π0(θ)dθ +

∫f (x |θ)π1(θ)dθ

,

for x ∈ [t1, t2]. Conversely, if ϕ is admissible, there exist [t1, t2], π0, and π1

such that the equality (on ϕ(x)) above holds.

Page 19: Ralph S. Silva · Teoria Bayesiana Tests and Confidence Regions The Bayes factor can be perceived as a Bayesian likelihood ratio since, if ˇ 0 is the prior distribution under H

Teoria Bayesiana

Tests and Confidence Regions

In the one-sided case, we can only propose an admissibility necessarycondition, but it implies that the generalized Bayes estimators form acomplete class.

TheoremFor the one-sided problem H0 : θ 6 θ0 versus H1 : θ > θ0, if ϕ is admissible,there exists an increasing procedure ϕ′ such that ϕ′ is (risk) equivalent to ϕ.If ϕ is an increasing admissible procedure and [t1, t2] is a truncation set suchthat 0 < ϕ(x) < 1 on [t1, t2], there exist two σ-finite measures on (−∞, θ0]and [θ0,+∞), π0 and π1, such that

1 =

∫exp{t0θ − ψ(θ)}(π0(θ) + π1(θ))dθ

for t1 < t0 < t2 and ϕ(x) given as in the previous theorem.

These two complete class theorems show that it is sufficient to consider thegeneralized Bayes estimators to obtain admissible estimators underquadratic loss.

Page 20: Ralph S. Silva · Teoria Bayesiana Tests and Confidence Regions The Bayes factor can be perceived as a Bayesian likelihood ratio since, if ˇ 0 is the prior distribution under H

Teoria Bayesiana

Tests and Confidence Regions

TheoremFor the test H0 : θ ∈ [θ1, θ2] versus H1 : θ 6∈ [θ1, θ2], when the samplingdistribution is continuous with respect to the Lebesgue measure, the p-valueis inadmissible for the loss L2(θ, ϕ) = (ϕ− IΘ0 (θ))2.

This result shows that p-values do not belong to the range of Bayesiananswers. It justifies the rejection of p-values for two-sided hypotheses.

The inadmissibility of p-values can be extended to most bounded properlosses.

Page 21: Ralph S. Silva · Teoria Bayesiana Tests and Confidence Regions The Bayes factor can be perceived as a Bayesian likelihood ratio since, if ˇ 0 is the prior distribution under H

Teoria Bayesiana

Tests and Confidence Regions

Credible intervals

DefinitionFor a prior distribution π, a set Cx is said to be an α-credible set if

Pπ(θ ∈ Cx |x) > 1− α.

This region is called an HPD α-credible region (for highest posterior density)if it can be written under the form

{θ : π(θ|x) > kα} ⊂ Cπx ⊂ {θ : π(θ|x) > kα}

where kα is the largest bound such that

Pπ(θ ∈ Cαx |x) > 1− α.