Bayes’ Theorem - The University of Chicago, Department...

6

Click here to load reader

Transcript of Bayes’ Theorem - The University of Chicago, Department...

Page 1: Bayes’ Theorem - The University of Chicago, Department ...galton.uchicago.edu/~eichler/stat24600/Handouts/l06.pdf · Conjugate Priors A mathematical convenient choice are conjugate

Bayes’ Theorem

Let A and B1, . . . , Bk be events in a sample space Ω.

Inversion problem: given P(A|Bj) (and P(Bj)) find P(Bj|A)

Bayes’ Theorem:

P(Bj|A) =P(A|Bj)P(Bj)

P(A)=

P(A|Bj)P(Bj)∑ki=1P(A|Bi)P(Bi)

For continuous random variables X and Y , Bayes’ Theorem is formulated

in terms of densities:

fX|Y (x|y) =fY |X(y|x) fX(x)

fY (y)=

fY |X(y|x) fX(x)∫fY |X(y|x) fX(x) dx

Application to statistical inference:

Probabilistic model: f(y|θ) - distribution of Y for fixed θ

Statistical problem: given data y make statements about θ

Likelihood: l(θ|y) = f(y|θ) (reflects inversion problem)

Bayesian approach:

A Bayesian statistical (parametric) model consists of

f(y|θ), a parametric statistical model (likelihood function), and

π(θ), a prior distribution on the parameters.

The posterior distribution of the parameter θ is

π(θ|y) =fY |θ(y|θ) π(θ)∫

Θ fY |θ(y|θ) π(θ) dθ∼ fY |θ(y|θ) π(θ)

The Bayesian modelling approach can be summarized by

posterior ∼ likelihood× prior.

Bayesian interpretation of probability

probability = (subjective) uncertainty

Bayesian Inference, Apr 20, 2004 - 1 -

Page 2: Bayes’ Theorem - The University of Chicago, Department ...galton.uchicago.edu/~eichler/stat24600/Handouts/l06.pdf · Conjugate Priors A mathematical convenient choice are conjugate

Bayesian Inference, Apr 20, 2004 - 2 -

Page 3: Bayes’ Theorem - The University of Chicago, Department ...galton.uchicago.edu/~eichler/stat24600/Handouts/l06.pdf · Conjugate Priors A mathematical convenient choice are conjugate

Bayesian Inference

Example: Binomial distribution

Likelihood function

Y |θ ∼ Bin(n, θ)

Prior distribution

θ ∼ U(0, 1) = Beta(1, 1)

Posterior distribution

θ|Y ∼ Beta(1 + Y, 1 + n− Y )

Uncertainty about parameter can be up-

dated repeatedly when new data are avail-

able:

take current posterior distribution as

prior

compute new posterior distribution

conditional on new data

0.0 0.2 0.4 0.6 0.8 1.00.0

0.5

1.0

1.5

2.0

θ

post

erio

r den

sity

0.0 0.2 0.4 0.6 0.8 1.00

1

2

3

4

θ

post

erio

r den

sity

0.0 0.2 0.4 0.6 0.8 1.00

1

2

3

4

θpo

ster

ior d

ensi

ty

0.0 0.2 0.4 0.6 0.8 1.00

1

2

3

4

5

θ

post

erio

r den

sity

0.0 0.2 0.4 0.6 0.8 1.00

1

2

3

4

5

θ

post

erio

r den

sity

The posterior distribution is used for inference about θ:

posterior mean

E(θ|Y )

posterior variance

var(θ|Y ) = E((θ −E(θ|Y ))2

∣∣Y ) posterior confidence interval (credibility interval)∫ θr

θl

π(θ|Y ) dθ = 1− αBayesian Inference, Apr 20, 2004 - 3 -

Page 4: Bayes’ Theorem - The University of Chicago, Department ...galton.uchicago.edu/~eichler/stat24600/Handouts/l06.pdf · Conjugate Priors A mathematical convenient choice are conjugate

Conjugate Priors

A mathematical convenient choice are conjugate priors: The posterior dis-

tribution belongs to the same parametric family as the prior distribution

with different parameters:

Likelihood Prior Posterior

f(y|θ) π(θ) π(θ|y)

Normal Normal Normal

N (θ, σ2) N (µ, τ 2) N(

σ2µ+τ2yσ2+τ2 , σ2τ2

σ2+τ2

)Poisson Gamma Gamma

Poisson(θ) Γ(α, β) Γ(α + y, β + 1)

Gamma Gamma Gamma

Γ(ν, θ) Γ(α, β) Γ(α + ν, β + y)

Binomial Beta Beta

Bin(n, θ) Beta(α, β) Beta(α + y, β + n− y)

Multinomial Dirichlet Dirichlet

Mk(θ1, . . . , θk) D(α1, . . . , αk) D(α1 + y1, . . . , αk + yk)

Normal Gamma Gamma

N (µ, 1/θ) Γ(α, β) Γ(α + 1

2 , β + 12 (µ− y)2

)Problems in choice of prior:

The conjugate priors might not reflect our uncertainty about θ correctly.

In general, for non-conjugate priors the posterior distribution is not

available in analytic form.

It is difficult to describe uncertainty about θ in form of a particular

distribution. In particular, we might be uncertain about the parameters

of the prior distribution ( hierarchical modelling, empirical Bayesian

methods).

Bayesian Inference, Apr 20, 2004 - 4 -

Page 5: Bayes’ Theorem - The University of Chicago, Department ...galton.uchicago.edu/~eichler/stat24600/Handouts/l06.pdf · Conjugate Priors A mathematical convenient choice are conjugate

Bayesian Analysis with Missing Data

Bayesian statistical model:

Data model:

f(Y |θ) complete-data likelihood

f(R|Y, ξ) missing-data mechanism

Prior distribution:

π(θ, ξ)

The posterior distribution of θ and ξ is

π(θ, ξ|Yobs, R) ∼ f(Yobs, R|θ, ξ) π(θ, ξ)

=

∫f(Yobs, ymis, R|θ, ξ) π(θ, ξ) dymis

=

∫f(Yobs, ymis|θ) f(R|Yobs, ymis, ξ) π(θ, ξ) dymis

If the data are missing at random (MAR) then

π(θ, ξ|Yobs, R) ∼∫

f(Yobs, ymis|θ) f(R|Yobs, ξ) π(θ, ξ) dymis

=

∫f(Yobs, ymis|θ) dymis f(R|Yobs, ξ) π(θ, ξ)

= f(Yobs|θ) f(R|Yobs, ξ) π(θ, ξ)

Bayesian Inference, Apr 20, 2004 - 5 -

Page 6: Bayes’ Theorem - The University of Chicago, Department ...galton.uchicago.edu/~eichler/stat24600/Handouts/l06.pdf · Conjugate Priors A mathematical convenient choice are conjugate

Bayesian Analysis with Missing Data

For inference on θ, we consider the marginal posterior distribution of θ

π(θ|Yobs, R) =

∫Ξπ(θ, ξ|Yobs, R) dξ

∼∫

Ξf(Yobs|θ) f(R|Yobs, ξ) π(θ, ξ) dξ

If the parameters are distinct in the sense that

π(θ, ξ) = π(θ) π(ξ)

then the marginal posterior distribution of θ satisfies

π(θ|Yobs, R) ∼ f(Yobs|θ) π(θ) ·∫

Ξf(R|Yobs, ξ) π(ξ) dξ

It follows that

π(θ|Yobs, R) =f(Yobs|θ) π(θ)∫

Θ f(Yobs|θ) π(θ) dθ

and hence π(θ|Yobs, R) = π(θ|Yobs).

Result:

The missing data mechanism is ignorable for posterior inference about

the parameter θ if

the data are missing at random (MAR) and

the parameters θ and ξ are distinct, that is

π(θ, ξ) = π(θ) π(ξ).

Bayesian Inference, Apr 20, 2004 - 6 -