Post on 21-Jul-2018
Bayes’ Theorem
Let A and B1, . . . , Bk be events in a sample space Ω.
Inversion problem: given P(A|Bj) (and P(Bj)) find P(Bj|A)
Bayes’ Theorem:
P(Bj|A) =P(A|Bj)P(Bj)
P(A)=
P(A|Bj)P(Bj)∑ki=1P(A|Bi)P(Bi)
For continuous random variables X and Y , Bayes’ Theorem is formulated
in terms of densities:
fX|Y (x|y) =fY |X(y|x) fX(x)
fY (y)=
fY |X(y|x) fX(x)∫fY |X(y|x) fX(x) dx
Application to statistical inference:
Probabilistic model: f(y|θ) - distribution of Y for fixed θ
Statistical problem: given data y make statements about θ
Likelihood: l(θ|y) = f(y|θ) (reflects inversion problem)
Bayesian approach:
A Bayesian statistical (parametric) model consists of
f(y|θ), a parametric statistical model (likelihood function), and
π(θ), a prior distribution on the parameters.
The posterior distribution of the parameter θ is
π(θ|y) =fY |θ(y|θ) π(θ)∫
Θ fY |θ(y|θ) π(θ) dθ∼ fY |θ(y|θ) π(θ)
The Bayesian modelling approach can be summarized by
posterior ∼ likelihood× prior.
Bayesian interpretation of probability
probability = (subjective) uncertainty
Bayesian Inference, Apr 20, 2004 - 1 -
Bayesian Inference, Apr 20, 2004 - 2 -
Bayesian Inference
Example: Binomial distribution
Likelihood function
Y |θ ∼ Bin(n, θ)
Prior distribution
θ ∼ U(0, 1) = Beta(1, 1)
Posterior distribution
θ|Y ∼ Beta(1 + Y, 1 + n− Y )
Uncertainty about parameter can be up-
dated repeatedly when new data are avail-
able:
take current posterior distribution as
prior
compute new posterior distribution
conditional on new data
0.0 0.2 0.4 0.6 0.8 1.00.0
0.5
1.0
1.5
2.0
θ
post
erio
r den
sity
0.0 0.2 0.4 0.6 0.8 1.00
1
2
3
4
θ
post
erio
r den
sity
0.0 0.2 0.4 0.6 0.8 1.00
1
2
3
4
θpo
ster
ior d
ensi
ty
0.0 0.2 0.4 0.6 0.8 1.00
1
2
3
4
5
θ
post
erio
r den
sity
0.0 0.2 0.4 0.6 0.8 1.00
1
2
3
4
5
θ
post
erio
r den
sity
The posterior distribution is used for inference about θ:
posterior mean
E(θ|Y )
posterior variance
var(θ|Y ) = E((θ −E(θ|Y ))2
∣∣Y ) posterior confidence interval (credibility interval)∫ θr
θl
π(θ|Y ) dθ = 1− αBayesian Inference, Apr 20, 2004 - 3 -
Conjugate Priors
A mathematical convenient choice are conjugate priors: The posterior dis-
tribution belongs to the same parametric family as the prior distribution
with different parameters:
Likelihood Prior Posterior
f(y|θ) π(θ) π(θ|y)
Normal Normal Normal
N (θ, σ2) N (µ, τ 2) N(
σ2µ+τ2yσ2+τ2 , σ2τ2
σ2+τ2
)Poisson Gamma Gamma
Poisson(θ) Γ(α, β) Γ(α + y, β + 1)
Gamma Gamma Gamma
Γ(ν, θ) Γ(α, β) Γ(α + ν, β + y)
Binomial Beta Beta
Bin(n, θ) Beta(α, β) Beta(α + y, β + n− y)
Multinomial Dirichlet Dirichlet
Mk(θ1, . . . , θk) D(α1, . . . , αk) D(α1 + y1, . . . , αk + yk)
Normal Gamma Gamma
N (µ, 1/θ) Γ(α, β) Γ(α + 1
2 , β + 12 (µ− y)2
)Problems in choice of prior:
The conjugate priors might not reflect our uncertainty about θ correctly.
In general, for non-conjugate priors the posterior distribution is not
available in analytic form.
It is difficult to describe uncertainty about θ in form of a particular
distribution. In particular, we might be uncertain about the parameters
of the prior distribution ( hierarchical modelling, empirical Bayesian
methods).
Bayesian Inference, Apr 20, 2004 - 4 -
Bayesian Analysis with Missing Data
Bayesian statistical model:
Data model:
f(Y |θ) complete-data likelihood
f(R|Y, ξ) missing-data mechanism
Prior distribution:
π(θ, ξ)
The posterior distribution of θ and ξ is
π(θ, ξ|Yobs, R) ∼ f(Yobs, R|θ, ξ) π(θ, ξ)
=
∫f(Yobs, ymis, R|θ, ξ) π(θ, ξ) dymis
=
∫f(Yobs, ymis|θ) f(R|Yobs, ymis, ξ) π(θ, ξ) dymis
If the data are missing at random (MAR) then
π(θ, ξ|Yobs, R) ∼∫
f(Yobs, ymis|θ) f(R|Yobs, ξ) π(θ, ξ) dymis
=
∫f(Yobs, ymis|θ) dymis f(R|Yobs, ξ) π(θ, ξ)
= f(Yobs|θ) f(R|Yobs, ξ) π(θ, ξ)
Bayesian Inference, Apr 20, 2004 - 5 -
Bayesian Analysis with Missing Data
For inference on θ, we consider the marginal posterior distribution of θ
π(θ|Yobs, R) =
∫Ξπ(θ, ξ|Yobs, R) dξ
∼∫
Ξf(Yobs|θ) f(R|Yobs, ξ) π(θ, ξ) dξ
If the parameters are distinct in the sense that
π(θ, ξ) = π(θ) π(ξ)
then the marginal posterior distribution of θ satisfies
π(θ|Yobs, R) ∼ f(Yobs|θ) π(θ) ·∫
Ξf(R|Yobs, ξ) π(ξ) dξ
It follows that
π(θ|Yobs, R) =f(Yobs|θ) π(θ)∫
Θ f(Yobs|θ) π(θ) dθ
and hence π(θ|Yobs, R) = π(θ|Yobs).
Result:
The missing data mechanism is ignorable for posterior inference about
the parameter θ if
the data are missing at random (MAR) and
the parameters θ and ξ are distinct, that is
π(θ, ξ) = π(θ) π(ξ).
Bayesian Inference, Apr 20, 2004 - 6 -