     • date post

15-Feb-2020
• Category

Documents

• view

4

0

Embed Size (px)

Transcript of (5) Multi-parameter models - Summarizing the reich/ABA/notes/ (5) Multi-parameter models -...

• (5) Multi-parameter models - Summarizing the posterior

ST440/550: Applied Bayesian Analysis

ST440/550: Applied Bayesian Analysis (5) Multi-parameter models - Summarizing the posterior

• Models with more than one parameter

I Thus far we have studied single-parameter models, but most analyses have several parameters

I For example, consider the normal model: Yi ∼ N(µ, σ2) with priors µ ∼ N(µ0, σ20) and σ2 ∼ InvGamma(a,b)

I We want to study the joint posterior distribution p(µ, σ2|Y)

I As another example, consider the simple linear regression model

Yi ∼ N(β0 + X1iβ1, σ2)

I We want to study the joint posterior f (β0, β1, σ2|Y)

ST440/550: Applied Bayesian Analysis (5) Multi-parameter models - Summarizing the posterior

• Models with more than one parameter

I How to compute high-dimensional (many parameters) posterior distributions?

I How to visualize the posterior?

I How to summarize them concisely?

ST440/550: Applied Bayesian Analysis (5) Multi-parameter models - Summarizing the posterior

• Bayesian one-sample t-test

I In this section we will study the one-sample t-test in depth

I Likelihood: Yi |µ, σ ∼ N(µ, σ2) independent over i = 1, ...,n

I Priors: µ ∼ N(µ0, σ20) independent of σ2 ∼ InvGamma(a,b)

I The joint (bivariate PDF) of (µ, σ2) is proportional to{ σn exp

[ − ∑n

i=1(Yi − µ)2

2σ2

]} exp

[ − (µ− µ0)

2

2σ20

] (σ2)a−1 exp(− b

σ2 )

I How to summarize this complicated function?

ST440/550: Applied Bayesian Analysis (5) Multi-parameter models - Summarizing the posterior

• Plotting the posterior on a grid

I For models with only a few parameters we could simply plot the posterior on a grid

I That is, we compute p(µ, σ2|Y1, ...,Yn) for all combinations of m values of µ and m values of σ2

I The number of grid points is mp where p is the number of parameters in the model

I See http: //www4.stat.ncsu.edu/~reich/ABA/code/NN

ST440/550: Applied Bayesian Analysis (5) Multi-parameter models - Summarizing the posterior

http://www4.stat.ncsu.edu/~reich/ABA/code/NN http://www4.stat.ncsu.edu/~reich/ABA/code/NN

• Summarizing the results in a table

I Typically we are interested in the marginal posterior

f (µ|Y) = ∫ ∞

0 p(µ, σ2|Y)dσ2

where Y = (Y1, ...,Yn)

I This accounts for our uncertainty about σ2

I We could also report the marginal posterior of σ2

I Results are usually given in a table with marginal mean, SD, and 95% interval for all parameters of interest

I The marginal posteriors can be computed using numerical integration

I See http: //www4.stat.ncsu.edu/~reich/ABA/code/NN

ST440/550: Applied Bayesian Analysis (5) Multi-parameter models - Summarizing the posterior

http://www4.stat.ncsu.edu/~reich/ABA/code/NN http://www4.stat.ncsu.edu/~reich/ABA/code/NN

• Frequentist analysis of a normal mean

I In frequentist statistics the estimate of the mean is Ȳ

I If σ is known the 95% interval is

Ȳ ± z0.975 σ√ n

where z is the quantile of a normal distribution

I If σ is unknown the 95% interval is

Ȳ ± t0.975,n−1 s√ n

where t is the quantile of a t-distribution

ST440/550: Applied Bayesian Analysis (5) Multi-parameter models - Summarizing the posterior

• Bayesian analysis of a normal mean

I The Bayesian estimate of µ is its marginal posterior mean

I The interval estimate is the 95% posterior interval

I If σ is known the posterior of µ|Y is Gaussian and the 95% interval is

E(µ|Y)± z0.975SD(µ|Y)

I If σ is unknown the marginal (over σ2) posterior of µ is t with ν = n + 2a degrees of freedom.

I Therefore the 95% interval is

E(µ|Y)± t0.975,νSD(µ|Y)

I See “Marginal posterior of µ” on http://www4.stat. ncsu.edu/~reich/ABA/derivations5.pdf

ST440/550: Applied Bayesian Analysis (5) Multi-parameter models - Summarizing the posterior

http://www4.stat.ncsu.edu/~reich/ABA/derivations5.pdf http://www4.stat.ncsu.edu/~reich/ABA/derivations5.pdf

• Bayesian analysis of a normal mean

I The following two slides give the posterior of µ for a data set with sample mean 10 and sample variance 4

I The Gaussian analysis assumes σ2 = 4 is known

I The t analysis integrates over uncertainty in σ2

I As expected, the latter interval is a bit wider

ST440/550: Applied Bayesian Analysis (5) Multi-parameter models - Summarizing the posterior

• Bayesian analysis of a normal mean

6 8 10 12 14

0. 00

0. 01

0. 02

0. 03

0. 04

n = 5

µ

D en

si ty

Gaussian Student's t

ST440/550: Applied Bayesian Analysis (5) Multi-parameter models - Summarizing the posterior

• Bayesian analysis of a normal mean

8 9 10 11 12

0. 00

0 0.

00 2

0. 00

4 0.

00 6

0. 00

8 0.

01 0

n = 25

µ

D en

si ty

Gaussian Student's t

ST440/550: Applied Bayesian Analysis (5) Multi-parameter models - Summarizing the posterior

• Bayesian one sample t-test

I The one-sided test of H1 : µ ≤ 0 versus H2 : µ > 0 is conducted by computing the posterior probability of each hypothesis

I This is done with the pt function in R

I The two-sided test of H1 : µ = 0 versus H2 : µ 6= 0 is conducted by either

I Determining if 0 is in the 95% posterior interval I Bayes factor (later)

ST440/550: Applied Bayesian Analysis (5) Multi-parameter models - Summarizing the posterior

• Methods for dealing with multiple parameters

I In this case, we were able to compute the marginal posterior in closed form (a t distribution)

I We were also able to compute the posterior on a grid

I For most analyses the marginal posteriors will not be a nice distributions, and a grid is impossible if there are many parameters

I We need new tools!

ST440/550: Applied Bayesian Analysis (5) Multi-parameter models - Summarizing the posterior

• Methods for dealing with multiple parameters

Some approaches to dealing with complicated joint posteriors:

I Just use a point estimate, ignore uncertainty

I Approximate the posterior as normal

I Numerical integration

I Monte Carlo sampling

ST440/550: Applied Bayesian Analysis (5) Multi-parameter models - Summarizing the posterior

• MAP estimation

I Summarizing an entire joint distribution is challenging

I Sometimes you don’t need an entire posterior distribution and a single point estimate will do

I Example: prediction in machine learning

I The Maximum a Posteriori (MAP) estimate is the posterior mode

θ̂MAP = argmin θ

p(θ|Y)

I This is similar to the maximum likelihood estimation but includes the prior

ST440/550: Applied Bayesian Analysis (5) Multi-parameter models - Summarizing the posterior

• Univariate example

Say Y |θ ∼ Binomial(n, θ) and θ ∼ Beta(0.5,0.5), find θ̂MAP

ST440/550: Applied Bayesian Analysis (5) Multi-parameter models - Summarizing the posterior

• Bayesian central limit theorem

I Another simplification is to approximate the posterior as Gaussian

I Berstein-Von Mises Theorem: As the sample size grows the posterior doesn’t depend on the prior

I Frequentist result: As the sample size grows the likelihood function is approximately normal

I Bayesian CLT: For large n and some other conditions θ|Y ≈ Normal

ST440/550: Applied Bayesian Analysis (5) Multi-parameter models - Summarizing the posterior

• Bayesian central limit theorem

I Bayesian CLT: For large n and some other conditions

θ ∼ Normal[θ̂MAP , I(θ̂MAP)−1]

I I is Fisher’s information matrix

I The (j , k) element of I is

− ∂ 2

∂θj∂θk log[p(θ|Y)]

evaluated at θ̂MAP

I We have marginal and conditional means, standard deviations and intervals for the normal distribution

ST440/550: Applied Bayesian Analysis (5) Multi-parameter models - Summarizing the posterior

• Univariate example Say Y |θ ∼ Binomial(n, θ) and θ ∼ Beta(0.5,0.5), find the Gaussian approximation for p(θ|Y)

http: //www4.stat.ncsu.edu/~reich/ABA/code/Bayes_CLT

ST440/550: Applied Bayesian Analysis (5) Multi-parameter models - Summarizing the posterior

http://www4.stat.ncsu.edu/~reich/ABA/code/Bayes_CLT http://www4.stat.ncsu.edu/~reich/ABA/code/Bayes_CLT

• Numerical integration

I Many posterior summaries of interest are integrals over the posterior

I Ex: E(θj |Y) = ∫ θjp(θ)dθ

I Ex: V(θj |Y) = ∫

[θj − E(θ|Y)]2p(θ)dθ

I These are p dimensional integrals that we usually can’t solve analytically

I A grid approximation is a crude approach