(5) Multi-parameter models - Summarizing the reich/ABA/notes/ (5) Multi-parameter models -...

download (5) Multi-parameter models - Summarizing the reich/ABA/notes/ (5) Multi-parameter models - Summarizing

of 22

  • date post

    15-Feb-2020
  • Category

    Documents

  • view

    4
  • download

    0

Embed Size (px)

Transcript of (5) Multi-parameter models - Summarizing the reich/ABA/notes/ (5) Multi-parameter models -...

  • (5) Multi-parameter models - Summarizing the posterior

    ST440/550: Applied Bayesian Analysis

    ST440/550: Applied Bayesian Analysis (5) Multi-parameter models - Summarizing the posterior

  • Models with more than one parameter

    I Thus far we have studied single-parameter models, but most analyses have several parameters

    I For example, consider the normal model: Yi ∼ N(µ, σ2) with priors µ ∼ N(µ0, σ20) and σ2 ∼ InvGamma(a,b)

    I We want to study the joint posterior distribution p(µ, σ2|Y)

    I As another example, consider the simple linear regression model

    Yi ∼ N(β0 + X1iβ1, σ2)

    I We want to study the joint posterior f (β0, β1, σ2|Y)

    ST440/550: Applied Bayesian Analysis (5) Multi-parameter models - Summarizing the posterior

  • Models with more than one parameter

    I How to compute high-dimensional (many parameters) posterior distributions?

    I How to visualize the posterior?

    I How to summarize them concisely?

    ST440/550: Applied Bayesian Analysis (5) Multi-parameter models - Summarizing the posterior

  • Bayesian one-sample t-test

    I In this section we will study the one-sample t-test in depth

    I Likelihood: Yi |µ, σ ∼ N(µ, σ2) independent over i = 1, ...,n

    I Priors: µ ∼ N(µ0, σ20) independent of σ2 ∼ InvGamma(a,b)

    I The joint (bivariate PDF) of (µ, σ2) is proportional to{ σn exp

    [ − ∑n

    i=1(Yi − µ)2

    2σ2

    ]} exp

    [ − (µ− µ0)

    2

    2σ20

    ] (σ2)a−1 exp(− b

    σ2 )

    I How to summarize this complicated function?

    ST440/550: Applied Bayesian Analysis (5) Multi-parameter models - Summarizing the posterior

  • Plotting the posterior on a grid

    I For models with only a few parameters we could simply plot the posterior on a grid

    I That is, we compute p(µ, σ2|Y1, ...,Yn) for all combinations of m values of µ and m values of σ2

    I The number of grid points is mp where p is the number of parameters in the model

    I See http: //www4.stat.ncsu.edu/~reich/ABA/code/NN

    ST440/550: Applied Bayesian Analysis (5) Multi-parameter models - Summarizing the posterior

    http://www4.stat.ncsu.edu/~reich/ABA/code/NN http://www4.stat.ncsu.edu/~reich/ABA/code/NN

  • Summarizing the results in a table

    I Typically we are interested in the marginal posterior

    f (µ|Y) = ∫ ∞

    0 p(µ, σ2|Y)dσ2

    where Y = (Y1, ...,Yn)

    I This accounts for our uncertainty about σ2

    I We could also report the marginal posterior of σ2

    I Results are usually given in a table with marginal mean, SD, and 95% interval for all parameters of interest

    I The marginal posteriors can be computed using numerical integration

    I See http: //www4.stat.ncsu.edu/~reich/ABA/code/NN

    ST440/550: Applied Bayesian Analysis (5) Multi-parameter models - Summarizing the posterior

    http://www4.stat.ncsu.edu/~reich/ABA/code/NN http://www4.stat.ncsu.edu/~reich/ABA/code/NN

  • Frequentist analysis of a normal mean

    I In frequentist statistics the estimate of the mean is Ȳ

    I If σ is known the 95% interval is

    Ȳ ± z0.975 σ√ n

    where z is the quantile of a normal distribution

    I If σ is unknown the 95% interval is

    Ȳ ± t0.975,n−1 s√ n

    where t is the quantile of a t-distribution

    ST440/550: Applied Bayesian Analysis (5) Multi-parameter models - Summarizing the posterior

  • Bayesian analysis of a normal mean

    I The Bayesian estimate of µ is its marginal posterior mean

    I The interval estimate is the 95% posterior interval

    I If σ is known the posterior of µ|Y is Gaussian and the 95% interval is

    E(µ|Y)± z0.975SD(µ|Y)

    I If σ is unknown the marginal (over σ2) posterior of µ is t with ν = n + 2a degrees of freedom.

    I Therefore the 95% interval is

    E(µ|Y)± t0.975,νSD(µ|Y)

    I See “Marginal posterior of µ” on http://www4.stat. ncsu.edu/~reich/ABA/derivations5.pdf

    ST440/550: Applied Bayesian Analysis (5) Multi-parameter models - Summarizing the posterior

    http://www4.stat.ncsu.edu/~reich/ABA/derivations5.pdf http://www4.stat.ncsu.edu/~reich/ABA/derivations5.pdf

  • Bayesian analysis of a normal mean

    I The following two slides give the posterior of µ for a data set with sample mean 10 and sample variance 4

    I The Gaussian analysis assumes σ2 = 4 is known

    I The t analysis integrates over uncertainty in σ2

    I As expected, the latter interval is a bit wider

    ST440/550: Applied Bayesian Analysis (5) Multi-parameter models - Summarizing the posterior

  • Bayesian analysis of a normal mean

    6 8 10 12 14

    0. 00

    0. 01

    0. 02

    0. 03

    0. 04

    n = 5

    µ

    D en

    si ty

    Gaussian Student's t

    ST440/550: Applied Bayesian Analysis (5) Multi-parameter models - Summarizing the posterior

  • Bayesian analysis of a normal mean

    8 9 10 11 12

    0. 00

    0 0.

    00 2

    0. 00

    4 0.

    00 6

    0. 00

    8 0.

    01 0

    n = 25

    µ

    D en

    si ty

    Gaussian Student's t

    ST440/550: Applied Bayesian Analysis (5) Multi-parameter models - Summarizing the posterior

  • Bayesian one sample t-test

    I The one-sided test of H1 : µ ≤ 0 versus H2 : µ > 0 is conducted by computing the posterior probability of each hypothesis

    I This is done with the pt function in R

    I The two-sided test of H1 : µ = 0 versus H2 : µ 6= 0 is conducted by either

    I Determining if 0 is in the 95% posterior interval I Bayes factor (later)

    ST440/550: Applied Bayesian Analysis (5) Multi-parameter models - Summarizing the posterior

  • Methods for dealing with multiple parameters

    I In this case, we were able to compute the marginal posterior in closed form (a t distribution)

    I We were also able to compute the posterior on a grid

    I For most analyses the marginal posteriors will not be a nice distributions, and a grid is impossible if there are many parameters

    I We need new tools!

    ST440/550: Applied Bayesian Analysis (5) Multi-parameter models - Summarizing the posterior

  • Methods for dealing with multiple parameters

    Some approaches to dealing with complicated joint posteriors:

    I Just use a point estimate, ignore uncertainty

    I Approximate the posterior as normal

    I Numerical integration

    I Monte Carlo sampling

    ST440/550: Applied Bayesian Analysis (5) Multi-parameter models - Summarizing the posterior

  • MAP estimation

    I Summarizing an entire joint distribution is challenging

    I Sometimes you don’t need an entire posterior distribution and a single point estimate will do

    I Example: prediction in machine learning

    I The Maximum a Posteriori (MAP) estimate is the posterior mode

    θ̂MAP = argmin θ

    p(θ|Y)

    I This is similar to the maximum likelihood estimation but includes the prior

    ST440/550: Applied Bayesian Analysis (5) Multi-parameter models - Summarizing the posterior

  • Univariate example

    Say Y |θ ∼ Binomial(n, θ) and θ ∼ Beta(0.5,0.5), find θ̂MAP

    ST440/550: Applied Bayesian Analysis (5) Multi-parameter models - Summarizing the posterior

  • Bayesian central limit theorem

    I Another simplification is to approximate the posterior as Gaussian

    I Berstein-Von Mises Theorem: As the sample size grows the posterior doesn’t depend on the prior

    I Frequentist result: As the sample size grows the likelihood function is approximately normal

    I Bayesian CLT: For large n and some other conditions θ|Y ≈ Normal

    ST440/550: Applied Bayesian Analysis (5) Multi-parameter models - Summarizing the posterior

  • Bayesian central limit theorem

    I Bayesian CLT: For large n and some other conditions

    θ ∼ Normal[θ̂MAP , I(θ̂MAP)−1]

    I I is Fisher’s information matrix

    I The (j , k) element of I is

    − ∂ 2

    ∂θj∂θk log[p(θ|Y)]

    evaluated at θ̂MAP

    I We have marginal and conditional means, standard deviations and intervals for the normal distribution

    ST440/550: Applied Bayesian Analysis (5) Multi-parameter models - Summarizing the posterior

  • Univariate example Say Y |θ ∼ Binomial(n, θ) and θ ∼ Beta(0.5,0.5), find the Gaussian approximation for p(θ|Y)

    http: //www4.stat.ncsu.edu/~reich/ABA/code/Bayes_CLT

    ST440/550: Applied Bayesian Analysis (5) Multi-parameter models - Summarizing the posterior

    http://www4.stat.ncsu.edu/~reich/ABA/code/Bayes_CLT http://www4.stat.ncsu.edu/~reich/ABA/code/Bayes_CLT

  • Numerical integration

    I Many posterior summaries of interest are integrals over the posterior

    I Ex: E(θj |Y) = ∫ θjp(θ)dθ

    I Ex: V(θj |Y) = ∫

    [θj − E(θ|Y)]2p(θ)dθ

    I These are p dimensional integrals that we usually can’t solve analytically

    I A grid approximation is a crude approach

    I Gaussian quadrature is better

    I The Iteratively Nested Laplace Approximation (INLA) is an even more sophisticated method

    ST440/550: Applied Bayesian Analysis (5) Multi-parameter mo