Bayesian Analysis of Binary and Polychotomous Response Data

Presentation on Bayesian Analysis of Binary and Polychotomous Response Data

Author(s): James H. Albert and Siddhartha Chib

By: Mohit Shukla

11435

Course: ECO543A

Introduction

Suppose Yi ~ Bernoulli(p

i)

P =

H(x

iT β)

If H = Standard Guassian => Probit H = Logistic cdf => Logit

Problem: Given π(β) proper/improperDetermining π(β/Data) is hard to deal with.

Introduction

Possible Solutions : For small models numerical integrationFor large models Monte Carlo integration

Solutions proposed:Simulation based approach for computing exact posterior

The Concept

Introduce N latent variables Zi ~ N(x

iT β,1)

Define Yi=1 if Z

i >0 and Y

i=0 if Z

i ≤0

Given the data Yi; Z

i's follow a truncated normal

distributionAbove observation mixed with Gibbs sampling will allow use to simulate from the exact posterior distribution of β

The Concept

This approach connects probit binary regression model on Y

i to normal linear regression model on

Zi

The sampling approach allows us to compute posterior of many parametersThe Bayesian residual is continuous so provides more information about outliers

The Gibbs SamplerInterest: to simulate from posterior of Θ It's easier to simulate for fully conditional distributions π = (Θ

k|{Θ

j, j≠k})

Gibbs Sampler:Initial guesses: Θ(0)

1 Θ(0)

2 Θ(0)

3 Θ(0)

4 .... Θ(0)

p

First cycle:Θ(1)

1 from π = (Θ

1|Θ(0)

2,Θ(0)

3,Θ(0)

4,..,Θ(0)

p)

Θ(1)2 from π = (Θ

2|Θ(1)

1,Θ(0)

3,Θ(0)

4,..,Θ(0)

p)

Θ(1)p from π = (Θ

p|Θ(1)

2,Θ(1)

3,Θ(1)

4,..,Θ(1)

p-1,)

The Gibbs Sampler

Cycle is iterated t times, to get Θ(t)

As t => ∞, Θ(t) => joint distribution of Θ

Replicating this process m times will give Θ(t)

1j,Θ(t)

2j,Θ(t)

3j .... Θ(t)

pj which can be used to

estimate posterior moments and densities

The Gibbs Sampler

DrawbacksSamples for t < t* are discardedafter initiation it may be necessary to repeat the simulation for larger no of replications for accuracy.“One run” solutionOnly one replication is used and cycle 2 is repeated a large no of times.

The Gibbs Sampler

Collect the values starting at the cycle t where Θ(t) ≈ a simulated value from the posterior of ΘMain objective is to collect a large no of values from the joint posterior of ΘStrong correlation b/w Θ(t) and Θ(t+1) so collect values at cycles t, t+n, t+2n

The Gibbs Sampler

Another goal is to estimate the densities of individual functions g(Θ

k)

Use a kernel density estimate to estimate simulated values of g(Θ

k)

E[g(Θk)] can also be estimated

either by sample mean of g(Θki) or by sample

mean of E[g(Θk)|{Θ

ri, r≠k}]

Standard error = sd of batch means /(no of batches)when lag 1 autocorrelation ≤ 0.5

Data Augmentation and Gibbs Sampling for Binary Data

Introduction

Let H = Φ be the probit model.Introduce Z

i's and Y

i's such that p

i = Φ(x

iT β) = P(Y

i

= 1) Joint posterior density of β and Z is complicatedHence Gibbs Sampling Marginal posterior distribution of β is easier this way.

Introduction

The t Link

Generalize Probit link by choosing H as t -distribution This helps in investigating the sensitivity of fitted probability to the choice of link functionMost popular link function for binary data is logitLogistic distribution is a member of t family with approximately 9 df

The t Link

To implement the Gibbs SamplerSet β = LSE under the probit modelSet λ

i = 1 for all i

Cycle through the equations in that order

The t Link

Hierarchical Analysis

Normal regression structure on Z also motivates the normal hierarchical modelZ ~ N(Xβ,I)B ~ N(Aβ0, σ2I)(Β0, σ2) is distributed according to prior π(Β0, σ2)

Generalization to a Multinomial Model

Ordered CategoriesLet p

ij = P[Y

i = j]

Regression modelη

ij = Φ(y

i - x

iT β) i = 1 to N and j = 1 to J-1

MotivationLatent Z

i ~ (x

iT β,1)

1)Yi = j if y

j-1 < Z

i ≤ y

j (y

0 = -∞ and y

J = ∞)

Unordered Categories with Latent Multinomial distribution

Independent unobserved latent variable Zi

where Zi = (Z

i1,Z

i2,Z

i3, ,Z

iJ) for J>2

Zij = x

ijT β + є

ij i = 1 to N and j = 1 to J

where єi = ( є

i1, є

i2, є

i3, є

iJ)T ~ N

j(0,∑)

∑ is a J*J Matrix


Here, i:experimental unit index, j:category indexOne of the J possible outcomes is observedcategory j is observed if Z

ij > Z

ik for k≠j

Gibbs Sampling approach can be used for multinomial probabilities


or Z = Xβ + є


For Gibbs Sampling we require samples fromβ|Y, Z

1 .... Z

N,Θ

Z1 .... Z

N | β,Y, Θ

Θ|Y, Z1 .... Z

N, β

First two distributions are normal but the last one isn't Hence to sample from that we can draw form a normal distribution with matching mode and curvature

Examples

Finney Data

Φ-1(pi) = β

0 + β

1x

1i + β

2x

2i

A uniform prior is placed on βDensity estimated of β

0, β

1 and β

2 are plotted and

it is found that Gibbs Sampling approximations are nearly equal to exact posterior distributions for large sample size

Finney Data

Election Data

To predict the Carter/Ford vote based on six socioeconomic factorsFitted probability for two models can be significantly differentOn the graph probit fitted probability was plotted against logit(t(4)probit.fit)-logit(probit probit.fit)For smaller p the p-fitted probabilities are significantly higher

Election Data

Election Data

To Confirm the connection b/w models, probit and logistic. Posterior analysis is comparedA t(8) r.v. is approximately .634 times a logistic

So a logistic parameter ~ N(0,1) ≈ t(8) parameter ~ N(0,0.4)

Posterior densities of t(8) and Logistic models have been compared using Gibbs Samping

Election Data

Election Data

Using a t link with unknown df ε {4,8,16,32}

The Gibbs sampler (10,000) gave the probabilities of .52, .21, .14, .11

The posterior distribution of probability pi using t

link were noticeably different than using the probit link

Election Data

Election Data

In classical model estimating size of residual yi –

pi = y

i - H(x

iT β) is difficult

In baysian case it takes values in range yi -1, y

i

One can check informally for outliers by checking distributions focused away form 0

Election Data

A Trivial Probit Example

Zik is perceived attractiveness of subject i of the

kth mode of transportation

Xik

denoted the travel time by mode k

Attractiveness of travel time is represented by βModel 1 and 2 are believed to be correlated by covariance ρ

Define wik = Z

ik -x

ik β for k=1,2

A Trivial Probit Example

Daganzo analysed a hypothetical data set of 50 MLE (β,ρ) = (.238,.475) with SE=(.144,.316)

The Gibbs Sampler gave(β,ρ) = (.234,.291) with SE = (.0475,.340)

ConclusionMain point of the article is to introduce latent data in the problem

Probit model for binary response can be converted to normal linear model on Z

i data

Allows exact inference for binomial model and easier to use for multinomial setup

Applying this method with Gibbs Sampling involves only standard distributions

Watch out for extra randomness introduced by simulation and convergence

Bayesian Analysis of Binary and Polychotomous Response Data

Documents

Transcript of Bayesian Analysis of Binary and Polychotomous Response Data