Bayesian Analysis of Binary and Polychotomous Response Data
-
Upload
mohit-shukla -
Category
Documents
-
view
28 -
download
4
Transcript of Bayesian Analysis of Binary and Polychotomous Response Data
Presentation on Bayesian Analysis of Binary and Polychotomous Response Data
Author(s): James H. Albert and Siddhartha Chib
By: Mohit Shukla
11435
Course: ECO543A
Introduction
Suppose Yi ~ Bernoulli(p
i)
P =
H(x
iT β)
If H = Standard Guassian => Probit H = Logistic cdf => Logit
Problem: Given π(β) proper/improperDetermining π(β/Data) is hard to deal with.
Introduction
Possible Solutions : For small models numerical integrationFor large models Monte Carlo integration
Solutions proposed:Simulation based approach for computing exact posterior
The Concept
Introduce N latent variables Zi ~ N(x
iT β,1)
Define Yi=1 if Z
i >0 and Y
i=0 if Z
i ≤0
Given the data Yi; Z
i's follow a truncated normal
distributionAbove observation mixed with Gibbs sampling will allow use to simulate from the exact posterior distribution of β
The Concept
This approach connects probit binary regression model on Y
i to normal linear regression model on
Zi
The sampling approach allows us to compute posterior of many parametersThe Bayesian residual is continuous so provides more information about outliers
The Gibbs SamplerInterest: to simulate from posterior of Θ It's easier to simulate for fully conditional distributions π = (Θ
k|{Θ
j, j≠k})
Gibbs Sampler:Initial guesses: Θ(0)
1 Θ(0)
2 Θ(0)
3 Θ(0)
4 .... Θ(0)
p
First cycle:Θ(1)
1 from π = (Θ
1|Θ(0)
2,Θ(0)
3,Θ(0)
4,..,Θ(0)
p)
Θ(1)2 from π = (Θ
2|Θ(1)
1,Θ(0)
3,Θ(0)
4,..,Θ(0)
p)
Θ(1)p from π = (Θ
p|Θ(1)
2,Θ(1)
3,Θ(1)
4,..,Θ(1)
p-1,)
The Gibbs Sampler
Cycle is iterated t times, to get Θ(t)
As t => ∞, Θ(t) => joint distribution of Θ
Replicating this process m times will give Θ(t)
1j,Θ(t)
2j,Θ(t)
3j .... Θ(t)
pj which can be used to
estimate posterior moments and densities
The Gibbs Sampler
DrawbacksSamples for t < t* are discardedafter initiation it may be necessary to repeat the simulation for larger no of replications for accuracy.“One run” solutionOnly one replication is used and cycle 2 is repeated a large no of times.
The Gibbs Sampler
Collect the values starting at the cycle t where Θ(t) ≈ a simulated value from the posterior of ΘMain objective is to collect a large no of values from the joint posterior of ΘStrong correlation b/w Θ(t) and Θ(t+1) so collect values at cycles t, t+n, t+2n
The Gibbs Sampler
Another goal is to estimate the densities of individual functions g(Θ
k)
Use a kernel density estimate to estimate simulated values of g(Θ
k)
E[g(Θk)] can also be estimated
either by sample mean of g(Θki) or by sample
mean of E[g(Θk)|{Θ
ri, r≠k}]
Standard error = sd of batch means /(no of batches)when lag 1 autocorrelation ≤ 0.5
Data Augmentation and Gibbs Sampling for Binary Data
Introduction
Let H = Φ be the probit model.Introduce Z
i's and Y
i's such that p
i = Φ(x
iT β) = P(Y
i
= 1) Joint posterior density of β and Z is complicatedHence Gibbs Sampling Marginal posterior distribution of β is easier this way.
Introduction
The t Link
Generalize Probit link by choosing H as t -distribution This helps in investigating the sensitivity of fitted probability to the choice of link functionMost popular link function for binary data is logitLogistic distribution is a member of t family with approximately 9 df
The t Link
To implement the Gibbs SamplerSet β = LSE under the probit modelSet λ
i = 1 for all i
Cycle through the equations in that order
The t Link
Hierarchical Analysis
Normal regression structure on Z also motivates the normal hierarchical modelZ ~ N(Xβ,I)B ~ N(Aβ0, σ2I)(Β0, σ2) is distributed according to prior π(Β0, σ2)
Generalization to a Multinomial Model
Ordered CategoriesLet p
ij = P[Y
i = j]
Regression modelη
ij = Φ(y
i - x
iT β) i = 1 to N and j = 1 to J-1
MotivationLatent Z
i ~ (x
iT β,1)
1)Yi = j if y
j-1 < Z
i ≤ y
j (y
0 = -∞ and y
J = ∞)
Unordered Categories with Latent Multinomial distribution
Independent unobserved latent variable Zi
where Zi = (Z
i1,Z
i2,Z
i3, ,Z
iJ) for J>2
Zij = x
ijT β + є
ij i = 1 to N and j = 1 to J
where єi = ( є
i1, є
i2, є
i3, є
iJ)T ~ N
j(0,∑)
∑ is a J*J Matrix
Unordered Categories with Latent Multinomial distribution
Here, i:experimental unit index, j:category indexOne of the J possible outcomes is observedcategory j is observed if Z
ij > Z
ik for k≠j
Gibbs Sampling approach can be used for multinomial probabilities
Unordered Categories with Latent Multinomial distribution
or Z = Xβ + є
Unordered Categories with Latent Multinomial distribution
For Gibbs Sampling we require samples fromβ|Y, Z
1 .... Z
N,Θ
Z1 .... Z
N | β,Y, Θ
Θ|Y, Z1 .... Z
N, β
First two distributions are normal but the last one isn't Hence to sample from that we can draw form a normal distribution with matching mode and curvature
Examples
Finney Data
Φ-1(pi) = β
0 + β
1x
1i + β
2x
2i
A uniform prior is placed on βDensity estimated of β
0, β
1 and β
2 are plotted and
it is found that Gibbs Sampling approximations are nearly equal to exact posterior distributions for large sample size
Finney Data
Finney Data
Election Data
To predict the Carter/Ford vote based on six socioeconomic factorsFitted probability for two models can be significantly differentOn the graph probit fitted probability was plotted against logit(t(4)probit.fit)-logit(probit probit.fit)For smaller p the p-fitted probabilities are significantly higher
Election Data
Election Data
To Confirm the connection b/w models, probit and logistic. Posterior analysis is comparedA t(8) r.v. is approximately .634 times a logistic
So a logistic parameter ~ N(0,1) ≈ t(8) parameter ~ N(0,0.4)
Posterior densities of t(8) and Logistic models have been compared using Gibbs Samping
Election Data
Election Data
Using a t link with unknown df ε {4,8,16,32}
The Gibbs sampler (10,000) gave the probabilities of .52, .21, .14, .11
The posterior distribution of probability pi using t
link were noticeably different than using the probit link
Election Data
Election Data
In classical model estimating size of residual yi –
pi = y
i - H(x
iT β) is difficult
In baysian case it takes values in range yi -1, y
i
One can check informally for outliers by checking distributions focused away form 0
Election Data
A Trivial Probit Example
Zik is perceived attractiveness of subject i of the
kth mode of transportation
Xik
denoted the travel time by mode k
Attractiveness of travel time is represented by βModel 1 and 2 are believed to be correlated by covariance ρ
Define wik = Z
ik -x
ik β for k=1,2
A Trivial Probit Example
Daganzo analysed a hypothetical data set of 50 MLE (β,ρ) = (.238,.475) with SE=(.144,.316)
The Gibbs Sampler gave(β,ρ) = (.234,.291) with SE = (.0475,.340)
ConclusionMain point of the article is to introduce latent data in the problem
Probit model for binary response can be converted to normal linear model on Z
i data
Allows exact inference for binomial model and easier to use for multinomial setup
Applying this method with Gibbs Sampling involves only standard distributions
Watch out for extra randomness introduced by simulation and convergence