Bayesian Analysis of Binary and Polychotomous Response Data

download Bayesian Analysis of Binary and Polychotomous Response Data

If you can't read please download the document

  • date post

    17-Aug-2015
  • Category

    Documents

  • view

    22
  • download

    4

Embed Size (px)

Transcript of Bayesian Analysis of Binary and Polychotomous Response Data

  1. 1. Presentation on Bayesian Analysis of Binary and Polychotomous Response Data Author(s): James H. Albert and Siddhartha Chib By: Mohit Shukla 11435 Course: ECO543A
  2. 2. Introduction Suppose Yi ~ Bernoulli(pi) P = H(xi T ) If H = Standard Guassian => Probit H = Logistic cdf => Logit Problem: Given () proper/improper Determining (/Data) is hard to deal with.
  3. 3. Introduction Possible Solutions : For small models numerical integration For large models Monte Carlo integration Solutions proposed: Simulation based approach for computing exact posterior
  4. 4. The Concept Introduce N latent variables Zi ~ N(xi T ,1) Define Yi=1 if Zi >0 and Yi=0 if Zi 0 Given the data Yi; Zi's follow a truncated normal distribution Above observation mixed with Gibbs sampling will allow use to simulate from the exact posterior distribution of
  5. 5. The Concept This approach connects probit binary regression model on Yi to normal linear regression model on Zi The sampling approach allows us to compute posterior of many parameters The Bayesian residual is continuous so provides more information about outliers
  6. 6. The Gibbs Sampler Interest: to simulate from posterior of It's easier to simulate for fully conditional distributions = (k|{j, jk}) Gibbs Sampler: Initial guesses: (0) 1 (0) 2 (0) 3 (0) 4 .... (0) p First cycle: (1) 1 from = (1|(0) 2,(0) 3,(0) 4,..,(0) p) (1) 2 from = (2|(1) 1,(0) 3,(0) 4,..,(0) p) (1) p from = (p|(1) 2,(1) 3,(1) 4,..,(1) p-1,)
  7. 7. The Gibbs Sampler Cycle is iterated t times, to get (t) As t => , (t) => joint distribution of Replicating this process m times will give (t) 1j,(t) 2j,(t) 3j .... (t) pj which can be used to estimate posterior moments and densities
  8. 8. The Gibbs Sampler Drawbacks Samples for t < t* are discarded after initiation it may be necessary to repeat the simulation for larger no of replications for accuracy. One run solution Only one replication is used and cycle 2 is repeated a large no of times.
  9. 9. The Gibbs Sampler Collect the values starting at the cycle t where (t) a simulated value from the posterior of Main objective is to collect a large no of values from the joint posterior of Strong correlation b/w (t) and (t+1) so collect values at cycles t, t+n, t+2n
  10. 10. The Gibbs Sampler Another goal is to estimate the densities of individual functions g(k) Use a kernel density estimate to estimate simulated values of g(k) E[g(k)] can also be estimated either by sample mean of g(k i) or by sample mean of E[g(k)|{r i, rk}] Standard error = sd of batch means /(no of batches) when lag 1 autocorrelation 0.5
  11. 11. Data Augmentation and Gibbs Sampling for Binary Data
  12. 12. Introduction Let H = be the probit model. Introduce Zi's and Yi's such that pi = (xi T ) = P(Yi = 1) Joint posterior density of and Z is complicated Hence Gibbs Sampling Marginal posterior distribution of is easier this way.
  13. 13. Introduction
  14. 14. The t Link Generalize Probit link by choosing H as t - distribution This helps in investigating the sensitivity of fitted probability to the choice of link function Most popular link function for binary data is logit Logistic distribution is a member of t family with approximately 9 df
  15. 15. The t Link To implement the Gibbs Sampler Set = LSE under the probit model Set i = 1 for all i Cycle through the equations in that order
  16. 16. The t Link
  17. 17. Hierarchical Analysis Normal regression structure on Z also motivates the normal hierarchical model Z ~ N(X,I) B ~ N(A0, 2I) (0, 2) is distributed according to prior (0, 2)
  18. 18. Generalization to a Multinomial Model
  19. 19. Ordered Categories Let pij = P[Yi = j] Regression model ij = (yi - xi T ) i = 1 to N and j = 1 to J-1 Motivation Latent Zi ~ (xi T ,1) 1)Yi = j if yj-1 < Zi yj (y0 = - and yJ = )
  20. 20. Unordered Categories with Latent Multinomial distribution Independent unobserved latent variable Zi where Zi = (Zi1,Zi2,Zi3, ,ZiJ) for J>2 Zij = xij T + ij i = 1 to N and j = 1 to J where i = ( i1, i2, i3, iJ)T ~ Nj(0,) is a J*J Matrix
  21. 21. Unordered Categories with Latent Multinomial distribution Here, i:experimental unit index, j:category index One of the J possible outcomes is observed category j is observed if Zij > Zik for kj Gibbs Sampling approach can be used for multinomial probabilities
  22. 22. Unordered Categories with Latent Multinomial distribution or Z = X +
  23. 23. Unordered Categories with Latent Multinomial distribution For Gibbs Sampling we require samples from |Y, Z1 .... ZN, Z1 .... ZN | ,Y, |Y, Z1 .... ZN, First two distributions are normal but the last one isn't Hence to sample from that we can draw form a normal distribution with matching mode and curvature
  24. 24. Examples
  25. 25. Finney Data -1(pi) = 0 + 1x1i + 2x2i A uniform prior is placed on Density estimated of 0, 1 and 2 are plotted and it is found that Gibbs Sampling approximations are nearly equal to exact posterior distributions for large sample size
  26. 26. Finney Data
  27. 27. Finney Data
  28. 28. Election Data To predict the Carter/Ford vote based on six socioeconomic factors Fitted probability for two models can be significantly different On the graph probit fitted probability was plotted against logit(t(4)probit.fit)-logit(probit probit.fit) For smaller p the p-fitted probabilities are significantly higher
  29. 29. Election Data
  30. 30. Election Data To Confirm the connection b/w models, probit and logistic. Posterior analysis is compared A t(8) r.v. is approximately .634 times a logistic So a logistic parameter ~ N(0,1) t(8) parameter ~ N(0,0.4) Posterior densities of t(8) and Logistic models have been compared using Gibbs Samping
  31. 31. Election Data
  32. 32. Election Data Using a t link with unknown df {4,8,16,32} The Gibbs sampler (10,000) gave the probabilities of .52, .21, .14, .11 The posterior distribution of probability pi using t link were noticeably different than using the probit link
  33. 33. Election Data
  34. 34. Election Data In classical model estimating size of residual yi pi = yi - H(xi T ) is difficult In baysian case it takes values in range yi -1, yi One can check informally for outliers by checking distributions focused away form 0
  35. 35. Election Data
  36. 36. A Trivial Probit Example Zik is perceived attractiveness of subject i of the kth mode of transportation Xik denoted the travel time by mode k Attractiveness of travel time is represented by Model 1 and 2 are believed to be correlated by covariance Define wik = Zik -xik for k=1,2
  37. 37. A Trivial Probit Example Daganzo analysed a hypothetical data set of 50 MLE (,) = (.238,.475) with SE=(.144,.316) The Gibbs Sampler gave (,) = (.234,.291) with SE = (.0475,.340)
  38. 38. Conclusion Main point of the article is to introduce latent data in the problem Probit model for binary response can be converted to normal linear model on Zi data Allows exact inference for binomial model and easier to use for multinomial setup Applying this method with Gibbs Sampling involves only standard distributions Watch out for extra randomness introduced by simulation and convergence