Fractional hot deck imputation - Jae Kim

download Fractional hot deck imputation - Jae Kim

of 35

  • date post

    09-Aug-2015
  • Category

    Education

  • view

    44
  • download

    2

Embed Size (px)

Transcript of Fractional hot deck imputation - Jae Kim

  1. 1. Fractional hot deck imputation for multivariate missing data in survey sampling Jae Kwang Kim 1 Iowa State University March 18, 2015 1 Joint work with Jongho Im and Wayne Fuller
  2. 2. Introduction Basic Setup Assume simple random sampling, for simplicity. Under complete response, suppose that n,g = n1 n i=1 g(yi ) is an unbiased estimator of g = E{g(Y )}, for known g(). i = 1 if yi is observed and i = 0 otherwise. y i : imputed value for yi for unit i with i = 0. Imputed estimator of g I,g = n1 n i=1 {i g(yi ) + (1 i )g(y i )} Need E {g(y i ) | i = 0} = E {g(yi ) | i = 0}. Kim (ISU) Fractional Imputation March 18, 2015 2 / 35
  3. 3. Introduction ML estimation under missing data setup Often, nd x (always observed) such that Missing at random (MAR) holds: f (y | x, = 0) = f (y | x) Imputed values are created from f (y | x). Computing the conditional expectation can be a challenging problem. 1 Do not know the true parameter in f (y | x) = f (y | x; ): E {g (y) | x} = E {g (yi ) | xi ; } . 2 Even if we know , computing the conditional expectation can be numerically dicult. Kim (ISU) Fractional Imputation March 18, 2015 3 / 35
  4. 4. Introduction Imputation Imputation: Monte Carlo approximation of the conditional expectation (given the observed data). E {g (yi ) | xi } = 1 m m j=1 g y (j) i 1 Bayesian approach: generate y i from f (yi | xi , yobs) = f (yi | xi , ) p( | xi , yobs)d 2 Frequentist approach: generate y i from f yi | xi ; , where is a consistent estimator. Kim (ISU) Fractional Imputation March 18, 2015 4 / 35
  5. 5. Introduction Basic Setup (Contd) Thus, imputation is a computational tool for computing the conditional expectation E{g(yi ) | xi } for missing unit i. To compute the conditional expectation, we need to specify a model f (y | x; ) evaluated at = . Thus, we can write I,g = I,g (). To estimate the variance of I,g , we need to take into account of the sampling variability of in I,g = I,g (). Kim (ISU) Fractional Imputation March 18, 2015 5 / 35
  6. 6. Introduction Basic Setup (Contd) Three approaches Bayesian approach: multiple imputation by Rubin (1978, 1987), Rubin and Schenker (1986), etc. Resampling approach: Rao and Shao (1992), Efron (1994), Rao and Sitter (1995), Shao and Sitter (1996), Kim and Fuller (2004), Fuller and Kim (2005). Linearization approach: Clayton et al (1998), Shao and Steel (1999), Robins and Wang (2000), Kim and Rao (2009). Kim (ISU) Fractional Imputation March 18, 2015 6 / 35
  7. 7. Comparison Bayesian Frequentist Model Posterior distribution Prediction model f (latent, | data) f (latent | data, ) Computation Data augmentation EM algorithm Prediction I-step E-step Parameter update P-step M-step Parameter estn Posterior mode ML estimation Imputation Multiple imputation Fractional imputation Variance estimation Rubins formula Linearization or Bootstrap Kim (ISU) Fractional Imputation March 18, 2015 7 / 35
  8. 8. Multiple imputation The multiple imputation estimator of , denoted by MI , is MI = 1 m m j=1 (j) I Rubins variance estimator is VMI (MI ) = Wm + 1 + 1 m Bm, where WM = m1 m j=1 V (j) and Bm = (m 1)1 m j=1( (j) I MI )2. Kim (ISU) Fractional Imputation March 18, 2015 8 / 35
  9. 9. Multiple imputation Rubins variance estimator is based on the following decomposition, var(MI ) = var(n) + var(MI, n) + var(MI MI,), (1) where n is the complete-sample estimator of and MI, is the probability limit of MI for m . Under some regularity conditions, Wm term estimates the rst term, the Bm term estimates the second term, and the m1Bm term estimates the last term of (1), respectively. Kim (ISU) Fractional Imputation March 18, 2015 9 / 35
  10. 10. Multiple imputation In particular, Kim et al (2006, JRSSB) proved that the bias of Rubins variance estimator is Bias( VMI ) = 2cov(MI n, n). (2) The decomposition (1) is equivalent to assuming that cov(MI n, n) = 0, which is called the congeniality condition by Meng (1994). The congeniality condition holds when n is the MLE of . In such cases, Rubins variance estimator is asymptotically unbiased. Kim (ISU) Fractional Imputation March 18, 2015 10 / 35
  11. 11. Multiple imputation Theorem (Yang and Kim, 2015) Let n = n1 n i=1 g(yi ) be used to estimate = E{g(Y )} under complete response. Then, under some regularity conditions, the bias of Rubins variance estimator is Bias( VMI ) = 2n1 (1 p)E0 var g(Y ) Bg (X)T S() | X 0, with equality if and only if g(Y ) is a linear function of S(), where p = E(), S() is the score function of in f (y | x; ), Bg (X) = [var{S() | X}]1 cov{S(), g(Y ) | X}, and E0() = E( | = 0). Kim (ISU) Fractional Imputation March 18, 2015 11 / 35
  12. 12. Multiple imputation Example Suppose that you are interested in estimating = P(Y 3). Assume a normal model for f (y | x; ) for multiple imputation. Two choices for n: 1 Method-of-moment estimator: n1 = n1 n i=1 I(yi 3). 2 Maximum-likelihood estimator: n2 = n1 n i=1 P(Y 3 | xi ; ), where P(Y 3 | xi ; ) = 3 f (y | xi ; )dy. Rubins variance estimator is nearly unbiased for n2, but provide conservative variance estimation for n1 (30-50% overestimation of the variances in most cases). Kim (ISU) Fractional Imputation March 18, 2015 12 / 35
  13. 13. Fractional Imputation Idea (parametric model approach) Approximate E{g(yi ) | xi } by E{g(yi ) | xi } = Mi j=1 w ij g(y (j) i ) where w ij is the fractional weight assigned to the j-th imputed value of yi . If yi is a categorical variable, we can use y (j) i = the j-th possible value of yi w (j) ij = P(yi = y (j) i | xi ; ), where is the (pseudo) MLE of . Kim (ISU) Fractional Imputation March 18, 2015 13 / 35
  14. 14. Fractional imputation Features Split the record with missing item into m(> 1) imputed values Assign fractional weights The nal product is a single data le with size nm. For variance estimation, the fractional weights are replicated. Kim (ISU) Fractional Imputation March 18, 2015 14 / 35
  15. 15. Fractional imputation Example (n = 10) ID Weight y1 y2 1 w1 y1,1 y1,2 2 w2 y2,1 ? 3 w3 ? y3,2 4 w4 y4,1 y4,2 5 w5 y5,1 y5,2 6 w6 y6,1 y6,2 7 w7 ? y7,2 8 w8 ? ? 9 w9 y9,1 y9,2 10 w10 y10,2 y10,2 ?: Missing Kim (ISU) Fractional Imputation March 18, 2015 15 / 35
  16. 16. Fractional imputation (categorical case) Fractional Imputation Idea If both y1 and y2 are categorical, then fractional imputation is easy to apply. We have only nite number of possible values. Imputed values = possible values The fractional weights are the conditional probabilities of the possible values given the observations. Can use EM by weighting method of Ibrahim (1990) to compute the fractional weights. Kim (ISU) Fractional Imputation March 18, 2015 16 / 35
  17. 17. Fractional imputation (categorical case) Example (y1, y2: dichotomous, taking 0 or 1) ID Weight y1 y2 1 w1 y1,1 y1,2 2 w2w 2,1 y2,1 0 w2w 2,2 y2,1 1 3 w3w 3,1 0 y3,2 w3w 3,2 1 y3,2 4 w4 y4,1 y4,2 5 w5 y5,1 y5,2 Kim (ISU) Fractional Imputation March 18, 2015 17 / 35
  18. 18. Fractional imputation (categorical case) Example (y1, y2: dichotomous, taking 0 or 1) ID Weight y1 y2 6 w6 y6,1 y6,2 7 w7w 7,1 0 y7,2 w7w 7,2 1 y7,2 8 w8w 8,1 0 0 w8w 8,2 0 1 w8w 8,3 1 0 w8w 8,4 1 1 9 w9 y9,1 y9,2 10 w10 y10,1 y10,2 Kim (ISU) Fractional Imputation March 18, 2015 18 / 35
  19. 19. Fractional imputation (categorical case) Example (Contd) E-step: Fractional weights are the conditional probabilities of the imputed values given the observations. w ij = P(y (j) i,mis | yi,obs) = (yi,obs, y (j) i,mis) Mi l=1 (yi,obs, y (l) i,mis) where (yi,obs, yi,mis) is the (observed, missing) part of yi = (yi1, , yi,p). M-step: Update the joint probability using the fractional weights. ab = 1 N n i=1 Mi j=1 wi w ij I(y (j) i,1 = a, y (j) i,2 = b) with N = n i=1 wi . Kim (ISU) Fractional Imputation March 18, 2015 19 / 35
  20. 20. Fractional imputation (categorical case) Example (Contd) Variance estimation Recompute the fractional weights for each replication Apply the same EM algorithm using the replicated weights. E-step: Fractional weights are the conditional probabilities of the imputed values given the observations. w (k) ij = (k) (yi,obs, y (j) i,mis) Mi l=1 (k)(yi,obs, y (l) i,mis) M-step: Update the joint probability using the fractional weights. (k) ab = 1 N(k) n i=1 Mi j=1 w (k) i w (k) ij I(y (j) i,1 = a, y (j) i,2 = b) where N(k) = n i=1 w (k) i . Kim (ISU) Fractional Imputation March 18, 2015 20 / 35
  21. 21. Fractional imputation (categorical case) Example (Contd) Final Product Replication Weights Weight x y1 y2 Rep 1 Rep 2 Rep L w1 x1 y1,1 y1,2 w (1) 1 w (2) 1 w (L) 1 w2w 2,1 x2 y2,2 0 w (1) 2 w (1) 2,1 w (2) 2 w (2) 2,1 w (L) 2 w (L) 2,1 w2w 2,2 x2 y2,2 1 w (1) 2 w (1) 2,2 w (2) 2 w (2) 2,1 w (L) 2 w (L) 2,2 w3w 3,1 x3 0 y3,2 w (1) 3 w (1) 3,1 w (2) 3 w (2) 3,1 w (L) 3 w (L) 3,1 w3w 3,2 x3 1 y3,2 w (1) 3 w (1) 3,2 w (2) 3 w (2) 3,2 w (L) 3 w (L) 3,2 w4 x4 y4,1 y4,2 w (1) 4 w (2) 4 w (L) 4 w5 x5 y5,1 y5,2 w (1) 5 w (2) 5 w (L) 5 w6 x6 y6,1 y6,2 w (1) 6 w (2) 6 w (L) 6 Kim (ISU) Fractional Imputation March 18, 2015 21 / 35
  22. 22. Fractional imputation (categorical case) Example (Contd) Final Product Replication Weights Weight x y1 y2 Rep 1 Rep 2 Rep L w7w 7,1 x7 0 y7,2 w (1) 7 w (1) 7,1 w (2) 7 w (2) 7,1 w (L) 7 w (L) 7,1 w7w 7,2 x7 1 y7,2 w (1) 7 w (1) 7,2 w (2) 7 w (2) 7,2 w (L) 7 w (L) 7,2 w8w 8,1 x8 0 0 w (1) 8 w (1) 8,1 w (2) 8 w (2) 8,1 w (L) 8 w (L) 8,1 w8w 8,2 x8 0 1 w (1) 8 w (1) 8,2 w (2) 8 w (2) 8,2 w (L) 8 w (L) 8,2 w8w 8,3 x8 1 0 w (1) 8 w (1) 8,3 w (2) 8 w (2) 8,3 w (L) 8 w (L) 8,3 w8w 8,4 x8 1 1 w (1) 8 w (1) 8,4 w (2) 8 w (2) 8,4 w (L) 8 w (L) 8,4 w9 x9 y9,1 y9,2 w (1) 9 w (2) 9 w (L) 9 w10 x10 y10,1 y10,2 w (1) 10 w (2) 10 w (L) 10 Kim (ISU) Fractional Imputation March 18, 2015 22 / 35
  23. 23. Fractional hot deck imputation (general cas