An Introduction to Categorical Data Analysis || Brief Solutions to Some Odd-Numbered Problems

16
Brief Solutions to Some Odd-Numbered Problems CHAPTER 1 1. Response variables: a. Attitude toward gun control, b. Heart disease, c. Vote for President, d. Quality of life. 3. a. Binomial, n = 100, π = 0.25. b. μ = = 25 and σ = [nπ(1 π)]= 4.33. 50 correct responses is surprising, since 50 is z = (50 25)/4.33 = 5.8 standard deviations above mean. 7. a. (5/6) 6 . b. Note Y = y when y 1 successes and then a failure. 9. a. Let π = population proportion obtaining greater relief with new analgesic. For H 0 : π = 0.50, z = 2.00, P -value = 0.046. b. Wald CI is (0.504, 0.696), score CI is (0.502, 0.691). 11. 0.86 ± 1.96(0.0102), or (0.84, 0.88). 13. a. (1 π 0 ) 25 is binomial probability of y = 0 in n = 25 trials. b. The maximum of (1 π) 25 occurs at π = 0.0. c. 2 log( 0 / 1 ) =−2 log[(0.50) 25 /1.0]= 34.7, P -value < 0.0001. An Introduction to Categorical Data Analysis, Second Edition. By Alan Agresti Copyright © 2007 John Wiley & Sons, Inc. 357

Transcript of An Introduction to Categorical Data Analysis || Brief Solutions to Some Odd-Numbered Problems

Page 1: An Introduction to Categorical Data Analysis || Brief Solutions to Some Odd-Numbered Problems

“bansw” — 2007/1/29 — 15:39 — page 357 — #1�

Brief Solutions to SomeOdd-Numbered Problems

CHAPTER 1

1. Response variables:

a. Attitude toward gun control,

b. Heart disease,

c. Vote for President,

d. Quality of life.

3. a. Binomial, n = 100, π = 0.25.

b. μ = nπ = 25 and σ = √[nπ(1 − π)] = 4.33. 50 correct responses issurprising, since 50 is z = (50 − 25)/4.33 = 5.8 standard deviationsabove mean.

7. a. (5/6)6.

b. Note Y = y when y − 1 successes and then a failure.

9. a. Let π = population proportion obtaining greater relief with new analgesic.For H0: π = 0.50, z = 2.00, P -value = 0.046.

b. Wald CI is (0.504, 0.696), score CI is (0.502, 0.691).

11. 0.86 ± 1.96(0.0102), or (0.84, 0.88).

13. a. (1 − π0)25 is binomial probability of y = 0 in n = 25 trials.

b. The maximum of (1 − π)25 occurs at π = 0.0.

c. −2 log(�0/�1) = −2 log[(0.50)25/1.0] = 34.7, P -value < 0.0001.

An Introduction to Categorical Data Analysis, Second Edition. By Alan AgrestiCopyright © 2007 John Wiley & Sons, Inc.

357

Page 2: An Introduction to Categorical Data Analysis || Brief Solutions to Some Odd-Numbered Problems

“bansw” — 2007/1/29 — 15:39 — page 358 — #2�

358 BRIEF SOLUTIONS TO SOME ODD-NUMBERED PROBLEMS

d. −2 log(�0/�1) = −2 log[(0.926)25/1.0] = 3.84. With df = 1, chi-squaredP -value = 0.05.

15. a. σ(p) equals binomial standard deviation√

nπ(1 − π) divided by samplesize n.

b. σ(p) takes maximum at π = 0.50 and minimum at π = 0 and 1.

17. a. Smallest possible P -value is 0.08, so never reject H0 and therefore nevercommit a type I error.

b. If T = 9, mid-P value = 0.08/2 = 0.04, so reject H0. Probability of thishappening is P(T = 9) = 0.08 = P (type I error).

c. (a) P(type I error) = 0.04, (b) P(type I error) = 0.04. Mid-P test can haveP (type I error) either below 0.05 (conservative) or above 0.05 (liberal).

CHAPTER 2

1. a. P(−|C) = 1/4, P(C̄|+) = 2/3.

b. Sensitivity = P(+|C) = 1 − P(−|C) = 3/4.

c. P(C, +) = 0.0075, P(C, −) = 0.0025, P(C̄, +) = 0.0150, P(C̄, −) =0.9750.

d. P(+) = 0.0225, P (−) = 0.9775.

e. 1/3.

3. a. (i) 0.000061, (ii) 62.4/1.3 = 48. (b) Relative risk.

5. a. Relative risk.

b. (i) π1 = 0.55π2, so π1/π2 = 0.55. (ii) 1/0.55 = 1.82.

7. a. Quoted interpretation is that of relative risk.

b. Proportion = 0.744 for females, 0.203 for males.

c. R = 0.744/0.203 = 3.7.

9. (Odds for high smokers)/(Odds for low smokers) = 26.1/11.7.

11. a. Relative risk: Lung cancer, 14.00; Heart disease, 1.62.Difference of proportions: Lung cancer, 0.00130; Heart disease, 0.00256.Odds ratio: Lung cancer, 14.02; Heart disease, 1.62.

b. Difference of proportions describes excess deaths due to smoking. If N =number of smokers in population, predict 0.00130N fewer deaths per yearfrom lung cancer if they had never smoked, and 0.00256N fewer deaths peryear from heart disease.

13. a. π̂1 = 0.814, π̂2 = 0.793. CI is 0.0216 ± 1.645(0.024), or (−0.018, 0.061).

Page 3: An Introduction to Categorical Data Analysis || Brief Solutions to Some Odd-Numbered Problems

“bansw” — 2007/1/29 — 15:39 — page 359 — #3�

CHAPTER 2 359

b. CI for log odds ratio 0.137 ± 1.645(0.1507), so CI for odds ratio is(0.89, 1.47).

c. X2 = 0.8, df = 1, P -value = 0.36.

15. log(0.0171/0.0094) ± 1.96√

(0.0052 + 0.0095) is (0.360, 0.835), which trans-lates to (1.43, 2.30) for relative risk.

17. a. X2 = 25.0, df = 1, P < 0.0001.

b. G2 = 25.4, df = 1.

19. a. G2 = 187.6, X2 = 167.8, df = 2 (P < 0.0001).

b. Standardized residuals of −11.85 for white Democrats and −11.77 for blackRepublicans show extremely strong evidence of fewer people in these cellsthan if party ID were independent of race. Standardized residuals of 11.85for black Democrats and 11.77 for white Republicans show extremely strongevidence of more people in these cells than expected.

c. G2 = 24.1 for comparing races on (Democrat, Independent) choice, andG2 = 163.5 for comparing races on (Dem. + Indep., Republican) choice.

21. a. No, samples in different columns are dependent, because subjects can selectas many columns as they wish.

b.

AGender Yes No

Men 60 40Women 75 25

23. Extremely strong evidence of association. Strong evidence of tendency for thosewith less than high school education to be fundamentalist, and those with bachelordegree or higher to be liberal in religious beliefs.

25. a. Total of estimated expected frequencies in row i equals

∑j (ni+n+j /n) = (ni+/n)

∑j n+j = ni+

b. Odds ratio = (n1+n+1/n)(n2+n+2/n)/(n1+n+2/n)(n2+n+1/n) = 1.

27. a. X2 = 8.9, df = 6, P = 0.18; nominal test with ordinal data.

b. Aspirations tend to be higher when family income is higher.

c. Ordinal test gives M2 = 4.75, df = 1, P = 0.03.

29. Table has entries (7, 8) in row 1 and (0, 15) in row 2. P = 0.003.

Page 4: An Introduction to Categorical Data Analysis || Brief Solutions to Some Odd-Numbered Problems

“bansw” — 2007/1/29 — 15:39 — page 360 — #4�

360 BRIEF SOLUTIONS TO SOME ODD-NUMBERED PROBLEMS

31. a. P -value = 0.638.

b. 0.243.

33. b. 0.67 for white victims and 0.79 for black victims.

c. 1.18; yes.

35. Age distribution is relatively higher in Maine.

37. a. 0.18 for males and 0.32 for females.

b. 0.21.

39. (a) T, (b) T, (c) F, (d) T, (e) F.

CHAPTER 3

3. a. π̂ = 0.00255 + 0.00109(alcohol).

b. Estimated probability of malformation increases from 0.00255 at x = 0 to0.01018 at x = 7. Relative risk = 0.01018/0.00255 = 4.0.

5. Fit of linear probability model is (i) 0.018 + 0.018(snoring), (ii) 0.018 +0.036(snoring), (iii) −0.019 + 0.036(snoring). Slope depends on distancebetween scores; doubling distance halves slope estimate. Fitted values areidentical for any linear transformation.

7. a. π̂ = −0.145 + 0.323(weight); at weight = 5.2, π̂ = 1.53, much higher thanupper bound of 1.0 for a probability.

c. logit(π̂) = −3.695 + 1.815(weight); at 5.2 kg, predicted logit = 5.74, andlog(0.9968/0.0032) = 5.74.

9. a. logit(π̂) = −3.556 + 0.0532x.

11. b. log(μ̂) = 1.609 + 0.588x. exp(β̂) = μ̂B/μ̂A = 1.80.

c. Wald z = 3.33, z2 = 11.1 (df = 1), P < 0.001. LR statistic = 11.6 withdf = 1, P < 0.001; higher defect rate for B.

d. Exponentiate 95% CI for β of 0.588 ± 1.96(0.176) to get (1.27, 2.54).

13. a. log(μ̂) = −0.428 + 0.589(weight).

b. 2.74.

c. 0.589 ± 1.96(0.065) = (0.462, 0.717); CI for multiplicative effect on meanis (1.59, 2.05).

d. z2 = (0.589/0.065)2 = 82.2.

e. LR statistic = 71.9, df = 1.

Page 5: An Introduction to Categorical Data Analysis || Brief Solutions to Some Odd-Numbered Problems

“bansw” — 2007/1/29 — 15:39 — page 361 — #5�

CHAPTER 4 361

15. a. exp(−2.38 + 1.733) = 0.522 for blacks and exp(−2.38) = 0.092 for whites.

b. Exponentiate endpoints of 1.733 ± 1.96(0.147), which gives (e1.44, e2.02).

c. CI based on negative binomial model, because overdispersion for Poissonmodel.

d. Poisson is a special case of negative binomial with dispersion parameter = 0.Here, there is strong evidence that dispersion parameter > 0, because theestimated dispersion parameter is almost 5 standard errors above 0.

17. CI for log rate is 2.549 ± 1.96(0.04495), so CI for rate is (11.7, 14.0).

19. a. Difference between deviances = 11.6, with df = 1, gives strong evidencePoisson model with constant rate inadequate.

b. z = β̂/SE = −0.0337/0.0130 = −2.6 (or z2 = 6.7 with df = 1).

c. [exp(−0.060), exp(−0.008)], or (0.94, 0.99), quite narrow around pointestimate of e−0.0337 = 0.967.

21. μ = αt + β(tx), form of GLM with identity link, predictors t and tx, no interceptterm.

CHAPTER 4

1. a. π̂ = 0.068.

b. π̂ = 0.50 at −α̂/β̂ = 3.7771/0.1449 = 26.

c. At LI = 8, π̂ = 0.068, rate of change = 0.1449(0.068)(0.932) = 0.009.

d. eβ̂ = e0.1449 = 1.16.

3. a. Proportion of complete games estimated to decrease by 0.07 per decade.

b. At x = 12, π̂ = −0.075, an impossible value.

c. At x = 12, logit(π̂) = −2.636, and π̂ = 0.067.

5. a. logit(π̂) = 15.043 − 0.232x.

b. At temperature = 31, π̂ = 0.9996.

c. π̂ = 0.50 at x = 64.8 and π̂ > 0.50 at x < 64.8. At x = 64.8, π̂ decreasesat rate 0.058.

d. Estimated odds of thermal distress multiply by exp(−0.232) = 0.79 for each1◦ increase in temperature.

e. Wald statistic z2 = 4.6 (P = 0.03) and LR statistic = 7.95 (df = 1, P =0.005).

7. a. logit(π̂) = −0.573 + 0.0043(age). LR statistic = 0.55, Wald statistic =0.54, df = 1; no evidence of age effect.

Page 6: An Introduction to Categorical Data Analysis || Brief Solutions to Some Odd-Numbered Problems

“bansw” — 2007/1/29 — 15:39 — page 362 — #6�

362 BRIEF SOLUTIONS TO SOME ODD-NUMBERED PROBLEMS

b. Age values more disperse when kyphosis absent.

c. logit(π̂) = −3.035 + 0.0558(age) − 0.0003(age)2. LR statistic for (age)2

term equals 6.3 (df = 1), showing strong evidence of effect.

9. a. logit(π̂) = −0.76 + 1.86c1 + 1.74c2 + 1.13c3. The estimated odds amedium-light crab has a satellite are e1.86 = 6.4 times estimated odds a darkcrab has a satellite.

b. LR statistic = 13.7, df = 3, P -value = 0.003.

c. For color scores 1,2,3,4, logit(π̂) = 2.36 − 0.71c.

d. LR statistic = 12.5, df = 1, P -value = 0.0004.

e. Power advantage of focusing test on df = 1. But, may not be linear trend forcolor effect.

11. Odds ratio for spouse vs others = 2.02/1.71 = 1.18; odds ratio for $10,000 −24,999 vs $25,000 + equal 0.72/0.41 = 1.76.

13. a. Chi-squared with df = 1, so P -value = 0.008.

b. Observed count = 0, expected count = 1.1.

15. a. CMH statistic = 7.815, P -value = 0.005.

b. Test β = 0 in model, logit(π) = α + βx + βDi , where x = race. ML fit

(when x = 1 for white and 0 for black) has β̂ = 0.791, with SE = 0.285.Wald statistic = 7.69, P -value = 0.006.

c. Model gives information about size of effect. Estimated odds ratio betweenpromotion and race, controlling for district, is exp(0.791) = 2.2.

17. a. e−2.83/(1 + e−2.83) = 0.056.

b. e0.5805 = 1.79.

c. (e0.159, e1.008) = (1.17, 2.74).

d. 1/1.79 = 0.56, CI is (1/2.74, 1/1.17) = (0.36, 0.85).

e. H0: β1 = 0, Ha : β1 �= 0, LR statistic = 7.28, df = 1, P -value = 0.007.

19. a. exp(β̂G1 − β̂G

2 ) = 1.17.

b. (i) 0.27, (ii) 0.88.

c. β̂G1 = 0.16, estimated odds ratio = exp(0.16) = 1.17.

d. β̂G1 = 0.08, β̂G

2 = −0.08.

21. a. Odds of obtaining condoms for educated group estimated to be 4.04 timesodds for noneducated group.

b. logit(π̂) = α̂ + 1.40x1 + 0.32x2 + 1.76x3 + 1.17x4, where x1 = 1 for edu-cated and 0 for noneducated, x2 = 1 for males and 0 for females, x3 = 1 forhigh SES and 0 for low SES, and x4 = lifetime number of partners. Log odds

Page 7: An Introduction to Categorical Data Analysis || Brief Solutions to Some Odd-Numbered Problems

“bansw” — 2007/1/29 — 15:39 — page 363 — #7�

CHAPTER 5 363

ratio = 1.40 has CI (0.16, 2.63). CI is 1.40 ± 1.96(SE), so CI has width3.92(SE), and SE = 0.63.

c. CI corresponds to one for log odds ratio of (0.207, 2.556); 1.38 is the mid-point of CI, suggesting it may be estimated log odds ratio, in which caseexp(1.38) = 3.98 = estimated odds ratio.

23. a. R = 1: logit(π̂) = −6.7 + 0.1A + 1.4S. R = 0: logit(π̂) = −7.0 + 0.1A+1.2S. YS conditional odds ratio = exp(1.4) = 4.1 for blacks and exp(1.2) =3.3 for whites. Coefficient of cross-product term, 0.22, is difference betweenlog odds ratios 1.4 and 1.2.

b. The coefficient of S of 1.2 is log odds ratio between Y and S when R = 0(whites), in which case RS interaction does not enter equation. P -value ofP < 0.01 for smoking represents result of test that log odds ratio between Y

and S for whites = 0.

25. a. Derive the four equations from overall equation

logit(π̂) = −5.854 + 4.101c1 − 4.186c2 − 15.66c3 + 0.200x

− 0.094(c1 × x) + 0.218(c2 × x) + 0.658(c3 × x)

b. LR statistic = 4.4 (df = 3), P = 0.22.

27. a. −0.41 and 0.97 are coefficients for standardized versions of predictors forwhich standard deviation is 1.0.

b. For c = 4 (dark crabs), logit(π̂) = −12.11 + 0.458x. Estimated probabilitychanges from 0.33 to 0.64 when x changes from 24.9 to 27.7.

29. For main effects model, estimated conditional odds ratios = 3.7 for race and 1.9for gender.

31. Model with main effects has estimated conditional odds ratios 17.3 betweenmarijuana use and cigarette use and 19.8 between marijuana use and alcohol use.

35. a. Exponential term maximized when exponent equals 0, which is whenx = −α/β.

b. 24.8.

c. 0.40(0.302) = 0.12.

37. (a) T, (b) F, (c) T, (d) F, (e) T.

CHAPTER 5

1. a. logit(π̂) = −9.35 + 0.834(weight) + 0.307(width).

Page 8: An Introduction to Categorical Data Analysis || Brief Solutions to Some Odd-Numbered Problems

“bansw” — 2007/1/29 — 15:39 — page 364 — #8�

364 BRIEF SOLUTIONS TO SOME ODD-NUMBERED PROBLEMS

b. LR statistic = 32.9 (df = 2), P < 0.0001.

c. Wald statistics = 1.55 and 2.85 (df = 1), for P -values 0.21 and 0.09.Predictors are highly correlated (Pearson correlation = 0.887), so problemof multicollinearity.

3. a. Test statistic = 3.2 (df = 3). Yes, can remove it.

b. Change in deviance is smallest, 0.0 on df = 2, when remove S∗W term.

c. Take out C∗W term, as model W + C∗S has larger P -value.

d. Yes, change in deviance = 9.0 (df = 6), which has P -value = 0.17.

e. Model C + S + W has smallest AIC.

5. Model with only four main effect terms has smallest AIC.

7. a. One intercept term, four main effect terms, six two-factor interaction terms,and four three-factor interaction terms, so numbers of parameters in modelsare 1, 1 + 4 = 5, 1 + 4 + 6 = 11, 1 + 4 + 6 + 4 = 15.

b. AIC values are 1130.23 + 2(1) = 1132.23, 1124.86 + 2(5) = 1134.86,1119.87 + 2(11) = 1141.87, 1116.47 + 2(15) = 1146.47. Best model hasintercept only.

c. No; for example, expect c around 0.50 just by chance.

9. a. No, deviance can check fit only for categorical predictors.

b. LR statistic for testing that parameter for quadratic term is zero equals 3.9,with df = 1. P -value is about 0.05.

c. Derivative of linear predictor with respect to LI is 0.9625 − 2(0.016)LI,which is >0 when LI < 0.9625/0.032 = 30.1. So, π̂ increases as LI increasesup to about 30.

d. Simpler model with linear effect on logit seems adequate.

11. Model seems adequate. A reference for this type of approach is the article byA. Tsiatis (Biometrika, 67: 250–251, 1980).

15. Logit model with additive factor effects has G2 = 0.1 and X2 = 0.1, df = 2.Estimated odds of females still being missing are exp(0.38) = 1.46 times thosefor males, given age. Estimated odds considerably higher for those aged at least19 than for other age groups, given gender.

17. a. For death penalty response with main effect for predictors, G2 = 0.38, df = 1,P = 0.54. Model fits adequately.

b. Each standardized residual is 0.44 in absolute value, showing no lack of fit.

c. Estimated conditional odds ratio = exp(−0.868) = 0.42 for defendant’s raceand exp(2.404) = 11.1 for victims’ race.

Page 9: An Introduction to Categorical Data Analysis || Brief Solutions to Some Odd-Numbered Problems

“bansw” — 2007/1/29 — 15:39 — page 365 — #9�

CHAPTER 6 365

19. a. logit(π) = α + β1d1 + · · · + β6d6, where di = 1 for department i and di = 0otherwise.

b. Model fits poorly.

c. Only lack of fit in Department 1, where more females were admitted thanexpected if the model lacking gender effect truly holds.

d. −4.15, so fewer males admitted than expected if model lacking gender effecttruly holds.

e. Males apply in relatively greater numbers to departments that have relativelyhigher proportions of acceptances.

27. zα/2 = 2.576, zβ = 1.645, and n1 = n2 = 214.

29. logit(π̂) = −12.351 + 0.497x. Probability at x = 26.3 is 0.674; probability atx = 28.4 (i.e., one standard deviation above mean) is 0.854. Odds ratio is 2.83,so λ = 1.04, δ = 5.1. Then n = 75.

CHAPTER 6

1. a. log(π̂R/π̂D) = −2.3 + 0.5x. Estimated odds of preferring Republicans overDemocrats increase by 65% for every $10,000 increase.

b. π̂R > π̂D when annual income >$46,000.

c. π̂I = 1/[1 + exp(3.3 − 0.2x) + exp(1 + 0.3x)].

3. a. SE values in parentheses

Logit Intercept Size ≤ 2.3 Hancock Oklawaha Trafford

log(πI /πF ) −1.55 1.46(0.40) −1.66(0.61) 0.94(0.47) 1.12(0.49)log(πR/πF ) −3.31 −0.35(0.58) 1.24(1.19) 2.46(1.12) 2.94(1.12)log(πB/πF ) −2.09 −0.63(0.64) 0.70(0.78) −0.65(1.20) 1.09(0.84)log(πO/πF ) −1.90 0.33(0.45) 0.83(0.56) 0.01(0.78) 1.52(0.62)

5. a. Job satisfaction tends to increase at higher x1 and lower x2 and x3.

b. x1 = 4 and x2 = x3 = 1.

7. a. Two cumulative probabilities to model and hence 2 intercept parameters. Pro-portional odds have same predictor effects for each cumulative probability,so only one effect reported for income.

b. Estimated odds of being at low end of scale (less happy) decrease as incomeincreases.

c. LR statistic = 0.89 with df = 1, and P -value = 0.35. It is plausible thatincome has no effect on marital happiness.

Page 10: An Introduction to Categorical Data Analysis || Brief Solutions to Some Odd-Numbered Problems

“bansw” — 2007/1/29 — 15:39 — page 366 — #10�

366 BRIEF SOLUTIONS TO SOME ODD-NUMBERED PROBLEMS

d. Deviance = 3.25, df = 3, P -value = 0.36, so model fits adequately.

e. 1 − cumulative probability for category 2, which is 0.61.

9. a. There are four nondegenerate cumulative probabilities. When all predictorvalues equal 0, cumulative probabilities increase across categories, so logitsincrease, as do parameters that specify logits.

b. (i) Religion = none, (ii) Religion = Protestant.

c. For Protestant, 0.09. For None, 0.26.

d. (i) e−1.27 = 0.28; that is, estimated odds that Protestant falls in relativelymore liberal categories (rather than more conservative categories) is 0.28 timesestimated odds for someone with no religious preference. (ii) Estimated oddsratio comparing Protestants to Catholics is 0.95.

11. a. β̂ = −0.0444 (SE = 0.0190) suggests probability of having relatively lesssatisfaction decreases as income increases.

b. β̂ = −0.0435, very little change. If model holds for underlying logistic latentvariable, model holds with same effect value for every way of definingoutcome categories.

c. Gender estimate of −0.0256 has SE = 0.4344 and Wald statistic = 0.003(df = 1), so can be dropped.

13. a. Income effect of 0.389 (SE = 0.155) indicates estimated odds of higher ofany two adjacent job satisfaction categories increases as income increases.

b. Estimated income effects are −1.56 for outcome categories 1 and 4, −0.64for outcome categories 2 and 4, and −0.40 for categories 3 and 4.

c. (a) Treats job satisfaction as ordinal whereas (b) treats job satisfaction asnominal. Ordinal model is more parsimonious and simpler to interpret,because it has one income effect rather than three.

17. Cumulative logit model with main effects of gender, location, and seat-belthas estimates 0.545, −0.773, and 0.824; for example, for those wearing a seatbelt, estimated odds that the response is below any particular level of injury aree0.824 = 2.3 times the estimated odds for those not wearing seat belts.

21. For cumulative logit model of proportional odds form with Y = happiness andx = marital status (1 = married, 0 = divorced), β̂ = −1.076 (SE = 0.116). Themodel fits well (e.g., deviance = 0.29 with df = 1).

CHAPTER 7

1. a. G2 = 0.82, X2 = 0.82, df = 1.

b. λ̂Y1 = 1.416, λ̂Y

2 = 0. Given gender, estimated odds of belief in afterlife equale1.416 = 4.1.

Page 11: An Introduction to Categorical Data Analysis || Brief Solutions to Some Odd-Numbered Problems

“bansw” — 2007/1/29 — 15:39 — page 367 — #11�

CHAPTER 7 367

3. a. G2 = 0.48, df = 1, fit is adequate.

b. 2.06 for PB association, 4.72 for PH association, 1.60 for BH association.

c. H0 model is (PH, BH). Test statistic = 4.64, df = 1, P -value = 0.03.

d. exp[0.721 ± 1.96(0.354)] = (1.03, 4.1).

5. a. 0.42.

b. 1.45.

c. G2 = 0.38, df = 1, P = 0.54, model fits adequately.

d. Logit model with main effects of defendant race and victim race, usingindicator variable for each.

7. a. Difference in deviances = 2.21, df = 2; simpler model adequate.

b. exp(−1.507, −0.938) = (0.22, 0.39).

c. e1.220 = 3.39, exp(0.938, 1.507) is (1/0.39, 1/0.22), which is (2.55, 4.51).

9. a. Estimated odds ratios are 0.9 for conditional and 1.8 for marginal. Men applyin greater numbers to departments (1, 2) having relatively high admissionsrates and women apply in greater numbers to departments (3, 4, 5, 6) havingrelatively low admissions rates.

b. Deviance G2 = 20.2 (df = 5), poor fit. Standardized residuals show lack offit only for Department 1.

c. G2 = 2.56, df = 4, good fit.

d. Logit model with main effects for department and gender has estimated con-ditional odds ratio = 1.03 between gender and admissions. Model deletinggender term fits essentially as well, with G2 = 2.68 (df = 5); plausible thatadmissions and gender are conditionally independent for these departments.

11. a. Injury has estimated conditional odds ratios 0.58 with gender, 2.13 with loca-tion, and 0.44 with seat-belt use. Since no interaction, overall most likely casefor injury is females not wearing seat belts in rural locations.

13. a. G2 = 31.7, df = 48.

b. log(μ11clμ33cl/μ13clμ31cl)

= log(μ11cl) + log(μ33cl) − log(μ13cl) − log(μ31cl)

Substitute model formula, and simplify. Estimated odds ratio = exp(2.142) =8.5. The 95% CI is exp[2.142 ± 1.96(0.523)], or (3.1, 24.4).

c. 2.4 for C and L, 6.5 for H and L, 0.8 for C and H , 0.9 for E and L, 3.3 for C

and E. Associations seem strongest between E and H and between H and L.

17. Logistic model more appropriate when one variable a response and othersare explanatory. Loglinear model may be more appropriate when at least twovariables are response variables.

Page 12: An Introduction to Categorical Data Analysis || Brief Solutions to Some Odd-Numbered Problems

“bansw” — 2007/1/29 — 15:39 — page 368 — #12�

368 BRIEF SOLUTIONS TO SOME ODD-NUMBERED PROBLEMS

19. b. The λXY term is not in the model, so X and Y are conditionally independent.All terms in the saturated model that are not in model (WXZ, WYZ) involveX and Y , and so permit XY conditional association.

21. a. G2 = 31.7, df = 48. The model with three-factor terms has G2 = 8.5, df =16; the change is 23.1, df = 32, not a significant improvement.

b. (ii) For the result at the beginning of Section 7.4.4, identify set B = {E, L}and sets A and C each to be one of other variables.

23. (a) No.

(b) Yes; in the result in Section 7.4.4, take A = {Y }, B = {X1, X2}, C = {X3}.

25. a. Take β = 0.

b. LR statistic comparing this to model (XZ, YZ).

d. No, this is a heterogeneous linear-by-linear association model. The XYodds ratios vary according to the level of Z, and there is no longer homoge-neous association. For scores {ui = i} and {vj = j}, local odds ratio equalsexp(βk).

27. (a) T, (b) F, (c) T.

CHAPTER 8

1. z = 2.88, two-sided P -value = 0.004; there is strong evidence that MI cases aremore likely than MI controls to have diabetes.

3. a. Population odds of belief in heaven estimated to be 2.02 times populationodds of belief in hell.

b. For each subject, odds of belief in heaven estimated to equal 62.5 times oddsof belief in hell.

5. a. This is probability, under H0, of observed or more extreme result, with moreextreme defined in direction specified by Ha .

b. Mid P -value includes only half observed probability, added to probability ofmore extreme results.

c. When binomial parameter = 0.50, binomial is symmetric, so two-sidedP -value = 2(one-sided P -value) in (a) and (b).

7. 0.022 ± 0.038, or (−0.016, 0.060), wider than for dependent samples.

9. β̂ = log(132/107) = 0.21.

Page 13: An Introduction to Categorical Data Analysis || Brief Solutions to Some Odd-Numbered Problems

“bansw” — 2007/1/29 — 15:39 — page 369 — #13�

CHAPTER 8 369

11. 95% CI for β is log(132/107) ± 1.96√

1/132 + 1/107, which is (−0.045,0.465). The corresponding CI for odds ratio is (0.96, 1.59).

13. a. More moves from (2) to (1), (1) to (4), (2) to (4) than if symmetry truly held.

b. Quasi-symmetry model fits well.

c. Difference between deviances = 148.3, with df = 3. P -value < 0.0001 forH0: marginal homogeneity.

15. a. Subjects tend to respond more in always wrong direction for extramarital sex.

b. z = −4.91/0.45 = −10.9, extremely strong evidence against H0.

c. Symmetry fits very poorly but quasi symmetry fits well. The differenceof deviances = 400.8, df = 3, gives extremely strong evidence againstH0: marginal homogeneity (P -value < 0.0001).

d. Also fits well, not significantly worse than ordinary quasi symmetry. Thedifference of deviances = 400.1, df = 1, gives extremely strong evidenceagainst marginal homogeneity.

e. From model formula in Section 8.4.5, for each pair of categories, amore favorable response is much more likely for premarital sex thanextramarital sex.

19. G2 = 4167.6 for independence model (df = 9), G2 = 9.7 for quasi-independence (df = 5). QI model fits cells on main diagonal perfectly.

21. G2 = 13.8, df = 11; fitted odds ratio = 1.0. Conditional on change in brand,new brand plausibly independent of old brand.

23. a. G2 = 4.3, df = 3; prestige ranking: 1. JRSS-B, 2. Biometrika, 3. JASA, 4.Commun. Statist.

25. a. e1.45 − 0.19/(1 + e1.45 − 0.19) = 0.78.

b. Extremely strong evidence (P -value < 0.0001) of at least one differenceamong {βi}. Players do not all have same probability of winning.

27. a. log(πij /πji) = log(μij /μji) = (λXi − λY

i ) − (λXj − λY

j ). Take βi = (λXi −

λYi ).

b. Under this constraint, μij = μji .

c. Under this constraint, model adds to independence model a term for each cellon main diagonal.

Page 14: An Introduction to Categorical Data Analysis || Brief Solutions to Some Odd-Numbered Problems

“bansw” — 2007/1/29 — 15:39 — page 370 — #14�

370 BRIEF SOLUTIONS TO SOME ODD-NUMBERED PROBLEMS

CHAPTER 9

1. a. Sample proportion yes = 0.86 for A, 0.66 for C, and 0.42 for M.

b. logit[P(Yt = 1)] = β1z1 + β2z2 + β3z3, where t = 1, 2, 3 refers to A, C,M, and z1 = 1 if t = 1, z2 = 1 if t = 2, z3 = 1 if t = 3 (0 otherwise);for example, eβ1 is the odds of using alcohol. Marginal homogeneity isβ1 = β2 = β3.

3. a. Marijuana: For S1 = S2 = 0, the linear predictor takes greatest value whenR = 1 and G = 0 (white males). For alcohol, S1 = 1, S2 = 0, the linearpredictor takes greatest value when R = 1 and G = 1 (white females).

b. Estimated odds for white subjects exp(0.38) = 1.46 times estimated odds forblack subjects.

c. For alcohol, estimated odds ratio = exp(−0.20 + 0.37) = 1.19; forcigarettes, exp(−0.20 + 0.22) = 1.02; for marijuana, exp(−0.20) = 0.82.

d. Estimated odds ratio = exp(1.93 + 0.37) = 9.97.

e. Estimated odds ratio = exp(1.93) = 6.89.

7. a. Subjects can select any number of sources, so a given subject could haveanywhere from zero to five observations in the table. Multinomial distributiondoes not apply to these 40 cells.

b. Estimated correlation is weak, so results not much different from treatingfive responses by a subject as if from five independent subjects. For source Athe estimated size effect is 1.08 and highly significant (Wald statistic = 6.46,df = 1, P < 0.0001). For sources C, D, and E size effect estimates are allroughly −0.2.

11. With constraint β4 = 0, ML estimates of item parameters {βj } are (−0.551,−0.603, −0.486, 0). The first three estimates have absolute values greater thanfive standard errors, providing strong evidence of greater support for increasedgovernment spending on education than other items.

13. logit[P̂ (Yt = 1)] = 1.37 + 1.148yt−1 + 1.945yt−2 + 0.174s − 0.437t . So, yt−2does have predictive power.

b. Given previous responses and child’s age, estimated effect of maternal smok-ing weaker than when use only previous response as predictor, but stillpositive. LR statistic for testing maternal smoking effect is 0.72 (df = 1,P = 0.40).

17. Independent conditional on Yt−1, but not independent marginally.

Page 15: An Introduction to Categorical Data Analysis || Brief Solutions to Some Odd-Numbered Problems

“bansw” — 2007/1/29 — 15:39 — page 371 — #15�

CHAPTER 10 371

CHAPTER 10

1. a. Using PROC NLMIXED in SAS, (β̂, SE, σ̂ , SE) = (4.135, 0.713, 10.199,

1.792) for 1000 quadrature points.

b. For given subject, estimated odds of belief in heaven are exp(4.135) = 62.5times estimated odds of belief in hell.

c. β̂ = log(125/2) = 4.135 with SE = √(1/125) + (1/2) = 0.713.

3. a. (i) 0.038, (ii) 0.020, (iii) 0.070.

b. Sample size may be small in each county, and GLMM analysis borrows fromwhole.

5. a. 0.4, 0.8, 0.2, 0.6, 0.6, 1.0, 0.8, 0.4, 0.6, 0.2.

b. logit(πi) = ui + α. ML estimates α̂ = 0.259 and σ̂ = 0.557. For averagecoin, estimated probability of head = 0.56.

c. Using PROC NLMIXED in SAS, predicted values are 0.52, 0.63, 0.46, 0.57,0.57, 0.68, 0.63, 0.52, 0.57, 0.46.

7. a. Strong associations between responses inflates GLMM estimates relative tomarginal model estimates.

b. Loglinear model focuses on strength of association between use of one sub-stance and use of another, given whether or not one used remaining substance.The focus is not on the odds of having used one substance compared with theodds of using another.

c. If σ̂ = 0, GLMM has the same fit as loglinear model (A, C, M), since condi-tional independence of responses given random effect translates to conditionalindependence marginally also.

9. For β̂A = 0, β̂B = 1.99 (SE = 0.35), β̂C = 2.51 (SE = 0.37), with σ̂ = 0.

11. a. For given department, estimated odds of admission for female are exp(0.173)

= 1.19 times estimated odds of admission for male.

b. For given department, estimated odds of admission for female are exp(0.163)

= 1.18 times estimated odds of admission for male.

c. The estimated mean log odds ratio between gender and admissions, givendepartment, is 0.176, corresponding to odds ratio = 1.19. Because of extravariance component, estimate of β is not as precise.

d. Marginal odds ratio of exp(−0.07) = 0.93 in different direction, correspond-ing to odds of being admitted lower for females than males.

13. a. e2.51 = 12.3, so estimated odds of response in category ≤ j (i.e., toward“always wrong” end of scale) on extramarital sex for a randomly selected

Page 16: An Introduction to Categorical Data Analysis || Brief Solutions to Some Odd-Numbered Problems

“bansw” — 2007/1/29 — 15:39 — page 372 — #16�

372 BRIEF SOLUTIONS TO SOME ODD-NUMBERED PROBLEMS

subject are 12.3 times estimated odds of response in those categories forpremarital sex for another randomly selected subject.

b. Estimate of β much larger for GLMM, since a subject-specific estimate andvariance component is large (recall Section 10.1.4).

17. a. At initial time, treatment effect = 0.058 (odds ratio = 1.06), so two groupshave similar response distributions. At follow-up time, the treatment effect =0.058 + 1.081 = 1.139 (odds ratio = 3.1).

b. LR statistic = −2[−593.0 − (−621.0)] = 56. Null distribution is equalmixture of degenerate at 0 and X2

1, and P -value is half that of X21 variate,

and is 0 to many decimal places.

23. From Section 10.1.4, the effects in marginal models are smaller in absolute valuethan effects in GLMMs, with greater difference when σ̂ is larger. Here, the effectfor GLMM is the same for each age group, but diminishes more for the olderage group in the marginal model because the older age group has much largerσ̂ in GLMM.