Heart Disease Example

Heart Disease Example

6.2.2

Male residents age 40-59

• Two models examined• A) independence 1)logit(╥) = α B) linear logit 1)logit(╥) = α + βx¡

Independence 1)The response variable is whether the men develop coronary heart disease during a six year follow-up period.

2)pi is the probability of heart disease for blood pressure categories. 3)This model treats the response as independent of blood pressure. 4)This is a poor fitting model since G² = 30.0 Χ² = 33.4, df =7 has p-value < .001. 5)In table 6.5 some residual large this is explained by the poor fit.

Linear Logit Model• A residual plot reveals an increasing trend.• This suggests the linear logit model with scores {x¡} for blood pressure level.• In fig. 6.2 the mid points of the blood pressure intervals are plotted

with the model.• This reduces the extreme trend of the residuals.• Notice the good fit except slight variation in second category.• Table 6.6 reports residuals for this linear logit model. Notice the

extreme large value for observation 2. This may be explained by chance and is not surprising.

• The overall fit is good and G² = 5.9, X² = 6.3 with df = 6 the p-value is > .25 which indicates good fit.

• Studying residuals helps us understand either why a model fits poorly or where there is lack of fit in a generally good fitting model.

Graduate Admissions Example

• Refer to Table 6.7 this is data of graduate school applications to the 23 departments in the College of Liberal Arts and Sciences at the University of Florida during 1997-1998 academic year.

• It cross-classifies applicant’s by gender, admission, and department to which the prospective students applied.

Graduate Admission Example cont.

• We consider logit models with admission as the response variable.

• Πіқ denotes the probability of admission for gender i in department қ.

• We would hope that admission does not dependent on gender; however, the model without gender, given the department fits poorly. Logit(πiқ)= α + βDқ

• G² = 44.7, χ² = 40.9, df = 23 which has a p-value <.025.

Graduate Admission cont.• Table 6.7 reports standardized Pearson

residuals for the number of females admitted for this model.

• Departments with large standardized Pearson residuals reveal the reason for the lack of fit. Significantly more females were admitted than predicted in the astronomy and geography departments and fewer in the psychology department.

• Without these three departments the model fits well.(G² = 24.4, X² =22.8, df = 20 with p-value >.25.)

Graduate Admission cont.• For the complete data, adding gender to the model does not

improve the fit.(G² = 42.4, X² = 39.0, df = 22 with p-value <0.01). This is because of the magnitude and direction of the residuals in the three departments described earlier.

• The ML estimate of this model 1.19 for the gender admission conditional odds ratio, reveals the females are admitted 19% more often than males, given the department.

• By contrast, a gender admission sample odds ratio of 0.94 for the marginal, which indicates females are 6% less likely to be admitted.

• This illustrates Simpson’s paradox, the conditional association having different direction than the marginal association.

Influence Diagnostics for Logistic Regression

6.2.4

Influence Diagnostics for Logistic Regression

• Other regression diagnostic tools are also helpful in assessing fit. These include plots of ordered residuals against normal percentiles and analyses that describe an observation’s influence on parameter estimates and fit statistics.

• Whenever a residual indicates a model is a poor fit, it can be informative to delete the observation and refit the model to the remaining observations. This is equivalent to adding a parameter to the model for that observation, forcing a perfect fit.

I. D. for Logistic Regression• In logistic regression, the observation could be a single binary

response or a binomial response for a set of subjects all having the same predictor values.

• Influence measures for each observation include:• 1. For each model parameter, the change in the parameter estimate

when the observation is deleted. This change, divided by its standard error, is called Dfbeta.

• A measure of the change in a joint confidence interval for the parameters produced by deleting the observation. This confidence interval displacement diagnostic is denoted by c.

• The change in X² or G² goodness-of-fit statistics when the observation is deleted.

• For each measure, the larger the value, the greater the influence.

Summarizing Predictive Power: Classification Tables and ROC

Curves• A classification table cross-classifies the

binary response with a prediction of whether y = 0 or 1.

• The prediction is ŷ = 1 when πi>πo and ŷ = 0 when πi<=πo, for some cutoff πo.

• Most classification tables use πo = 0.5 and summarize predictive power by

Sensitivity = P(ŷ = 1l y = 1) and specificity = P (ŷ = 0l y = 0)

Classification Tables and ROC Curves cont.

• A receiver operating characteristic (ROC) curve is a plot of sensitivity as a function of (1 – specificity) for the possible cutoffs πo.

• This curve usually has a concave shape connecting the points (0,0) and (1,1).

• The higher the area under the curve the better the predictions.

• The ROC curve is more informative than classification tables, since it summarizes predictive power for all possible πo.

Classification Tables and ROC Curves cont.

• The area under a ROC curve is identical to the value of another measure of predictive power, the concordance index.

• The concordance index c estimates the probability that the predictions and the outcomes are concordant, the observation with the larger y also has the larger π.

• A value c = 0.5 means predictions are no better than a random guess.

Heart Disease Example

Documents

Transcript of Heart Disease Example