of 55 /55
STAT 525 SPRING 2018 Chapter 14 Logistic and Poisson Regressions Professor Min Zhang
• Author

others
• Category

## Documents

• view

5

0

Embed Size (px)

### Transcript of Chapter 14 Logistic and Poisson Regressionsminzhang/525-Spring2018/Lec_notes_… ·... STAT 525 SPRING 2018

Chapter 14Logistic and Poisson Regressions

Professor Min Zhang Logistic Regression

Background

• In many situations, the response variable has only two pos-

sible outcomes

– Disease (Y = 1) vs Not diseased (Y = 0)

– Employed (Y = 1) vs Unemployed (Y = 0)

• Response is binary or dichotomous

• Can model response using Bernoulli distribution

Pr(Yi = 1) = πiPr(Yi = 0) = 1− πi

→ E(Yi) = πi

• Goal to link πi with covariates Xi

14-1 Standard Linear Model?

• Consider simple linear model (single predictor X)

Yi = β0 + β1Xi + εi

↓πi = E[Yi] = β0 + β1Xi

• Assumption violations:

– Non-normal error terms

∗ When Yi = 0 : εi = −β0 − β1Xi

∗ When Yi = 1 : εi = 1− β0 − β1Xi

– Non-constant variance

Var(Yi) = πi(1− πi)

= (β0 + β1Xi)(1− β0 − β1Xi)

– Response function constraints → 0 ≤ πi ≤ 1

14-2 Logistic Response Function

• Consider a sigmoidal response function

πi = E[Yi] =exp(β0 + β1Xi)

1 + exp(β0 + β1Xi)

= {1+ exp(−β0 − β1Xi)}−1

• Example of a nonlinear model

14-3 Properties

• Monotonic increasing/decreasing function

• Restricts 0 ≤ πi = E[Yi] ≤ 1

• Can be linearized through transformation

log

(

πi1− πi

)

= β0 + β1Xi

– Known as the logit transformation

– Thus, use logit link to relate πi with Xi

• Other links possible (i.e., probit, complementary log-log described on

p 559-563)

14-4 Estimation

• Given the distribution of Yi, we can formulate the likelihood

function

log(L) = log

n∏

i=1

πYii (1− πi)

1−Yi

=∑

Yi log(πi) +∑

(1− Yi) log(1− πi)

=∑

Yi log(πi

1− πi) +

log(1− πi)

=∑

Yi(β0 + β1Xi)−∑

log(1 + exp(β0 + β1Xi))

• MLEs do not have closed forms

• SAS performs iterative reweighted least squares (IRWLS)

• Given b0 and b1, can calculate π̂i

14-5 Interpretation

• π̂i is the estimated probability of individual i having response

Yi = 1

• b1 is no longer the slope of a linear relationship but rather

the “slope” of the logit relationship

logit(π̂(Xi +1))− logit(π̂(Xi)) = b1

• Logit transformation is also the log of the odds

• Thus, exp(b1) becomes the odds ratio

• Popular summary in epidemiologic studies

14-6 Example (Page 625)

• Board of directors are interested in the effect of a due in-

crease

• Randomly surveyed n = 30 members

• Xi is the due increase posited to member

• Xi varied between \$30 and \$50

• Yi is whether membership would continue

14-7 data a1;infile ’u:\.www\datasets525\CH14PR07.txt’;input norenew increase;renew=1-norenew;

proc print data=a1;var renew increase;

run; quit;

Obs renew increase1 1 302 0 303 1 304 1 315 1 326 1 337 0 348 1 359 1 35. . .22 1 4523 0 4524 0 4525 1 4626 0 4727 0 4828 1 4929 0 5030 0 50

14-8 /*------ Scatterplot ------*/goptions colors=(’none’);symbol1 v=circle i=sm70;proc gplot data=a1;

plot renew*increase;run; quit;

14-9 /* DESCENDING: SAS will predict the event of 1, otherwise of 0 */proc logistic data=a1 descending;

model renew = increase;output out=a2 p=pred;

run; quit;

Model Information

Model binary logitOptimization Technique Fisher’s scoring

Response ProfileOrdered Total

Value renew Frequency1 1 142 0 16

Probability modeled is renew=1.

Model Fit StatisticsIntercept

Intercept andCriterion Only CovariatesAIC 43.455 41.465SC 44.857 44.267-2 Log L 41.455 37.465

14-10 Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSqLikelihood Ratio 3.9906 1 0.0458Score 3.8265 1 0.0504Wald 3.5104 1 0.0610

Analysis of Maximum Likelihood EstimatesStandard Wald

Parameter DF Estimate Error Chi-Square Pr > ChiSqIntercept 1 4.8075 2.6558 3.2769 0.0703increase 1 -0.1251 0.0668 3.5104 0.0610

Odds Ratio EstimatesPoint 95% Wald

Effect Estimate Confidence Limitsincrease 0.882 0.774 1.006

Association of Predicted Probabilities and Observed ResponsesPercent Concordant 68.3 Somers’ D 0.402Percent Discordant 28.1 Gamma 0.417Percent Tied 3.6 Tau-a 0.207Pairs 224 c 0.701

14-11 /*------ Scatterplot/Fit ------*/symbol1 v=circle i=none;symbol2 v=star i=sm30;proc gplot data=a2;

plot renew*increase pred*increase /overlay;run; quit;

14-12 Multiple Logistic Regression

• Easily extension using matrix formulations

• Similar model building strategies (SAS option of MODEL)

– Stepwise: SELECTION=STEPWISE

– Forward: SELECTION=FORWARD

– Backward: SELECTION=BACKWARD

– Best Subset: SELECTION=SCORE

• SELECTION=SCORE

– PROC LOGISTIC uses the branch and bound algorithm of

Furnival and Wilson (1974) to find a specified number of

models (e.g., BEST=1) with the highest score (chi-square)

test statistic for all possible model sizes.

– The score test statistic is to test whether all the coeffi-

cients are zero.

14-13 Example (Page 573)

• Investigating epidemic outbreak of a disease spread by mosquitoes

• Randomly sampled individuals within two sectors of city

• Assessed whether individual had symptoms of disease and

obtained other info

– Xi1 is age

– Xi2 indicator of middle class

– Xi3 indicator of lower class

– Xi4 is the sector

– Yi is whether they had symptoms

14-14 data a3;infile ’u:\.www\datasets525\APC3.DAT’;input case age socioecon sector Y jnk;X2=0; if socioecon=2 then X2=1;X3=0; if socioecon=3 then X3=1;drop jnk;

proc logistic data=a3 descending;model Y = age X2 X3 sector/selection=score best=1;

run; quit;

Regression Models Selected by Score Criterion

Number of ScoreVariables Chi-Square Variables Included in Model

1 14.8687 sector2 24.6315 age sector3 24.9862 age X3 sector4 24.9977 age X2 X3 sector

• Which model should be selected?

• X2 and X3 should be included/excluded together!

• Note: selection=score does not support class variables!

14-15 proc logistic data=a3 descending;class socioecon;model Y = age socioecon sector/selection=stepwise;

run; quit;

Summary of Stepwise Selection

Effect Number Score WaldStep Entered Removed DF In Chi-Square Chi-Square Pr > ChiSq

1 sector 1 1 14.8687 0.00012 age 1 2 10.2014 0.0014

proc logistic data=a3 descending;model Y = age X2 X3 sector;test x2=0,x3=0;

run; quit;

Linear Hypotheses Testing Results

WaldLabel Chi-Square DF Pr > ChiSqTest 1 0.4195 2 0.8108

14-16 /*------ Remove X2 & X3 ------*/proc logistic data=a3 descending;

model Y = age sector;run; quit;

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSqLikelihood Ratio 24.6901 2 <.0001Score 24.6315 2 <.0001Wald 21.6714 2 <.0001

Analysis of Maximum Likelihood Estimates

Standard WaldParameter DF Estimate Error Chi-Square Pr > ChiSqIntercept 1 -3.3413 0.5921 31.8471 <.0001age 1 0.0268 0.00865 9.6082 0.0019sector 1 1.1817 0.3370 12.2981 0.0005

Odds Ratio Estimates

Point 95% WaldEffect Estimate Confidence Limitsage 1.027 1.010 1.045sector 3.260 1.684 6.310

14-17 • PROC GENMOD fits a generalized linear model by ML.

– The available distributions: normal, binomial, Poisson,

gamma, inverse Gaussian, negative binomial, multinomial.

proc genmod data=a3 descending;model Y = age X2 X3 sector / link=logit noscale dist=bin;contrast ’socioeconomic’ X2 1, X3 1;contrast ’sector’ sector 1;

run; quit;

Analysis Of Parameter Estimates

Standard Wald 95% Confidence Chi-Parameter DF Estimate Error Limits Square Pr > ChiSqIntercept 1 -3.5376 0.6892 -4.8884 -2.1867 26.35 <.0001age 1 0.0270 0.0087 0.0100 0.0440 9.68 0.0019X2 1 0.0446 0.4325 -0.8031 0.8923 0.01 0.9179X3 1 0.2534 0.4056 -0.5414 1.0483 0.39 0.5320sector 1 1.2436 0.3523 0.5532 1.9341 12.46 0.0004Scale 0 1.0000 0.0000 1.0000 1.0000

Contrast Results

Contrast DF Chi-Square Pr > ChiSq Typesocioeconomic 2 0.42 0.8109 LRsector 1 13.00 0.0003 LR

14-18 Hypothesis Testing

• Hypothesis testing and formulation of test statistics is done

using a likelihood ratio test, which is similar to the general

linear test (i.e., comparing the full and reduced models).

• The deviance of a fitted model is the difference between the

log-likelihood of the fitted model and a model that has a

parameter (πi) for each observation Yi (i.e., use all of the degrees

of freedom so that the residuals will be zero).

• For logistic regression

DEV(X1, X2, ...,Xp−1) = −2 log(L(b0, b1, ..., bp−1))

14-19 • Can use deviance to compare models

• Models must be hierarchical (Full/Reduced)

DEV(Xq, ...,Xp−1|X0, ...,Xq−1) = DEV(X0, ...,Xq−1)−DEV(X0, ...,Xp−1)

• Partial deviance approx χ2 with p− q df

• Must compute by hand or with PROC GENMOD

• For example: Testing HO: Socioeconomic=0

– Model Fit Statistics from PROC LOGISTIC

∗ Output of model Y = age X2 X3 sector;

-2 Log L 211.220

∗ Output of model Y = age sector;

-2 Log L 211.639

– DEV(Socioecon|Age,Sector) = 211.639-211.220 = 0.419

∗ p-value=0.8109

14-20 • Can also use Wald’s Test for H0 : l′β = 0

• Described on page 578

• Test is similar to squared z test

S = (L′β̂)′(L′ΣL)−1(L′β̂)

• Under H0, S ∼ χ2r where r is rank of L

• Available in PROC GENMOD and PROC LOGISTIC, using ESTIMATE

statement.

14-21 Diagnostics

• Goodness of fit : Overall measure of fit

– Consider both replicated and unreplicated binary data

– Replicated (c unique classes of predictors)

a. Deviance goodness of fit : Compare deviance to χ2c−p

b. Pearson goodness of fit : Compare∑∑

(Ojk − Ejk)2/Ejk to

χ2c−p

– Unreplicated

a. Hosemer-Lemeshow goodness of fit: Group observa-

tions into classes (usually around 10) according to fit-

ted logit values. Assess overall fit to each class using a

Pearson goodness of fit approach. (p 589-590)

14-22 Hosemer-Lemeshow Goodness of Fit Test

• Divide obs up into ≈ 10 groups of equal size based on per-

centiles of the estimated probabilities

• Expected # of 1’s is∑

π̂i

• Expected # of 0’s is ni −∑

π̂i

• Compare expected with observed through

χ2 =∑∑ (Oij−Eij)

2

Eij

Example (Page 625)

/* LACKFIT: Hosmer and Lemeshow goodness-of-fit test */proc logistic data=a1 descending;

model renew = increase / lackfit;output out=a2 p=pred;

proc print; run; quit;

14-23 Obs norenew increase renew _LEVEL_ pred1 1 50 0 1 0.190562 1 50 0 1 0.190563 0 49 1 1 0.210604 1 48 0 1 0.232145 1 47 0 1 0.255186 0 46 1 1 0.279677 0 45 1 1 0.305558 1 45 0 1 0.305559 1 45 0 1 0.3055510 1 44 0 1 0.3327211 1 43 0 1 0.3610412 1 42 0 1 0.3903713 0 41 1 1 0.4205114 0 40 1 1 0.4512515 1 40 0 1 0.4512516 1 40 0 1 0.4512517 1 39 0 1 0.4823718 0 38 1 1 0.5136319 0 37 1 1 0.5447824 1 34 0 1 0.6352625 0 33 1 1 0.6637226 0 32 1 1 0.6910427 0 31 1 1 0.7170928 0 30 1 1 0.7417729 1 30 0 1 0.7417730 0 30 1 1 0.74177

14-24 • In this example

– E11 = (.19056 + .19056 + .21060) = 0.59 → E10 = 2.41

– E21 = (.23214 + .25518 + .27967) = 0.77 → E20 = 2.23

Partition for the Hosmer and Lemeshow Test

renew = 1 renew = 0Group Total Observed Expected Observed Expected

1 3 1 0.59 2 2.412 3 1 0.77 2 2.233 3 1 0.92 2 2.084 3 0 1.08 3 1.925 4 2 1.77 2 2.236 3 2 1.54 1 1.467 4 2 2.39 2 1.618 3 2 1.99 1 1.019 4 3 2.94 1 1.06

Hosmer and Lemeshow Goodness-of-Fit Test

Chi-Square DF Pr > ChiSq2.6526 7 0.9152

14-25 Measures of Agreement

• Want to measure the association of the pair (Yi, π̂i), i =

1,2, · · · , N

– Desired to have close (positive) association in some sense.

• Compare predicted probabilities of pair (Yi = 1, Yj = 0)

– Concordant if π̂i > π̂j (#concordant = nc)

– Discordant if π̂i < π̂j (#discordant = nd)

– Tie if π̂i = π̂j

• Consider all pairs of distinct responses

– In this example t = 16× 14 = 224

14-26 • Measures of agreement

– Somers’ D =nc−nd

t : ranges from −1 (all pairs disagree)

to 1 (all pairs agree);

– Goodman-Kruskal Gamma =nc−ndnc+nd

: range from -1.0 (no

association) to 1.0 (perfect association). Not penalizing

for ties and generally greater than Somer’s D;

– Kendall’s Tau-a =nc−nd

N(N−1)/2: modification of Somer’s D

by considering all possible paired observations. Usually

much smaller than Somer’s D;

– c =nc+(t−nc−nd)/2

t : equivalent to the well known mea-

sure ROC, ranging from 0.5 (randomly predicting the re-

sponse) to 1 (perfectly discriminating the response).

Association of Predicted Probabilities and Observed ResponsesPercent Concordant 68.3 Somers’ D 0.402Percent Discordant 28.1 Gamma 0.417Percent Tied 3.6 Tau-a 0.207Pairs 224 c 0.701

14-27 Diagnostics

• “Residual” analysis

– Distribution of residuals under correct model is unknown

and thus common residual plot uninformative.

– Pearson residual is the ordinary residual divided by the

standard error of Yi. Sum of squared residuals equals

Pearson X2

rPi =Yi − π̂i

π̂i(1− π̂i)

– Studentized Pearson residual is similar but divided by its

standard error so they have unit variance

– Deviance residual is the signed square root of the contri-

bution to the model deviance

(rDi )2 = DEV(X0, ..., Xp−1)

– Sign depends on π̂i > 0.5

14-28 • “Residual” analysis

– Can plot residual by predicted value : A flat lowess smooth

to this plot suggests the model is correct

– Can generate probability plot with envelope

• DFFITS, DFBETAS

• SAS contains influence and iplots options

– Example (Page 625))

ods html;

ods graphics on;

proc logistic data=a1 descending;

model renew = increase / iplots influence lackfit;

ods graphics off;

ods html close;

run; quit;

14-29 Regression DiagnosticsPearson Residual Deviance Residual

CovariatesCase (1 unit = 0.24) (1 unit = 0.22)

increase Value -8 -4 0 2 4 6 8 Value -8 -4 0 2 4 6 81 30.0000 0.5900 | | * | 0.7729 | | * |2 30.0000 -1.6948 | * | | -1.6455 | * | |3 30.0000 0.5900 | | * | 0.7729 | | * |4 31.0000 0.6281 | | * | 0.8155 | | * |5 32.0000 0.6686 | | * | 0.8597 | | * |6 33.0000 0.7118 | | * | 0.9054 | | * |7 34.0000 -1.3197 | * | | -1.4203 | * | |8 35.0000 0.8066 | | * | 1.0012 | | * |9 35.0000 0.8066 | | * | 1.0012 | | * |

10 35.0000 -1.2397 | * | | -1.3645 | * | |11 36.0000 -1.1646 | * | | -1.3092 | * | |12 37.0000 0.9141 | | * | 1.1021 | | * |13 38.0000 0.9731 | | * | 1.1543 | | * |14 39.0000 -0.9653 | * | | -1.1476 | * | |15 40.0000 1.1028 | | * | 1.2615 | | * |16 40.0000 -0.9068 | * | | -1.0955 | * | |17 40.0000 -0.9068 | * | | -1.0955 | * | |18 41.0000 1.1739 | | * | 1.3163 | | * |19 42.0000 -0.8002 | * | | -0.9949 | * | |20 43.0000 -0.7517 | * | | -0.9465 | * | |21 44.0000 -0.7061 | * | | -0.8995 | * | |22 45.0000 1.5076 | | * | 1.5399 | | * |23 45.0000 -0.6633 | * | | -0.8540 | * | |24 45.0000 -0.6633 | * | | -0.8540 | * | |25 46.0000 1.6049 | | * | 1.5963 | | * |26 47.0000 -0.5853 | * | | -0.7676 | * | |27 48.0000 -0.5498 | * | | -0.7268 | * | |28 49.0000 1.9361 | | *| 1.7651 | | *|29 50.0000 -0.4852 | * | | -0.6502 | * | |30 50.0000 -0.4852 | * | | -0.6502 | * | |

14-30 Hat Matrix DiagonalIntercept

Case (1 unit = 6.E-03) DfBeta (1 unit = 0.07)Value 0 2 4 6 8 12 16 Value -8 -4 0 2 4 6 8

1 0.1040 | *| 0.1945 | | * |2 0.1040 | *| -0.5587 |* | |3 0.1040 | *| 0.1945 | | * |4 0.0941 | * | 0.1902 | | * |5 0.0841 | * | 0.1831 | | * |6 0.0743 | * | 0.1732 | | * |7 0.0651 | * | -0.2792 | * | |8 0.0568 | * | 0.1441 | | * |9 0.0568 | * | 0.1441 | | * |

10 0.0568 | * | -0.2215 | * | |11 0.0497 | * | -0.1689 | * | |12 0.0442 | * | 0.1013 | |* |13 0.0404 | * | 0.0744 | |* |14 0.0385 | * | -0.0405 | *| |15 0.0385 | * | 0.00837 | * |16 0.0385 | * | -0.00688 | * |17 0.0385 | * | -0.00688 | * |18 0.0404 | * | -0.0310 | * |19 0.0440 | * | 0.0479 | |* |20 0.0491 | * | 0.0696 | |* |21 0.0555 | * | 0.0879 | |* |22 0.0628 | * | -0.2338 | * | |23 0.0628 | * | 0.1029 | |* |24 0.0628 | * | 0.1029 | |* |25 0.0707 | * | -0.2957 | * | |26 0.0788 | * | 0.1240 | | * |27 0.0869 | * | 0.1306 | | * |28 0.0946 | * | -0.5053 | * | |29 0.1017 | *| 0.1370 | | * |30 0.1017 | *| 0.1370 | | * |

14-31 Influence Plots

14-32 Residual Plots

14-33 Ordinal Logistic Regression

• Have more than two possible outcomes on ordered scale (i.e.,

Likert scale), say J outcomes

• Often called proportional odds model

log

(

Pr(Yi ≤ j)

1− Pr(Yi ≤ j)

)

= β0j + β1Xi1 + · · ·+ βp−1Xi,p−1,

j = 1,2, · · · , J − 1

– Shared values of β1, · · · , βp−1

– J − 1 intercepts: β01, β02, · · · , β0,J−1

– Proportional odds (irrelevant to the values of predictors)

Pr(Yi ≤ j|Xi)

1−Pr(Yi ≤ j|Xi)/

Pr(Yi ≤ k|Xi)

1−Pr(Yi ≤ k|Xi)= eβ0j−β0k

14-34 Dose-Symptoms Example

data dsymp;input dose symptoms \$ n @@;ldose = log10(dose);

cards;10 None 33 10 Mild 7 10 Severe 10 20 None 1720 Mild 13 20 Severe 17 30 None 14 30 Mild 330 Severe 28 40 None 9 40 Mild 8 40 Severe 32;

/* ORDER: specifies sorting order for classification variables *//* ORDER=DATA: order of appearance in the input data set */proc logistic data=dsymp order=data;

class symptoms;model symptoms = ldose;weight n;

run; quit;

Response ProfileOrdered Total Total

Value symptoms Frequency Weight1 None 4 73.0000002 Mild 4 31.0000003 Severe 4 87.000000

Probs modeled are cumulated over the lower Ordered Values.

14-35 Score Test for the Proportional Odds Assumption

Chi-Square DF Pr > ChiSq0.1674 1 0.6825

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSqLikelihood Ratio 31.3281 1 <.0001Score 29.6576 1 <.0001Wald 28.5177 1 <.0001

Analysis of Maximum Likelihood Estimates

Standard WaldParameter DF Estimate Error Chi-Square Pr > ChiSqIntercept None 1 4.1734 0.8862 22.1759 <.0001Intercept Mild 1 4.9372 0.9083 29.5473 <.0001ldose 1 -3.5207 0.6593 28.5177 <.0001

Odds Ratio Estimates

Point 95% WaldEffect Estimate Confidence Limitsldose 0.030 0.008 0.108

14-36 Polytomous Logistic Regression

• For polytomous or multicategory, consider there are J re-

sponse categories

• Select one as baseline or reference category,

log

(

P(Yi = j)

P(Yi = J)

)

= β0j + β1jXi1 + · · ·+ βp−1,jXi,p−1,

⇔ log

(

πij

πiJ

)

= X ′iβj, j = 1, · · · , J − 1;

• Results in J − 1 parameter vectors, β1, · · · , βJ−1

14-37 Dose-Symptoms Example

/* CATMOD: performs categorical data modeling *//* DIRECT: lists numeric predictors */proc catmod data=dsymp order=data;

direct ldose;weight n;model symptoms = ldose;

run; quit;

Maximum Likelihood Analysis of Variance

Source DF Chi-Square Pr > ChiSqIntercept 2 24.83 <.0001ldose 2 26.47 <.0001Likelihood Ratio 4 6.91 0.1410

Analysis of Maximum Likelihood Estimates

Function Standard Chi-Parameter Number Estimate Error Square Pr > ChiSqIntercept 1 5.3897 1.0998 24.02 <.0001

2 2.3076 1.3681 2.84 0.0917ldose 1 -4.1489 0.8065 26.46 <.0001

2 -2.4128 0.9902 5.94 0.0148

14-38 Overdispersion

• Binomial/Bernoulli Model

E(Yi) = πi

Var(Yi) = πi(1− πi)

• Beta-Binomial Model

E(Yi) = πi

Var(Yi) = σ2πi(1− πi)

• Mean unaffected but Cov(β̂) ≈ σ2(X ′WX)−1

• Can show E(χ2/(N − p)) ≈ σ2

14-39 Example (Page 625)

proc genmod data=a1 descending;

model renew = increase / dist=binomial noscale link=logit;

run; quit;

Criteria For Assessing Goodness Of FitCriterion DF Value Value/DF

Deviance 28 37.4648 1.3380

Scaled Deviance 28 37.4648 1.3380

Pearson Chi-Square 28 30.1028 1.0751

Scaled Pearson X2 28 30.1028 1.0751

Log Likelihood -18.7324

---------------------------------------------------------------

proc genmod data=a1 descending;

model renew = increase / dist=binomial scale=2 link=logit;

run; quit;

Criteria For Assessing Goodness Of Fit

Criterion DF Value Value/DF

Deviance 28 37.4648 1.3380

Scaled Deviance 28 9.3662 0.3345

Pearson Chi-Square 28 30.1028 1.0751Scaled Pearson X2 28 7.5257 0.2688

Log Likelihood -4.6831

14-40 proc genmod data=a1 descending;model renew = increase / dist=binomial noscale link=logit;

run; quit;

Analysis Of Parameter Estimates

Standard Chi-Parameter DF Estimate Error Square Pr > ChiSq

Intercept 1 4.8075 2.6558 3.28 0.0703

increase 1 -0.1251 0.0668 3.51 0.0610

Scale 0 1.0000 0.0000

---------------------------------------------------------------

proc genmod data=a1 descending;

model renew = increase / dist=binomial scale=2 link=logit;

run; quit;

Analysis Of Parameter Estimates

Standard Chi-

Parameter DF Estimate Error Square Pr > ChiSq

Intercept 1 4.8075 5.3115 0.82 0.3654

increase 1 -0.1251 0.1335 0.88 0.3489

Scale 0 2.0000 0.0000

14-41 INGOTS Example

• Taken from Cox and Snell (1989, p.10-11)

• The data consist of the number, r, of ingots not ready for

rolling, out of n tested, for a number of combinations of

heating time (heat) and soaking time (soak).

data ingots;input heat soak r n @@;

cards;7 1.0 0 10 14 1.0 0 31 27 1.0 1 56 51 1.0 3 137 1.7 0 17 14 1.7 0 43 27 1.7 4 44 51 1.7 0 17 2.2 0 7 14 2.2 2 33 27 2.2 0 21 51 2.2 0 17 2.8 0 12 14 2.8 0 31 27 2.8 1 22 51 4.0 0 17 4.0 0 9 14 4.0 0 19 27 4.0 1 16;

14-42 proc logistic data = ingots;model r/n = heat soak;

run; quit;

Model Fit Statistics

InterceptIntercept and

Criterion Only CovariatesAIC 108.988 101.346SC 112.947 113.221-2 Log L 106.988 95.346

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSqLikelihood Ratio 11.6428 2 0.0030Score 15.1091 2 0.0005Wald 13.0315 2 0.0015

Analysis of Maximum Likelihood Estimates

Standard WaldParameter DF Estimate Error Chi-Square Pr > ChiSqIntercept 1 -5.5592 1.1197 24.6503 <.0001heat 1 0.0820 0.0237 11.9454 0.0005soak 1 0.0568 0.3312 0.0294 0.8639

14-43 proc genmod data = ingots;model r/n = heat soak / dist=binomial link=logit scale=p;

run; quit;

Criteria For Assessing Goodness Of Fit

Criterion DF Value Value/DFDeviance 16 13.7526 0.8595Scaled Deviance 16 16.2476 1.0155Pearson Chi-Square 16 13.5431 0.8464Scaled Pearson X2 16 16.0000 1.0000Log Likelihood -56.3214

Analysis Of Parameter Estimates

Standard Wald 95% Confidence Chi-Parameter DF Estimate Error Limits Square Pr > ChiSqIntercept 1 -5.5592 1.0301 -7.5782 -3.5401 29.12 <.0001heat 1 0.0820 0.0218 0.0392 0.1248 14.11 0.0002soak 1 0.0568 0.3047 -0.5405 0.6540 0.03 0.8522Scale 0 0.9200 0.0000 0.9200 0.9200

NOTE: The scale parameter was estimated by the square root of Pearson’s Chi-Square/DOF.

14-44 proc genmod data = ingots;model r/n = heat soak / dist=binomial link=logit scale=d;

run; quit;

Criteria For Assessing Goodness Of Fit

Criterion DF Value Value/DF

Deviance 16 13.7526 0.8595Scaled Deviance 16 16.0000 1.0000Pearson Chi-Square 16 13.5431 0.8464Scaled Pearson X2 16 15.7562 0.9848Log Likelihood -55.4632

Analysis Of Parameter Estimates

Standard Wald 95% Confidence Chi-Parameter DF Estimate Error Limits Square Pr > ChiSqIntercept 1 -5.5592 1.0381 -7.5938 -3.5246 28.68 <.0001heat 1 0.0820 0.0220 0.0389 0.1252 13.90 0.0002soak 1 0.0568 0.3071 -0.5451 0.6586 0.03 0.8533Scale 0 0.9271 0.0000 0.9271 0.9271

NOTE: The scale parameter was estimated by the square root of DEVIANCE/DOF.

14-45 Poisson Regression

• Like logistic regression, this is a nonlinear regression model

for discrete outcomes

• Used when response variable is a count, for example,

– Number of acute asthma attacks in a week

– Number of visits to mall during December

• Useful when large count is a rare event

• Otherwise, standard linear model with normal errors appro-

priate

– Poisson counts → use√

transformation

14-46 Regression Model

• Standard linear/nonlinear regression form:

Yi = E(Yi) + εi

• Generalized linear model: Exponential family distribution for

Y and link between the mean and covariates

• Common link g(·) such that µi = g−1(X′iβ)

– µi = (X′iβ)

– All must result in µi nonnegative

• Model is such that the Yi’s are independent Poisson random

variables with expected values µi = g−1(X′iβ)

14-47 Estimation

• Given the distribution of Yi, we can formulate the likelihood

function

log(L) =∑

Yi log{µ(Xi,β)} −∑

µ(Xi,β) + C

• MLEs do not have closed forms

• Iterative reweighted least squares or other numerical search

procedures used to solve for β

14-48 Hypothesis Testing

• Hypothesis testing and formulation of test statistics is done

in similar manner to logistic regression.

• The deviance of a fitted model is the difference between

the log-likelihood of the fitted model and a model that has a

parameter (µi) for each observation Yi (i.e., use all of the degrees

of freedom so that the residuals will be zero).

– For Poisson regression

DEV(X1, X2, ..., Xp−1) = 2

[

Yi log

(

Yiµ̂i

)

−∑

(Yi − µ̂i)

]

14-49 • Can use deviance to compare models

– Models must be hierarchical (Full/Reduced)

DEV(Xq, ...,Xp−1|X0, ...,Xq−1) = DEV(X0, ..., Xq−1)

− DEV(X0, ..., Xp−1)

– Partial deviance approx χ2 with p− q df

• Must compute by hand or with PROC GENMOD

• Prediction

– Mean response for predictors X

– Probability of specific count (i.e., Y = 0)

14-50 Example (Page 621)

• Lumber company interested in the relationship between the number ofcustomers from a census tract and

– X1 : Number of housing units

– X2 : Average income

– X3 : Average housing unit age

– X4 : Distance to nearest competitor

– X5 : Distance to store

• Will use LINK=LOG → log(µi) = X′β

data ppoi1;infile ’U:\.www\datasets525\CH14TA08.DAT’;input ncust x1 x2 x3 x4 x5;

proc genmod;model ncust= x1 x2 x3 x4 x5 / link=log dist=poi type3;

run; quit;

14-51 Criteria For Assessing Goodness Of Fit

Criterion DF Value Value/DFDeviance 104 114.9854 1.1056Scaled Deviance 104 114.9854 1.1056Pearson Chi-Square 104 101.8808 0.9796Scaled Pearson X2 104 101.8808 0.9796Log Likelihood 1898.0224

Analysis Of Parameter Estimates

Standard Wald 95% Chi-Parameter DF Estimate Error Confidence Limits Square P>ChiSqIntercept 1 2.9424 0.2072 2.5362 3.3486 201.57 <.0001x1 1 0.0006 0.0001 0.0003 0.0009 18.17 <.0001x2 1 -0.0000 0.0000 -0.0000 -0.0000 30.63 <.0001x3 1 -0.0037 0.0018 -0.0072 -0.0002 4.37 0.0365x4 1 0.1684 0.0258 0.1179 0.2189 42.70 <.0001x5 1 -0.1288 0.0162 -0.1605 -0.0970 63.17 <.0001Scale 0 1.0000 0.0000 1.0000 1.0000

NOTE: The scale parameter was held fixed.

LR Statistics For Type 3 Analysis

Chi-Source DF Square Pr > ChiSqx1 1 18.20 <.0001x2 1 31.79 <.0001x3 1 4.38 0.0364x4 1 41.66 <.0001x5 1 67.50 <.0001

14-52 Diagnostics

• Goodness of fit

– Deviance goodness of fit : Compare overall fit with χ2n−p

• “Residual” analysis

– Distribution of residuals under correct model is unknown

and thus common residual plot uninformative.

– Deviance residual is the signed square root of the contri-

bution to the model deviance

(rDi )2 = DEV(X0, ..., Xp−1)

– Sign depends on whether µ̂i > Yi

– Can plot rDi by observation (index plot)

– Can generate probability plot with envelope

14-53 Chapter Review

• Logistic Regression

– Background

– Model

– Inference

– Diagnostics and remedies

• Ordinal Logistic Regression

• Polytomous Logistic Regression

• Poisson Regression

– Background

– Model

– Inference

14-54