Multicategory Logit Models

49
MULTICATEGORY L OGIT MODELS Del Rosario, RP | Perez, JJ

description

Categorical Data Analysis

Transcript of Multicategory Logit Models

Page 1: Multicategory Logit Models

MULTICATEGORY

LOGITMODELS

Del Rosario, RP | Perez, JJ

Page 2: Multicategory Logit Models

Nominal Responses

•One response variable Y with J levels

•One or more explanatory or predictor variables • quantitative, qualitative or both

Logistic Regression

Page 3: Multicategory Logit Models

Forming Logits

•When J = 2, Y is dichotomous • log of success odds that an event occurs or does not

occur:

logit (π) = log 𝜋

1−𝜋

•When J > 2, • “Multicategory” or “Polytomous” response variable’

• There are 𝐽(𝐽−1)

2logits that can be formed but only

(J − 1) are non-redundant

Page 4: Multicategory Logit Models

Categorical Logit Models

•Nominal response•Multinomial logistic regression/Baseline Logits

•Ordinal response•Ordinal logistic regression

•Cumulative Logits/Proportional Odds Model

•Adjacent Categories

•Continuous Ratio

Page 5: Multicategory Logit Models

Multicategory Logits

• Model simultaneously all relationships between probabilities for pairs of categories (vs Binary Logistic Regression)• Optimal efficiency

• Estimates of the model parameters smaller SE than the estimates obtained by fitting the equations separately.

• For simultaneous fitting, the same parameter estimates occur for a pair of categories no matter which category is baseline.

• They describe the odds of response in one category rather than another.

Page 6: Multicategory Logit Models

Baseline Category Logits

• Logit models for nominal responses pair each response category with a baseline category.

• The choice of baseline category is arbitrary.

• If the last category (J) is the baseline, the baseline category logits are:

𝑙𝑜𝑔𝜋𝑗

𝜋𝐽, 𝑗 = 1,… , 𝐽

• Given that the response falls in category j or J, this is the log odds that the response is j.

• For J = 3, for instance, the logit model uses log (π 1/π2 )and log (π2 /π3)

Page 7: Multicategory Logit Models

Baseline Category Logits

• The logit models using the baseline-category logits with a predictor x has

log𝜋𝑗

𝜋𝐽= 𝛼𝑗 + 𝛽𝑗𝑥, 𝑗 = 1, … , 𝐽

• Parameters in the (J − 1) equations determine parameters for logits using all other pairs of response categories.

• For instance, for an arbitrary pair of categories a and b

log𝜋𝑎

𝜋𝑏= log

𝜋𝑎/𝜋𝐽

𝜋𝑏/𝜋𝐽= log

𝜋𝑎

𝜋𝐽− log

𝜋𝑏

𝜋𝐽

= 𝛼𝑎 + 𝛽𝑎𝑥 − 𝛼𝑏 + 𝛽𝑏𝑥= 𝛼𝑎 − 𝛼𝑏 + 𝛽𝑎 − 𝛽𝑏 𝑥

Page 8: Multicategory Logit Models

Example 1: Alligator Food Choice

• A study looking into factors influencing the primary food choice of alligators in the wild

• 59 alligators were sampled, and the data shows the alligator length (in meters) and the primary food type, in volume, found in the alligator’s stomach

• Food type has three categories: Fish (1), Invertebrate (2), and Other (3)

Page 9: Multicategory Logit Models

Table 1. Alligator size (meter) and Primary food choice

Example 1: Alligator Food Choice

Page 10: Multicategory Logit Models

. 59 -57.57093 -49.17062 4 106.3412 114.6514 Model Obs ll(null) ll(model) df AIC BIC

. estat ic

3 (base outcome) _cons 5.697444 1.793809 3.18 0.001 2.181644 9.213244 size -2.465446 .8996503 -2.74 0.006 -4.228728 -.7021642 _cons 1.617731 1.307275 1.24 0.216 -.9444801 4.179943 size -.110109 .517082 -0.21 0.831 -1.123571 .90335311 food Coef. Std. Err. z P>|z| [95% Conf. Interval]

Log likelihood = -49.170622 Pseudo R2 = 0.1459 Prob > chi2 = 0.0002 LR chi2(2) = 16.80Multinomial logistic regression Number of obs = 59

Iteration 5: log likelihood = -49.170622 Iteration 4: log likelihood = -49.170622 Iteration 3: log likelihood = -49.170647 Iteration 2: log likelihood = -49.186349 Iteration 1: log likelihood = -49.97414 Iteration 0: log likelihood = -57.570928

. mlogit food size, b(3)

Using STATA

Page 11: Multicategory Logit Models

Using R

Page 12: Multicategory Logit Models

Example 1: Alligator Food Choice

• Y = ”primary food choice” ; X = ”length of alligator”

• Estimated log odds that primary food choice of alligators is fish rather than other types:

log 𝜋1

𝜋3= 1.618 − 0.110𝑙𝑒𝑛𝑔𝑡ℎ

• Estimated log odds that primary food choice of alligators is invertebrate rather than other types:

log 𝜋2

𝜋3= 5.697 − 2.465𝑙𝑒𝑛𝑔𝑡ℎ

Page 13: Multicategory Logit Models

Example 1: Alligator Food Choice

• What about estimated log odds that primary food choice of alligators is fish rather than invertebrate?

log 𝜋1

𝜋2= 1.618 − 5.697 + −0.110 − (−2.465) 𝑥

log 𝜋1

𝜋2= −4.080 + 2.355𝑙𝑒𝑛𝑔𝑡ℎ

• For every 1 meter increase in length of the alligator, the odds of choosing fish rather than an invertebrate as primary food increase by 𝑒2.355 = 10.54 times.

Page 14: Multicategory Logit Models

Example 1: Alligator Food Choice

• Hypothesis testing on the effect of length as predictor:

Ho : βj = 0 for j = 1, 2

• LR = 16.8, p = 0.0002

• Strong effect of length of alligator on food choice

Page 15: Multicategory Logit Models

Estimated Probabilities

𝜋𝑗 =𝑒𝑥𝑝(𝛼𝑗 + 𝛽𝑗𝑥)

ℎ=1𝐽

𝑒𝑥𝑝 (𝛼ℎ + 𝛽ℎ𝑥)

• Denominator = same for each probability

• Numerator = various j sum to the denominator

• Parameters = zero for whichever the category is baseline in the logit expression

Page 16: Multicategory Logit Models

Estimated Probabilities

𝜋1 =𝑒𝑥𝑝 1.62 − 0.011𝑥

1 + 𝑒𝑥𝑝 1.62 − 0.011𝑥 + 𝑒𝑥𝑝(5.70 − 2.47𝑥)

𝜋2 =𝑒𝑥𝑝(5.70 − 2.47𝑥)

1 + 𝑒𝑥𝑝 1.62 − 0.011𝑥 + 𝑒𝑥𝑝(5.70 − 2.47𝑥)

𝜋3 =1

1 + 𝑒𝑥𝑝 1.62 − 0.011𝑥 + 𝑒𝑥𝑝(5.70 − 2.47𝑥)

Page 17: Multicategory Logit Models

Example 2: Job Satisfaction and Income

• The researchers seek to find the relationship between Y = job satisfaction and X1 = income, stratified by X2= gender (1=F, 2=M), for black Americans

Page 18: Multicategory Logit Models

Note: N=Obs used in calculating BIC; see [R] BIC note . 104 -107.3908 -102.9136 9 223.8272 247.6267 Model Obs ll(null) ll(model) df AIC BIC

. estat ic

_cons -1.81048 1.977129 -0.92 0.360 -5.685582 2.064621 gender .1884805 1.286052 0.15 0.883 -2.332134 2.709095 income 1.560782 .7659445 2.04 0.042 .0595581 3.0620054 _cons .5385145 1.842591 0.29 0.770 -3.072897 4.149926 gender .005601 1.22245 0.00 0.996 -2.390357 2.401559 income 1.157282 .7388206 1.57 0.117 -.2907792 2.6053443 _cons -.583335 1.990687 -0.29 0.769 -4.485009 3.318339 gender .1239678 1.317757 0.09 0.925 -2.458788 2.706724 income .9239423 .7752856 1.19 0.233 -.5955895 2.4434742 1 (base outcome) satisfaction Coef. Std. Err. z P>|z| [95% Conf. Interval]

Log likelihood = -102.91362 Pseudo R2 = 0.0417 Prob > chi2 = 0.1762 LR chi2(6) = 8.95Multinomial logistic regression Number of obs = 104

Iteration 5: log likelihood = -102.91362 Iteration 4: log likelihood = -102.91362 Iteration 3: log likelihood = -102.91365 Iteration 2: log likelihood = -102.92608 Iteration 1: log likelihood = -103.35145 Iteration 0: log likelihood = -107.39082

(frequency weights assumed). mlogit satisfaction income gender [weight=count], b(1)

Note: N=Obs used in calculating BIC; see [R] BIC note . 104 -107.3908 -102.9136 9 223.8272 247.6267 Model Obs ll(null) ll(model) df AIC BIC

. estat ic

_cons -1.81048 1.977129 -0.92 0.360 -5.685582 2.064621 gender .1884805 1.286052 0.15 0.883 -2.332134 2.709095 income 1.560782 .7659445 2.04 0.042 .0595581 3.0620054 _cons .5385145 1.842591 0.29 0.770 -3.072897 4.149926 gender .005601 1.22245 0.00 0.996 -2.390357 2.401559 income 1.157282 .7388206 1.57 0.117 -.2907792 2.6053443 _cons -.583335 1.990687 -0.29 0.769 -4.485009 3.318339 gender .1239678 1.317757 0.09 0.925 -2.458788 2.706724 income .9239423 .7752856 1.19 0.233 -.5955895 2.4434742 1 (base outcome) satisfaction Coef. Std. Err. z P>|z| [95% Conf. Interval]

Log likelihood = -102.91362 Pseudo R2 = 0.0417 Prob > chi2 = 0.1762 LR chi2(6) = 8.95Multinomial logistic regression Number of obs = 104

Iteration 5: log likelihood = -102.91362 Iteration 4: log likelihood = -102.91362 Iteration 3: log likelihood = -102.91365 Iteration 2: log likelihood = -102.92608 Iteration 1: log likelihood = -103.35145 Iteration 0: log likelihood = -107.39082

(frequency weights assumed). mlogit satisfaction income gender [weight=count], b(1)

Page 19: Multicategory Logit Models

Example 2: Job Satisfaction and Income

log𝜋𝑗

𝜋1= 𝛼𝑗 + 𝛽𝑗

𝐼𝑥1 + 𝛽𝑗𝐺𝑥2, 𝑗 = 1,2,… , 𝐽 − 1

• I = Income

• 𝛽𝑗𝐼 is the conditional log odds ratio between income and job

satisfaction categories 2 & 1 (3&1,4&1), given gender

• G = Gender

• 𝛽𝑗𝐺 is the conditional log odds ratio between gender and job

satisfaction categories 2 & 1 (3&1,4&1), given income

Page 20: Multicategory Logit Models

Models for Ordinal Responses

Page 21: Multicategory Logit Models

Cumulative Logit Models

• Logits can utilize ordered categories

• results in models with simpler interpretations and potentially greater power than baseline-category logit models.

• A cumulative probability for Y is the probability that Y is less than or equal to a certain value. In notation, for j = 1, 2, … , J,

𝑃 𝑌 ≤ 𝑗 = 𝑃 𝑌 = 1 + 𝑃 𝑌 = 2 + ⋯+ 𝑃 𝑌 = 𝑗= 𝜋1 + 𝜋2 + ⋯+ 𝜋𝑗

• The cumulative probabilities reflect the ordering.

𝑃 𝑌 ≤ 1 ≤ 𝑃 𝑌 ≤ 2 … ≤ 𝑃 𝑌 ≤ 𝐽 = 1

• Models for cum. prob. do not use P(Y≤1) because P(Y≤1) = 1

Page 22: Multicategory Logit Models

• The logits of the first J-1 cumulative probabilities are

• These are called cumulative logits.

• Logit[P(Y ≤ j)] is like an ordinary logit model with binary response , i.e. 1 to j combines to form the first category, and j+1 to J form the second category.

• Each cumulative logit uses all response categories. • For J = 3, both logit[P(Y ≤ 1)] = log[π1/(π2+ π3)] and logit[P(Y≤2)] =

log[(π1+ π2)/ π3] are used

Cumulative Logit Models

Page 23: Multicategory Logit Models

Proportional Odds Property

• For a predictor X, the cumulative logit model is given by:

• Notice that β does not have a subscript j which implies that the value of β is constant for all J-1 cumulative logits.

• When the model fits well, a single parameter instead of J-1 parameters is enough to describe the effect of x.

• The curves of each cumulative probability have the same shape/slope/rate of change but different start and end points depending on αj.

Page 24: Multicategory Logit Models

• At any fixed value, the ordering is retained, with P(Y ≤ 1) being the lowest.

• This is the case when β > 0

• When β < 0, the curves are descending.

Proportional Odds Property

• When β = 0, the graph has a horizontal line for each

cumulative probability.

• Implies X and Y are statistically independent

Page 25: Multicategory Logit Models

• P(Y = j) = P(Y ≤ j) – P(Y ≤ j-1)

• Probabilities are graphed figure on the right.

• This graph is when β > 0.

• As x increases, the probability to fall on a lower category increases as well

• This is against the usual interpretation that positive slope implies positive association

• When β < 0, the labels on the figure on the right are reversed.

Proportional Odds Property

Page 26: Multicategory Logit Models

• Consider the odds ratio

• Get the logarithm on both sides and simplify

log𝑃(𝑌 ≤ 𝑗|𝑋 = 𝑥2

𝑃(𝑌 > 𝑗|𝑋 = 𝑥2− log

𝑃(𝑌 ≤ 𝑗|𝑋 = 𝑥1

𝑃(𝑌 > 𝑗|𝑋 = 𝑥1

= 𝛼𝑗 + 𝛽𝑥2 − 𝛼𝑗 + 𝛽𝑥1 = 𝛽(𝑥2-𝑥1)

• Thus, the log OR is the difference between the cumulative logits at those two values of x, and is equal to 𝛽(𝑥2-𝑥1)

• This is the proportional odds assumption.

• The log OR is proportional to the distance between any x values

• For x2 – x1 = 1, the odds of response below any given category multiply by exp{β} for every unit increase in x.

Proportional Odds Property

Page 27: Multicategory Logit Models

Estimated Probabilities

• The model expression for the cumulative probabilities is:

• To estimate the category probabilities,

𝜋𝑗 = 𝑃 𝑌 = 𝑗 = 𝑃 𝑌 ≤ 𝑗 − 𝑃 𝑌 ≤ 𝑗 − 1

For example,

Page 28: Multicategory Logit Models

Example: All explanatory variables are categorical

• A study looks at factors that influence the decision of whether college juniors will apply to graduate school. The response is ordinal with VL at the highest end of the scale.

• Because all variables are categorical, data can be entered in a contingency table.

Apply to Grad SchoolParental

EducationUndergrad institution

Very Unlikely

Somewhat Unlikely

Very Likely

LowPrivate 175 98 20Public 25 12 7

HighPrivate 14 26 10Public 6 4 3

Page 29: Multicategory Logit Models

Example: Cont.

• Ensure that dataset is in case or expanded form before using polr.

Page 30: Multicategory Logit Models

• R command is polr (proportional odds logistic regression) from the nnet package. Format of dataset should be in case form.

Example: Cont. (Using polr)

Page 31: Multicategory Logit Models

• The coefficients of the last output are called proportional odds ratios.

• For pared, the odds of "very likely" applying versus "somewhat likely" or "unlikely" applying combined are 3.07 greater among students from public than private colleges, given that all the other variables in the model are held constant

• Likewise, the odds of "very likely" or "somewhat likely" applying versus "unlikely" applying is 3.07 times greater among students with high parental education, given that all of the other variables in the model are held constant

Example: Cont. (using polr)

Page 32: Multicategory Logit Models

Example: Cont. (using vglm from VGAM package)

Page 33: Multicategory Logit Models

Example: Cont. (in Stata)

Page 34: Multicategory Logit Models

Example: w/ continuous predictor (Using polr)

Page 35: Multicategory Logit Models

Example: Cont.

• The coefficients of the last output are called proportional odds ratios.

• For pared, the odds of "very likely" applying versus "somewhat likely" or "unlikely" applying combined are 2.85 greater among students from public than private colleges, given that all the other variables in the model are held constant

• Likewise, the odds of "very likely" or "somewhat likely" applying versus "unlikely" applying is 2.85 times greater among students with high parental education, given that all of the other variables in the model are held constant

• For gpa, when a student's gpa moves 1 unit, the odds of moving from "unlikely" applying to "somewhat likely" or "very likley" applying (or from the lower and middle categories to the high category) are multiplied by 1.85.

Page 36: Multicategory Logit Models

Example: Cont. (using vglm)

Page 37: Multicategory Logit Models

Example: Cont. (using stata)

Page 38: Multicategory Logit Models

Inference on Model Parameters

• Testing for independence (Ho: β = 0)• Test statistic to be used is the difference between the deviance

value for the independence model and the model allowing an explanatory variable.

• If p-value < LOS, Ho is rejected and we conclude that an association exists

• Tests of independence on an ordinal scale considers the ordering of response categories.• When the model is fit, it is more powerful that tests of

independence for nominal data, because

• it focuses on a restricted alternative, P(Y ≤ j)

• it has only a single degree of freedom (Recall that beta is the same for all J-1 cum logits)

Page 39: Multicategory Logit Models

• Testing Ho: β = 0

Inference on Model Parameters

Page 40: Multicategory Logit Models

• Testing the assumption on proportional odds• Our model where β is constant will only hold if the proportional odds

assumptions is not violated. If it is violated, it would be better to get individual estimates for each βj.

• Agresti suggested to get the LR test between the vglm model with (Parallel=TRUE) for simultaneous fitting of β, i.e. only one β will be the outcome, and with (Parallel=FALSE) for individual fitting of β, i.e. there will be separate estimates for β.

• Cases when assumption of proportionality: when the cumulative probability curves intersect (recall graph earlier)

• Occurs , for example, when Males tend to be on the moderate response of the ordinal scale, whereas Females tend to be both on the extreme responses of the ordinal scale.

Inference on Model Parameters

Page 41: Multicategory Logit Models

• Ho: The model without the additional parameters βj is sufficient

• P-value does not reject the null hypothesis. There is no need to estimate for individual βj’s. The single β is enough.

• Alternatives if the proportional odds assumptions is violated:

• Run the model with individual βj’s. (Issues: increase in SE, decrease in power)

• Run the model using baseline-category logits and use the ordinality in an informal way to interpret the association. (Issue: Increase in number of parameters, less parsimonious)

• Collapse multicategory responses to binary. (Issue: loss of efficiency, loss of data)

Inference on Model Parameters

Page 42: Multicategory Logit Models

Invariance

• Invariance to choice of response categories

• Situation: Researcher A used a 5-point likert scale (SD, D, N, A, SA). Researcher B conducted a similar study but used a 3-point likert scale (D, N, A). If the proportional odds assumption is not violated, the parameters for the effect of a predictor are roughly the same.

• This feature of the model makes it possible to compare estimates from studies using different response scales.

Page 43: Multicategory Logit Models

Paired-Category Ordinal Logits

• ADJACENT-CATEGORIES LOGITS

• The adjacent-category logits are:

• For J = 3, the logits are log(π2/ π1) and log(π3/ π2)

• The corresponding models is

Page 44: Multicategory Logit Models

Paired-Category Ordinal Logits

• A simpler proportional odds version of the model is

• For it, the effects {β= βj} of x, on the odds of making the higher instead of the lower response are identical for each pair of adjacent response categories.

Page 45: Multicategory Logit Models

Example• Stem Cell Research and Religious Fundamentalism

Page 46: Multicategory Logit Models

Example: Cont.

Page 47: Multicategory Logit Models

Paired-Category Ordinal Logits

• CONTINUATION-RATIO LOGITS

• Another approach forms logits for ordered response categories in a sequential manner. The models apply simultaneously to:

• These are called continuation-ratio logits.

• They refer to a binary response that contrasts each category with a grouping of categories from lower levels of the response scale.

• They refer to a binary response that contrasts each category with a grouping of categories from lower levels of the response scale.

Page 48: Multicategory Logit Models

Example

• Tonsil Size and Streptococcus

Page 49: Multicategory Logit Models

Example: Cont.