Multicategory Logit Models

MULTICATEGORY

LOGITMODELS

Del Rosario, RP | Perez, JJ

Nominal Responses

•One response variable Y with J levels

•One or more explanatory or predictor variables • quantitative, qualitative or both

Logistic Regression

Forming Logits

•When J = 2, Y is dichotomous • log of success odds that an event occurs or does not

occur:

logit (π) = log 𝜋

1−𝜋

•When J > 2, • “Multicategory” or “Polytomous” response variable’

• There are 𝐽(𝐽−1)

2logits that can be formed but only

(J − 1) are non-redundant

Categorical Logit Models

•Nominal response•Multinomial logistic regression/Baseline Logits

•Ordinal response•Ordinal logistic regression

•Cumulative Logits/Proportional Odds Model

•Adjacent Categories

•Continuous Ratio

Multicategory Logits

• Model simultaneously all relationships between probabilities for pairs of categories (vs Binary Logistic Regression)• Optimal efficiency

• Estimates of the model parameters smaller SE than the estimates obtained by fitting the equations separately.

• For simultaneous fitting, the same parameter estimates occur for a pair of categories no matter which category is baseline.

• They describe the odds of response in one category rather than another.

Baseline Category Logits

• Logit models for nominal responses pair each response category with a baseline category.

• The choice of baseline category is arbitrary.

• If the last category (J) is the baseline, the baseline category logits are:

𝑙𝑜𝑔𝜋𝑗

𝜋𝐽, 𝑗 = 1,… , 𝐽

• Given that the response falls in category j or J, this is the log odds that the response is j.

• For J = 3, for instance, the logit model uses log (π 1/π2 )and log (π2 /π3)

Baseline Category Logits

• The logit models using the baseline-category logits with a predictor x has

log𝜋𝑗

𝜋𝐽= 𝛼𝑗 + 𝛽𝑗𝑥, 𝑗 = 1, … , 𝐽

• Parameters in the (J − 1) equations determine parameters for logits using all other pairs of response categories.

• For instance, for an arbitrary pair of categories a and b

log𝜋𝑎

𝜋𝑏= log

𝜋𝑎/𝜋𝐽

𝜋𝑏/𝜋𝐽= log

𝜋𝑎

𝜋𝐽− log

𝜋𝑏

𝜋𝐽

= 𝛼𝑎 + 𝛽𝑎𝑥 − 𝛼𝑏 + 𝛽𝑏𝑥= 𝛼𝑎 − 𝛼𝑏 + 𝛽𝑎 − 𝛽𝑏 𝑥

Example 1: Alligator Food Choice

• A study looking into factors influencing the primary food choice of alligators in the wild

• 59 alligators were sampled, and the data shows the alligator length (in meters) and the primary food type, in volume, found in the alligator’s stomach

• Food type has three categories: Fish (1), Invertebrate (2), and Other (3)

Table 1. Alligator size (meter) and Primary food choice


. 59 -57.57093 -49.17062 4 106.3412 114.6514 Model Obs ll(null) ll(model) df AIC BIC

. estat ic

3 (base outcome) _cons 5.697444 1.793809 3.18 0.001 2.181644 9.213244 size -2.465446 .8996503 -2.74 0.006 -4.228728 -.7021642 _cons 1.617731 1.307275 1.24 0.216 -.9444801 4.179943 size -.110109 .517082 -0.21 0.831 -1.123571 .90335311 food Coef. Std. Err. z P>|z| [95% Conf. Interval]

Log likelihood = -49.170622 Pseudo R2 = 0.1459 Prob > chi2 = 0.0002 LR chi2(2) = 16.80Multinomial logistic regression Number of obs = 59

Iteration 5: log likelihood = -49.170622 Iteration 4: log likelihood = -49.170622 Iteration 3: log likelihood = -49.170647 Iteration 2: log likelihood = -49.186349 Iteration 1: log likelihood = -49.97414 Iteration 0: log likelihood = -57.570928

. mlogit food size, b(3)

Using STATA

Using R


• Y = ”primary food choice” ; X = ”length of alligator”

• Estimated log odds that primary food choice of alligators is fish rather than other types:

log 𝜋1

𝜋3= 1.618 − 0.110𝑙𝑒𝑛𝑔𝑡ℎ

• Estimated log odds that primary food choice of alligators is invertebrate rather than other types:

log 𝜋2

𝜋3= 5.697 − 2.465𝑙𝑒𝑛𝑔𝑡ℎ


• What about estimated log odds that primary food choice of alligators is fish rather than invertebrate?

log 𝜋1

𝜋2= 1.618 − 5.697 + −0.110 − (−2.465) 𝑥

log 𝜋1

𝜋2= −4.080 + 2.355𝑙𝑒𝑛𝑔𝑡ℎ

• For every 1 meter increase in length of the alligator, the odds of choosing fish rather than an invertebrate as primary food increase by 𝑒2.355 = 10.54 times.


• Hypothesis testing on the effect of length as predictor:

Ho : βj = 0 for j = 1, 2

• LR = 16.8, p = 0.0002

• Strong effect of length of alligator on food choice

Estimated Probabilities

𝜋𝑗 =𝑒𝑥𝑝(𝛼𝑗 + 𝛽𝑗𝑥)

ℎ=1𝐽

𝑒𝑥𝑝 (𝛼ℎ + 𝛽ℎ𝑥)

• Denominator = same for each probability

• Numerator = various j sum to the denominator

• Parameters = zero for whichever the category is baseline in the logit expression


𝜋1 =𝑒𝑥𝑝 1.62 − 0.011𝑥

1 + 𝑒𝑥𝑝 1.62 − 0.011𝑥 + 𝑒𝑥𝑝(5.70 − 2.47𝑥)

𝜋2 =𝑒𝑥𝑝(5.70 − 2.47𝑥)

1 + 𝑒𝑥𝑝 1.62 − 0.011𝑥 + 𝑒𝑥𝑝(5.70 − 2.47𝑥)

𝜋3 =1

1 + 𝑒𝑥𝑝 1.62 − 0.011𝑥 + 𝑒𝑥𝑝(5.70 − 2.47𝑥)

Example 2: Job Satisfaction and Income

• The researchers seek to find the relationship between Y = job satisfaction and X1 = income, stratified by X2= gender (1=F, 2=M), for black Americans

Note: N=Obs used in calculating BIC; see [R] BIC note . 104 -107.3908 -102.9136 9 223.8272 247.6267 Model Obs ll(null) ll(model) df AIC BIC

. estat ic

_cons -1.81048 1.977129 -0.92 0.360 -5.685582 2.064621 gender .1884805 1.286052 0.15 0.883 -2.332134 2.709095 income 1.560782 .7659445 2.04 0.042 .0595581 3.0620054 _cons .5385145 1.842591 0.29 0.770 -3.072897 4.149926 gender .005601 1.22245 0.00 0.996 -2.390357 2.401559 income 1.157282 .7388206 1.57 0.117 -.2907792 2.6053443 _cons -.583335 1.990687 -0.29 0.769 -4.485009 3.318339 gender .1239678 1.317757 0.09 0.925 -2.458788 2.706724 income .9239423 .7752856 1.19 0.233 -.5955895 2.4434742 1 (base outcome) satisfaction Coef. Std. Err. z P>|z| [95% Conf. Interval]



(frequency weights assumed). mlogit satisfaction income gender [weight=count], b(1)

Note: N=Obs used in calculating BIC; see [R] BIC note . 104 -107.3908 -102.9136 9 223.8272 247.6267 Model Obs ll(null) ll(model) df AIC BIC

. estat ic

_cons -1.81048 1.977129 -0.92 0.360 -5.685582 2.064621 gender .1884805 1.286052 0.15 0.883 -2.332134 2.709095 income 1.560782 .7659445 2.04 0.042 .0595581 3.0620054 _cons .5385145 1.842591 0.29 0.770 -3.072897 4.149926 gender .005601 1.22245 0.00 0.996 -2.390357 2.401559 income 1.157282 .7388206 1.57 0.117 -.2907792 2.6053443 _cons -.583335 1.990687 -0.29 0.769 -4.485009 3.318339 gender .1239678 1.317757 0.09 0.925 -2.458788 2.706724 income .9239423 .7752856 1.19 0.233 -.5955895 2.4434742 1 (base outcome) satisfaction Coef. Std. Err. z P>|z| [95% Conf. Interval]



(frequency weights assumed). mlogit satisfaction income gender [weight=count], b(1)

Example 2: Job Satisfaction and Income

log𝜋𝑗

𝜋1= 𝛼𝑗 + 𝛽𝑗

𝐼𝑥1 + 𝛽𝑗𝐺𝑥2, 𝑗 = 1,2,… , 𝐽 − 1

• I = Income

• 𝛽𝑗𝐼 is the conditional log odds ratio between income and job

satisfaction categories 2 & 1 (3&1,4&1), given gender

• G = Gender

• 𝛽𝑗𝐺 is the conditional log odds ratio between gender and job

satisfaction categories 2 & 1 (3&1,4&1), given income

Models for Ordinal Responses

Cumulative Logit Models

• Logits can utilize ordered categories

• results in models with simpler interpretations and potentially greater power than baseline-category logit models.

• A cumulative probability for Y is the probability that Y is less than or equal to a certain value. In notation, for j = 1, 2, … , J,

𝑃 𝑌 ≤ 𝑗 = 𝑃 𝑌 = 1 + 𝑃 𝑌 = 2 + ⋯+ 𝑃 𝑌 = 𝑗= 𝜋1 + 𝜋2 + ⋯+ 𝜋𝑗

• The cumulative probabilities reflect the ordering.

𝑃 𝑌 ≤ 1 ≤ 𝑃 𝑌 ≤ 2 … ≤ 𝑃 𝑌 ≤ 𝐽 = 1

• Models for cum. prob. do not use P(Y≤1) because P(Y≤1) = 1

• The logits of the first J-1 cumulative probabilities are

• These are called cumulative logits.

• Logit[P(Y ≤ j)] is like an ordinary logit model with binary response , i.e. 1 to j combines to form the first category, and j+1 to J form the second category.

• Each cumulative logit uses all response categories. • For J = 3, both logit[P(Y ≤ 1)] = log[π1/(π2+ π3)] and logit[P(Y≤2)] =

log[(π1+ π2)/ π3] are used

Cumulative Logit Models

Proportional Odds Property

• For a predictor X, the cumulative logit model is given by:

• Notice that β does not have a subscript j which implies that the value of β is constant for all J-1 cumulative logits.

• When the model fits well, a single parameter instead of J-1 parameters is enough to describe the effect of x.

• The curves of each cumulative probability have the same shape/slope/rate of change but different start and end points depending on αj.

• At any fixed value, the ordering is retained, with P(Y ≤ 1) being the lowest.

• This is the case when β > 0

• When β < 0, the curves are descending.


• When β = 0, the graph has a horizontal line for each

cumulative probability.

• Implies X and Y are statistically independent

• P(Y = j) = P(Y ≤ j) – P(Y ≤ j-1)

• Probabilities are graphed figure on the right.

• This graph is when β > 0.

• As x increases, the probability to fall on a lower category increases as well

• This is against the usual interpretation that positive slope implies positive association

• When β < 0, the labels on the figure on the right are reversed.


• Consider the odds ratio

• Get the logarithm on both sides and simplify

log𝑃(𝑌 ≤ 𝑗|𝑋 = 𝑥2

𝑃(𝑌 > 𝑗|𝑋 = 𝑥2− log

𝑃(𝑌 ≤ 𝑗|𝑋 = 𝑥1

𝑃(𝑌 > 𝑗|𝑋 = 𝑥1

= 𝛼𝑗 + 𝛽𝑥2 − 𝛼𝑗 + 𝛽𝑥1 = 𝛽(𝑥2-𝑥1)

• Thus, the log OR is the difference between the cumulative logits at those two values of x, and is equal to 𝛽(𝑥2-𝑥1)

• This is the proportional odds assumption.

• The log OR is proportional to the distance between any x values

• For x2 – x1 = 1, the odds of response below any given category multiply by exp{β} for every unit increase in x.



• The model expression for the cumulative probabilities is:

• To estimate the category probabilities,

𝜋𝑗 = 𝑃 𝑌 = 𝑗 = 𝑃 𝑌 ≤ 𝑗 − 𝑃 𝑌 ≤ 𝑗 − 1

For example,

Example: All explanatory variables are categorical

• A study looks at factors that influence the decision of whether college juniors will apply to graduate school. The response is ordinal with VL at the highest end of the scale.

• Because all variables are categorical, data can be entered in a contingency table.

Apply to Grad SchoolParental

EducationUndergrad institution

Very Unlikely

Somewhat Unlikely

Very Likely

LowPrivate 175 98 20Public 25 12 7

HighPrivate 14 26 10Public 6 4 3

Example: Cont.

• Ensure that dataset is in case or expanded form before using polr.

• R command is polr (proportional odds logistic regression) from the nnet package. Format of dataset should be in case form.

Example: Cont. (Using polr)

• The coefficients of the last output are called proportional odds ratios.

• For pared, the odds of "very likely" applying versus "somewhat likely" or "unlikely" applying combined are 3.07 greater among students from public than private colleges, given that all the other variables in the model are held constant

• Likewise, the odds of "very likely" or "somewhat likely" applying versus "unlikely" applying is 3.07 times greater among students with high parental education, given that all of the other variables in the model are held constant

Example: Cont. (using polr)

Example: Cont. (using vglm from VGAM package)

Example: Cont. (in Stata)

Example: w/ continuous predictor (Using polr)

Example: Cont.

• The coefficients of the last output are called proportional odds ratios.

• For pared, the odds of "very likely" applying versus "somewhat likely" or "unlikely" applying combined are 2.85 greater among students from public than private colleges, given that all the other variables in the model are held constant

• Likewise, the odds of "very likely" or "somewhat likely" applying versus "unlikely" applying is 2.85 times greater among students with high parental education, given that all of the other variables in the model are held constant

• For gpa, when a student's gpa moves 1 unit, the odds of moving from "unlikely" applying to "somewhat likely" or "very likley" applying (or from the lower and middle categories to the high category) are multiplied by 1.85.

Example: Cont. (using vglm)

Example: Cont. (using stata)

Inference on Model Parameters

• Testing for independence (Ho: β = 0)• Test statistic to be used is the difference between the deviance

value for the independence model and the model allowing an explanatory variable.

• If p-value < LOS, Ho is rejected and we conclude that an association exists

• Tests of independence on an ordinal scale considers the ordering of response categories.• When the model is fit, it is more powerful that tests of

independence for nominal data, because

• it focuses on a restricted alternative, P(Y ≤ j)

• it has only a single degree of freedom (Recall that beta is the same for all J-1 cum logits)

• Testing Ho: β = 0


• Testing the assumption on proportional odds• Our model where β is constant will only hold if the proportional odds

assumptions is not violated. If it is violated, it would be better to get individual estimates for each βj.

• Agresti suggested to get the LR test between the vglm model with (Parallel=TRUE) for simultaneous fitting of β, i.e. only one β will be the outcome, and with (Parallel=FALSE) for individual fitting of β, i.e. there will be separate estimates for β.

• Cases when assumption of proportionality: when the cumulative probability curves intersect (recall graph earlier)

• Occurs , for example, when Males tend to be on the moderate response of the ordinal scale, whereas Females tend to be both on the extreme responses of the ordinal scale.


• Ho: The model without the additional parameters βj is sufficient

• P-value does not reject the null hypothesis. There is no need to estimate for individual βj’s. The single β is enough.

• Alternatives if the proportional odds assumptions is violated:

• Run the model with individual βj’s. (Issues: increase in SE, decrease in power)

• Run the model using baseline-category logits and use the ordinality in an informal way to interpret the association. (Issue: Increase in number of parameters, less parsimonious)

• Collapse multicategory responses to binary. (Issue: loss of efficiency, loss of data)


Invariance

• Invariance to choice of response categories

• Situation: Researcher A used a 5-point likert scale (SD, D, N, A, SA). Researcher B conducted a similar study but used a 3-point likert scale (D, N, A). If the proportional odds assumption is not violated, the parameters for the effect of a predictor are roughly the same.

• This feature of the model makes it possible to compare estimates from studies using different response scales.

Paired-Category Ordinal Logits

• ADJACENT-CATEGORIES LOGITS

• The adjacent-category logits are:

• For J = 3, the logits are log(π2/ π1) and log(π3/ π2)

• The corresponding models is


• A simpler proportional odds version of the model is

• For it, the effects {β= βj} of x, on the odds of making the higher instead of the lower response are identical for each pair of adjacent response categories.

Example• Stem Cell Research and Religious Fundamentalism

Example: Cont.


• CONTINUATION-RATIO LOGITS

• Another approach forms logits for ordered response categories in a sequential manner. The models apply simultaneously to:

• These are called continuation-ratio logits.

• They refer to a binary response that contrasts each category with a grouping of categories from lower levels of the response scale.

• They refer to a binary response that contrasts each category with a grouping of categories from lower levels of the response scale.

Example

• Tonsil Size and Streptococcus

Example: Cont.

Multicategory Logit Models

Documents

Transcript of Multicategory Logit Models