Modeling Binary Outcomes: Logit and Probit Models · Modeling Binary Outcomes: Logit and Probit...

17
Modeling Binary Outcomes: Logit and Probit Models Eric Zivot December 5, 2009

Transcript of Modeling Binary Outcomes: Logit and Probit Models · Modeling Binary Outcomes: Logit and Probit...

Modeling Binary Outcomes: Logit and

Probit Models

Eric Zivot

December 5, 2009

Motivating Example: Women’s labor force participation

yi = 1 if married woman is in labor force

= 0 otherwise

xik×1

= observed covariates

Linear probability model formulation

yi = x0iβ + εi, i = 1, . . . , n

Note

yi = 1⇒ εi = 1− x0iβyi = 0⇒ εi = −x0iβ

Interpretation of regression model

E[yi|xi] = 1 · Pr(yi = 1|xi) + 0 · Pr(yi = 0|xi)= Pr(yi = 1|xi) = x0iβ

and∂E[yi|xi]∂xki

=∂ Pr(yi = 1|xi)

∂xki= βk

Note: yi|xi is heteroskedastic

var(yi|xi) = Pr(yi = 1|xi) Pr(yi = 0|xi) = x0iβ(1− x0iβ)

Problems with linear probability model

1. εi|xi cannot be normally distributed

2. Predicted probabilities can be less than zero or greater than one

3. Constant marginal effects ∂ Pr(yi=1|xi)∂xi= β often an unrealistic assumption

Latent Variable Formulation

yi = 0, 1 : observed discrete response

y∗i ∈ R : unobserved (latent) continuous index

Idea: Large values of y∗i generate yi = 1, and small values generate yi = 0

Example: Labor force participation

yi = 1 if married woman is in labor force

= 0 otherwise

y∗i = unobserved propensity to work based on utility of choice

xi = observed variables that influence utility

yi = 1 if utility of work is greater than utility of leisure

Assume y∗i has a linear model representation

y∗i = x0iβ + εi, i = 1, . . . , n

The relationship between yi and y∗i is

yi = 1 if y∗i > 0 (normalized thershold)

= 0 if y∗i ≤ 0

Then

Pr(yi = 1|xi) = Pr(y∗i > 0|xi) = Pr(x0iβ + εi > 0|xi)= Pr(εi > −x0iβ|xi) = Pr(εi ≤ x0iβ|xi) = Fε(x

0iβ)

provided εi has a symmetric distribution Fε. Similarly,

Pr(yi = 0|xi) = Pr(y∗i ≤ 0|xi) = Pr(εi ≤ −x0iβ|xi) = 1− Fε(x0iβ)

Remarks

1. The latent variable formulation provides a non-linear probability model forPr(yi|xi).

2. By construction, Pr(yi|xi) ∈ (0, 1) because it is based on Fε

3. limx0iβ→∞Pr(yi = 1|xi) = limx0iβ→∞

Fε(x0iβ) = 1 and

limx0iβ→−∞Pr(yi = 1|xi) = 0

4. To make the model operational requires specifying Fε

Probit Model

Assume εi ∼ N(0, 1). Then

Pr(yi = 1|xi) = Fε(x0iβ) = Φ(x0iβ)

Pr(yi = 0|xi) = 1− Fε(x0iβ) = 1−Φ(x0iβ)

Φ(z) =Z z

−∞ϕ(x)dx

ϕ(x) =1√2πexp

µ−12x2¶

Note: If εi ∼ N(0, σ2) then β and σ are not separately identified. Only theratio β/σ is identified. Hence, σ2 = 1 is an identifying assumption for β.

Logit Model

Assume εi ∼Logistic. Then

Pr(yi = 1|xi) = Fε(x0iβ) = Λ(x0iβ)

Pr(yi = 0|xi) = 1− Fε(x0iβ) = 1− Λ(x0iβ)

Λ(z) =Z z

−∞λ(x)dx =

exp(z)

1 + exp(z)=

1

exp(−z) + 1

λ(z) =d

dzΛ(z) = Λ(z)(1− Λ(z)) =

exp(z)

(1 + exp(z))2

Remarks

1. If εi ∼Logistic then

E[εi] = 0 and var(εi) =π2

3= 3.29

which is similar to a Student’s t distribution with 7 degrees of freedom.

2. Logit and probit probabilities are essentially the same in the middle of thedistribution but differ slightly in the tails of the distribution.

Marginal Effects in Latent Variable Formulation

In the latent variable formulation

Pr(yi = 1|xi) = Fε(x0iβ)

Then∂ Pr(yi = 1|xi)

∂xki=

∂Fε(x0iβ)∂xi

= fε(x0iβ)βk

where fε is the pdf for ε. For the probit and logit models

Probit :∂ Pr(yi = 1|xi)

∂xki= ϕ(x0iβ)βk

Logit :∂ Pr(yi = 1|xi)

∂xki= λ(x0iβ)βk

Remarks

1. Marginal effects are non-linear functions of xi and β

2. Marginal effect of xki depends on the value of xi = (x1i, . . . , xki)0, the

value of β = (β1, . . . , βk)0 and the value of βk.

3. Because fε(·) > 0, the sign of βk determines the sign of the marginal effect

4. Estimated standard errors for marginal effects require the delta-method.

Maximum Likelihood Estimation

Observe a random sample {(y1,x1), . . . , (yn,xn)} and assume that it is gen-erated from the latent variable formulation of the binary response model. Then,yi|xi is a Bernoulli random variable with conditional probablities

πi = Pr(yi = 1|xi) = Fε(x0iβ)

1− πi = Pr(yi = 0|xi) = 1− Fε(x0iβ)

The likelihood and log-likelihood functions are

L(β|y,X) =nYi=1

πyii (1− πi)

1−yi =nYi=1

Fε(x0iβ)

yi(1− Fε(x0iβ))

1−yi

lnL(β|y,X) =nXi=1

nyi ln

³Fε(x

0iβ)

´+ (1− yi) ln

³1− Fε(x

0iβ)

´o

The FOCs that define the MLE are

0 =∂ lnL(βmle|y,X)

∂β= S(βmle|y,X) =

nXi=1

Si(βmle|yi,xi)

=nXi=1

⎧⎨⎩yi∂ ln³Fε(x0iβmle)

´∂β

+ (1− yi)∂ ln

³1− Fε(x0iβmle)

´∂β

⎫⎬⎭=

nXi=1

⎧⎨⎩yi fε(x0iβmle)

Fε(x0iβmle)xi + (1− yi)

−fε(x0iβmle)

1− Fε(x0iβmle)xi

⎫⎬⎭=

nXi=1

⎧⎨⎩yi fε(x0iβmle)

Fε(x0iβmle)− (1− yi)

fε(x0iβmle)

1− Fε(x0iβmle)

⎫⎬⎭xiThese are k non-linear equations in k unknowns. No analytic solution exists.

The Newton-Raphson iteration is

βn+1 = βn −H(βn|y,X)−1S(βn|y,X)

H(β|y,X) =∂2 lnL(β|y,X)

∂β∂β0

Remarks

1. For the logit model, analytic derivatives are easy to determine:

S(β|y,X) =nXi=1

(yi − Λ(x0iβ))xi

H(β|y,X) = −nXi=1

Λ(x0iβ)³1− Λ(x0iβ)

´xix

0i

The Hessian is independent of yi and is always negative definite. Hence,lnL(β|y,X) is globally concave and a unique maximum exists.

2. For the probit model, analytic derivatives are also available:

S(β|y,X) =nXi=1

mixi

H(β|y,X) = −nXi=1

mi

³mi + x

0iβ´xix

0i

where

mi =qiϕ(qi · x0iβ)Φ(qi · x0iβ)

, qi = 2yi − 1

It can be shown that H(β|y,X) is negative definite for all β so that

lnL(β|y,X) is globally concave and a unique maximum exists.

Measuring Goodness-of-fit in Binary Response Models

1. McFadden’s Likelihood Ratio Index (R2)

LRI = R2McFadden = 1−lnL(βmle|y,X)

lnL(all slopes = 0|y,X)LRI ≈ 0 if model with all slopes = 0 fits the data as well as the model withestimated slopes.

2. Prediction Table

2 x 2 table of hits and misses of the prediction rule

classify yi = 1 if cPr(yi = 1|xi) > cutoff probability