Modeling Binary Outcomes: Logit and Probit Models · Modeling Binary Outcomes: Logit and Probit...
-
Upload
truongmien -
Category
Documents
-
view
218 -
download
0
Transcript of Modeling Binary Outcomes: Logit and Probit Models · Modeling Binary Outcomes: Logit and Probit...
Motivating Example: Women’s labor force participation
yi = 1 if married woman is in labor force
= 0 otherwise
xik×1
= observed covariates
Linear probability model formulation
yi = x0iβ + εi, i = 1, . . . , n
Note
yi = 1⇒ εi = 1− x0iβyi = 0⇒ εi = −x0iβ
Interpretation of regression model
E[yi|xi] = 1 · Pr(yi = 1|xi) + 0 · Pr(yi = 0|xi)= Pr(yi = 1|xi) = x0iβ
and∂E[yi|xi]∂xki
=∂ Pr(yi = 1|xi)
∂xki= βk
Note: yi|xi is heteroskedastic
var(yi|xi) = Pr(yi = 1|xi) Pr(yi = 0|xi) = x0iβ(1− x0iβ)
Problems with linear probability model
1. εi|xi cannot be normally distributed
2. Predicted probabilities can be less than zero or greater than one
3. Constant marginal effects ∂ Pr(yi=1|xi)∂xi= β often an unrealistic assumption
Latent Variable Formulation
yi = 0, 1 : observed discrete response
y∗i ∈ R : unobserved (latent) continuous index
Idea: Large values of y∗i generate yi = 1, and small values generate yi = 0
Example: Labor force participation
yi = 1 if married woman is in labor force
= 0 otherwise
y∗i = unobserved propensity to work based on utility of choice
xi = observed variables that influence utility
yi = 1 if utility of work is greater than utility of leisure
Assume y∗i has a linear model representation
y∗i = x0iβ + εi, i = 1, . . . , n
The relationship between yi and y∗i is
yi = 1 if y∗i > 0 (normalized thershold)
= 0 if y∗i ≤ 0
Then
Pr(yi = 1|xi) = Pr(y∗i > 0|xi) = Pr(x0iβ + εi > 0|xi)= Pr(εi > −x0iβ|xi) = Pr(εi ≤ x0iβ|xi) = Fε(x
0iβ)
provided εi has a symmetric distribution Fε. Similarly,
Pr(yi = 0|xi) = Pr(y∗i ≤ 0|xi) = Pr(εi ≤ −x0iβ|xi) = 1− Fε(x0iβ)
Remarks
1. The latent variable formulation provides a non-linear probability model forPr(yi|xi).
2. By construction, Pr(yi|xi) ∈ (0, 1) because it is based on Fε
3. limx0iβ→∞Pr(yi = 1|xi) = limx0iβ→∞
Fε(x0iβ) = 1 and
limx0iβ→−∞Pr(yi = 1|xi) = 0
4. To make the model operational requires specifying Fε
Probit Model
Assume εi ∼ N(0, 1). Then
Pr(yi = 1|xi) = Fε(x0iβ) = Φ(x0iβ)
Pr(yi = 0|xi) = 1− Fε(x0iβ) = 1−Φ(x0iβ)
Φ(z) =Z z
−∞ϕ(x)dx
ϕ(x) =1√2πexp
µ−12x2¶
Note: If εi ∼ N(0, σ2) then β and σ are not separately identified. Only theratio β/σ is identified. Hence, σ2 = 1 is an identifying assumption for β.
Logit Model
Assume εi ∼Logistic. Then
Pr(yi = 1|xi) = Fε(x0iβ) = Λ(x0iβ)
Pr(yi = 0|xi) = 1− Fε(x0iβ) = 1− Λ(x0iβ)
Λ(z) =Z z
−∞λ(x)dx =
exp(z)
1 + exp(z)=
1
exp(−z) + 1
λ(z) =d
dzΛ(z) = Λ(z)(1− Λ(z)) =
exp(z)
(1 + exp(z))2
Remarks
1. If εi ∼Logistic then
E[εi] = 0 and var(εi) =π2
3= 3.29
which is similar to a Student’s t distribution with 7 degrees of freedom.
2. Logit and probit probabilities are essentially the same in the middle of thedistribution but differ slightly in the tails of the distribution.
Marginal Effects in Latent Variable Formulation
In the latent variable formulation
Pr(yi = 1|xi) = Fε(x0iβ)
Then∂ Pr(yi = 1|xi)
∂xki=
∂Fε(x0iβ)∂xi
= fε(x0iβ)βk
where fε is the pdf for ε. For the probit and logit models
Probit :∂ Pr(yi = 1|xi)
∂xki= ϕ(x0iβ)βk
Logit :∂ Pr(yi = 1|xi)
∂xki= λ(x0iβ)βk
Remarks
1. Marginal effects are non-linear functions of xi and β
2. Marginal effect of xki depends on the value of xi = (x1i, . . . , xki)0, the
value of β = (β1, . . . , βk)0 and the value of βk.
3. Because fε(·) > 0, the sign of βk determines the sign of the marginal effect
4. Estimated standard errors for marginal effects require the delta-method.
Maximum Likelihood Estimation
Observe a random sample {(y1,x1), . . . , (yn,xn)} and assume that it is gen-erated from the latent variable formulation of the binary response model. Then,yi|xi is a Bernoulli random variable with conditional probablities
πi = Pr(yi = 1|xi) = Fε(x0iβ)
1− πi = Pr(yi = 0|xi) = 1− Fε(x0iβ)
The likelihood and log-likelihood functions are
L(β|y,X) =nYi=1
πyii (1− πi)
1−yi =nYi=1
Fε(x0iβ)
yi(1− Fε(x0iβ))
1−yi
lnL(β|y,X) =nXi=1
nyi ln
³Fε(x
0iβ)
´+ (1− yi) ln
³1− Fε(x
0iβ)
´o
The FOCs that define the MLE are
0 =∂ lnL(βmle|y,X)
∂β= S(βmle|y,X) =
nXi=1
Si(βmle|yi,xi)
=nXi=1
⎧⎨⎩yi∂ ln³Fε(x0iβmle)
´∂β
+ (1− yi)∂ ln
³1− Fε(x0iβmle)
´∂β
⎫⎬⎭=
nXi=1
⎧⎨⎩yi fε(x0iβmle)
Fε(x0iβmle)xi + (1− yi)
−fε(x0iβmle)
1− Fε(x0iβmle)xi
⎫⎬⎭=
nXi=1
⎧⎨⎩yi fε(x0iβmle)
Fε(x0iβmle)− (1− yi)
fε(x0iβmle)
1− Fε(x0iβmle)
⎫⎬⎭xiThese are k non-linear equations in k unknowns. No analytic solution exists.
The Newton-Raphson iteration is
βn+1 = βn −H(βn|y,X)−1S(βn|y,X)
H(β|y,X) =∂2 lnL(β|y,X)
∂β∂β0
Remarks
1. For the logit model, analytic derivatives are easy to determine:
S(β|y,X) =nXi=1
(yi − Λ(x0iβ))xi
H(β|y,X) = −nXi=1
Λ(x0iβ)³1− Λ(x0iβ)
´xix
0i
The Hessian is independent of yi and is always negative definite. Hence,lnL(β|y,X) is globally concave and a unique maximum exists.
2. For the probit model, analytic derivatives are also available:
S(β|y,X) =nXi=1
mixi
H(β|y,X) = −nXi=1
mi
³mi + x
0iβ´xix
0i
where
mi =qiϕ(qi · x0iβ)Φ(qi · x0iβ)
, qi = 2yi − 1
It can be shown that H(β|y,X) is negative definite for all β so that
lnL(β|y,X) is globally concave and a unique maximum exists.
Measuring Goodness-of-fit in Binary Response Models
1. McFadden’s Likelihood Ratio Index (R2)
LRI = R2McFadden = 1−lnL(βmle|y,X)
lnL(all slopes = 0|y,X)LRI ≈ 0 if model with all slopes = 0 fits the data as well as the model withestimated slopes.
2. Prediction Table
2 x 2 table of hits and misses of the prediction rule
classify yi = 1 if cPr(yi = 1|xi) > cutoff probability