Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004...

141
Classical Discrete Choice Theory Classical regression model: y = + ε 0 = E (ε|x) E N ¡ 02 I ¢ Model untenable if y = 1 1 0 : 1 0 : 0 0 1 Unless we invoke the linear probability model, which is discussed below and has some unusual properties. 1

Transcript of Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004...

Page 1: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Classical Discrete Choice TheoryClassical regression model:

y = xβ + ε

0 = E (ε|x)E ∼ N

¡0, σ2I

¢

Model untenable if y =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

110:10:001

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠Unless we invoke the linear probability model, which is discussed belowand has some unusual properties.

1

Page 2: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

To model discrete choices, need to think of the ingredients that give rise to choices.

For example, suppose we want to forecast demand for a new good. We observeconsumption data on old goods x1...xI. (Each good could represent a transportationmode, for example, or an occupation choice.)

Assume people choose a good that yields highest utility. When we have a new good,we need a way of putting it on a basis with the old.

Earliest literature on discrete choice was developed in psychometrics where researcherswere concerned with modeling choice behavior (Thurstone).

These are also models of counterfactual utilities.

2

Page 3: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Two dominant modeling approaches

(i) Luce Model (1953) ⇐⇒McFadden Conditional logit model(ii) Thurstone-Quandt Model (1929, 1930s). (Multivariate probit/normal model)Other approachesGEV models(i) Luce-McFadden Model

- widely used in economics- easy to compute- identiÞability of parameters understood- very restrictive substitution possibilities among goods- restrictive heterogeneity-imposes arbitrary preference shocks

(ii) Quandt-Thurstone Model- very general substitution possibilities- allows for more general forms of heterogeneity- more difficult to compute- identiÞability less easily established-does not necessarily rely on preference shocks

3

Page 4: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Luce Model/McFadden Conditional Logit ModelReferences: Manski and McFadden, Chapter 5, Yellot paper (Manski-McFadden canbe downloaded from McFaddens website)Notation:X : universe of objects of choiceS : universe of attributes of personsB : feasible choice set (x ∈ B ⊆ X)Behavior rule mapping attributes into choices: h

h (B,S) = x

We might assume that there is a distribution of choice rules.h might be random because(a) in observation we lose some information governing choices (unobserved character-istics of choice and person)(b) there can be random variation in choices due to unmeasured psychological factorsDeÞne P (x|S,B) = Pr h0H 3 h (S,B) = x

Probability that an individual drawn randomly from the population with at-tributes S and alternative set B chooses x.

4

Page 5: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Luce Axioms- Maintain some restrictions on P (x|S,B) and derive implications for the functionalform of P.Axiom #1: Independence of Irrelevant Alternatives

x, y ∈ B s ∈ SP (x|s, xy)P (y|s, xy) =

P (x|s,B)P (y|s,B) B = larger choice set

example: Suppose choice is career decision and individual is choosing to be- an economist (E)- a Þreman (F )- a policeman (P )

Pr (E|s, EF)Pr (F |s, EF) =

Pr (E|s, EFP)Pr (F |s, EFP) ←−

would think that introducing 3rd alternativemight increase ratio

another example: (Red bus-Blue bus)Choices:take car Cred bus RB

5

Page 6: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

blue bus BB

Pr(C|s, C,RB)Pr (RB|s, C,RB) =

Pr (C|s, C,RB,BB)Pr (RB|s, C,RB,BB)

Axiom #2

Pr(y|s,B) > 0 ∀y ∈ B (i.e. eliminate 0 probability choices)

6

Page 7: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Implications of above axiomsDeÞne Pxy = P (x|s, xy) Assume Pxx = 1

2

P (y|s,B) =PyxPxy

P (x|s,B) by IIA axiomXy∈B

P (y|s,B) = 1 =⇒ P (x|s,B) = 1Py∈B

PyxPxy

7

Page 8: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Furthermore,

P (y|s,B) =PyzPzyP (z|s,B)

P (x|s,B) =PxzPzx

P (z|s,B)

P (y|s,B) =PyxPxy

P (x|s,B)

PyxPxy

=P (y|s,B)P (x|s,B) =

PyzPzyPxzPzx

DeÞne v (s, x, z) = ln

µPxzPzx

¶v (s, y, z) = ln

µPyzPzy

¶=⇒ Pyx

Pxy=ev(s,y,z)

ev(s,x,z)

8

Page 9: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Axiom #3: Separability Assumption

v(s, x, z) = v(s, x)− v(s, z) ←− v(s, z) can be interpreted as autility indicator of representative tastes

Then

P (x|s,B) =1P

y∈BPyxPxy

=1P

y∈Bev(s,y)−v(s,z)

ev(s,x)−v(s,z)

P (x|s,B) =ev(s,x)Py∈B e

v(s,y)←− Get logistic from

from Luce Axioms

Now link model to familiar models in economics.Marshak (1959) established link between Luce Model and random utility models(Rums)

9

Page 10: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Random Utility ModelsThurstone (1927,1930s)Assume utility from choosing alternative j is

uj = v (s, xj) + ε(s, xj)

where v (s, xj)is a nonstochastic function and ε(s, xj) is stochastic, reßecting idiosyn-cratic tastes.

Pr(j is maximal in set B) = Pr (u(s, xj) ≥ u(s, xl)) ∀l 6= j

= Pr (v(s, xj) + ε(s, xj) ≥ v(s, xl) + ε(s, xl)) ∀l 6= j= Pr (v(s, xj)− v(s, xl) ≥ ε(s, xl)− ε(s, xj)) ∀l 6= j

Specify a cdf F (ε1, ..., εN)Then

Pr(vj − vl ≥ εl − εj ∀l 6= j)= Pr(vj − vl + εj ≥ εl ∀l 6= j)

=

Z ∞

−∞Fj( vj − v1 + εj, ..., vj − vj−1 + εj, εj, ..., vj − vJ + εj)dεj

10

Page 11: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

If ε is iid, then

F (ε1, ..., εn) =nYi=1

Fi(εi)

so the above expression equals

Z ∞

−∞

⎡⎢⎢⎣ nYi=1i6=j

Fi(vj − vi + εj)

⎤⎥⎥⎦ fj(εj)dεj

11

Page 12: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Binary Example (N = 2)

P (1 | s,B) =Z ∞

−∞

Z v1−v2+ε1

−∞f1(ε1, ε2) dε1 dε2

If ε1, ε2 are normal then ε1 − ε2 is normal, so Pr(v1 − v2 ≥ ε1 − ε2) is normal.If ε1, ε2 are weibull then ε1 − ε2 is logistic

ε ∼ weibull =⇒ Pr(ε < c) = e−e−c+α

(Also called double exponential or Type I extreme value)ε weibull

Pr(v1 + ε1 > v2 + ε2) = Pr(v1 − v2 > ε2 − ε1)

= Ω(v1 − v2) =ev1−v2

1 + ev1−v2=

ev1

ev1 + ev2

12

Page 13: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Result: Assuming that the errors follow aWeibull distribution yields same logit modelderived from the Luce Axioms.(This link was established by Marshak (1959))

Turns out that Weibull is sufficient but not necessary. Some other distributions for εgenerate a logit. Yellot (1977) showed that if we require invariance under uniformexpansions of the choice set then only double exponential gives logit.

example: Suppose choice set is coffee, tea, mildinvariance requires that probabilities stay the same if we double the choice set. (i.e.2 coffees, 2 teas, 2 milks)

13

Page 14: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Some Important Properties of the Weibull-Developed 1928 (Fisher & Tippet showed its one of 3 possible limiting distributionsfor the maximum of a sequence of random variables)(i) Closed under Maximization(i.e. max of n weibulls is a Weibull)

Pr(maxiεi ≤ c) =

Yi

e−e−(c+α1) = e

−!ie−ceαi

= e−e−c

!ieαi

= e−e−c+ln

!ieαi

(ii) Difference between two Weibulls is a logitUnder Luce axioms (on R.U.M. with Weibull assumption)

Pr(j | s,B) = ev(s,xj)PNl=1 e

v(s,xl)

Now reconsider the forecasting problem:Let xj =set of characteristics associated with choice jUsually, it is assumed that v(s, xj) = θ(s)0xjdependence of θon s reßects fact that individuals differ in their evaluation of charac-teristics.

14

Page 15: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Get

Pr(j | s,B) = eθ(s)0xjPN

l=1 eθ(s)0xl

Solve by MLE

maxθ(s)

NYi=1

"eθ(s)x1PNl=1 e

θ(s)xl

#D1i...

"eθ(s)xNPNl=1 e

θ(s)xl

#DNiIf a new good has same characteristics, get probabilities by

B0 = B,N + 1

P (N + 1 | B0, s) = eθ(s)0xN+1PN+1

l=1 eθ(s)0xl

15

Page 16: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Debreu (1960) criticism of Luce Model(Red Bus - Blue Bus Problem)Suppose N + 1th alternative is identical to the Þrst

Pr(choose 1 or N + 1 | s,B0) = 2eθ(s)0xN+1PN+1

l=1 eθ(s)0xl

=⇒ Introduction of identical good changes probability of riding a bus.-not an attractive result-comes from need to make iid assumption on new alternative

16

Page 17: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Some Alternative Assumptions(i) Could let vi = ln(θ(s)0xi)

Pr(j | s,B) = θ(s)0xjPN+1l=1 θ(s)

0xl

If we also imposedPN

l=1 θ(s)0xl = 1,

we would get linear probability model but this could violate IIA.(ii) Could consider model of form

Pr(j | s,B) = eθj(s)xiPN

l=1 eθl(s)xl

but here we have lost our forecasting ability (cannot predict demand for a new good).(iii) Universal Logit Model

Pr(i | s, x1, ..., xN) =eϕi(x1,...,xN )β(s)PNl=1 e

ϕl(x1,...,xN )β(s)

Here we lose IIA and forecasting (Bernstein Polynomial Expansion).

17

Page 18: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Criteria for a good PCS

Goal: We want a probabilistic choice model that1. has a ßexible functional form

2. is computationally practical

3. allows for ßexibility in representing substitution patterns among choices

4. is consistent with a random utility model (RUM)=⇒ has a structural interpretation

18

Page 19: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

How do you verify that a candidate PCS is consistent with aRUM?

Either (a) start with a R.U.M.

ui = v(s, xi) + ε(s, xi)

and solve integral for

Pr(ui > ul, ∀l 6= i) = Pr(i = argmaxl

¡vl + εl

¢)

or(b) start with a candidate PCS and verify that it is consistent with a R.U.M. (easier)(McFadden provides sufficient conditions)see discussion of Daley-Zachary-Williams theorem

19

Page 20: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Notes on McFadden Chapter/Integrating Discrete Continu-ous. (See Heckman, 1974a, 1974b) (Change Notation)Notation:

I : enumeration of discrete alternativesx : divisible goodsw : attributes of discrete choicesr : price of xqi : price of good iy : incomey : rx+ qieu : ex× ω × I → [0, 1] utility

DeÞne indirect utility function

v(y − q, r, wi, i, eu) = maxxeu (x,wi, i | rx ≤ y − qi)

(maximize out over continuous goods so we are left with discrete)

20

Page 21: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Assumptions(IU1) We assume v has usual properties of an indirect utility function

(Continuous, twice differentiable, homogeneous degree 0 in (y, q− r), quasi-concave in r, dv

d(y−q) > 0)(IU2) Then get

x(y − q, r, wi, i; eu) = −∂v∂r∂v∂y

. (Roys Identity)

For discrete alternative, we also get something like Roys Identity

δj = D(j | B, s; eu) = −∂v∗∂qj∂v∗

∂y

where

v∗(y − qB, r, wR, B; eu) = maxi∈B

v(y − qi, r, wi, i; eu)δj =

½10if j ∈ B vj ≥ vk ∀k

21

Page 22: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

If IU assumptions satisÞed, can write relationship between the probability of choosingi and the utility function as

P (j | B, s) = Eu|sD(j | B, s; eu)We seek sufficient conditions on preferences u such that we can integrate out overcharacteristics and come up with probabilitiesMcFadden Shows that v takes AIRUM form

v(y − q, r, wi, i; eu) = y − q − α(r, wi, i, eu)β(r)

where y > q + αα, β homogeneous of degree one wrt r

22

Page 23: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

then

v = Eu|smaxi∈B

v(y − qi, r, wi, i; eu)=

1

β(r)

∙y +max

i∈B

¡Eu|s (−qi − α(r, w, i; eu))¢¸

and

P (j) = Eu|sD(j | B, s) =− ∂v∂qj∂v∂y

v is a utility function yielding the PCS

=⇒demand distribution can be analyzed as if it were generated by a population withcommon tastes, with each representative consumer having fractional consumptionrates for the discrete alternative.

23

Page 24: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Let

eG(qB, r, wB, B, s) = Eu|smaxi∈B

[−qi − α(r, wB, i; eu)] (∗)Social surplus function

Then

P (j | B, s) = −∂ eG(qB, r, wB, B, s)∂qj

(∗∗)

under SS conditions given in Mcfaddens chapteri.e. choice probabilities given by the gradient of the SS function.

24

Page 25: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Daly-Zachary-Williams TheoremDaly-Zachary (1976)Williams (1977)

- provide a set of conditions that makes it easy to derive a PCS from a RUM with aclass of models (GEV - generalized extreme value models)DeÞne G : G(Y1, . . . , YJ)

If G satisÞes the following

(1) nonnegative deÞned on Y1, . . . , YJ ≥ 0(2) homogeneous degree one in its arguments(3) lim

Yi→∞G(Y1, . . . , Yi, . . . , YJ)→∞, ∀i = 1, . . . , J

(4) ∂kG∂Y1···∂Yk is

nonnegative if k oddnonpositive if even

25

Page 26: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Then for a R.U.M. with ui = vi + εi and

F (ε1, . . . , εJ) = exp©−G

¡e−ε1 , . . . , e−εJ

¢ª(this cdf has Weibull marginals but allows for more dependence among εs.)the PCS is given by

Pi =∂ lnG

∂vi=eviGi (e

v1 , . . . , evJ )

G (ev1 , . . . , evJ )

Note: McFadden shows that under certain conditions on the form of the indirectutility function (satisÞes AIRUM form), the DZW result can be seen as a form ofRoys identity.

26

Page 27: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Lets apply this result

(i) Multinomial logit model (MNL)

cdf F (ε1, . . . , εJ) = e−e−ε1 · · · e−e−εJ ←− product of iid Weibulls

= e−!Jj=1 e

−εj

Can verify that G (ev1 , . . . , evJ ) =PJ

j=1 evi satisÞes DZW conditions

P (j) =∂ lnG

∂vi=

evjPJl=1 e

vl=MNL model

27

Page 28: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Another GEV model(ii) Nested logit model (addresses to a limited extent the IIA criticism)let

G (ev1 , . . . , evJ ) =MXm=1

am

"Xi∈Bm

evi

1−σm

#1−σm(- like an elasticityof substitution)

idea - divide goods into branchesÞrst choose branch, then good within branchex

car bus

red

blue

28

Page 29: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

(will allow for correlation between errors (this is role of σ))

Bm ⊆ 1, . . . , J[m=1

Bm = B

is a single branchneed not have all choices on all branches

29

Page 30: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Note: if σ = 0, get usual MNL form

Calculate equation for

pi =∂ lnG

∂vi=

∂ ln

∙Pmm=1 am

hPi∈Bm e

vi1−σm

i1−σm¸∂vi

=

Pm3 i∈Bm am

³e

vi1−σm

´ hPi∈Bm e

vi1−σm

i−σm hPi∈Bm e

vi1−σm

i−1 hPi∈Bm e

vi1−σm

iPm

m=1 amhP

i∈Bm evi

1−σm

i1−σm=

mXm=1

P (i | Bm)P (Bm)

30

Page 31: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

where

P (i | Bm) =e

vi1−σmP

i∈Bm evi

1−σmif i ∈ Bm, 0 otherwise

P (Bm) =amhP

i∈Bm evi

1−σm

i1−σmPm

m=1 amhP

i∈Bm evi

1−σm

i1−σmNote: If P (Bm) = 1 get logit form

Nested logit requires that analyst make choices about nesting structure

31

Page 32: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

How does nested logit solvered bus/blue bus problem?

Suppose

G = Y1 +

∙Y

11−σ2 + Y

11−σ3

¸1−σYi = e

vi

32

Page 33: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

P (1 | 123) =∂ lnG

∂vi=

ev1

ev1 +he

v21−σ + e

v31−σ

i1−σP (2 | 123) =

∂ lnG

∂vi=e

v21−σ

he

v21−σ + e

v31−σ

i−σev1 +

he

v21−σ + e

v31−σ

i1−σAs v3 →−∞

P (1 | 123) = ev1

ev1 + ev2(get logistic)

As v1 →−∞

P (2 | 123) =e

v21−σ

ev21−σ + e

v31−σ

(get logistic)

=1

2when v2 = v3

33

Page 34: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

What Role does σ play?σ is the degree of substitutability parameterRecall

F (ε1, ε2, ε3) = exp−G(e−ε1, e−ε2, e−ε3)Here

σ =cov(ε2, ε3)√var ε2 var ε3

= correlation coefficient

Thus we require −1 ≤ σ ≤ 1, but turns out we also need to require σ > 0 for DZWconditions to be satisÞed. This is unfortunate because it does not allow εs to benegatively correlated.Can show that

limσ→1

P (1 | 123) = ev1

ev1 +max(ev2 , ev3)(LHôpitals Rule)

34

Page 35: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

If v2 = v3, then

P (2 | 123) =e

v21−σ

h2e

v21−σ

i−σev1 +

h2e

v21−σ

i1−σ= 2−σ

ev2

ev1 + (ev2) (21−σ)

limσ→1

= 2−1ev2

ev1 + ev2when v1 = v2

% introduce 3rd identical alternative and cut the probability of choosing 2 in half

=⇒ solves red-bus/blue-bus problem

probability cut in half with two identical alternatives

35

Page 36: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

car

red bus

blue bus

σ is a measure of similarity between red and blue bus. When σ close to one, theconditional choice probability selects with high probability the alternativeRemark: We can expand logit to accommodate multiple levelsex.

G =

QXq=1

aq

⎧⎨⎩Xm∈Qq

am

"Xi∈Bm

y1

1−σmi

#1−σm⎫⎬⎭ 3 levels

36

Page 37: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Example: Two Choices1. Neighborhood (m)2. Transportation mode (t)P (m) choice of neighborhood

P (i | Bm)probability of choosing ith mode, given neighborhood m.Not all modes available in all neighborhoods.

Pm,t =ev(m,t)1−σm

hPTmt=1 e

v(m,t)1−σm

i−σmPm

j=1

hPTjt=1 e

v(m,t)1−σm

i1−σm

Pt|m =ev(m,t)1−σmPTm

t=1 ev(m,t)1−σm

Pm =

hPTmt=1 e

v(m,t)1−σm

i1−σmPm

j=1

hPTjt=1 e

v(m,t)1−σm

i1−σm = P (Bm)

37

Page 38: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Standard type of utility function that people might use

v(m, t) = z0tγ + x0mtβ + y

0mα

where z0t is transportation mode characteristics, x0mt is interactions and y

0m is neigh-

borhood characteristics.Then

Pt|m =e(z0tγ+x

0mtβ)

1−σm∙PTmt=1 e

(z0tγ+x0mtβ)

1−σm

¸

Pm =

ey0mα

∙PTmt=1 e

(z0tγ+x0mtβ)

1−σm

¸1−σmPm

j=1 ey0mα

∙PTmt=1 e

(z0tγ+x0mtβ)

1−σj

¸1−σjEstimation (in two steps) (see Amemiya, Chapter 9)let

Im =TmXt=1

e(z0tγ+x

0mtβ)

1−σm

38

Page 39: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

1. Within each neighborhood, get "γ1−σm and

"β1−σm by logit

2. Form cIm. Then estimate by MLEey

0mα+(1−σm) ln#ImPm

j=1 ey0mα+(1−σj) ln "Ij get bα, bσm

Assume σm = σj ∀j,m or at least need some restrictions across multiple neighbor-hoods?Note: cIm is an estimated regressor (Durbin problem)

Need to correct standard errors

39

Page 40: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Multinomial Probit ModelsAlso known as:

1. Thurstone Model V

2. Thurstone-Quandt Model

3. Developed by Domencich-McFadden (1978) (on reading list)

ui = vi + ηi i = 1, ..., J

vi = Ziβ (linear in parameters form)

ui = Ziβ + ηi

MNL MNP(i) β Þxed (i) β random coefficient β ∼ N

¡β,Σβ

¢(ii) ηi iid (ii) β independent of η η ∼ (0,Ση) ,

(allow gen. forms of correlation between errors)

40

Page 41: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

ui = Ziβ + Zi¡β − β

¢+ ηi

where (β − β) = ε and Zi¡β − β

¢+ ηi is a composite heteroskedastic error term. (β

random=taste heterogeneity, ηi can interpret as unobserved attributes of goods)⇒Main advantage of MNP over MNL is that it allows for general error covariancestructure.

Note: To make computation easier, users sometimes set Σβ = 0 (Þxed coefficientversion)-allowing for β random-permits random taste variation-allows for possibility that different persons value 2 characteristics differently

41

Page 42: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Parametric Identification of Multinomial Discrete

Choice Models

Karsten T. Hansen and James J. Heckman

June 4, 2002

1 Introduction

We consider identification of parametric multinomial discrete choice models. This

endeavor is – perhaps surprisingly – often analytically much harder than proving non-

parametric identification of choice models.

For ease of exposition we begin by considering a multinomial probit model with

three choices.

Let latent utility be described by

U1 = Z1β + ε1,

U2 = Z2β + ε2,

U3 = Z3β + ε3,

(1)

where Z ≡ (Z1, Z2, Z3) are observable covariates assumed independent of ε ≡ (ε1, ε2, ε3)

and

ε ∼ N(

0, Σ). (2)

Consider first the probability of choosing alternative 1. Since only relative utilities

1

42

Page 43: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

are relevant rewrite (1) in difference form

U1.2 = Z1.2β + ε1.2,

U1.3 = Z1.3β + ε1.3,(3)

where the notation rule is Xi.j = Xi −Xj.

Note that

V(

ε1.2, ε1.3)

≡ Σ1 =

σ11 + σ22 − 2σ12 σ23 + σ11 − σ12 − σ13

σ23 + σ11 − σ12 − σ13 σ11 + σ33 − 2σ13

. (4)

Let D be the choice index equal to j is alternative j is chosen. Then

D = 1 ⇐⇒ Z1.2β + ε1.2 > 0, Z1.3β + ε1.3 > 0.

Since these inequalities are invariant to scale changes we can normalize by any

positive constant. Dividing through by

ω12 ≡√

σ11 + σ22 − 2σ12,

in the first inequality and

ω13 ≡√

σ11 + σ33 − 2σ13,

in the second:

D = 1 ⇐⇒ Z1.2β/ω12 + ε1.2/ω12 > 0, Z1.3β/ω13 + ε1.3/ω13 > 0,

and

V(

ε1.2/ω12, ε1.3/ω13)

≡ R1 =

1 ρ1

ρ1 1

, (5)

where

ρ1 ≡σ23 + σ11 − σ12 − σ13

(σ11 + σ22 − 2σ12)1/2(σ11 + σ33 − 2σ13)1/2 . (6)

Then we can write the probability of choosing alternative 1 as

Pr(D = 1|Z) =∫ Z1.2β/ω12

−∞

∫ Z1.3β/ω13

−∞φ(

τ |0, R1)

dτ 1dτ 2, (7)

2

43

Page 44: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

are relevant rewrite (1) in difference form

U1.2 = Z1.2β + ε1.2,

U1.3 = Z1.3β + ε1.3,(3)

where the notation rule is Xi.j = Xi −Xj.

Note that

V(

ε1.2, ε1.3)

≡ Σ1 =

σ11 + σ22 − 2σ12 σ23 + σ11 − σ12 − σ13

σ23 + σ11 − σ12 − σ13 σ11 + σ33 − 2σ13

. (4)

Let D be the choice index equal to j is alternative j is chosen. Then

D = 1 ⇐⇒ Z1.2β + ε1.2 > 0, Z1.3β + ε1.3 > 0.

Since these inequalities are invariant to scale changes we can normalize by any

positive constant. Dividing through by

ω12 ≡√

σ11 + σ22 − 2σ12,

in the first inequality and

ω13 ≡√

σ11 + σ33 − 2σ13,

in the second:

D = 1 ⇐⇒ Z1.2β/ω12 + ε1.2/ω12 > 0, Z1.3β/ω13 + ε1.3/ω13 > 0,

and

V(

ε1.2/ω12, ε1.3/ω13)

≡ R1 =

1 ρ1

ρ1 1

, (5)

where

ρ1 ≡σ23 + σ11 − σ12 − σ13

(σ11 + σ22 − 2σ12)1/2(σ11 + σ33 − 2σ13)1/2 . (6)

Then we can write the probability of choosing alternative 1 as

Pr(D = 1|Z) =∫ Z1.2β/ω12

−∞

∫ Z1.3β/ω13

−∞φ(

τ |0, R1)

dτ 1dτ 2, (7)

244

Page 45: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

where φ(τ |0, R1) is the bivariate normal density with mean zero and covariance matrix

R1.

By a parallel analysis we find

Pr(D = 2|Z) =∫ Z2.1β/ω12

−∞

∫ Z2.3β/ω23

−∞φ(

τ |0, R2)

dτ 1dτ 2, (8)

where ω23 ≡√

σ22 + σ33 − 2σ23 and

R2 =

1 ρ2

ρ2 1

, (9)

and

ρ2 ≡σ22 − σ23 − σ12 + σ13

(σ11 + σ22 − 2σ12)1/2(σ22 + σ33 − 2σ23)1/2 . (10)

2 Identification analysis

We now consider identification of the parameters of interest β and Σ. We know the

functions Pr(D = 1|Z) and Pr(D = 2|Z). Since Pr(D = 3|Z) = 1 − Pr(D = 1|Z) −

Pr(D = 2|Z) this is all the information we have.

2.1 “Identification at infinity”

We first consider identification using an identification at infinity type argument.

Consider Pr(D = 1|Z). If we can condition on very small values of Z3β we can shut

down alternative three as an option. In particular,

limZ3β→−∞

Pr(D = 1|Z) =∫ Z1.2β/ω12

−∞φ(τ |0, 1)dτ ,

= Φ(

Z1.2β/ω12

)

.

(11)

Similarly, if we can condition on very small values of Z2β we can shut down alternative

two:

limZ2β→−∞

Pr(D = 1|Z) =∫ Z1.3β/ω13

−∞φ(τ |0, 1)dτ ,

= Φ(

Z1.3β/ω13

)

.

(12)

3

45

Page 46: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

where φ(τ |0, R1) is the bivariate normal density with mean zero and covariance matrix

R1.

By a parallel analysis we find

Pr(D = 2|Z) =∫ Z2.1β/ω12

−∞

∫ Z2.3β/ω23

−∞φ(

τ |0, R2)

dτ 1dτ 2, (8)

where ω23 ≡√

σ22 + σ33 − 2σ23 and

R2 =

1 ρ2

ρ2 1

, (9)

and

ρ2 ≡σ22 − σ23 − σ12 + σ13

(σ11 + σ22 − 2σ12)1/2(σ22 + σ33 − 2σ23)1/2 . (10)

2 Identification analysis

We now consider identification of the parameters of interest β and Σ. We know the

functions Pr(D = 1|Z) and Pr(D = 2|Z). Since Pr(D = 3|Z) = 1 − Pr(D = 1|Z) −

Pr(D = 2|Z) this is all the information we have.

2.1 “Identification at infinity”

We first consider identification using an identification at infinity type argument.

Consider Pr(D = 1|Z). If we can condition on very small values of Z3β we can shut

down alternative three as an option. In particular,

limZ3β→−∞

Pr(D = 1|Z) =∫ Z1.2β/ω12

−∞φ(τ |0, 1)dτ ,

= Φ(

Z1.2β/ω12

)

.

(11)

Similarly, if we can condition on very small values of Z2β we can shut down alternative

two:

limZ2β→−∞

Pr(D = 1|Z) =∫ Z1.3β/ω13

−∞φ(τ |0, 1)dτ ,

= Φ(

Z1.3β/ω13

)

.

(12)

3

46

Page 47: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

where φ(τ |0, R1) is the bivariate normal density with mean zero and covariance matrix

R1.

By a parallel analysis we find

Pr(D = 2|Z) =∫ Z2.1β/ω12

−∞

∫ Z2.3β/ω23

−∞φ(

τ |0, R2)

dτ 1dτ 2, (8)

where ω23 ≡√

σ22 + σ33 − 2σ23 and

R2 =

1 ρ2

ρ2 1

, (9)

and

ρ2 ≡σ22 − σ23 − σ12 + σ13

(σ11 + σ22 − 2σ12)1/2(σ22 + σ33 − 2σ23)1/2 . (10)

2 Identification analysis

We now consider identification of the parameters of interest β and Σ. We know the

functions Pr(D = 1|Z) and Pr(D = 2|Z). Since Pr(D = 3|Z) = 1 − Pr(D = 1|Z) −

Pr(D = 2|Z) this is all the information we have.

2.1 “Identification at infinity”

We first consider identification using an identification at infinity type argument.

Consider Pr(D = 1|Z). If we can condition on very small values of Z3β we can shut

down alternative three as an option. In particular,

limZ3β→−∞

Pr(D = 1|Z) =∫ Z1.2β/ω12

−∞φ(τ |0, 1)dτ ,

= Φ(

Z1.2β/ω12

)

.

(11)

Similarly, if we can condition on very small values of Z2β we can shut down alternative

two:

limZ2β→−∞

Pr(D = 1|Z) =∫ Z1.3β/ω13

−∞φ(τ |0, 1)dτ ,

= Φ(

Z1.3β/ω13

)

.

(12)

347

Page 48: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

By a similar argument we find

limZ1β→−∞

Pr(D = 2|Z) =∫ Z2.3β/ω23

−∞φ(τ |0, 1)dτ ,

= Φ(

Z2.3β/ω23

)

,

(13)

limZ3β→−∞

Pr(D = 2|Z) =∫ Z2.1β/ω12

−∞φ(τ |0, 1)dτ ,

= Φ(

Z2.1β/ω12

)

.

(14)

Using the classical results for identification of binary probit models we can therefore

identify

a = β/ω12,

b = β/ω13,

c = β/ω23.

(15)

We now know the functions,

P1(ρ1, Z) =∫ Z1.2a

−∞

∫ Z1.3b

−∞φ(

τ |0, R(ρ1))

dτ ,

P2(ρ2, Z) =∫ Z2.1a

−∞

∫ Z2.3c

−∞φ(

τ |0, R(ρ2))

dτ .(16)

Identification of ρ1 and ρ2 follows by appeal to the following lemma:

Lemma 1. Fix some Z ∈ ΩZ. Then the mappings

Hj(ρj) ≡ Pj(ρj, Z) : [−1, 1] → [Pj, Pj] j [0, 1], j = 1, 2,

are onto.

Proof. Consider the first mapping H1(ρ1). We have

∂H1

∂ρ1=

∫ Z1.2a

−∞

∫ Z1.3b

−∞

∂φ(

τ |0, R(ρ1))

∂ρ1dτ .

From Plackett (1954) we have the result1,

∂φ(

τ |0, R(ρ1))

∂ρ1=

∂2φ∂τ 1∂τ 2

. (17)

1Plackett, R.L. (1954), “A Reduction Formula for Normal Multivariate Integrals”, Biometrika, Vol.

41, Issue 3/4, p. 351 – 360.

4

48

Page 49: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

By a similar argument we find

limZ1β→−∞

Pr(D = 2|Z) =∫ Z2.3β/ω23

−∞φ(τ |0, 1)dτ ,

= Φ(

Z2.3β/ω23

)

,

(13)

limZ3β→−∞

Pr(D = 2|Z) =∫ Z2.1β/ω12

−∞φ(τ |0, 1)dτ ,

= Φ(

Z2.1β/ω12

)

.

(14)

Using the classical results for identification of binary probit models we can therefore

identify

a = β/ω12,

b = β/ω13,

c = β/ω23.

(15)

We now know the functions,

P1(ρ1, Z) =∫ Z1.2a

−∞

∫ Z1.3b

−∞φ(

τ |0, R(ρ1))

dτ ,

P2(ρ2, Z) =∫ Z2.1a

−∞

∫ Z2.3c

−∞φ(

τ |0, R(ρ2))

dτ .(16)

Identification of ρ1 and ρ2 follows by appeal to the following lemma:

Lemma 1. Fix some Z ∈ ΩZ. Then the mappings

Hj(ρj) ≡ Pj(ρj, Z) : [−1, 1] → [Pj, Pj] j [0, 1], j = 1, 2,

are onto.

Proof. Consider the first mapping H1(ρ1). We have

∂H1

∂ρ1=

∂2φ∂τ 1∂τ 2

. (17)

1Plackett, R.L. (1954), “A Reduction Formula for Normal Multivariate Integrals”, Biometrika, Vol.

41, Issue 3/4, p. 351 – 360.

4

49

Page 50: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

By a similar argument we find

limZ1β→−∞

Pr(D = 2|Z) =∫ Z2.3β/ω23

−∞φ(τ |0, 1)dτ ,

= Φ(

Z2.3β/ω23

)

,

(13)

limZ3β→−∞

Pr(D = 2|Z) =∫ Z2.1β/ω12

−∞φ(τ |0, 1)dτ ,

= Φ(

Z2.1β/ω12

)

.

(14)

Using the classical results for identification of binary probit models we can therefore

identify

a = β/ω12,

b = β/ω13,

c = β/ω23.

(15)

We now know the functions,

P1(ρ1, Z) =∫ Z1.2a

−∞

∫ Z1.3b

−∞φ(

τ |0, R(ρ1))

dτ ,

P2(ρ2, Z) =∫ Z2.1a

−∞

∫ Z2.3c

−∞φ(

τ |0, R(ρ2))

dτ .(16)

Identification of ρ1 and ρ2 follows by appeal to the following lemma:

Lemma 1. Fix some Z ∈ ΩZ. Then the mappings

Hj(ρj) ≡ Pj(ρj, Z) : [−1, 1] → [Pj, Pj] j [0, 1], j = 1, 2,

are onto.

Proof. Consider the first mapping H1(ρ1). We have

∂H1

∂ρ1=

∫ Z1.2a

−∞

∫ Z1.3b

−∞

∂φ(

τ |0, R(ρ1))

∂ρ1dτ .

From Plackett (1954) we have the result1,

∂φ(

τ |0, R(ρ1))

∂ρ1=

∂2φ∂τ 1∂τ 2

. (17)

1Plackett, R.L. (1954), “A Reduction Formula for Normal Multivariate Integrals”, Biometrika, Vol.

41, Issue 3/4, p. 351 – 360.

450

Page 51: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Then

∂H1

∂ρ1=

∫ Z1.2a

−∞

∫ Z1.3b

−∞

∂φ(

τ |0, R(ρ1))

∂ρ1dτ ,

=∫ Z1.2a

−∞

∫ Z1.3b

−∞

∂2φ∂τ 1∂τ 2

= φ(

Z1.2a, Z1.3b)

> 0.

Hence ∂H1/∂ρ1 > 0 and in addition we can show that P1 5 H1(ρ1) 5 P1 for some

P1 < P1. So the mapping H1(ρ1) is onto. A similar argument can be given for H2(ρ2).

So the maps H1 and H2 are onto and we can solve hj = Hj(ρj) for a unique ρ1 and

ρ2.

So we can identity a, b, c and ρ1, ρ2.

2.2 Specific normalizations

2.2.1 Normalization I

This normalization has β free and

Σ =

1 0 0

0 σ22 0

0 0 σ33

.

We know the left hand side of the following equation system:

a = β/√

1 + σ22

b = β/√

1 + σ33

c = β/√

σ22 + σ33

ρ1 =1√

1 + σ22√

1 + σ33

ρ2 =σ22√

1 + σ22√

σ22 + σ33

(18)

5

51

Page 52: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Then

∂H1

∂ρ1=

∫ Z1.2a

−∞

∫ Z1.3b

−∞

∂φ(

τ |0, R(ρ1))

∂ρ1dτ ,

=∫ Z1.2a

−∞

∫ Z1.3b

−∞

∂2φ∂τ 1∂τ 2

= φ(

Z1.2a, Z1.3b)

> 0.

Hence ∂H1/∂ρ1 > 0 and in addition we can show that P1 5 H1(ρ1) 5 P1 for some

P1 < P1. So the mapping H1(ρ1) is onto. A similar argument can be given for H2(ρ2).

So the maps H1 and H2 are onto and we can solve hj = Hj(ρj) for a unique ρ1 and

ρ2.

So we can identity a, b, c and ρ1, ρ2.

2.2 Specific normalizations

2.2.1 Normalization I

This normalization has β free and

Σ =

1 0 0

0 σ22 0

0 0 σ33

.

We know the left hand side of the following equation system:

a = β/√

1 + σ22

b = β/√

1 + σ33

c = β/√

σ22 + σ33

ρ1 =1√

1 + σ22√

1 + σ33

ρ2 =σ22√

1 + σ22√

σ22 + σ33

(18)

552

Page 53: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Then sinceba

=[1 + σ22

1 + σ33

]1/2,

we have

ρ1 =ba

1(1 + σ22)

.

So we can solve for σ22,

σ22 =b

aρ1− 1, ρ1 6= 0. (19)

Knowing σ22 we can solve immediately for β and σ33.

With only three free parameters (β, σ22, σ33) and five equations this normalization

seems strongly overidentified. Yet removing the constraint σ11 = 1 one finds that

(β, σ11, σ22, σ33) are not identified.

2.2.2 Normalization II

This normalization has β free and

Σ =

1 σ12 0

σ12 σ22 0

0 0 0

.

We know the left hand side of the following equation system:

a = β/√

1 + σ22 − 2σ12

b = β

c = β/√

σ22

ρ1 =1− σ12√

1 + σ22 − 2σ12

ρ2 =σ22 − σ12√

1 + σ22 − 2σ12√

σ22

(20)

Clearly β is immediately identified from b and we then get σ22 from c. Finally, we

can solve σ12 from the relation,

(ac

)2=

σ22

1 + σ22 − 2σ12.

6

53

Page 54: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Then sinceba

=[1 + σ22

1 + σ33

]1/2,

we have

ρ1 =ba

1(1 + σ22)

.

So we can solve for σ22,

σ22 =b

aρ1− 1, ρ1 6= 0. (19)

Knowing σ22 we can solve immediately for β and σ33.

With only three free parameters (β, σ22, σ33) and five equations this normalization

seems strongly overidentified. Yet removing the constraint σ11 = 1 one finds that

(β, σ11, σ22, σ33) are not identified.

2.2.2 Normalization II

This normalization has β free and

Σ =

1 σ12 0

σ12 σ22 0

0 0 0

.

We know the left hand side of the following equation system:

a = β/√

1 + σ22 − 2σ12

b = β

c = β/√

σ22

ρ1 =1− σ12√

1 + σ22 − 2σ12

ρ2 =σ22 − σ12√

1 + σ22 − 2σ12√

σ22

(20)

Clearly β is immediately identified from b and we then get σ22 from c. Finally, we

can solve σ12 from the relation,

(ac

)2=

σ22

1 + σ22 − 2σ12.

6

54

Page 55: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

2.2.3 Normalization III

This normalization has β free and

Σ =

1 α1α2 α1α3

α1α2 1 α2α3

α1α3 α2α3 1

.

The restrictions on Σ are the restrictions implied by a one-factor model for ε :

εj = αjf + uj,

with f ∼ N(0, 1) and V[ε] = 1.

We know the left hand side of the following equation system:

a =β/

(1− α1α3)1/2

c =β/

(2)(1− α2α3)1/2

ρ1 =(1/2)(1 + α2α3 − α1α2 − α1α3)

(1− α1α2)1/2(1− α1α3)1/2

ρ2 =(1/2)(1 + α1α3 − α1α2 − α2α3)

(1− α1α2)1/2(1− α2α3)1/2

(21)

Then

(ac

)2=

1− α2α3

1− α1α2,

(bc

)2=

1− α2α3

1− α1α3.

Solving for α1α2 and α1α3 we get

α1α2 =(a/d)2 + α2α3 − 1

(a/d)2 ≡ k1 + k2α2α3, (22)

α1α3 =(b/d)2 + α2α3 − 1

(b/d)2 ≡ m1 + m2α2α3, (23)

where k1 = 1− (d/a)2, k2 = (d/a)2 and m1 = 1− (d/b)2, k2 = (d/b)2.

7

55

Page 56: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

2.2.3 Normalization III

This normalization has β free and

Σ =

1 α1α2 α1α3

α1α2 1 α2α3

α1α3 α2α3 1

.

The restrictions on Σ are the restrictions implied by a one-factor model for ε :

εj = αjf + uj,

with f ∼ N(0, 1) and V[ε] = 1.

We know the left hand side of the following equation system:

a =β/

(2)(1− α1α2)1/2

b =β/

(2)(1− α1α3)1/2

c =β/

(2)(1− α2α3)1/2

ρ1 =(1/2)(1 + α2α3 − α1α2 − α1α3)

(1− α1α2)1/2(1− α1α3)1/2

ρ2 =(1/2)(1 + α1α3 − α1α2 − α2α3)

(1− α1α2)1/2(1− α2α3)1/2

(21)

Then

(a

1− α1α3.

Solving for α1α2 and α1α3 we get

α1α2 =(a/d)2 + α2α3 − 1

(a/d)2 ≡ k1 + k2α2α3, (22)

α1α3 =(b/d)2 + α2α3 − 1

(b/d)2 ≡ m1 + m2α2α3, (23)

where k1 = 1− (d/a)2, k2 = (d/a)2 and m1 = 1− (d/b)2, k2 = (d/b)2.

7

56

Page 57: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

2.2.3 Normalization III

This normalization has β free and

Σ =

1 α1α2 α1α3

α1α2 1 α2α3

α1α3 α2α3 1

.

The restrictions on Σ are the restrictions implied by a one-factor model for ε :

εj = αjf + uj,

with f ∼ N(0, 1) and V[ε] = 1.

We know the left hand side of the following equation system:

a =β/

(2)(1− α1α2)1/2

b =β/

(2)(1− α1α3)1/2

c =β/

(2)(1− α2α3)1/2

ρ1 =(1/2)(1 + α2α3 − α1α2 − α1α3)

(1− α1α2)1/2(1− α1α3)1/2

ρ2 =(1/2)(1 + α1α3 − α1α2 − α2α3)

(1− α1α2)1/2(1− α2α3)1/2

(21)

Then

(ac

)2=

1− α2α3

1− α1α2,

(bc

)2=

1− α2α3

1− α1α3.

Solving for α1α2 and α1α3 we get

α1α2 =(a/d)2 + α2α3 − 1

(a/d)2 ≡ k1 + k2α2α3, (22)

α1α3 =(b/d)2 + α2α3 − 1

(b/d)2 ≡ m1 + m2α2α3, (23)

where k1 = 1− (d/a)2, k2 = (d/a)2 and m1 = 1− (d/b)2, k2 = (d/b)2.

7

57

Page 58: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Substituting this in the expression for ρ2 we find

2ρ2 =1 + m1 − k1 + (m2 − k2 − 1)α2α3

(1− k1 − k2α2α3)1/2(1− α2α3)1/2 .

Everything but α2α3 is known in this equation. This can be rewritten as the quadratic

equation

(α2α3)2 + q1α2α3 + q2 = 0,

with q1 and q2 appropriately defined as functions of k1, k2,m1,m2.

Assuming away complex solutions we can thus determine α2α3 up to sign. From (22)

and (23) we can then determine α1α2 and α1α3. Identification of β follows immediately.

8

58

Page 59: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Problemof IdentiÞcation andNormalization in theMNPmodelReference:David Bunch (1979), Estimability in the multinominal Probit Model in Transporta-tion ResearchDomencich and McFadden

Let Zβ =

⎛⎜⎝ Z1 · β...

ZJ · β

⎞⎟⎠ η =

⎛⎜⎝ η1...ηJ

⎞⎟⎠ J alternativesK characteristicsβ random β ∼ N (β,Σβ)

Pr (alternative j selected) = Pr (uj > ui) ∀i 6= j

=

∞Zuj=−∞

ujZui=−∞

ujZuJ=−∞

Φ (u | Vµ,Σµ) duJdulduj

where Φ (u | Vµ,Σµ) is pdf(Φ is J-dimensional MVN density with mean Vµ,Σµ)

Note: Unlike the MVL, no closed form expression for the integral. The integralsoften evaluated using simulation methods (we will work an example).

59

Page 60: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

How many parameters are there?

β : K parametersΣβ : K ×K symmetric matrix K2−K

2+K = K(K+1)

2

Ση :J(J+1)2

Note: When a person chooses j, all we know is relative utility, not absolute utility.This suggests that not all parameters in the model will be identiÞed. Will requirenomalizations.

60

Page 61: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Digression on IdentiÞcationWhat does it mean to say a parameter is not identiÞed in amodel?⇒model with one parameterization is observationally equivalent to another modelwith a different parameterizationExample: Binary Probit Model (Þxed β)

Pr (D = 1 | Z) = Pr (v1 + ε1 > v2 + ε2)

= Pr (xβ + ε1 > x2β + ε2)

= Pr ((x1 − x2)β > ε2 − ε1)

= Pr

µ(x1 − x2)β

σ>ε2 − ε1σ

¶= Φ

µxβ

σ

¶x = x1 − x2

Φ¡xβσ

¢is observationally equivalent to Φ

³xβ∗

σ∗

´for β

σ= β∗

σ∗ .

61

Page 62: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

β not separably identiÞed relative to σ but ratio is identiÞed:

Φ

µxβ

σ

¶= Φ

µxβ∗

σ∗

¶Φ−1 · Φ

µxβ

σ

¶= Φ−1Φ

µxβ∗

σ∗

¶⇒ β

σ=β∗

σ∗

Set b : b = β · δ, δ any positive scalar is identiÞed (say β is identiÞed up to scaleand sign is identiÞed).

62

Page 63: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

IdentiÞcation in the MVP model

Pr (j selected | Vµ,Σµ) = Pr (ui − uj < 0 ∀i 6= j)

DeÞne∆j =

⎛⎜⎜⎝1 0 .. −1 .. 00 1 .. −1 .. 0: : :0 .. .. −1 0 1

⎞⎟⎟⎠(J−1)×J

(contrast matrix) ∆ju =

⎛⎝ u0 − uj:

uJ − uj

⎞⎠

Pr (j selected | Vµ,Σµ) = Pr (∆ju < 0 | Vµ,Σµ)= Φ (0 | VZ ,ΣZ)

where(i) VZ is the mean of ∆ju = ∆j Zβ(ii) ΣZ is the variance of ∆j ZΣβ Z 0∆0j +∆jΣη∆

0j

(iii) VZ is (J − 1)× 1(iv) ΣZ is (J − 1)× (J − 1)

⇒We reduce dimensions of the integral by one.

63

Page 64: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

This says that all of the information exists in the contrasts. Cant identify all thecomponents because we only observe the contrasts. Now deÞne ∆j as ∆j with Jthcolumn removed and choose J as the reference alternative with corresponding ∆J .Then can verify that

∆j = ∆j ·∆JFor example, with three goods:µ1 −10 −1

¶×µ1 0 −10 1 1

¶=

µ1 −1 00 −1 1

¶∆j , (j = 2, ∆J , (J = 3, ∆j , (j = 2, 3rd column included)

3rd columnremoved)

reference alt.)

Therefore, we can write

VZ = ∆j Zβ

ΣZ = ∆j ZΣβ Z0∆0j + ∆j∆JΣη∆

0J∆0j

where CJ = ∆JΣη∆0J and (J − 1) × (J − 1) has

(J−1)2−(J−1)2

+ (J + 1) parameters= J(J−1)

2total. Since original model can always be expressed in terms of a model

with (β,Σβ, CJ) , it follows that some of the parameters in the original model are notidentiÞed.

64

Page 65: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

How many parameters not identiÞed?

Original model:

K +K (K + 1)

2+J (J + 1)

2

Now:

K +K (K + 1)

2+J (J − 1)

2,

J2 + J − (J2 − J)2

= J not identiÞed

Turns out that one additional parameter not identiÞed.Total: J + 1Note: Evaluation ofΦ (0 | kvZ , k2ΣZ) k > 0 gives same result as evaluatingΦ (0 | vZ ,ΣZ) caneliminate one more parameter by suitable choice of k.

65

Page 66: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Illustration

J = 3 Ση =

⎛⎝ σ11 σ12 σ13σ21 σ22 σ23σ31 σ32 σ33

⎞⎠C2 = ∆2Ση∆

02 =

µ1 −1 00 −1 1

¶· Ση

µ1 −1 00 −1 1

¶0=

µσ11 −2σ21 +σ22, σ21 −σ31 −σ32 +σ22σ21 −σ31 −σ32 +σ22, σ33 −2σ31 +σ22

C2 = ∆2∆3Ση∆03∆

02 =

µ1 −10 −1

¶·µ

σ11 −2σ21 +σ33, σ21 −σ31 −σ32 +σ33σ21 −σ31 −σ32 +σ33 σ22 −2σ32 σ33

¶·µ

1 0−1 −1

66

Page 67: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

NormalizationApproach of Albreit, Lerman, andManski (1978)

Note: Need J + 1 restrictions on VCV matrix.

Þx J parameters by setting last row and last column of Ση to 0

Þx scale by constraining diagonal elements of Ση so that trace ΣεJequals variance of

a standard Weibull. (To compare estimates with MNL and independent probit)

67

Page 68: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

How do we solve the forecasting problem?Suppose that we have 2 goods and add a 3rd

Pr (1 chosen) = Pr¡u1 − u2 ≥ 0

¢= Pr 1

¡¡Z1 − Z2

¢β ≥ ω2 − ω1

¢where

ω1 = Z1¡β − β

¢+ η1, ω2 = Z2

¡β − β

¢+ η2

=

Z (Z1−Z2)β

[σ11+σ22−2σ12+(Z2−Z1)Ση(Z2−Z1)0]1/2

−∞

1√2πe−t/2dt

Now add a 3rd goodu3 = Z3β + Z3

¡β − β

¢+ η3.

68

Page 69: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Problem: We dont know correlation of η3 with other errors.Suppose that η3 = 0 (i.e. only preference heterogeneity). Then

Pr (1 chosen) =

Z a

−∞

Z b

−∞B.V.N. dt1dt2

when a =(Z1 − Z2) β£

σ11 + σ22 − 2σ12 + (Z2 − Z1)Σβ (Z2 − Z1)0¤1/2

and b =(Z1 − Z3) β£

σ11 + (Z3 − Z1)Σβ (Z3 − Z1)0¤1/2

We could also solve the forecasting problem if we make an assumption like η2 = η3.We solve red-bus//blue-bus problem if η2 = η1 = 0 and z3 = z2.

Pr (1 chosen) = Pr¡u1 − u2 ≥ 0, u1 − u3 ≥ 0

¢but u1 − u2 ≥ 0 ∧ u1 − u3 ≥ 0 are the same event.∴adding a third choice does not change the choice of 1.

69

Page 70: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Estimation Methods for MNP ModelsModels tend to be difficult to estimate because of high dimensional integrals. Integralsneed to be evaluated at each stage of estimating the likelihood.Simulation provides a means of estimating Pij = Pr (i chooses j)

70

Page 71: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Variety of Simulation Methodssimulated method of momentsmethod of simulated scoressimulated maximum likelihood

References:Lerman and Manski (1981), Structural Analysis (online at McFaddens website)McFadden (1989), EconometricaRuud (1982), Journal of EconometricsHajivassilon and McFadden (1990)Hajivassilon and Ruud (Ch. 20), Handbook of EconometricsStern (1992), EconometricaStern (1997), Survey in JELBayesian MCMC (Chib et al. on reading list)

71

Page 72: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Early Simulation Method: Crude Frequency MethodModel

uj = Zjβ + ηj with β Þxed, ηj ∼ N (0,Ω) , J choicesPij = prob i chooses j

Yij = 1 if i chooses j, 0 else

$ =NYi=1

JYj=1

(Pij)Yij

log$ =NXi=1

JXj=1

Yij logPij

72

Page 73: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Simulation Algorithm(i) For given β,Ω generate Monte Carlo draws (R of them)

urj , j = 1...J, r = 1...R

(ii) Let Pk = 1R

RXr=1

1(urk = maxur1, ..., urJ) where Pk is a frequency simulator of

Pr (k chosen; β,Ω)

(iii)MaximizeNXi=1

log Pik over alternative values for β,Ω

73

Page 74: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Note: Lerman and Manski found that this procedure performs poorly and requiresa large number of draws, particularly when P is close to 0 or 1.

var

µ1

R

X1 (·)

¶=1

R2

RXi=1

var1 (·) with var1 (·) at true values

McFadden (1989) provided some key insights into how to improve the simulationmethod. He showed that simulation is viable even for a small number of drawsprovided that:(a) an unbiased simulator is used(b) functions to be simulated appear linearly in the conditions deÞning the estimator(c) same set of random draws is used to simulate the model at different parametervaluesNote: Condition (b) is violated for crude frequency method which had log Pik

74

Page 75: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Simulated Method of Moments (McFadden, Econometrica, 1989)

uij = Zijβ = Zijβ+Zijεi β = β+εi (earlier model with only preference heterogeneity)

Pij (γ) = P (i chooses j | wi, γ) (wi are regressors)DeÞne Yij = 1 if i chooses j, Yij = 0 otherwise.

log$ =1

N0

NXi=1

JXj=1

Yij lnPij (γ) N0 = NJ

∂ log$

∂γ=

1

N0

NXi=1

JXj=1

Yij

" ∂Pij∂γ

Pij (γ)

#= 0 (1)

pN0(γMLE − γ0) ∼ N

¡0, I−1f

¢If =

1

N0

NXi=1

"JXj=1

Yij

"∂Pij∂γ

Pij (γ)

##⎡⎣ JXj=1

Yij

"∂Pij∂γ

Pij (γ)

#0⎤⎦ (outer product of score vector).

75

Page 76: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Now use the fact thatPJ

j=1 Pij (γ) = 1

⇒JXj=1

∂Pij∂γ

= 0 ⇒JXj=1

∂Pij∂γ

PijPij = 0

Rewrite 1 as1

N0

NXi=1

JXj=1

(Yij − Pij)∂Pij∂γ

Pij= 0

Note: E (Yij) = Pij.

Letting εij = Yij − Pij, and Zij =∂Pij∂γ

Pij, we have

1

N0

NXi=1

JXj=1

εijZij =

∂Pij∂γ

Pij

like a moment condition using Zij as the instrument but so far Pij still a J − 1dimensional intergral.

76

Page 77: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Simulation AlgorithmModeluij = Zijβ + Zij · εi J choices, K characteristicsuij : 1× 1 Zij : 1×K β : K × 1Zij : 1×K εi : K × 1

Rewrite asui = Ziβ + ZiΓei where ΓΓ0 = Σε (Cholesky decomposition), ei ∼ N (0, Ik) , εi = Γeiui : J × 1 Zi : J ×K β : K × 1Zi : J ×K Γ : K ×X ei : K × 1

Step (i). Generate ei for each i such that ei are iid across persons and distributedN (0, Ik) . In total, generate

N (sample size) ·K (vector length) ·R (number of Monte Carlo draws)

Step (ii). Fix matrix Γ and obtain ηij = ZijΓei,where Zij : 1×K; Γ : K ×K; ei is K × 1 .

Form vector

⎛⎜⎜⎜⎝Zi1ΓeiZi2Γei...

ZiJΓei

⎞⎟⎟⎟⎠ for each person.

77

Page 78: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Step (iii). Fix β and generate

uij = Zijβ + ηij ∀i.

Step (iv). Find relative frequency that ith person chooses alternative j across MonteCarlo draws

Pij (γ) =1

R

RXr=1

1 (uij > uim; ∀m 6= j)

where Pij (γ) is the simulator for Pij (γ) . Stack to get Pi (γ) .

78

Page 79: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Step (v). To get Pi (γ) for different values of γ, repeat steps (ii) through (iv) usingthe same r.v.s ei generated in step (i).Step (vi). DeÞne

wij =

∂Pij(γ)

∂γ

Pij

and wij as corresponding stacked vector simulator for wij can be obtained by a nu-merical derivative

∂Pij (γ)

∂γ=Pij (γ + hlm)− Pij (γ − hlm)

2hwhere m = 1...J, lm = vector with 1 in mth place.

79

Page 80: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Solve Moment ConditionApply Gauss-Newton Method, iterate to convergence

γ1 = γ0 + 1

N

Xwi (γ) yi − Pi (γ0)yi − Pi (γ0)−1 ·

1

N

NXi=1

wi (γ0) yi − Pi (γ0)

Digression on Gauss-NewtonSuppose problem is

S = minβ

1

N

NXi=1

[yi − fi (β)]2 (nonlinear least squares).

fi (β) = fi³β1

´+∂fi∂β

¯β1

³β − β1

´+ ... by Taylor expansion around initial guess β1

= fi³β1

´+∂f

∂β

¯β1

³β − β1

´+ ... (terms ignored)

80

Page 81: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Substitution gives

minβ

1

N

NXi=1

[yi − fi³β1

´− ∂f

∂β

¯β1

³β − β1

´]2

Solve for β2 to get

β2 = β1 +

"NXi=1

∂fi∂β

¯β1

∂fi∂β

¯0β1

#−1∂S

∂β

¯β1

Repeat until convergence (problem if matrix is singular).

81

Page 82: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Disadvantages of Simulation Methods

1. Pij is not smooth due to indicator function (causes difficulties in deriving asymp-totic distribution; need to use methods developed by Pakes and Pollard (1989) fornondifferentiable functions). Smoothed SMOM methods developed by Stern, Haji-vassiliou, and Ruud.

2. Pij cannot be 0 (causes problems in denominator when close to 0)

3. Simulating small Pij may require large number of draws

ReÞnement: Smoothed Simulated Method of Moments replaces indicator with asmooth function (Stern (1992), Econometrica).Z

instead of

Pij (γ) =1

R

RXr=1

R (Φ (uij − uim))

82

Page 83: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

How does simulation affect the asymptotic distribution?Without simulation get

√η (γmme − γ0) ∼ N

Ã0,

∙p limN→∞

1

N

Xwi (yi − Pi)0w0i

¸−1!

with simulation, the variance is slightly hgher due to simulation error

√n (γmsm − γ0) ∼ N

µ0,plimN→∞C

−11 + 1η¶

where

C = − 1N

NXi=1

wi (yi − Pi) (yi − Pi)0wi0

as N →∞, √η (γmsm − γ) ∼ N (0, C−1) .

Note: Method does not require that number of draws go to inÞnity.

83

Page 84: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Choice-Based Sampling (See Heckman in New Palgrave)Reference:-Chs. 1-2 of Manski and McFadden volume-Manski and Lerman (1978 Econometrica)-AmemiyaExamples:1. Suppose we gather data on transportation mode choice at the

train stationsubway stationcar checkpoints (toll booths etc.)

We observe characteristics on populations conditioned on the choice that they made(this type of sampling commonly arises)2. Evaluating effects of a social program; have data on particpants and non-participants;usually participants oversampled relative to frequency in the population.

Distinguish between exogenous stratiÞcation and endogenous stratiÞcation, thelatter of which is choice-based. (But a special type of endogenous stratiÞcation)Oversampling in high population areas (as is commonly done to reduce samplingcosts or to increase representation of some groups) could be exogenous stratiÞcation(depending on phenomenon being studied).

84

Page 85: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Notation:Let

Pi = P (i | Z) in a random sample P ∗i in a choice-based sample (CBS)

Under CBS, sampling is assumed to be random within i partitions of the data

P (Z | i) = P ∗ (Z | i) but P (Z) 6= P ∗ (Z)

Suppose that we want to recover P (i | Z) from choice-based dataWe observe

P ∗ (i | Z) (assume Z are discrete conditioning cells)

P ∗ (Z)

P ∗ (i)

Frequency weights :

P ∗ (Z | i) = P (Z | i) (key assumption)

85

Page 86: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

By Bayes Rule

P (A,B) =P (A,B)

P (B)=P (B | A) · P (A)

P (B)

P ∗ (i | Z) =P ∗ (Z | i) · P ∗ (i)

P ∗ (Z)

P (i | Z) =P (Z | i) · P (i)

P (Z)

P (i | Z) =

hP∗(i|Z)·P∗(Z)

P∗(i)

iP (i)

P (Z)

P (Z) =Xj

P (Z | j)P (i)

P (Z | j) = P ∗ (Z | j)

P ∗ (Z | j) =P ∗ (j | Z) · P ∗ (Z)

P ∗ (j)

P (i | Z) =P∗(i|Z)P∗(Z)

P∗(i) P (i)PjP∗(j|Z)P∗(Z)

P∗(j) P (j)=

P ∗ (i | Z) P (i)P∗(i)P

j P∗ (j | Z) P (j)

P∗(j)

86

Page 87: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

To recover P (i | Z) from choice-based sampled data, you need to know P (j) , P ∗ (j)∀j. P ∗ (j) can be estimated from sample, but P (j) requires outside information. Needweights P (i)

P∗(i) .

Note: Problem set asks you to consider how CBS biases the coefficients and interceptin a logit model. (Can show that bias only in the constant)

87

Page 88: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Application and Extension: Berry, Levinsohn, and Pakes (1995)

develop equilibrium model and estimation techniques for analyzing demand andsupply in differentiated product marketsuse to study automobile industry

Goal is to estimate parameters of both the demand and cost functions incorporatingown and cross price elasticities and elasticities with respect to product attributes (carhorse power, MPG, air conditioning, size,...) using only aggregate product level datasupplemented with data on consumer characteristic distributions (income distributionfrom CBS)want to allow for ßexible substitution patterns

Key assumptions(i) joint distribution of observed and unobserved product and consumer charac-

teristics(ii) price taking for consumers, Nash eq assumptions on producers in oligopolistic,

differentiated products market.

88

Page 89: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Notationζ : individual characteristicsx (observed)ξ (unobserved)p (price)

: product characteristics

uij = u (ζi, pj, xj, θ) : utility if person i chooses j (Cobb-Douglas assumption here)

j = 0, 1, ..., J

0 = not purchasing any

DeÞne

Aj =©ζ : uij

¡ζ, pj, xj, ξj ; θ

¢≥ u (ζ, pr, xr, ξr ; 0) r = 0, ..., J

ª,

the set of ζ that induces choice of good j. This is deÞned over individual character-istics which may be observed or unobserved.

89

Page 90: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Market Share

sj (p, x, ξ; θ) =

ZζTAj

f (ζ) dζ; (s is vector of market shares)

Special functional form:

uij = u¡ζi, pj, xj, ξj; θ

¢= xjβ − αpj + ξj + 0ij = δj + 0ij

δj = xjβ − αpj + ξj = mean utility from good jξj is mean across consumers of unobserved component of utility0ij are the only elements representing consumer characteristicsSpecial Case:

ξj = 0 (no unobserved characteristic)

0ij iid over i, j, independent of xj

Then share

sj =

Z ∞

−∞Πq 6=jF (δj − δq + 0) f (0) d0

Unidimensional integral; has closed form solution under extreme value.

90

Page 91: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Why is assumption that utility is additively separable and iid in con-sumer and product characteristics highly restrictive?(a) Implies that all substitution effects depend only on the δs (since there is a uniquevector of market shares associated with each δ vector). Therefore, conditional onmarket shares, substitution patterns dont depend on characteristics of the product.Example: if Mercedes and Yugo have some market share then they must have thesame δs and some cross derivative with respect to any 3rd car (BMW).

∂si∂pk

=

ZΠq 6=kF (δk − δq + 0)F 0 (δk − δq + 0)

∂δk∂pk

f (0) d0 (same if δs same)

(b) Two products with same market share have same own price derivatives (not goodbecause you expect product markup to depend on more than market share)

(c) Also assumes that individuals value product characteristics in same way (no pref-erence heterogeneity)

91

Page 92: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Alternative Model (Random Coefficients Versions)

uij = xjβ − αpj + ξj +Xk

σkxjkνik + 0ij

βk = βk + σkνk

E (νik) = 1

Could still assume 0ij has iid extreme value.

92

Page 93: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Model Actually UsedImpose alternative functional form assumption because they want to incorporate priorinfo on distribution of relevant consumer characteristics and on interactions betweenconsumer and product characteristics.

uij = (y − pj)αG¡xi, ξj, νi

¢eT(i,j)

Assume G is log linear

uij = log uij = α log (yi − pj) + xjβ + ξj +Xk

σkxjkνik + 0ij

No good utility:= α log yi + ξ0 + σ0νio + 0io

Note: Prices are likely to be correlated with unobserved product attributes, ξ, whichleads to an endogeneity problem. (ξ may represent things like style, prestige, reputa-tion, etc.)

quantity demanded, qj =Msj (x, ξ, p; θ) (share)

ξ enters nonlinearly, so we need to use some transformation to be able to applyintrumental variables (Principle of Replacement Function).

93

Page 94: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Cost SideMultiproduct Þrms 1, . . . , F . Each produces subset τF of J possible products. Costof producing good assumed to be independent of output levels and log linear in vectorof cost characteristics (Wj, ωj) .

lnmcj =Wjγ + ωj ⇒ Πf =Xj∈τF

(pj −mcj)msj (p, x, ξ; θ)

Nash AssumptionFirm chooses prices that maximize proÞt taking as given attributes of its productsand the prices and attributes of its competitors products. Pj satisÞes

sj (p, x, ξ; θ) +XrTτF

(pr −mcr)∂sr (p, x, ξ; θ)

∂pj= 0

DeÞne

∆jr =

½ −∂sr∂pj

0

¾if r and j produced by same Þrm

⇒ s (p, x, ξ; θ)−∆ (p, x, ξ; θ) [p−mc] = 0⇒ p = mc+∆ (p, x, ξ; θ)−1 s (p, x, ξ; θ) (market)

94

Page 95: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Market term depends on parameters of demand system and on equilibrium pricevector

p = mc+∆ (p, x, ξ; θ)−1 s (p, x, ξ; θ)

Mark-up depends only on the parameters of the demand system and equilibrium pricevector.Since p is a function of w, b (p, x, ξ; θ) also a function of ω (unobs cost determinants)Let

lnmcj = Wjγ + ωj

⇒ p exp Wγ + ω+ b (p, x, ξ; θ)ln (p− b (p, x, ξ, θ)) = Wγ + ω (pricing equation)

95

Page 96: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

EstimationNeed instruments for both demand and pricing equations. i.e. need variables corre-lated with (x, ω) uncorrelated with ξ and ω. Let

Z = (X,W ) (p, q not included in Z)

Assume

E¡ξj | Zj

¢= E (ωj | Z) = 0

E¡¡ξj, ωj

¢ ¡ξj, ωj

¢| Z¢= Ω (Zj) (Þnite almost every Zj)

Note that demand for any product is a function of characteristics of all products, sodont have any exclusion restrictions.

96

Page 97: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Data

J vectors (xj, ωj, pj, qj)n : number of households sampledsn : vector of sampled market shares

Assume that a true θ0 population abides by models.

Decision Rulessn converges to s0 (multinomial)√n (sn − s0) = Op(1)

97

Page 98: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Assume we could calculate

εj (θ, s, p0) , ωj (θ, s, p0)Jj=1

for alternative values of θ

They show that any choice of(a) observed vector of market shares, s(b) distribution of consumer characteristics, P(c) parameter of model

implies a unique sequence of estimates for the two unobserved characteristics

ξj(θ, s, P ), ωj (θ, s, P )

98

Page 99: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Then any function of Z must be uncorrelated with the vector©ξ¡θ, s0, P0

¢, ω¡θ, s0, P0

¢ªJj=1

when θ = θ0

Can use GMM

Note: Conditional moment restriction implies inÞnite number of unconditional re-strictions

min1

J

XHj (Z)

µξj (θ, s

0, P0)wj (θ, s

0, P0)

99

Page 100: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Compuation in Logit Case

Logit 0ij is Weibull.

δj = xjβ − αpj + εjuij = xjβ − αpj + εj + 0ij

sj (p, x, ε) =eδj

1 +PJ

j=1 eδj

δj = ln sj − ln s0, j = 1, . . . , J

εj = ln sj − ln s0 − xjβ + αpj

See paper for more details.

100

Page 101: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Generalized Method of Moments (GMM)References:Hansen (1982), EconometricaHansen and Singleton (1982,1988)Also known as minimum distance estimatorsSuppose that we have some data xt t = 1...T and we want to test hypothesesabout E (xt = µ) . How do we proceed? By a CLT

1√T

TXt=1

(xt − µ) ∼ N (O, V0)

V0 = E¡(xt − µ) (xt − µ)0

¢if xt iid

= limt→∞

E

Ã1√TvXt

(xt − µ)1√T(xt − µ)0

!(general case)

101

Page 102: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

We can decompose V0 = QDQ0 where QQ0 = I, Q−1 = Q1, D = matrix ofeigenvalues

V0 = QD1/2D1/2Q0

Q0V0Q = D1/2D1/2

D−1/2Q0V0QD−1/2 = I

under rule H0,"1√T

Xt

(xt − µ0)#0V −10

∙1√T

X(xt − u0)

¸

=

"1√T

Xt

(xt − µ0)#0QD−1/2D−1/2Q0

"1√T

Xt

(xt − µ0)#∼ X2 (n)

where n is the number of moment conditions.How does test statistic behave under alternative? (µ 6= µ0)should get large

102

Page 103: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Write as"1√T

Xt

(xt − µ0)#0V −10

"1√T

Xt

(xt − µ0)#=

"1√T

Xt

(xt − µ0)#0V −10

"1√T

Xt

(xt − µ0)#

+2√T

Xt

(µ− µ0)0 V −10

1√T

Xt

(xt − µ) + ...

+1√T

Xt

(µ− µ0)0 V −10

1√T

Xt

(µ− µ0) (*)

last term * is 0(T ) .

λ = T (µ− µ0)0 V −10 (µ− µ0) is the noncentrality parameter

103

Page 104: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Problems:(i) Vo is not known a priori.Estimate VT −→ V0In the setting, use sample covariance matrix.In general setting, approximate limit by Þnite T(ii) µ not knownSuppose we want to test µ = ϕ (β) ϕ speciÞed

β unknownCan estimate by min-x2 estimation.

minβ∈B

∙1√T

X(xt − ϕ (β))

¸0V −1T

∙1√T

X(xt − ϕ (β))

¸∼ x2 (n− k)

k = dimension of β

n = number of moments

Note: Searching over k dimensions, you lose one degree of freedom (will show next).

104

Page 105: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Find distribution theory for β :Q: This is a M-estimator, so how do you proceed?FOC

√T∂ϕ

∂β

¯0βT

V −1T

1√T

hxt − ϕ

³βT

´i= 0

Taylor expand ϕ³βT

´around ϕ

³β0

´ϕ³βT

´= ϕ

³β0

´+∂ϕ

∂β

¯β∗

³βT − β0

´β∗ between β0, βT

get√T∂ϕ

∂β

¯0βT

V −1T

1√T

Xt

"xt − ϕ (β0)−

∂ϕ

∂β

¯β∗

³βT − β0

´#= 0

105

Page 106: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Rearrange to solve for³βT − β0

´+

"√T∂ϕ

∂β

¯0βT

V −1T

∂ϕ

∂β

¯β∗

#³βT − β0

´ T√T

=√T∂ϕ

∂β

¯0βT

V −1T

1√T

Xt

xt − ϕ³βT

´

if∂ϕ

∂β

¯0βT

→ D0 (Convergence of random function)

VT → V0∂ϕ

∂β

¯β∗→ D0

Apply CLT to 1√T

Xt

xt − ϕ³βT

´∼ N (0, V0)

Then√T³βT − β0

´∼ N

³0,¡D00V

−10 D0

¢−1 ¡D00V

−10 D0

¢−10´

106

Page 107: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Why is the limiting distribution χ2 (n− k)?Write

1√T

Xt

xt − ϕ³βT

´=

1√T

Xt

(xt − µ0) +1√T

Xt

³µ0 − ϕ

³βT

´´Recall, we had

ϕ³βT

´= ϕ (β0) +

∂ϕ

∂β

¯β∗

(**)⎧⎨⎩Ã∂ϕ

∂β

¯0βT

V −1T

∂ϕ

∂β

¯β∗

!−1· ∂ϕ∂β

¯βT

· V −1T

1√T

Xt

(xt − µ0)

⎫⎬⎭deÞnition of

√T³βT − β0

´derived easier. (1)

107

Page 108: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Note that the second term (1) is a linear combination of the Þrst.

⇒ 1√T

Xt

xt − ϕ³βT

´= B

1√T

Xt

(xt − µ0)

where B = I − ∂ϕ∂β

¯β0

∙∂ϕ∂β

¯0β0

V −10∂ϕ∂β

¯β0

¸−1∂ϕ∂β

¯0β0V −10 = B0

Note that∂ϕ

∂β

¯0β0

V −10 B0 = 0

108

Page 109: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

This tells us that certain linear combinations of B will give a degerate distribution(along k dimensions)This needs to be taken into account when testing.Recall that we had V0 = QDQ0 QQ0 = I V −10 = QD−1/2D−1/2Q0

D−1/2Q01√T

Xt

xt − ϕ³βT

´= D−1/2Q0B

1√T

Xt

(xt − µ0)

where

D−1/2Q0B = D−1/2Q0 −D−1/2Q0 · ∂ϕ∂β

¯βT

"∂ϕ

∂β

¯0βT

QD−1/2D−1/2Q0∂ϕ

∂β

¯βT

#−1∂ϕ

∂β

¯0β0

QD−1/2D−1/2Q0

=³I −A (A0A)−1

´D−1/2Q0 (idempotent matrix MA)

Thus

D−1/2Q01√T

Xt

xt − ϕ³βT

´= MA ·D−1/2 ·Q0 · 1√

T

Xt

(xt − µ0)

This matrix MA accounts for the fact that we performed

the minimization over β

109

Page 110: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

How is distribution theory affected?Have a quadratic form in normal r.vs with idempotent matrix

ex. ε · ε = ε0Mxε Mx = I − x (x0x)−1 x0

Me facts(i) Theorem

Let Y ∼ N¡θ, σ2In

¢and let P be a symmetric matrix of rank γ

Then Q =(Y −B)0 P (Y −B)

σ2∼ x2r iff p2 = p (i.e. P idempotent)

see Seber, p37

(ii) if Qi ∼ X2ri

i = 1, 2 r1 > r2and Q = Q1 −Q2 is independent of Q2then Q ∼ X2

r1−r2Apply these results to

ε0ε

σ2=ε0Mxε

σ2∼ x2 (rank Mx)

110

Page 111: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Rank of an idempotent matrix is equal to its trace and

tr (A) =nXi=1

λi λi eigenvalues

for idempotent matrix, eigenvalues are all 0 or 1 (2)

(note rank = #non-zero eigenvalues for idempotent eigenvalues all 0 or 1)

rank³I − x (x0x)−1 x0

´= rank (I)− rank

³x (x0x)

−1x0´

where rank (I) = n

rank³x (x0x)

−1x0´= trace

³x0x (x0x)

−1´since tr (AB) = tr (BA)

= trace (Ik)

rank³I − x (x0x)−1 x0

´= nk

111

Page 112: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Thus, by same reasoning limiting distribution of"1√T

Xt

xt − ϕ³βT

´#V −1T

"1√T

Xt

xt − ϕ³βT

´#∼ x2 (n− k) = rank (A)

We preserve x2 but lose degrees of freedom in estimating β.In case where n = k (just identiÞed case)we can estimate β but have no degrees of freedom left to perform the test.Would GMM provide a method fo estimating β if we used a weighting matrix otherthan V −1T ?Why not replace V −10 by w0?

minβ∈B

1√T

Xt

³xt − ϕ

³βT

´´0w0

1√T

Xt

xt − ϕ (β)

could choose w0 = I (avoid need to estimate weighting matrix)Result: Asymptotic covariance is altered and asymptotic distribution of criterion isdifferent, but 1√

T

Xt

³xt − ϕ

³βT

´´will still be normal.

112

Page 113: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

What is the advantage of focusing on minimum x2 estimation?Choosing w0 = V −10 gives smallest covariance matrix. Get most efficient estimator forβ and most powerful test of restrictions.Show this: show(D0

0w0D0)−1 ¡D0

0w0V−10 w0D0

¢ ¡D00w0D

−10

¢−1− (D00V0D0)

−1is P.S.D

where (D00w0D0)

−1 ¡D00w0V

−10 w0D0

¢ ¡D00w0D

−10

¢−1is the covariance matrix for

√T³βT − β0

´when general weighting matrix is used.Equivalent to showing¡

D0V−10 D0

¢− (D0

0w0D0)¡D00w0V

−10 w0D0

¢−1(D0

0w0D0) is P.S.D

Show that it can be written as a quadratic formTake any vector α

α0hD0V −10 D0 − (D0

0W0D0) (D00W0V0W0D0)

−1(D0

0W0D0)iα

= α0D00V

−1/20

∙I − V 01/20 W0D0

³D00W0V

1/20 · Y 1/20 W0D0

´−1D00W0V

1/20

¸Y−1/20 D0α

= α0D00V

−1/20

∙I − V

³V 0 V

´−1V 0¸V −1/2D0α ≥ 0 (= 0 if W0 = V

−10 )

Therefore W0 = V−10 is the optimal choice for the weighting matrix.

113

Page 114: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Many standard estimators can be interpreted as GMM estimatorsSome examples:(1)OLS

yt = xtβ + ut E (utxt) = 0 ⇒ E ((yt − xtβ)xt) = 0

minβ∈B

Ã1√T

Xt

(yt − xtβ)xt

!0V −1T

Ã1√T

Xt

(yt − xtβ)xt

!where yt is the estimator for E (xtutu0tx

0t) for idd case = σ

2E (xtx0t) if homoskedastic.

114

Page 115: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

(2) Instrumental Variablesyt = xtβ + ut

E (utxt) 6= 0

E (utzt) = 0

E (xtzt) 6= 0

βT = argminβ∈B

µ1√T

X(yt − xtβ)Zt

¶0V −1T

1√T

X(yt − xtβ)

where VT = E (ztutu0tz0t) in idd case

115

Page 116: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Estimator for

limT→∞

E

Ã1√T

Xt

ztut

µ1√T

Xzsus

¶0!(time series case)

SupposeE (utu

0t | zt) = σ2I

thenW0 = σ

2E (ztz0t)

Can verify that 2SLS and GMM give same estimator

116

Page 117: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

β2SLS =³x0z (z0z)

−1z0x´−1 ³

x0z (z0z)−1z0y´

Note: In Þrst stage regress x on Z

x = z (z0z)−1z0x

y = z (z0z)−1z0y

var³β2SLS

´=

hE (xizi)E (ziz

0i)−1E (zixi)

i−1E (xizi)E (ziz

0i)−1E (ziuiu

0iz0i)

·E (ziz0i)−1E (xizi)

0hE (xizi)E (ziz

0i)−1E (zixi)

i−1Under GMM

117

Page 118: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

var³βGMM

´=

¡D0V

−10 D0

¢−1(when W0 = V

−10 )

D0 =∂ϕ

∂β|β0= p lim1

n

Xxiz

0i = E (xiz

0i)

W0 =¡σ2¢−1

E (ziz0i)−1

In the presense of heterskedastocity, weighting matrix would be differ-

118

Page 119: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

ent (and 2SLS and GMM not the same)

W0 = E (z0uu0z)

−1= E (z0E (uu0 | z) z)−1 = E (z0vz)−1

with panel data could have

E (uu0 | z) =

⎛⎜⎜⎜⎝v1 0 · · · 0

v2 · · ·. . . 0

0 vT

⎞⎟⎟⎟⎠ = y

allow for correlation over time for given individual, but iid across individuals.

119

Page 120: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Nonlinear least squares

yt = ϕ (xt, β) + ut E (utϕ (xt;β)) = 0

minβ∈B

∙1√T

X(yt − ϕ (xt;β))ϕ (xt;β)

¸V −1T

∙1√T

X(yt − ϕ (xt;β))ϕ (xt;β)

¸General Method of Moments

minβ∈B

Xt

ft (β)0 V −10

"Xt

ft (β)

#

where

1√T

Xft (β0) → N (O, V0)

1√T

Xft → Eft

In general, ft is a random function.

120

Page 121: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Suppose we want to estimate β 5 × 1 and we have 6 potential instruments. Canwe test validity of the instruments? What if we have 5 instruments? If we assumeE (εi | xi) = 0 instead of E (εixi) = 0 (i. e. conditional instead of unconditional)then have inÞnite number of moment conditions.

E (εif (xi)) = E (E (εi | xi) f (xi)) = 0 any f (xi)

How to optimally choose which moment conditions to use is current area of research.How might you use GMM to check if a variable is normally distributed?

121

Page 122: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Classical Models for Estimating Models with Limited Depen-dent VariablesReferences:Amemiya, Ch. 10

Different types of sampling(a) random sampling(b) censored sampling(c) truncated sampling(d) other non-random (exogenous stratiÞed, choice-based)

122

Page 123: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Standard Tobit Model (Tobin, 1958) Type I Tobit

y∗i = xiβ + ui

Observe

yi = y∗i if y∗i ≥ y0 or yi = 1 (y∗i ≥ y0) y∗iyi = 0 if y∗i < y0

Tobins example-expenditure on a durable good only observed if good is purchased

123

Page 124: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

individuals

y0

x

x

x

x

x

x x x x x

expenditure

Note: Censored observations might have bought the good if price had been lower.Estimator Assume y∗i /xi ∼ N (0, σ2u) y∗i /xi ∼ N (xiβ, σ2u)

124

Page 125: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Z = π0 Pr (y∗i < y0)π1f (y

∗i |yi ≥ y0) · Pr (y∗i ≥ y0)

Pr (y∗i < y0) = Pr (xiβ + ui < y0) = Pr

µuiσu<y0−xiβ

σu

¶= Φ

µy0−xiβ

σu

f (y∗i |y∗i ≥ y0) =

1σuφ³y∗i−xiβσu

´1− Φ

³y0−xiβσu

´ why?

Pr (y∗ = y∗i |y0 ≤ y∗)= Pr (xβ + u = y∗i |y0 ≤ xβ + u)Pr³uσu=

y∗i−xβσu

| uσu≥ y0−xβ

σu

´Note that likelihood can be written as:

L = Π0Φ

µy0 − xiβσu

¶Π1

µ1−Φ

µy0 − xiβσu

¶¶| z Π1

1σuφ³y∗i−xiβσu

´n1−Φ

³y0−xiβσu

´o| z

this part you would set with just a simple probit Additional information

-You could estimate β up to scale using only the information on whether yi T y0, butwill get more efficient estimate using additional information.* if you know y0, you can estimate σu.

125

Page 126: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Truncated Version of Type I TobitObserve yi = y∗i if y∗i > o

µobserve nothing for censored observationsexample: only observe wages for workers

Z = Π1

1σuφ³y∗i−xiβσu

´Φ³xiβσu

´Pr (y∗i > 0) = Pr (xβ + u > 0)

= Pr

µu

σu>−xβσu

¶= Pr

µu <

σu

126

Page 127: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Different Ways of Estimating Tobit(a) if censored, could obtain estimates of β

σuby simple probit

(b) run OLS on observations for which y∗i is observed

E (yi|xiβ + ui ≥ 0) = xiβ + σuEµuiσu| uiσu>−xβσu

¶(y0 = 0)

where E (yi|xiβ + ui ≥ 0) is the conditional mean for truncated normal r.vand σuE

³uiσu| uiσu> −xβ

σu

´−→ λ

³xiβσu

´=

φ(−xβσu)

Φ(πiβσu )

λ

µxiβ

σu

¶known as Mills ratio; bias due to censoring,

can be viewed as an omitted variables problem

127

Page 128: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Heckman Two-Step procedureStep 1: estimate β

σuby probit

Step 2:

form λ

Ãxiβ

σ

!regress

yi = xiβ + σ λ

µxiβ

σ

¶+ v + ε

v = σ

½λ

µxiβ

σ

¶− λ

µxiβ

σ

¶¾ε = ui −E (ui|ui > xiβ)

128

Page 129: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Note: errors (v+e) will be heteroskedaticneed to account for fact that λ is estimated (Durbin problem)Two ways of doing this:(a) Delta method(b) GMM (Newey, Economic Letters, 1984)(c) Suppose you run OLS using all the data

E (yi) = Pr (y∗i ≤ 0) · 0 + Pr (y∗i > 0)∙xiβ + σuE

µuiσu| uiσu>−xiβσ

¶¸= Φ

µxiβ

σ

¶∙xiβ + σuλ

µxiβ

σ

¶¸could estimate model by replacing Φ with φ and λ with λ.For both (b) and (c), errors are heteroskedatic, meaning that you could use weightsto improve efficiency.Also need to adjust for estimated regressor.(d) Estimate model by Tobit maximum likelihood directly.

129

Page 130: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Variations on Standard Tobit Model

y∗1i = x1iβ + u1i

y∗2i = x2iβ + u2i

y2i = y∗2i if y∗1i ≥ 0= 0 else

example

y2i = student test scores

y∗1i = index representing parents propensity to enroll students in school

Test scores only observed for proportion enrolled

130

Page 131: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

L = Π1 Pr (y∗1i > 0) f (y2i|y∗1i > 0)Π0 Pr (y∗1i ≤ 0)

f (y∗2i|y∗1i ≥ 0) =

R∞0f (y∗1i, y

∗2i) dy

∗1iR∞

0f (y∗1i) dy

∗1i

=f (y2i)

R∞0f (y∗1i|y∗2i) dy∗1iR∞

0f (y∗1i) dy

∗1i

=1

σ2φ

µy∗2i − x2iβ2

σ2

¶·R∞0f (y∗1i|y∗2i) dy∗1iPr (y∗1i > 0)

y1i ∼ N¡x1iβ1, σ

y2i ∼ N (x2iβ2, )

y∗1i | y∗2i ∼ N

µx1iβ1 +

σ12σ22(y2i − x2iβ2) , σ21 −

σ12σ22

¶E (y∗1i | u2i = y∗2i − x2iβ) = x1iβ1 +E (u1i | u2i = y∗2i − x2iβ)

131

Page 132: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Estimation by MLE

L = Π0

∙1−Φ

µx1iβ

σ1

¶¸Π11

σ2· φµy∗2i − x2iβ2

σ2

·

⎧⎨⎩1− Φ⎛⎝−

nx1iβ1 +

σ12σ22(y2i − x2iβ2)

oσx

⎞⎠⎫⎬⎭Estimation by Two-Step ApproachUsing data on y2i for which y1i > 0

E (y2i|y1i > 0) = x2iβ +E (u2i|xiβ + u1i > 0)

= x2iβ + σ2E

µu2iσ2| u1iσ1>−x1iβ1σ1

¶= x2iβ + \σ2

σ12σ1\σ2

E

µu1iσ1| u1iσ1>−x1iβ1σ1

¶= x2iβ2 +

σ12σ1λ

µ−xiβσ

132

Page 133: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Example: Gronaus female labor supply modelmax u (L, x)

s.t. x = wH + v H = 1− Lwhere H : hours worked

v : asset income

w given

Px = 1

L : time spent at home for child care∂u∂L∂u∂x

= w when L < 1

reservation wage =MRS |H=0= wRWe dont observe wR directly.

Model w0 = xβ + u (wage person would earn if they worked)

wR = zγ + v

wi = w0i if wRi < w0i

= 0 else

133

Page 134: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Fits within previous Tobit framework if we set

y∗1i = xβ − zγ + u− v = w0 − wR

y2i = wi

Note - Gronau does not develop a model to explain hours of work.

134

Page 135: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Heckmans Model (incorporates choice of H)

w0 = x2iβ2 + u2i given

MRS =∂u∂L∂u∂x

= γHi + z0iα+ vi

(Assume functional form for utility function that yields this)

wr (Hi = 0) = z0iα+ vi

work if w0 = x2iβ2 + u2i > ziα+ vi

if work, then w0i = MRS =⇒ x2iβ2 + u2i = αHi + ziα+ vi

=⇒ Hi =x2iβ2−z

0iα+ u2i − viγ

= x1iβ1 + u1i

where x1iβ1 = (x2iβ2 − ziα) γ−1

u1i = u2i − vi

135

Page 136: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Type 3 Tobit Model

y∗1i = x1iβ1 + u1i ←− hours

y∗2i = x2iβ1 + u2i ←− wage

y1i = y∗1i if y∗1i > 0

= 0 if y∗1i ≤ 0

y2i = y∗2i if y∗1i > 0

= 0 if y∗1i ≤ 0

136

Page 137: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

here Hi = H∗i if H∗

i > 0

= 0 if H∗i ≤ 0

wi = w0i if H∗i > 0

= 0 if H∗i ≤ 0

Note: Type IV Tobit simply adds

y3i = y∗3i if y∗1i > 0

= 0 if y∗1i ≤ 0

Can estimate by(1) maximum likelihood(2) Two-step method

E¡w0i | Hi > 0

¢= γHi + ziα+E (vi | Hi > 0)

Type V Tobit Model of Heckman (1978)

137

Page 138: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

y∗1j = γy2i + x1iβ + δ2wi + u1i

y2i = γ2y∗1i + x2iβ2 + δ2wi + u2i

Analysis of an antidiscrimination law on average income of African Americans in ithstate.Observe x1i, x2i, y2i and wi

wi = 1 if y∗1i > 0

wi = 0 if y∗1i ≤ 0

y2i = average income of African Americans in the statey∗1i = unobservable sentiment towards African Americanswi = if law is in effect

138

Page 139: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Adoption of Law is endogenousRequire restriction γδ2 + δ1 = 0 so that we can solve for y∗1j as a function that doesnot depend on wi.Similar set-up in Lees (1978) model studying effect of membership in union on in-comes, when decision to join union is endogenous.This class of models known as dummy endogenous variable models.

139

Page 140: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

Relaxing Parametric Assumptions in the Selection ModelReference:Heckman (AER, 1990) Varieties of Selection BiasHeckman (1980), Addendum to Sample Selection Bias as SpeciÞcation ErrorHeckmand and Robb (1985, 1986)

y∗1 = xβ + u

y∗2 = zγ + v

y1 = y∗1 if y∗2 > 0

E (y∗1 | observed) = xβ +E (u | x, zγ + u > 0) + [u−E (u | x, zγ + u > 0)]R∞−∞R −zγ−∞ uf (u, v | x, z) dvduR∞

−∞R −zγ−∞ f (uv | x, z) dvdu

Note: Pr (y∗2 > 0 | z) = Pr (zγ + u > 0 | z) = P (Z) = 1− Fv(−zγ)

140

Page 141: Classical Discrete Choice Theoryjenni.uchicago.edu/...dscrt-choice_2004-01-30_mms.pdf · 1/30/2004  · Classical Discrete Choice Theory ... (Each good could represent a transportation

⇒ Fv (−zγ) = 1− P (Z)⇒ −zγ = F−1v (1− P (Z)) if Fv

Can replace−zγ in integrals in integrals by F−1v (1− P (Z)) if in addition f (u, v | x, z) =f (u, v | zγ) (index sufficiency)Then

E (y∗1 | y2 > 0) = xβ + g (P (z)) + ε where g (P (Z)) is bias or control function

Semiparametric selection model-Approximate bias function by Taylor series in P (zγ) ,truncated power series.

141