Logit/Probit Models

66
1 Logit/Probit Models

description

Logit/Probit Models. Making sense of the decision rule. Suppose we have a kid with great scores, great grades, etc. For this kid, x i β is large. What will prevent admission? Only a large negative ε i What is the probability of observing a large negative ε i ? Very small. - PowerPoint PPT Presentation

Transcript of Logit/Probit Models

Page 1: Logit/Probit Models

1

Logit/Probit Models

Page 2: Logit/Probit Models

2

Making sense of the decision rule

• Suppose we have a kid with great scores, great grades, etc.

• For this kid, xi β is large. • What will prevent admission? Only a large

negative εi

• What is the probability of observing a large negative εi ? Very small.

• Most likely admitted. We estimate a large probability

Page 3: Logit/Probit Models

3

Distribution of Epsilon

0.00

0.10

0.20

0.30

0.40

0.50

-3 -2 -1 0 1 2 3

xβ-xβ

Values of εThat will prevent admission

Values of ε that would allow admission

Page 4: Logit/Probit Models

4

Another example

• Suppose we have a kid with bad scores. • For this kid, xi β is small (even negative). • What will allow admission? Only a large

positive εi

• What is the probability of observing a large positive εi ? Very small.

• Most likely, not admitted, so, we estimate a small probability

Page 5: Logit/Probit Models

5

Distribution of Epsilon

0.00

0.10

0.20

0.30

0.40

0.50

-3 -2 -1 0 1 2 3

xβ -xβ

Values of εthat would allow admission

Values ofε that wouldpreventadmission

Page 6: Logit/Probit Models

6

Normal (probit) Model

• ε is distributed as a standard normal – Mean zero– Variance 1

• Evaluate probability (y=1)– Pr(yi=1) = Pr(εi > - xi β) = 1 – Ф(-xi β)– Given symmetry: 1 – Ф(-xi β) = Ф(xi β)

• Evaluate probability (y=0)– Pr(yi=0) = Pr(εi ≤ - xi β) = Ф(-xi β)– Given symmetry: Ф(-xi β) = 1 - Ф(xi β)

Page 7: Logit/Probit Models

7

• Summary– Pr(yi=1) = Ф(xi β)

– Pr(yi=0) = 1 -Ф(xi β)

• Notice that Ф(a) is increasing a. Therefore, if the x’s increases the probability of observing y, we would expect the coefficient on that variable to be (+)

Page 8: Logit/Probit Models

8

• The standard normal assumption (variance=1) is not critical

• In practice, the variance may be not equal to 1, but given the math of the problem, we cannot separately identify the variance.

Page 9: Logit/Probit Models

9

Logit

• PDF: f(x) = exp(x)/[1+exp(x)]2

• CDF: F(a) = exp(a)/[1+exp(a)]– Symmetric, unimodal distribution– Looks a lot like the normal– Incredibly easy to evaluate the CDF and PDF– Mean of zero, variance > 1 (more variance

than normal)

Page 10: Logit/Probit Models

10

• Evaluate probability (y=1)– Pr(yi=1) = Pr(εi > - xi β) = 1 – F(-xi β)

– Given symmetry: 1 – F(-xi β) = F(xi β)

F(xi β) = exp(xi β)/(1+exp(xi β))

Page 11: Logit/Probit Models

11

• Evaluate probability (y=0)– Pr(yi=0) = Pr(εi ≤ - xi β) = F(-xi β)

– Given symmetry: F(-xi β) = 1 - F(xi β)

– 1 - F(xi β) = 1 /(1+exp(xi β))

• In summary, when εi is a logistic distribution– Pr(yi =1) = exp(xi β)/(1+exp(xi β))

– Pr(yi=0) = 1/(1+exp(xi β))

Page 12: Logit/Probit Models

12

STATA Resources Discrete Outcomes

• “Regression Models for Categorical Dependent Variables Using STATA”– J. Scott Long and Jeremy Freese

• Available for sale from STATA website for $52 (www.stata.com)

• Post-estimation subroutines that translate results– Do not need to buy the book to use the

subroutines

Page 13: Logit/Probit Models

13

• In STATA command line type•net search spost

• Will give you a list of available programs to download

• One is Spostado from http://www.indiana.edu/~jslsoc/stata

• Click on the link and install the files

Page 14: Logit/Probit Models

14

Example: Workplace smoking bans

• Smoking supplements to 1991 and 1993 National Health Interview Survey

• Asked all respondents whether they currently smoke

• Asked workers about workplace tobacco policies

• Sample: indoor workers• Key variables: current smoking and

whether they faced a workplace ban

Page 15: Logit/Probit Models

15

• Data: workplace1.dta

• Sample program: workplace1.doc

• Results: workplace1.log

Page 16: Logit/Probit Models

16

Description of variables in data• . desc;

• storage display value• variable name type format label variable label• ------------------------------------------------------------------------• > -• smoker byte %9.0g is current smoking• worka byte %9.0g has workplace smoking bans• age byte %9.0g age in years• male byte %9.0g male• black byte %9.0g black• hispanic byte %9.0g hispanic• incomel float %9.0g log income• hsgrad byte %9.0g is hs graduate• somecol byte %9.0g has some college• college float %9.0g • -----------------------------------------------------------------------

Page 17: Logit/Probit Models

17

Summary statistics• sum;

• Variable | Obs Mean Std. Dev. Min Max• -------------+--------------------------------------------------------• smoker | 16258 .25163 .433963 0 1• worka | 16258 .6851396 .4644745 0 1• age | 16258 38.54742 11.96189 18 87• male | 16258 .3947595 .488814 0 1• black | 16258 .1119449 .3153083 0 1• -------------+--------------------------------------------------------• hispanic | 16258 .0607086 .2388023 0 1• incomel | 16258 10.42097 .7624525 6.214608 11.22524• hsgrad | 16258 .3355271 .4721889 0 1• somecol | 16258 .2685447 .4432161 0 1• college | 16258 .3293763 .4700012 0 1

Page 18: Logit/Probit Models

18

. * run a linear probability model for comparison purposes;

. * estimate white standard errors to control for heteroskedasticity;

. reg smoker age incomel male black hispanic > hsgrad somecol college worka, robust; Regression with robust standard errors Number of obs = 16258 F( 9, 16248) = 99.26 Prob > F = 0.0000 R-squared = 0.0488 Root MSE = .42336 ------------------------------------------------------------------------------ | Robust smoker | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | -.0004776 .0002806 -1.70 0.089 -.0010276 .0000725 incomel | -.0287361 .0047823 -6.01 0.000 -.03811 -.0193621 male | .0168615 .0069542 2.42 0.015 .0032305 .0304926 black | -.0356723 .0110203 -3.24 0.001 -.0572732 -.0140714 hispanic | -.070582 .0136691 -5.16 0.000 -.097375 -.043789 hsgrad | -.0661429 .0162279 -4.08 0.000 -.0979514 -.0343345 somecol | -.1312175 .0164726 -7.97 0.000 -.1635056 -.0989293 college | -.2406109 .0162568 -14.80 0.000 -.272476 -.2087459 worka | -.066076 .0074879 -8.82 0.000 -.080753 -.051399 _cons | .7530714 .0494255 15.24 0.000 .6561919 .8499509 ------------------------------------------------------------------------------

Heteroskedastic consistentStandard errors

Very low R2, typical in LP models

Since OLSReport t-stats

Page 19: Logit/Probit Models

19

. * run probit model;

. probit smoker age incomel male black hispanic > hsgrad somecol college worka; Iteration 0: log likelihood = -9171.443 Iteration 1: log likelihood = -8764.068 Iteration 2: log likelihood = -8761.7211 Iteration 3: log likelihood = -8761.7208 Probit estimates Number of obs = 16258 LR chi2(9) = 819.44 Prob > chi2 = 0.0000 Log likelihood = -8761.7208 Pseudo R2 = 0.0447 ------------------------------------------------------------------------------ smoker | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | -.0012684 .0009316 -1.36 0.173 -.0030943 .0005574 incomel | -.092812 .0151496 -6.13 0.000 -.1225047 -.0631193 male | .0533213 .0229297 2.33 0.020 .0083799 .0982627 black | -.1060518 .034918 -3.04 0.002 -.17449 -.0376137 hispanic | -.2281468 .0475128 -4.80 0.000 -.3212701 -.1350235 hsgrad | -.1748765 .0436392 -4.01 0.000 -.2604078 -.0893453 somecol | -.363869 .0451757 -8.05 0.000 -.4524118 -.2753262 college | -.7689528 .0466418 -16.49 0.000 -.860369 -.6775366 worka | -.2093287 .0231425 -9.05 0.000 -.2546873 -.1639702 _cons | .870543 .154056 5.65 0.000 .5685989 1.172487 ------------------------------------------------------------------------------

Same syntax as REG but with probit

Converges rapidly for mostproblems

Report z-statisticsInstead of t-stats

Test that all non-constantTerms are 0

Page 20: Logit/Probit Models

20

. dprobit smoker age incomel male black hispanic > hsgrad somecol college worka;

Probit regression, reporting marginal effects Number of obs = 16258 LR chi2(9) = 819.44 Prob > chi2 = 0.0000Log likelihood = -8761.7208 Pseudo R2 = 0.0447

------------------------------------------------------------------------------ smoker | dF/dx Std. Err. z P>|z| x-bar [ 95% C.I. ]---------+-------------------------------------------------------------------- age | -.0003951 .0002902 -1.36 0.173 38.5474 -.000964 .000174 incomel | -.0289139 .0047173 -6.13 0.000 10.421 -.03816 -.019668 male*| .0166757 .0071979 2.33 0.020 .39476 .002568 .030783 black*| -.0320621 .0102295 -3.04 0.002 .111945 -.052111 -.012013hispanic*| -.0658551 .0125926 -4.80 0.000 .060709 -.090536 -.041174 hsgrad*| -.053335 .013018 -4.01 0.000 .335527 -.07885 -.02782 somecol*| -.1062358 .0122819 -8.05 0.000 .268545 -.130308 -.082164 college*| -.2149199 .0114584 -16.49 0.000 .329376 -.237378 -.192462 worka*| -.0668959 .0075634 -9.05 0.000 .68514 -.08172 -.052072---------+-------------------------------------------------------------------- obs. P | .25163 pred. P | .2409344 (at x-bar)------------------------------------------------------------------------------(*) dF/dx is for discrete change of dummy variable from 0 to 1 z and P>|z| correspond to the test of the underlying coefficient being 0

Page 21: Logit/Probit Models

21

. mfx compute; Marginal effects after probit y = Pr(smoker) (predict) = .24093439 ------------------------------------------------------------------------------ variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X ---------+-------------------------------------------------------------------- age | -.0003951 .00029 -1.36 0.173 -.000964 .000174 38.5474 incomel | -.0289139 .00472 -6.13 0.000 -.03816 -.019668 10.421 male*| .0166757 .0072 2.32 0.021 .002568 .030783 .39476 black*| -.0320621 .01023 -3.13 0.002 -.052111 -.012013 .111945 hispanic*| -.0658551 .01259 -5.23 0.000 -.090536 -.041174 .060709 hsgrad*| -.053335 .01302 -4.10 0.000 -.07885 -.02782 .335527 somecol*| -.1062358 .01228 -8.65 0.000 -.130308 -.082164 .268545 college*| -.2149199 .01146 -18.76 0.000 -.237378 -.192462 .329376 worka*| -.0668959 .00756 -8.84 0.000 -.08172 -.052072 .68514 ------------------------------------------------------------------------------ (*) dy/dx is for discrete change of dummy variable from 0 to 1

Males are 1.7 percentage points more likely to smoke

Those w/ college degree 21.5 % pointsLess likely to smoke

10 years of age reduces smoking rates by 4 tenths of a percentage point10 percent increase in income will reduce smoking By .29 percentage points

Page 22: Logit/Probit Models

22

. * get marginal effect/treatment effects for specific person;

. * male, age 40, college educ, white, without workplace smoking ban;

. * if a variable is not specified, its value is assumed to be;

. * the sample mean. in this case, the only variable i am not;

. * listing is mean log income;

. prchange, x(male=1 age=40 black=0 hispanic=0 hsgrad=0 somecol=0 worka=0);

probit: Changes in Predicted Probabilities for smoker

min->max 0->1 -+1/2 -+sd/2 MargEfct age -0.0327 -0.0005 -0.0005 -0.0057 -0.0005 incomel -0.1807 -0.0314 -0.0348 -0.0266 -0.0349 male 0.0198 0.0198 0.0200 0.0098 0.0200 black -0.0390 -0.0390 -0.0398 -0.0126 -0.0398hispanic -0.0817 -0.0817 -0.0855 -0.0205 -0.0857 hsgrad -0.0634 -0.0634 -0.0656 -0.0310 -0.0657 somecol -0.1257 -0.1257 -0.1360 -0.0605 -0.1367 college -0.2685 -0.2685 -0.2827 -0.1351 -0.2888 worka -0.0753 -0.0753 -0.0785 -0.0365 -0.0786

Page 23: Logit/Probit Models

23

• Min->Max: change in predicted probability as x changes from its minimum to its maximum

• 0->1: change in pred. prob. as x changes from 0 to 1• -+1/2: change in predicted probability as x changes from 1/2

unit below base value to 1/2 unit above• -+sd/2: change in predicted probability as x changes from 1/2

standard dev below base to 1/2 standard dev above• MargEfct: the partial derivative of the predicted

probability/rate with respect to a given independent variable

Page 24: Logit/Probit Models

24

. * using a wald test, test the null hypothesis that;

. * all the education coefficients are zero;

. test hsgrad somecol college; ( 1) hsgrad = 0 ( 2) somecol = 0 ( 3) college = 0 chi2( 3) = 504.78 Prob > chi2 = 0.0000

Page 25: Logit/Probit Models

25

. * how to run the same tets with a -2 log like test; . * estimate the unresticted model and save the estimates ; . * in urmodel; . probit smoker age incomel male black hispanic > hsgrad somecol college worka; Iteration 0: log likelihood = -9171.443 Iteration 1: log likelihood = -8764.068 Iteration 2: log likelihood = -8761.7211 Iteration 3: log likelihood = -8761.7208 Delete some results . estimates store urmodel; . * estimate the restricted model. save results in rmodel; . probit smoker age incomel male black hispanic > worka; Iteration 0: log likelihood = -9171.443 Iteration 1: log likelihood = -9022.2473 Iteration 2: log likelihood = -9022.1031 Delete some results . lrtest urmodel rmodel; likelihood-ratio test LR chi2(3) = 520.76 (Assumption: rmodel nested in urmodel) Prob > chi2 = 0.0000

Page 26: Logit/Probit Models

26

Comparing Marginal Effects

Variable LP Probit Logit

age -0.00040 -0.00048 -0.00048

incomel -0.0289 -0.0287 -0.0276

male 0.0167 0.0168 0.0172

Black -0.0321 -0.0357 -0.0342

hispanic -0.0658 -0.0706 -0.0602

hsgrad -0.0533 -0.0661 -0.0514

college -0.2149 -0.2406 -0.2121

worka -0.0669 -0.0661 -0.0658

Page 27: Logit/Probit Models

27

When will results differ?

• Normal and logit PDF/CDF look:– Similar in the mid point of the distribution– Different in the tails

• You obtain more observations in the tails of the distribution when – Samples sizes are large approaches 1 or 0

• These situations will more likely produce differences in estimates

Page 28: Logit/Probit Models

28

probit smoker worka age incomel male black hispanic hsgrad somecol college;matrix betat=e(b); * get beta from probit (1 x k);matrix beta=betat';matrix covp=e(V); * get v/c matric from probit (k x k);

* get means of x -- call it xbar (k x 1);* must be the same order as in the probit statement;matrix accum zz = worka age incomel male black hispanic hsgrad somecol college, means(xbart);matrix xbar=xbart'; * transpose beta; matrix xbeta=beta'*xbar; * get xbeta (scalar);matrix pdf=normalden(xbeta[1,1]); * evaluate std normal pdf at xbarbeta;matrix k=rowsof(beta); * get number of covariates;matrix Ik=I(k[1,1]); * construct I(k);matrix G=Ik-xbeta*beta*xbar'; * construct G;matrix v_c=(pdf*pdf)*G*covp*G'; * get v-c matrix of marginal effects;matrix me= beta*pdf; * get marginal effects;matrix se_me1=cholesky(diag(vecdiag(v_c))); * get square root of main diag;matrix se_me=vecdiag(se_me1)'; *take diagonal values;matrix z_score=vecdiag(diag(me)*inv(diag(se_me)))'; * get z score;matrix results=me,se_me,z_score; * construct results matrix;matrix colnames results=marg_eff std_err z_score; * define column names;matrix list results; * list results;

Page 29: Logit/Probit Models

29

results[10,3] marg_eff std_err z_score worka -.06521255 .00720374 -9.0525984 age -.00039515 .00029023 -1.3615156 incomel -.02891389 .00471728 -6.129356 male .01661127 .00714305 2.3255154 black -.03303852 .0108782 -3.0371321hispanic -.07107496 .01479806 -4.8029926 hsgrad -.05447959 .01359844 -4.0063111 somecol -.11335675 .01408096 -8.0503576 college -.23955322 .0144803 -16.543383 _cons .2712018 .04808183 5.6404217

------------------------------------------------------------------------------ smoker | dF/dx Std. Err. z P>|z| x-bar [ 95% C.I. ]---------+-------------------------------------------------------------------- age | -.0003951 .0002902 -1.36 0.173 38.5474 -.000964 .000174 incomel | -.0289139 .0047173 -6.13 0.000 10.421 -.03816 -.019668 male*| .0166757 .0071979 2.33 0.020 .39476 .002568 .030783 black*| -.0320621 .0102295 -3.04 0.002 .111945 -.052111 -.012013hispanic*| -.0658551 .0125926 -4.80 0.000 .060709 -.090536 -.041174 hsgrad*| -.053335 .013018 -4.01 0.000 .335527 -.07885 -.02782 somecol*| -.1062358 .0122819 -8.05 0.000 .268545 -.130308 -.082164 college*| -.2149199 .0114584 -16.49 0.000 .329376 -.237378 -.192462 worka*| -.0668959 .0075634 -9.05 0.000 .68514 -.08172 -.052072---------+--------------------------------------------------------------------

Page 30: Logit/Probit Models

30

* this is an example of a marginal effect for a dichotomous outcome;* in this case, set the 1st variable worka as 1 or 0;matrix x1=xbar;matrix x1[1,1]=1;matrix x0=xbar;matrix x0[1,1]=0;matrix xbeta1=beta'*x1;matrix xbeta0=beta'*x0;matrix prob1=normal(xbeta1[1,1]);matrix prob0=normal(xbeta0[1,1]);matrix me_1=prob1-prob0;matrix pdf1=normalden(xbeta1[1,1]);matrix pdf0=normalden(xbeta0[1,1]);matrix G1=pdf1*x1 - pdf0*x0;matrix v_c1=G1'*covp*G1;matrix se_me_1=sqrt(v_c1[1,1]);* marginal effect of workplace bans;matrix list me_1;* standard error of workplace a;matrix list se_me_1;

Page 31: Logit/Probit Models

31

symmetric me_1[1,1] c1r1 -.06689591

. * standard error of workplace a;

. matrix list se_me_1;

symmetric se_me_1[1,1] c1r1 .00756336

------------------------------------------------------------------------------ smoker | dF/dx Std. Err. z P>|z| x-bar [ 95% C.I. ]---------+-------------------------------------------------------------------- age | -.0003951 .0002902 -1.36 0.173 38.5474 -.000964 .000174 incomel | -.0289139 .0047173 -6.13 0.000 10.421 -.03816 -.019668 male*| .0166757 .0071979 2.33 0.020 .39476 .002568 .030783 black*| -.0320621 .0102295 -3.04 0.002 .111945 -.052111 -.012013hispanic*| -.0658551 .0125926 -4.80 0.000 .060709 -.090536 -.041174 hsgrad*| -.053335 .013018 -4.01 0.000 .335527 -.07885 -.02782 somecol*| -.1062358 .0122819 -8.05 0.000 .268545 -.130308 -.082164 college*| -.2149199 .0114584 -16.49 0.000 .329376 -.237378 -.192462 worka*| -.0668959 .0075634 -9.05 0.000 .68514 -.08172 -.052072---------+--------------------------------------------------------------------

Page 32: Logit/Probit Models

32

Logit and Standard Normal CDF

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45

-7 -5 -3 -1 1 3 5 7

X

Y

Standard Normal

Logit

Page 33: Logit/Probit Models

33

Pseudo R2

• LLk log likelihood with all variables• LL1 log likelihood with only a constant• 0 > LLk > LL1 so | LLk | < |LL1|

• Pseudo R2 = 1 - |LL1/LLk| • Bounded between 0-1• Not anything like an R2 from a regression

Page 34: Logit/Probit Models

34

Predicting Y

• Let b be the estimated value of β

• For any candidate vector of xi , we can predict probabilities, Pi

• Pi = Ф(xib)

• Once you have Pi, pick a threshold value, T, so that you predict

• Yp = 1 if Pi > T

• Yp = 0 if Pi ≤ T

• Then compare, fraction correctly predicted

Page 35: Logit/Probit Models

35

• Question: what value to pick for T?

• Can pick .5 – what some textbooks suggest– Intuitive. More likely to engage in the activity

than to not engage in it– When is small (large), this criteria does a

poor job of predicting Yi=1 (Yi=0)

Page 36: Logit/Probit Models

36

• *predict probability of smoking;• predict pred_prob_smoke;• * get detailed descriptive data about predicted

prob;• sum pred_prob, detail;

• * predict binary outcome with 50% cutoff;• gen pred_smoke1=pred_prob_smoke>=.5;• label variable pred_smoke1 "predicted smoking, 50%

cutoff";

• * compare actual values;• tab smoker pred_smoke1, row col cell;

Page 37: Logit/Probit Models

37

. predict pred_prob_smoke; (option p assumed; Pr(smoker)) . * get detailed descriptive data about predicted prob; . sum pred_prob, detail; Pr(smoker) ------------------------------------------------------------- Percentiles Smallest 1% .0959301 .0615221 5% .1155022 .0622963 10% .1237434 .0633929 Obs 16258 25% .1620851 .0733495 Sum of Wgt. 16258 50% .2569962 Mean .2516653 Largest Std. Dev. .0960007 75% .3187975 .5619798 90% .3795704 .5655878 Variance .0092161 95% .4039573 .5684112 Skewness .1520254 99% .4672697 .6203823 Kurtosis 2.149247

Mean of predictedY is always close to actual mean(0.25163 in this case)

Predicted values closeTo sample mean of y

No one predicted to have a High probability of smokingBecause mean of Y closer to 0

Page 38: Logit/Probit Models

38

Some nice properties of the Logit

• Outcome, y=1 or 0• Treatment, x=1 or 0• Other covariates, x

• Context, – x = whether a baby is born with a low weight

birth– x = whether the mom smoked or not during

pregnancy

Page 39: Logit/Probit Models

39

• Risk ratio

RR = Prob(y=1|x=1)/Prob(y=1|x=0)

Differences in the probability of an event when x is and is not observed

How much does smoking elevate the chance your child will be a low weight birth

Page 40: Logit/Probit Models

40

• Let Yyx be the probability y=1 or 0 given x=1 or 0

• Think of the risk ratio the following way

• Y11 is the probability Y=1 when X=1• Y10 is the probability Y=1 when X=0

• Y11 = RR*Y10

Page 41: Logit/Probit Models

41

• Odds Ratio OR=A/B = [Y11/Y01]/[Y10/Y00]

A = [Pr(Y=1|X=1)/Pr(Y=0|X=1)]

= odds of Y occurring if you are a smoker

B = [Pr(Y=1|X=0)/Pr(Y=0|X=0)]

= odds of Y happening if you are not a smoker

What are the relative odds of Y happening if you do or do not experience X

Page 42: Logit/Probit Models

42

• Suppose Pr(Yi =1) = F(βo+ β1Xi + β2Z) and F is the logistic function

• Can show that

• OR = exp(β1) = e β1

• This number is typically reported by most statistical packages

Page 43: Logit/Probit Models

43

• Details• Y11 = exp(βo+ β1 + β2Z) /(1+ exp(βo+ β1+ β2Z) )

• Y10 = exp(βo+ β2Z)/(1+ exp(βo+β2Z))

• Y01 = 1 /(1+ exp(βo+ β1 + β2Z) )

• Y00 = 1/(1+ exp(βo+β2Z)

• [Y11/Y01] = exp(βo+ β1 + β2Z)

• [Y10/Y00] = exp(βo+ β2Z)

• OR=A/B = [Y11/Y01]/[Y10/Y00]

= exp(βo+ β1 + β2Z)/ exp(βo + β2Z)

= exp(β1)

Page 44: Logit/Probit Models

44

• Suppose Y is rare, mean is close to 0– Pr(Y=0|X=1) and Pr(Y=0|X=0) are both close

to 1, so they cancel

• Therefore, when mean is close to 0– Odds Ratio ≈ Risk Ratio

• Why is this nice?

Page 45: Logit/Probit Models

45

Population Attributable Risk

• PAR

• Fraction of outcome Y attributed to X

• Let xs be the fraction use of x

• PAR = (RR – 1)xs /[(1-xs) + RRxs]

• Derived on next 2 slides

Page 46: Logit/Probit Models

46

Population attributable risk

• Average outcome in the population

• yc = (1-xs) Y10 + xs Y11 = (1- xs)Y10 + xs (RR)Y10

• Average outcomes are a weighted average of outcomes for X=0 and X=1

• What would the average outcome be in the absence of X (e.g., reduce smoking rates to 0)?

• Ya = Y10

Page 47: Logit/Probit Models

47

• Therefore – yc = current outcome

– Ya = Y10 outcome with zero smoking

– PAR = (yc – Ya)/yc

– Substitute definition of Ya and yc

– Reduces to (RR – 1)xs /[(1-xs) + RRxs]

Page 48: Logit/Probit Models

48

Example: Maternal Smoking and Low Weight Births

• 6% births are low weight– < 2500 grams – Average birth is 3300 grams (5.5 lbs)

• Maternal smoking during pregnancy has been identified as a key cofactor– 13% of mothers smoke – This number was falling about 1 percentage

point per year during 1980s/90s– Doubles chance of low weight birth

Page 49: Logit/Probit Models

49

Natality detail data

• Census of all births (4 million/year)• Annual files starting in the 60s• Information about

– Baby (birth weight, length, date, sex, plurality, birth injuries)

– Demographics (age, race, marital, educ of mom)– Birth (who delivered, method of delivery)– Health of mom (smoke/drank during preg, weight

gain)

Page 50: Logit/Probit Models

50

• Smoking not available from CA or NY

• ~3 million usable observations

• I pulled .5% random sample from 1995

• About 12,500 obs

• Variables: birthweight (grams), smoked, married, 4-level race, 5 level education, mothers age at birth

Page 51: Logit/Probit Models

51

• Notice a few things– 13.7% of women smoke– 6% have low weight birth

• Pr(LBW | Smoke) =10.28%

• Pr(LBW |~ Smoke) = 5.36%

• RR

= Pr(LBW | Smoke)/ Pr(LBW |~ Smoke)

= 0.1028/0.0536 = 1.92

RawNumbers

Page 52: Logit/Probit Models

52

Asking for odds ratios

• Logistic y x1 x2;

• In this case

• xi: logistic lowbw smoked age married i.educ5 i.race4;

Page 53: Logit/Probit Models

53

PAR

• PAR = (RR – 1) xs /[(1- xs) + RR xs]

• xs= 0.137• RR = 1.96

• PAR = 0.116• 11.6% of low weight births attributed to

maternal smoking

Page 54: Logit/Probit Models
Page 55: Logit/Probit Models
Page 56: Logit/Probit Models

0 1

0

1 1

*Pr( 1) *

1 10.045 5.3 / 22.222 0.239

22.222

DY D

Page 57: Logit/Probit Models
Page 58: Logit/Probit Models

Endowment effect

• Ask group to fill out a survey• As a thank you, give them a coffee mug

– Have the mug when they fill out the survey

• After the survey, offer them a trade of a candy bar for a mug

• Reverse the experiment – offer candy bar, then trade for a mug

• Comparison sample – give them a choice of mug/candy after survey is complete

Page 59: Logit/Probit Models
Page 60: Logit/Probit Models

Contrary to simply consumer choice model

• Standard util. theory model assume MRS between two good is symmetric

• Lack of trading suggests an “endowment” effect– People value the good more once they own it– Generates large discrepancies between WTP

and WTA

Page 61: Logit/Probit Models

Policy implications

• Example:– A) How much are you willing to pay for clean

air?– B) How much do we have to pay you to allow

someone to pollute– Answer to B) orders of magnitude larger than A)– Prior – estimate WTP via A and assume equals

WTA

• Thought of as loss aversion –

Page 62: Logit/Probit Models

Problem

• Artificial situations

• Inexperienced may not know value of the item

• Solution: see how experienced actors behave when they are endowed with something they can easily value

• Two experiments: baseball card shows and collectible pins

Page 63: Logit/Probit Models

Baseball cards

• Two pieces of memorabilia– Game stub from game Cal Ripken Jr set the

record for consecutive games played (vs. KC, June 14, 1996)

– Certificate commemorating Nolan Ryans’ 300th win

• Ask people to fill out a 5 min survey. In return, they receive one of the pieces, then ask for a trade

Page 64: Logit/Probit Models
Page 65: Logit/Probit Models
Page 66: Logit/Probit Models