Lecture 15. Dummy variables, continued - Personal World...
Transcript of Lecture 15. Dummy variables, continued - Personal World...
Lecture 15. Dummy variables, continued Seasonal effects in time series Consider relation between electricity consumption Y and electricity price X . The data are quarterly time series. First model ttt uXY ++= lnln 21 αα What is the interpretation of 2α ?
Because electricity consumption depends on the weather and special circumstances (Christmas, summer holidays) we expect is to be different in the quarters, even if the price is constant. Solution: Define
11 =tD if t is the first quarter of a year 01 =tD if not
Define 432 ,, ttt DDD analogously for the other quarters.
To allow for differences in the average consumption between quarters we write 44332211 DDD ββββα +++= Substitution in the regression model gives
tttttt uXDDDY +++++= lnln 24433221 αββββ Why not 1tD in model?
Average electricity consumption in the four quarters (given price tX ) Quarter 1 ttt XXYE ln)|(ln 21 αβ += Quarter 2 ttt XXYE ln)|(ln 221 αββ ++= Quarter 3 ttt XXYE ln)|(ln 231 αββ ++= Quarter 4 ttt XXYE ln)|(ln 241 αββ ++= Interpretation 432 ,, βββ : Relative change (relative to quarter 1) of electricity consumption in quarters 2,3,4.
Change of reference quarter to quarter 2:
44331211 DDD γγγγα +++= Intercept in the four quarters Reference quarter is quarter 1 4131211 ,,, βββββββ +++ Reference quarter is quarter 2 4131121 ,,, γγγγγγγ +++
Hence
24423322211 ,,, ββγββγβγββγ −=−=−=+= The same relations hold for the OLS estimates. Change of reference quarter does not require re-estimation. Same result holds for change in reference category for any qualitative variable with more than two values (e.g. earlier example with type of work)
If we want to investigate whether price elasticity depends on season we write
48372652 DDD ββββα +++= Substitution gives
ttttttt
ttttt
uXDXDXD
XDDDY
+++
+++++=
lnlnln
lnln
483726
54433221
βββ
βββββ
Price elasticities in the four quarters Quarter 1 5β Quarter 2 65 ββ + Quarter 3 75 ββ + Quarter 4 85 ββ +
Tests:
• Average demand does not change with the season • Price elasticity constant over seasons
Derive price elasticities if we choose period 2 as reference period.
Structural change Events may change the relation between economic variables.
• Consider time series data on dependent variable Y and independent variable X for e.g. years nt ,,1K= .
• In year 0nt = some event happens. • This event induces a structural change if the regression
coefficients change due to the event.
Original model (no structural change) ntuXY ttt ,,1, K=++= βα Model with structural change in 0n :
011 ,,1, ntuXY ttt K=++= βα nntuXY ttt ,,1,)( 02121 K+=++++= ββαα
This is equivalent to introducing the dummy variable
0=tD for 0,,1 nt K=
1=tD for nnt ,,10 K+= with the model
ntuXDXDY tttttt ,,1,2121 K=++++= ββαα Test for structural change can be done in two ways
• Estimate separate models and compare ESS • Estimate model with dummy and test 0,0 22 == βα
This gives the same value for the test statistic.
Outliers There may be individual observations that do not fit the relation See output/graphs Reason:
• Omitted variables • Error in the data • Some unknown event/circumstance
How to check this?
Introduce dummy variable
123, =iD for observation 23 (and 0 otherwise) Include this in the regression model and test whether coefficient is 0. See output.
Dependent Variable: LNWAGEMethod: Least SquaresDate: 11/01/01 Time: 08:42Sample: 1 49Included observations: 49
Variable Coefficient Std. Error t-Statistic Prob.
C 6.864366 0.186127 36.88002 0.0000EDUC 0.052987 0.017107 3.097432 0.0034
EXPER 0.020776 0.006321 3.286999 0.0020AGE -0.002250 0.003804 -0.591382 0.5574
RACE 0.071479 0.081543 0.876575 0.3856GENDER 0.242610 0.071645 3.386300 0.0015
R-squared 0.470916 Mean dependent var 7.454952Adjusted R-squared 0.409395 S.D. dependent var 0.312741S.E. of regression 0.240344 Akaike info criterion 0.100786Sum squared resid 2.483904 Schwarz criterion 0.332438Log likelihood 3.530733 F-statistic 7.654508Durbin-Watson stat 1.708658 Prob(F-statistic) 0.000032
-0.6
-0.4
-0.2
0.0
0.2
0.4
0.6
5 10 15 20 25 30 35 40 45
LNWAGE Residuals
obs Actual Fitted Residual Residual Plot
1 7.20415 7.20983 -0.005682 7.79770 7.64738 0.150323 7.44717 7.47824 -0.031074 7.28688 7.44899 -0.162125 7.40184 7.57869 -0.176856 7.20415 7.27025 -0.066107 7.37901 7.36391 0.015098 7.04229 7.06440 -0.022119 7.35628 7.78801 -0.4317310 7.31055 7.61880 -0.3082511 7.11802 7.16206 -0.0440412 7.20415 7.19236 0.0117913 7.20415 7.36341 -0.1592614 8.12829 7.90676 0.2215315 7.51698 7.67094 -0.1539616 6.88857 7.34406 -0.4554917 7.20415 7.54399 -0.3398418 7.35628 7.14941 0.2068719 7.07918 7.21830 -0.1391120 7.20415 7.41777 -0.2136221 7.20415 7.35979 -0.1556422 7.68110 7.56638 0.1147223 7.24566 7.69937 -0.4537224 7.65681 7.51358 0.1432325 7.70436 7.68690 0.0174626 8.18172 7.69434 0.4873827 7.58680 7.32344 0.2633728 7.11802 7.09935 0.0186629 7.56320 7.43966 0.1235430 7.68018 7.43267 0.2475131 7.76853 7.35286 0.4156832 7.20415 7.41537 -0.2112233 7.51698 7.28394 0.2330434 7.86825 7.65101 0.2172435 7.83716 7.72690 0.1102636 7.37901 7.39163 -0.0126237 7.51698 7.69670 -0.1797338 7.70436 7.45990 0.2444639 7.33237 7.18733 0.1450440 7.28688 7.29798 -0.0111041 8.10380 8.01117 0.0926342 8.25140 7.75389 0.4975143 7.51698 7.48605 0.0309344 7.28688 7.29015 -0.0032845 7.26753 7.58319 -0.3156746 7.65681 7.35894 0.2978747 7.51698 7.51702 -4.2E-0548 7.16085 7.37542 -0.2145849 7.16085 7.20015 -0.03930
Dependent Variable: LNWAGEMethod: Least SquaresDate: 10/29/01 Time: 22:21Sample: 1 49Included observations: 49
Variable Coefficient Std. Error t-Statistic Prob.
C 6.789626 0.182398 37.22432 0.0000GENDER 0.261107 0.069438 3.760286 0.0005
AGE -0.001271 0.003687 -0.344724 0.7320EXPER 0.018787 0.006149 3.055107 0.0039EDUC 0.061945 0.016981 3.647842 0.0007RACE 0.065118 0.078464 0.829904 0.4113D23 -0.530696 0.249938 -2.123314 0.0397
R-squared 0.522205 Mean dependent var 7.454952Adjusted R-squared 0.453948 S.D. dependent var 0.312741S.E. of regression 0.231101 Akaike info criterion 0.039638Sum squared resid 2.243118 Schwarz criterion 0.309898Log likelihood 6.028867 F-statistic 7.650623Durbin-Watson stat 1.653329 Prob(F-statistic) 0.000014
Dependent Variable: LNWAGEMethod: Least SquaresDate: 11/01/01 Time: 08:47Sample: 1 49Included observations: 49
Variable Coefficient Std. Error t-Statistic Prob.
C 7.401588 0.160656 46.07096 0.0000GENDER 0.276639 0.074299 3.723311 0.0006EXPER 0.017241 0.004698 3.669938 0.0007EDUC 0.022429 0.013386 1.675632 0.1018AGE -0.002105 0.002676 -0.786674 0.4362
RACE 0.095861 0.060972 1.572208 0.1240D23 -0.293381 0.187884 -1.561503 0.1265
CLERICAL -0.419411 0.083845 -5.002245 0.0000CRAFTS -0.342397 0.081728 -4.189469 0.0002MAINT -0.525459 0.092669 -5.670253 0.0000
R-squared 0.778514 Mean dependent var 7.454952Adjusted R-squared 0.727402 S.D. dependent var 0.312741S.E. of regression 0.163285 Akaike info criterion -0.606736Sum squared resid 1.039816 Schwarz criterion -0.220650Log likelihood 24.86503 F-statistic 15.23148Durbin-Watson stat 1.985725 Prob(F-statistic) 0.000000
Application: Election 2000 in Florida
Effect of butterfly ballot in Palm Beach County on Buchanan vote Data for all Florida counties
• Votes candidates • Size and demographic composition of counties (census). What is
relevant?
Model
• Dependent variable?
• Independent variables?
• How do we check whether Palm Beach is different?
Election 2000 in Florida: Butterfly ballot in Palm Beach county
• Outcome of 2000 presidential election disputed.
• Claims of voting irregularities in Florida.
• One issue was a confusing ballot design in Palm Beach county, the butterfly ballot.
• Order of punch holes different from order of the two main candidates,
Bush and Gore.
• Claim: Many voters mistakenly voted for Buchanan, the candidate of the Reform Party.
Research question: Did Buchanan get an unusually large fraction of the votes in Palm Beach county?
Regression model
Dependent variable: log of fraction votes for Buchanan.
Independent variables
• Percentage of population Hispanic • Percentage of population Black
• Percentage of population over 65
• Percentage of population with college degree
• Income (1000$ per year)
• Population (10000)
Descriptive statistics
Date: 04/06/05 Sample: 1 67
Time: 22:14
FRACBUCHA FRACGORE FRACBUSH PERCBLACK PERCHISPAN PERCOVER6 PERCCOLLE INCOME1000 POPULATION
Mean 0.004697 0.428125 0.551544 15.89701 6.288060 16.80293 13.89701 26.18864 21.87156 Median 0.003976 0.430705 0.549881 14.40000 3.500000 14.60939 11.90000 25.71800 8.191900 Maximum 0.017452 0.676075 0.741084 61.80000 54.40000 33.43856 37.10000 38.13000 204.4600 Minimum 0.000897 0.241105 0.310129 2.300000 0.900000 6.974674 5.200000 17.09800 0.628900 Std. Dev. 0.003218 0.091383 0.092058 11.07191 8.186436 7.011421 6.588534 4.794646 36.05383 Skewness 1.912928 0.348309 -0.282254 1.926733 3.699223 0.846034 1.203425 0.446578 3.023152 Kurtosis 7.448237 3.331569 3.187830 7.627821 19.94750 2.718083 4.690722 2.364601 13.30769
Jarque-Bera 96.10031 1.661643 0.988110 101.2424 954.6232 8.214687 24.15201 3.354070 398.6674 Probability 0.000000 0.435691 0.610147 0.000000 0.000000 0.016451 0.000006 0.186927 0.000000
Observations 67 67 67 67 67 67 67 67 67
OLS results: Basis equation
• Signs of coefficients plausible?
• Interpretation of coefficients: dependent variable is log!
OLS residuals: Graph
-1.0
-0.5
0.0
0.5
1.0
1.5
2.0
5 10 15 20 25 30 35 40 45 50 55 60 65
LNFRACBUCHANAN Residuals
OLS residuals: Table
OLS results: Palm Beach dummy To check whether Palm Beach is special include dummy that is 1 for Palm Beach (observation 50) and 0 otherwise
Interpretation of Palm Beach dummy dy 794.1521.2ln ++−= Hence 794.1lnln =− normalobserved yy so that
103.51794.1 =−=− e
yyy
normal
normalobserved
i.e. fraction 5 times higher than expected. Fraction is .00789.
Sensitivity check: Include log fraction Bush vote
Effect on Bush and Gore vote