Economics 105: Statistics
description
Transcript of Economics 105: Statistics
Economics 105: Statistics• Go over GH 21 due Wednesday• GH 22 due Friday
• The relationship between the outcome and the explanatory variable may not be linear
• Make the scatterplot to examine• Example: Quadratic model
• Example: Log transformations
• Log always means natural log (ln) in economics
Nonlinear Relationships
Quadratic Regression Model
• where:β0 = Y interceptβ1 = regression coefficient for linear effect of X on Yβ2 = regression coefficient for quadratic effect of X on Yεi = random error in Y for observation i
Model form:
Linear fit does not give random residuals
Linear vs. Nonlinear Fit
Nonlinear fit gives random residuals
X
resi
dual
s
X
Y
X
resi
dual
s
Y
X
Quadratic Regression Model
Quadratic models may be considered when the scatter diagram takes on one of the following shapes:
X1
Y
X1X1
YYY
β1 < 0 β1 > 0 β1 < 0 β1 > 0
β1 = the coefficient of the linear term β2 = the coefficient of the squared term
X1
β2 > 0 β2 > 0 β2 < 0 β2 < 0
Testing the Overall Quadratic Model
• Test for Overall RelationshipH0: β1 = β2 = 0 (X does not have a significant effect on Y)H1: β1 and/or β2 ≠ 0 (X does have a significant effect on Y)
–F-test statistic =
• Estimate the quadratic model to obtain the regression equation:
Testing for Significance: Quadratic Effect
• t-testH0: β2 = 0
H1: β2 0
Example: Quadratic Model• Purity increases as filter time
increases:Purity
FilterTime
3 17 28 3
15 522 733 840 1054 1267 1370 1478 1585 1587 1699 17
Example: Quadratic Model (continued)
Regression Statistics
R Square 0.96888
Adjusted R Square 0.96628
Standard Error 6.15997
• Simple regression results:
Purity = -11.283 + 5.985 Time
CoefficientsStandard
Error t Stat P-value
Intercept -11.28267 3.46805 -3.25332 0.00691
Time 5.98520 0.30966 19.32819 2.07E-10
FSignificance
F
373.5790 2.0778E-10
^
t statistic, F statistic, and r2 are all high, but the residuals are not random:
CoefficientsStandard
Error t Stat P-value
Intercept 1.53870 2.24465 0.68550 0.50722
Time 1.56496 0.60179 2.60052 0.02467
Time-squared 0.24516 0.03258 7.52406 1.165E-5
Regression Statistics
R Square 0.99494Adjusted R Squar 0.99402Standard Error 2.59513
FSignificance
F
1080.733 2.368E-13
• Quadratic regression results:
Purity = 1.539 + 1.565 Time + 0.245 (Time)2^
Example: Quadratic Model(continued)
The quadratic term is significant and improves the model: r2 is higher and SYX is lower, residuals are now random
Coefficient of Determination for Multiple Regression
• Reports the proportion of total variation in Y explained by all X variables taken together
• Consider this model
Regression StatisticsMultiple R 0.72213R Square 0.52148Adjusted R Squar 0.44172Standard Error 47.46341Observations 15
ANOVA df SS MS F Significance FRegression 2 29460.027 14730.013 6.53861 0.01201Residual 12 27033.306 2252.776Total 14 56493.333
CoefficientsStandard
Error t Stat P-value Lower 95% Upper 95%Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
52.1% of the variation in pie sales is explained by the variation in price and advertising
Multiple Coefficient of Determination(continued)
Adjusted R2
• R2 never decreases when a new X variable is added to the model–disadvantage when comparing models
• What is the net effect of adding a new variable?–We lose a degree of freedom when a new X
variable is added–Did the new X variable add enough
explanatory power to offset the loss of one degree of freedom?
Adjusted R2
• Penalizes excessive use of unimportant variables• Smaller than r2 and can increase, decrease, or stay
same• Useful in comparing among models, but don’t rely
too heavily on it – use theory and statistical signif
(continued)
Regression StatisticsMultiple R 0.72213R Square 0.52148
Adjusted R Squar 0.44172Standard Error 47.46341Observations 15
ANOVA df SS MS F Significance FRegression 2 29460.027 14730.013 6.53861 0.01201Residual 12 27033.306 2252.776Total 14 56493.333
CoefficientsStandard
Error t Stat P-value Lower 95% Upper 95%Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
44.2% of the variation in pie sales is explained by the variation in price and advertising, taking into account the sample size and number of independent variables
(continued)Adjusted R2
• Consider a change in X1 of ΔX1
• X2 is held constant!• Average effect on Y is difference in pop reg models
• Estimate of this pop difference is
Average Effect on Y of a change in X in Nonlinear Models
Example
Example
• What is the average effect of an increase in Age from 30 to 40 years? 40 to 50 years?• 2.03*(40-30) - .02*(1600 – 900) = 20.3 – 14 = 6.3• 2.03*(50-40) - .02*(2500 – 1600) = 20.3 – 18 = 2.3
• Units?!
http://xkcd.com/985/
Example
Example