The Multiple Regression Model
Two Explanatory Variables
yt = 1 + 2xt2 + 3xt3 + εt
yt
xt2
= 2 xt3
yt = 3xt affect yt
separately
But least squares estimation of 2
now depends upon both xt2 and xt3 .
Correlated Variables
yt = output xt2 = capital xt3 = labor
Always 5 workers per machine.
If number of workers per machine is never varied, it becomes impossible to tell if the machines or the workers are responsible for changes in output.
yt = 1 + 2xt2 + 3xt3 + εt
The General Model
yt = 1 + 2xt2 + 3xt3 +. . .+ KxtK + εt
The parameter 1 is the intercept (constant) term.
The variable attached to 1 is xt1= 1.
Usually, the number of explanatory variables is said to be K1 (ignoring xt1= 1), while the
number of parameters is K. (Namely: 1 . . . K).
Statistical Properties of εt
1. E(εt) = 0
2. var(εt) = 2
covεt , εs= for t s
4. εt ~ N(0, 2)
Statistical Properties of yt
1. E (yt) = 1 + 2xt2 +. . .+ KxtK
2. var(yt) = var(εt) = 2
cov(yt ,ys) = cov(εt , εs)= 0 t s
4. yt ~ N(1+2xt2 +. . .+KxtK, 2)
Assumptions
1. yt = 1 + 2xt2 +. . .+ KxtK + εt
2. E (yt) = 1 + 2xt2 +. . .+ KxtK
3. var(yt) = var(εt) = 2
cov(yt ,ys) = cov(εt , εs) = 0 t s
5. The values of xtk are not random
6. yt ~ N(1+2xt2 +. . .+KxtK, 2)
Least Squares Estimation
yt = 1 + 2xt2 + 3xt3 + εt
S S(1, 2, 3) = yt12xt23xt3
Define: yt = yt y*
xt2 = xt2 x2*
xt3 = xt3 x3*
T
t=1
b3 =yt xt3xt2 yt xt2xt3xt2* * * * * * *2
xt2 xt3 xt2xt3* * * *2 2 2
b2 =yt xt2xt3 yt xt3xt2xt3* * * * * * *2
xt2 xt3 xt2xt3* * * *2 2 2
Least Squares Estimators
b1 = y – b2x2 – b3x3
Dangers of Extrapolation
Statistical models generally are good onlywithin the relevant range. This meansthat extending them to extreme data valuesoutside the range of the original data oftenleads to poor and sometimes ridiculous results.
If height is normally distributed and the normal ranges from minus infinity to plus infinity, pity the man minus three feet tall.
Interpretation of Coefficientsbj represents an estimate of the mean cha
nge in y responding to a one-unit change in xj when all other independent variables are held constant. Hence, bj is called the partial coefficient.
Note that regression analysis cannot be interpreted as a procedure for establishing a cause-and-effect relationship between variables.
Universal Set
B
x2 x3x2 / x3
x3 / x2
Error Variance Estimation
2^ =
εt^
Unbiased estimator of the error variance:
2
2
^
Transform to a chi-square distribution:
Gauss-Markov Theorem
Under the first of five assumptions of the multiple regression model, the ordinary least squares estimators have the smallest variance of all linear and unbiased estimators. This means that the least squares estimators are the Best Linear U nbiased Estimators (BLUE).
Variances
yt = 1 + 2xt2 + 3xt3 + εt
var(b3) =
(1 r23)(xt3 x3)2
22
var(b2) =(1 r23)(xt2 x2)
22
2
(xt2 x2)2 (xt3 x3)
2
where r23 = (xt2 x2)(xt3 x3)
When r23 = 0these reduceto the simpleregressionformulas.
Variance Decomposition
The variance of an estimator is smaller when:
1. The error variance, 2, is smaller:
2 0 .
2. The sample size, T, is larger:
(xt2 x2)2 .
3. The variable values are more spread out: (xt2 x2)
2 .
4. The correlation is close to zero: r23 0 .
2
t = 1
T
Covariances
yt = 1 + 2xt2 + 3xt3 + εt
where r23 =
(xt2 x2)2 (xt3 x3)
2
(xt2 x2)(xt3 x3)
(1 r23) (xt2 x2)2 (xt3 x3)
2
cov(b2,b3) = 2
r23 2
Covariance Decomposition
1. The error variance, 2, is larger.
2. The sample size, T, is smaller.
3. The values of the variables are less spread out.
4. The correlation, r23, is high.
The covariance between any two estimatorsis larger in absolute value when:
Var-Cov Matrix
yt = 1 + 2xt2 + 3xt3 + εt
var(b1) cov(b1,b2) cov(b1,b3)cov(b1,b2,b3) = cov(b1,b2) var(b2) cov(b2,b3) cov(b1,b3) cov(b2,b3) var(b3)
The least squares estimators b1, b2, and b3
have covariance matrix:
Normal
yt = 1 + 2x2t + 3x3t +. . .+ KxKt + εt
yt ~N (1 + 2x2t + 3x3t +. . .+ KxKt), 2
εt ~ N(0, 2)This implies and is implied by:
bk ~ N k, var(bk)
z = ~ N(0,1) for k = 1,2,...,Kbk k
var(bk)
Since bk is a linear
function of the yt:
Student-t
bk k
var(bk)^t = =
bk k
se(bk)
Since generally the population varianceof bk , var(bk) , is unknown, we estimate it with which uses
2 instead of 2.var(bk)^ ^
t has a Student-t distribution with df=(TK).
Interval Estimation
bk k
se(bk)P tc < < tc = 1
tc is critical value for (T-K) degrees of freedom
such that P( t > tc ) = /2.
P bk tc se(bk) < k < bk + tc se(bk) = 1
Interval endpoints:bk tc se(bk) , bk + tc se(bk)
Student - t Test
yt = 1 + 2Xt2 + 3Xt3 + 4Xt4 + εt
Student-t tests can be used to test any linearcombination of the regression coefficients:
H0: 2 + 3 + 4 = 1H0: 1 = 0
H0: 32 73 = 21 H0: 2 3 < 5
Every such t-test has exactly TK degrees of freedom where K = # of coefficients estimated(including the intercept).
One Tail Test
yt = 1 + 2Xt2 + 3Xt3 + 4Xt4 + εt
H0: 3 < 0
H1: 3 > 0b3
se(b3)t = ~ t (TK)
tc0
df = TK = T4
Two Tail Test
yt = 1 + 2Xt2 + 3Xt3 + 4Xt4 + εt
b2
se(b2)t = ~ t (TK)
tc0
df = TK = T4
-tc
H0: 2 = 0
H1: 2 0
Goodness - of - Fit
Coefficient of Determination
SSTR2 = = (yt y)2t = 1
T^
SSR
(yt y)2t = 1
T
0 < R2 < 1
Adjusted R-Squared
Adjusted Coefficient of Determination
Original:
Adjusted:
SST/(T1)R2 = 1 SSE/(TK)
SST= 1 SSER2 =SSTSSR
Computer Output
Table 8.2 Summary of Least Squares Results Variable Coefficient Std Error t-value p-value constant 104.79 6.48 16.17 0.000price 6.642 3.191 2.081 0.042advertising 2.984 0.167 17.868 0.000
b2
se(b2)t = =
6.642
3.1912.081=
Reporting Your Results
yt = Xt2 + Xt3^
(6.48) (3.191) (0.167) (s.e.)
yt = Xt2 + Xt3^
(16.17) (-2.081) (17.868) (t)
Reporting t-statistics:
Reporting standard errors:
H0: 2 = 0H1: 2 = 0
yt = 1 + 2Xt2 + 3Xt3 + 4Xt4 + εt
H0: yt = 1 + 3Xt3 + 4Xt4 + εt
H1: yt = 1 + 2Xt2 + 3Xt3 + 4Xt4 + εt
H0: Restricted Model
H1: Unrestricted Model
Single Restriction F-Test
yt = 1 + 2Xt2 + 3Xt3 + 4Xt4 + εt
dfd = TK = 49dfn = J = 1
(SSER SSEU)/J
SSEU/(TK)F =
(1964.758 1805.168)/1
1805.168/(52 3)=
= 4.33
By definition this is the t-statistic squared:t = 2.081 F = t2 =
H0: 2 = 0
H1: 2 = 0 ~ FJ, T-K
Under H0
yt = 1 + 2Xt2 + 3Xt3 + 4Xt4 + εt
H0: yt = 1 + 3Xt3 + εt
H1: yt = 1 + 2Xt2 + 3Xt3 + 4Xt4 + εt
H0: Restricted Model
H1: Unrestricted Model
H0: 2 = 0, 4 = 0
H1: H0 not true
Multiple Restriction F-Test
yt = 1 + 2Xt2 + 3Xt3 + 4Xt4 + εt
H0: 2 = 0, 4 = 0
H1: H0 not true
dfd = TK = 49
dfn = J = 2(J: The number of hypothesis)
(SSER SSEU)/J
SSEU/(TK)F =
First run the restrictedregression by dropping Xt2 and Xt4 to get SSER.Next run unrestricted regression to get SSEU .
~ F J, T-K
Under H0
F-Tests
(SSER SSEU)/J
SSEU/(TK)F =
F-Tests of this type are always right-tailed, even for left-sided or two-sided hypotheses, because any deviation from the null will make the F value bigger (move rightward).
0 Fc
f(F)
F
~ F J, T-K
F-Test of Entire Equation
yt = 1 + 2Xt2 + 3Xt3 + εt
H0: 2 = 3 = 0
H1: H0 not true
dfd = TK = 49dfn = J = 2
(SSER SSEU)/J
SSEU/(TK)F =
(13581.35 1805.168)/2
1805.168/(52 3)=
= 159.828
We ignore 1. Why?
F2, 49, 0.005 = 3.187
= 0.05
Reject H0!
ANOVA Table
Table 8.3 Analysis of Variance Table Sum of MeanSource DF Squares Square F-Value Regression 2 11776.18 5888.09 159.828Error 49 1805.168 36.84Total 51 13581.35 p-value: 0.0001
SSTR2 = =SSR = 0.86711776.18
13581.35
84.36ˆ 2 MSE
Nonsample Information
ln(yt) = 1 + 2 ln(Xt2) + 3 ln(Xt3) + 4 ln(Xt4) + εt
A certain production process is known to beCobb-Douglas with constant returns to scale.
2 + 3 + 4 = 1where 4 = (1 2 3)
ln(yt /Xt4) = 1 + 2 ln(Xt2/Xt4) + 3 ln(Xt3 /Xt4) + εt
yt = 1 + 2 Xt2 + 3 Xt3 + εt* * *
Run least squares on the transformed model.Interpret coefficients same as in original model.
Collinear Variables
The term independent variables means an explanatory variable is independent of of the error term, but not necessarily independent of other explanatory variables.
Since economists typically have no controlover the implicit experimental design,explanatory variables tend to movetogether which often makes sorting outtheir separate influences rather problematic.
Effects of Collinearity
1. no least squares output when collinearity is exact.2. large standard errors and wide confidence intervals. 3. insignificant t-values even with high R2 and a signi
ficant F-value.4. estimates sensitive to deletion or addition of a few
observations or insignificant variables.5.The OLS estimators retain all their desired propertie
s (BLUE and consistency), but the problem is that the influential procedure may be uninformative.
A high degree of collinearity will produce:
Identifying Collinearity
Evidence of high collinearity include:
1. a high pairwise correlation between two explanatory variables (greater than .8 or .9).
2. a high R-squared (called Rj2) when regressing
one explanatory variable (Xj) on the other explanatory variables. Variance inflation factor (VIF): VIF (bj) = 1 / (1 Rj
2) ( > 10)
3. high R2 and a statistically significant F-value when the t-values are statistically insignificant.
Mitigating CollinearityHigh collinearity is not a violation ofany least squares assumption, but rather a lack of adequate information in the sample:
1. Collect more data with better information.2. Impose economic restrictions as
appropriate.3. Impose statistical restrictions when
justified.4. Delete the variable which is highly
collinear with other explanatory variables.
Prediction
Given a set of values for the explanatoryvariables, (1 X02 X03), the best linearunbiased predictor of y is given by:
yt = 1 + 2Xt2 + 3Xt3 + εt
This predictor is unbiased in the sensethat the average value of the forecasterror is zero.
y0 = b1 + b2X02 + b3X03^
Top Related