Download - The Multiple Regression Model. Two Explanatory Variables y t = 1 + 2 x t2 + 3 x t3 + ε t ytyt x t2 = 2 x t3 ytyt = 3 x t affect y t.

Transcript
Page 1: The Multiple Regression Model. Two Explanatory Variables y t =  1 +  2 x t2 +  3 x t3 + ε t ytyt  x t2 =  2  x t3 ytyt =  3 x t affect y t.

The Multiple Regression Model

Page 2: The Multiple Regression Model. Two Explanatory Variables y t =  1 +  2 x t2 +  3 x t3 + ε t ytyt  x t2 =  2  x t3 ytyt =  3 x t affect y t.

Two Explanatory Variables

yt = 1 + 2xt2 + 3xt3 + εt

yt

xt2

= 2 xt3

yt = 3xt affect yt

separately

But least squares estimation of 2

now depends upon both xt2 and xt3 .

Page 3: The Multiple Regression Model. Two Explanatory Variables y t =  1 +  2 x t2 +  3 x t3 + ε t ytyt  x t2 =  2  x t3 ytyt =  3 x t affect y t.

Correlated Variables

yt = output xt2 = capital xt3 = labor

Always 5 workers per machine.

If number of workers per machine is never varied, it becomes impossible to tell if the machines or the workers are responsible for changes in output.

yt = 1 + 2xt2 + 3xt3 + εt

Page 4: The Multiple Regression Model. Two Explanatory Variables y t =  1 +  2 x t2 +  3 x t3 + ε t ytyt  x t2 =  2  x t3 ytyt =  3 x t affect y t.

The General Model

yt = 1 + 2xt2 + 3xt3 +. . .+ KxtK + εt

The parameter 1 is the intercept (constant) term.

The variable attached to 1 is xt1= 1.

Usually, the number of explanatory variables is said to be K1 (ignoring xt1= 1), while the

number of parameters is K. (Namely: 1 . . . K).

Page 5: The Multiple Regression Model. Two Explanatory Variables y t =  1 +  2 x t2 +  3 x t3 + ε t ytyt  x t2 =  2  x t3 ytyt =  3 x t affect y t.

Statistical Properties of εt

1. E(εt) = 0

2. var(εt) = 2

covεt , εs= for t s

4. εt ~ N(0, 2)

Page 6: The Multiple Regression Model. Two Explanatory Variables y t =  1 +  2 x t2 +  3 x t3 + ε t ytyt  x t2 =  2  x t3 ytyt =  3 x t affect y t.

Statistical Properties of yt

1. E (yt) = 1 + 2xt2 +. . .+ KxtK

2. var(yt) = var(εt) = 2

cov(yt ,ys) = cov(εt , εs)= 0 t s

4. yt ~ N(1+2xt2 +. . .+KxtK, 2)

Page 7: The Multiple Regression Model. Two Explanatory Variables y t =  1 +  2 x t2 +  3 x t3 + ε t ytyt  x t2 =  2  x t3 ytyt =  3 x t affect y t.

Assumptions

1. yt = 1 + 2xt2 +. . .+ KxtK + εt

2. E (yt) = 1 + 2xt2 +. . .+ KxtK

3. var(yt) = var(εt) = 2

cov(yt ,ys) = cov(εt , εs) = 0 t s

5. The values of xtk are not random

6. yt ~ N(1+2xt2 +. . .+KxtK, 2)

Page 8: The Multiple Regression Model. Two Explanatory Variables y t =  1 +  2 x t2 +  3 x t3 + ε t ytyt  x t2 =  2  x t3 ytyt =  3 x t affect y t.

Least Squares Estimation

yt = 1 + 2xt2 + 3xt3 + εt

S S(1, 2, 3) = yt12xt23xt3

Define: yt = yt y*

xt2 = xt2 x2*

xt3 = xt3 x3*

T

t=1

Page 9: The Multiple Regression Model. Two Explanatory Variables y t =  1 +  2 x t2 +  3 x t3 + ε t ytyt  x t2 =  2  x t3 ytyt =  3 x t affect y t.

b3 =yt xt3xt2 yt xt2xt3xt2* * * * * * *2

xt2 xt3 xt2xt3* * * *2 2 2

b2 =yt xt2xt3 yt xt3xt2xt3* * * * * * *2

xt2 xt3 xt2xt3* * * *2 2 2

Least Squares Estimators

b1 = y – b2x2 – b3x3

Page 10: The Multiple Regression Model. Two Explanatory Variables y t =  1 +  2 x t2 +  3 x t3 + ε t ytyt  x t2 =  2  x t3 ytyt =  3 x t affect y t.

Dangers of Extrapolation

Statistical models generally are good onlywithin the relevant range. This meansthat extending them to extreme data valuesoutside the range of the original data oftenleads to poor and sometimes ridiculous results.

If height is normally distributed and the normal ranges from minus infinity to plus infinity, pity the man minus three feet tall.

Page 11: The Multiple Regression Model. Two Explanatory Variables y t =  1 +  2 x t2 +  3 x t3 + ε t ytyt  x t2 =  2  x t3 ytyt =  3 x t affect y t.

Interpretation of Coefficientsbj represents an estimate of the mean cha

nge in y responding to a one-unit change in xj when all other independent variables are held constant. Hence, bj is called the partial coefficient.

Note that regression analysis cannot be interpreted as a procedure for establishing a cause-and-effect relationship between variables.

Page 12: The Multiple Regression Model. Two Explanatory Variables y t =  1 +  2 x t2 +  3 x t3 + ε t ytyt  x t2 =  2  x t3 ytyt =  3 x t affect y t.

Universal Set

B

x2 x3x2 / x3

x3 / x2

Page 13: The Multiple Regression Model. Two Explanatory Variables y t =  1 +  2 x t2 +  3 x t3 + ε t ytyt  x t2 =  2  x t3 ytyt =  3 x t affect y t.

Error Variance Estimation

2^ =

εt^

Unbiased estimator of the error variance:

2

2

^

Transform to a chi-square distribution:

Page 14: The Multiple Regression Model. Two Explanatory Variables y t =  1 +  2 x t2 +  3 x t3 + ε t ytyt  x t2 =  2  x t3 ytyt =  3 x t affect y t.

Gauss-Markov Theorem

Under the first of five assumptions of the multiple regression model, the ordinary least squares estimators have the smallest variance of all linear and unbiased estimators. This means that the least squares estimators are the Best Linear U nbiased Estimators (BLUE).

Page 15: The Multiple Regression Model. Two Explanatory Variables y t =  1 +  2 x t2 +  3 x t3 + ε t ytyt  x t2 =  2  x t3 ytyt =  3 x t affect y t.

Variances

yt = 1 + 2xt2 + 3xt3 + εt

var(b3) =

(1 r23)(xt3 x3)2

22

var(b2) =(1 r23)(xt2 x2)

22

2

(xt2 x2)2 (xt3 x3)

2

where r23 = (xt2 x2)(xt3 x3)

When r23 = 0these reduceto the simpleregressionformulas.

Page 16: The Multiple Regression Model. Two Explanatory Variables y t =  1 +  2 x t2 +  3 x t3 + ε t ytyt  x t2 =  2  x t3 ytyt =  3 x t affect y t.

Variance Decomposition

The variance of an estimator is smaller when:

1. The error variance, 2, is smaller:

2 0 .

2. The sample size, T, is larger:

(xt2 x2)2 .

3. The variable values are more spread out: (xt2 x2)

2 .

4. The correlation is close to zero: r23 0 .

2

t = 1

T

Page 17: The Multiple Regression Model. Two Explanatory Variables y t =  1 +  2 x t2 +  3 x t3 + ε t ytyt  x t2 =  2  x t3 ytyt =  3 x t affect y t.

Covariances

yt = 1 + 2xt2 + 3xt3 + εt

where r23 =

(xt2 x2)2 (xt3 x3)

2

(xt2 x2)(xt3 x3)

(1 r23) (xt2 x2)2 (xt3 x3)

2

cov(b2,b3) = 2

r23 2

Page 18: The Multiple Regression Model. Two Explanatory Variables y t =  1 +  2 x t2 +  3 x t3 + ε t ytyt  x t2 =  2  x t3 ytyt =  3 x t affect y t.

Covariance Decomposition

1. The error variance, 2, is larger.

2. The sample size, T, is smaller.

3. The values of the variables are less spread out.

4. The correlation, r23, is high.

The covariance between any two estimatorsis larger in absolute value when:

Page 19: The Multiple Regression Model. Two Explanatory Variables y t =  1 +  2 x t2 +  3 x t3 + ε t ytyt  x t2 =  2  x t3 ytyt =  3 x t affect y t.

Var-Cov Matrix

yt = 1 + 2xt2 + 3xt3 + εt

var(b1) cov(b1,b2) cov(b1,b3)cov(b1,b2,b3) = cov(b1,b2) var(b2) cov(b2,b3) cov(b1,b3) cov(b2,b3) var(b3)

The least squares estimators b1, b2, and b3

have covariance matrix:

Page 20: The Multiple Regression Model. Two Explanatory Variables y t =  1 +  2 x t2 +  3 x t3 + ε t ytyt  x t2 =  2  x t3 ytyt =  3 x t affect y t.

Normal

yt = 1 + 2x2t + 3x3t +. . .+ KxKt + εt

yt ~N (1 + 2x2t + 3x3t +. . .+ KxKt), 2

εt ~ N(0, 2)This implies and is implied by:

bk ~ N k, var(bk)

z = ~ N(0,1) for k = 1,2,...,Kbk k

var(bk)

Since bk is a linear

function of the yt:

Page 21: The Multiple Regression Model. Two Explanatory Variables y t =  1 +  2 x t2 +  3 x t3 + ε t ytyt  x t2 =  2  x t3 ytyt =  3 x t affect y t.

Student-t

bk k

var(bk)^t = =

bk k

se(bk)

Since generally the population varianceof bk , var(bk) , is unknown, we estimate it with which uses

2 instead of 2.var(bk)^ ^

t has a Student-t distribution with df=(TK).

Page 22: The Multiple Regression Model. Two Explanatory Variables y t =  1 +  2 x t2 +  3 x t3 + ε t ytyt  x t2 =  2  x t3 ytyt =  3 x t affect y t.

Interval Estimation

bk k

se(bk)P tc < < tc = 1

tc is critical value for (T-K) degrees of freedom

such that P( t > tc ) = /2.

P bk tc se(bk) < k < bk + tc se(bk) = 1

Interval endpoints:bk tc se(bk) , bk + tc se(bk)

Page 23: The Multiple Regression Model. Two Explanatory Variables y t =  1 +  2 x t2 +  3 x t3 + ε t ytyt  x t2 =  2  x t3 ytyt =  3 x t affect y t.

Student - t Test

yt = 1 + 2Xt2 + 3Xt3 + 4Xt4 + εt

Student-t tests can be used to test any linearcombination of the regression coefficients:

H0: 2 + 3 + 4 = 1H0: 1 = 0

H0: 32 73 = 21 H0: 2 3 < 5

Every such t-test has exactly TK degrees of freedom where K = # of coefficients estimated(including the intercept).

Page 24: The Multiple Regression Model. Two Explanatory Variables y t =  1 +  2 x t2 +  3 x t3 + ε t ytyt  x t2 =  2  x t3 ytyt =  3 x t affect y t.

One Tail Test

yt = 1 + 2Xt2 + 3Xt3 + 4Xt4 + εt

H0: 3 < 0

H1: 3 > 0b3

se(b3)t = ~ t (TK)

tc0

df = TK = T4

Page 25: The Multiple Regression Model. Two Explanatory Variables y t =  1 +  2 x t2 +  3 x t3 + ε t ytyt  x t2 =  2  x t3 ytyt =  3 x t affect y t.

Two Tail Test

yt = 1 + 2Xt2 + 3Xt3 + 4Xt4 + εt

b2

se(b2)t = ~ t (TK)

tc0

df = TK = T4

-tc

H0: 2 = 0

H1: 2 0

Page 26: The Multiple Regression Model. Two Explanatory Variables y t =  1 +  2 x t2 +  3 x t3 + ε t ytyt  x t2 =  2  x t3 ytyt =  3 x t affect y t.

Goodness - of - Fit

Coefficient of Determination

SSTR2 = = (yt y)2t = 1

T^

SSR

(yt y)2t = 1

T

0 < R2 < 1

Page 27: The Multiple Regression Model. Two Explanatory Variables y t =  1 +  2 x t2 +  3 x t3 + ε t ytyt  x t2 =  2  x t3 ytyt =  3 x t affect y t.

Adjusted R-Squared

Adjusted Coefficient of Determination

Original:

Adjusted:

SST/(T1)R2 = 1 SSE/(TK)

SST= 1 SSER2 =SSTSSR

Page 28: The Multiple Regression Model. Two Explanatory Variables y t =  1 +  2 x t2 +  3 x t3 + ε t ytyt  x t2 =  2  x t3 ytyt =  3 x t affect y t.

Computer Output

Table 8.2 Summary of Least Squares Results Variable Coefficient Std Error t-value p-value constant 104.79 6.48 16.17 0.000price 6.642 3.191 2.081 0.042advertising 2.984 0.167 17.868 0.000

b2

se(b2)t = =

6.642

3.1912.081=

Page 29: The Multiple Regression Model. Two Explanatory Variables y t =  1 +  2 x t2 +  3 x t3 + ε t ytyt  x t2 =  2  x t3 ytyt =  3 x t affect y t.

Reporting Your Results

yt = Xt2 + Xt3^

(6.48) (3.191) (0.167) (s.e.)

yt = Xt2 + Xt3^

(16.17) (-2.081) (17.868) (t)

Reporting t-statistics:

Reporting standard errors:

Page 30: The Multiple Regression Model. Two Explanatory Variables y t =  1 +  2 x t2 +  3 x t3 + ε t ytyt  x t2 =  2  x t3 ytyt =  3 x t affect y t.

H0: 2 = 0H1: 2 = 0

yt = 1 + 2Xt2 + 3Xt3 + 4Xt4 + εt

H0: yt = 1 + 3Xt3 + 4Xt4 + εt

H1: yt = 1 + 2Xt2 + 3Xt3 + 4Xt4 + εt

H0: Restricted Model

H1: Unrestricted Model

Page 31: The Multiple Regression Model. Two Explanatory Variables y t =  1 +  2 x t2 +  3 x t3 + ε t ytyt  x t2 =  2  x t3 ytyt =  3 x t affect y t.

Single Restriction F-Test

yt = 1 + 2Xt2 + 3Xt3 + 4Xt4 + εt

dfd = TK = 49dfn = J = 1

(SSER SSEU)/J

SSEU/(TK)F =

(1964.758 1805.168)/1

1805.168/(52 3)=

= 4.33

By definition this is the t-statistic squared:t = 2.081 F = t2 =

H0: 2 = 0

H1: 2 = 0 ~ FJ, T-K

Under H0

Page 32: The Multiple Regression Model. Two Explanatory Variables y t =  1 +  2 x t2 +  3 x t3 + ε t ytyt  x t2 =  2  x t3 ytyt =  3 x t affect y t.

yt = 1 + 2Xt2 + 3Xt3 + 4Xt4 + εt

H0: yt = 1 + 3Xt3 + εt

H1: yt = 1 + 2Xt2 + 3Xt3 + 4Xt4 + εt

H0: Restricted Model

H1: Unrestricted Model

H0: 2 = 0, 4 = 0

H1: H0 not true

Page 33: The Multiple Regression Model. Two Explanatory Variables y t =  1 +  2 x t2 +  3 x t3 + ε t ytyt  x t2 =  2  x t3 ytyt =  3 x t affect y t.

Multiple Restriction F-Test

yt = 1 + 2Xt2 + 3Xt3 + 4Xt4 + εt

H0: 2 = 0, 4 = 0

H1: H0 not true

dfd = TK = 49

dfn = J = 2(J: The number of hypothesis)

(SSER SSEU)/J

SSEU/(TK)F =

First run the restrictedregression by dropping Xt2 and Xt4 to get SSER.Next run unrestricted regression to get SSEU .

~ F J, T-K

Under H0

Page 34: The Multiple Regression Model. Two Explanatory Variables y t =  1 +  2 x t2 +  3 x t3 + ε t ytyt  x t2 =  2  x t3 ytyt =  3 x t affect y t.

F-Tests

(SSER SSEU)/J

SSEU/(TK)F =

F-Tests of this type are always right-tailed, even for left-sided or two-sided hypotheses, because any deviation from the null will make the F value bigger (move rightward).

0 Fc

f(F)

F

~ F J, T-K

Page 35: The Multiple Regression Model. Two Explanatory Variables y t =  1 +  2 x t2 +  3 x t3 + ε t ytyt  x t2 =  2  x t3 ytyt =  3 x t affect y t.

F-Test of Entire Equation

yt = 1 + 2Xt2 + 3Xt3 + εt

H0: 2 = 3 = 0

H1: H0 not true

dfd = TK = 49dfn = J = 2

(SSER SSEU)/J

SSEU/(TK)F =

(13581.35 1805.168)/2

1805.168/(52 3)=

= 159.828

We ignore 1. Why?

F2, 49, 0.005 = 3.187

= 0.05

Reject H0!

Page 36: The Multiple Regression Model. Two Explanatory Variables y t =  1 +  2 x t2 +  3 x t3 + ε t ytyt  x t2 =  2  x t3 ytyt =  3 x t affect y t.

ANOVA Table

Table 8.3 Analysis of Variance Table Sum of MeanSource DF Squares Square F-Value Regression 2 11776.18 5888.09 159.828Error 49 1805.168 36.84Total 51 13581.35 p-value: 0.0001

SSTR2 = =SSR = 0.86711776.18

13581.35

84.36ˆ 2 MSE

Page 37: The Multiple Regression Model. Two Explanatory Variables y t =  1 +  2 x t2 +  3 x t3 + ε t ytyt  x t2 =  2  x t3 ytyt =  3 x t affect y t.

Nonsample Information

ln(yt) = 1 + 2 ln(Xt2) + 3 ln(Xt3) + 4 ln(Xt4) + εt

A certain production process is known to beCobb-Douglas with constant returns to scale.

2 + 3 + 4 = 1where 4 = (1 2 3)

ln(yt /Xt4) = 1 + 2 ln(Xt2/Xt4) + 3 ln(Xt3 /Xt4) + εt

yt = 1 + 2 Xt2 + 3 Xt3 + εt* * *

Run least squares on the transformed model.Interpret coefficients same as in original model.

Page 38: The Multiple Regression Model. Two Explanatory Variables y t =  1 +  2 x t2 +  3 x t3 + ε t ytyt  x t2 =  2  x t3 ytyt =  3 x t affect y t.

Collinear Variables

The term independent variables means an explanatory variable is independent of of the error term, but not necessarily independent of other explanatory variables.

Since economists typically have no controlover the implicit experimental design,explanatory variables tend to movetogether which often makes sorting outtheir separate influences rather problematic.

Page 39: The Multiple Regression Model. Two Explanatory Variables y t =  1 +  2 x t2 +  3 x t3 + ε t ytyt  x t2 =  2  x t3 ytyt =  3 x t affect y t.

Effects of Collinearity

1. no least squares output when collinearity is exact.2. large standard errors and wide confidence intervals. 3. insignificant t-values even with high R2 and a signi

ficant F-value.4. estimates sensitive to deletion or addition of a few

observations or insignificant variables.5.The OLS estimators retain all their desired propertie

s (BLUE and consistency), but the problem is that the influential procedure may be uninformative.

A high degree of collinearity will produce:

Page 40: The Multiple Regression Model. Two Explanatory Variables y t =  1 +  2 x t2 +  3 x t3 + ε t ytyt  x t2 =  2  x t3 ytyt =  3 x t affect y t.

Identifying Collinearity

Evidence of high collinearity include:

1. a high pairwise correlation between two explanatory variables (greater than .8 or .9).

2. a high R-squared (called Rj2) when regressing

one explanatory variable (Xj) on the other explanatory variables. Variance inflation factor (VIF): VIF (bj) = 1 / (1 Rj

2) ( > 10)

3. high R2 and a statistically significant F-value when the t-values are statistically insignificant.

Page 41: The Multiple Regression Model. Two Explanatory Variables y t =  1 +  2 x t2 +  3 x t3 + ε t ytyt  x t2 =  2  x t3 ytyt =  3 x t affect y t.

Mitigating CollinearityHigh collinearity is not a violation ofany least squares assumption, but rather a lack of adequate information in the sample:

1. Collect more data with better information.2. Impose economic restrictions as

appropriate.3. Impose statistical restrictions when

justified.4. Delete the variable which is highly

collinear with other explanatory variables.

Page 42: The Multiple Regression Model. Two Explanatory Variables y t =  1 +  2 x t2 +  3 x t3 + ε t ytyt  x t2 =  2  x t3 ytyt =  3 x t affect y t.

Prediction

Given a set of values for the explanatoryvariables, (1 X02 X03), the best linearunbiased predictor of y is given by:

yt = 1 + 2Xt2 + 3Xt3 + εt

This predictor is unbiased in the sensethat the average value of the forecasterror is zero.

y0 = b1 + b2X02 + b3X03^