The Multiple Regression Model. Two Explanatory Variables y t = 1 + 2 x t2 + 3 x t3 + ε t ytyt x...

of 42 /42
The Multiple Regression Model

Embed Size (px)

Transcript of The Multiple Regression Model. Two Explanatory Variables y t = 1 + 2 x t2 + 3 x t3 + ε t ytyt x...

  • Slide 1

The Multiple Regression Model Slide 2 Two Explanatory Variables y t = 1 + 2 x t2 + 3 x t3 + t ytyt x t2 = 2 x t3 ytyt = 3 x t affect y t separately But least squares estimation of 2 now depends upon both x t2 and x t3. Slide 3 Correlated Variables y t = output x t2 = capitalx t3 = labor Always 5 workers per machine. If number of workers per machine is never varied, it becomes impossible to tell if the machines or the workers are responsible for changes in output. y t = 1 + 2 x t2 + 3 x t3 + t Slide 4 The General Model y t = 1 + 2 x t2 + 3 x t3 +...+ K x tK + t The parameter 1 is the intercept (constant) term. The variable attached to 1 is x t1 = 1. Usually, the number of explanatory variables is said to be K 1 (ignoring x t1 = 1), while the number of parameters is K. (Namely: 1... K ). Slide 5 Statistical Properties of t 1. E( t ) = 0 2. var( t ) = 2 cov t, s = for t s 4. t ~ N(0, 2 ) Slide 6 Statistical Properties of y t 1. E (y t ) = 1 + 2 x t2 +...+ K x tK 2. var(y t ) = var( t ) = 2 cov(y t,y s ) = cov( t, s )= 0 t s 4. y t ~ N( 1 + 2 x t2 +...+ K x tK, 2 ) Slide 7 Assumptions 1. y t = 1 + 2 x t2 +...+ K x tK + t 2. E (y t ) = 1 + 2 x t2 +...+ K x tK 3. var(y t ) = var( t ) = 2 cov(y t,y s ) = cov( t, s ) = 0 t s 5. The values of x tk are not random 6. y t ~ N( 1 + 2 x t2 +...+ K x tK, 2 ) Slide 8 Least Squares Estimation y t = 1 + 2 x t2 + 3 x t3 + t S S( 1, 2, 3 ) = y t 1 2 x t2 3 x t3 Define: y t = y t y * x t2 = x t2 x 2 * x t3 = x t3 x 3 * T t=1 Slide 9 b 3 = y t x t3 x t2 y t x t2 x t3 x t2 * * * ** * * 2 x t2 x t3 x t2 x t3 * * * * 2 2 2 b 2 = y t x t2 x t3 y t x t3 x t2 x t3 * * * ** * * 2 x t2 x t3 x t2 x t3 * * * * 2 2 2 Least Squares Estimators b 1 = y b 2 x 2 b 3 x 3 Slide 10 Dangers of Extrapolation Statistical models generally are good only within the relevant range. This means that extending them to extreme data values outside the range of the original data often leads to poor and sometimes ridiculous results. If height is normally distributed and the normal ranges from minus infinity to plus infinity, pity the man minus three feet tall. Slide 11 Interpretation of Coefficients b j represents an estimate of the mean change in y responding to a one-unit change in x j when all other independent variables are held constant. Hence, b j is called the partial coefficient. Note that regression analysis cannot be interpreted as a procedure for establishing a cause-and-effect relationship between variables. Slide 12 Universal Set B x 2 x 3 x 2 / x 3 x 3 / x 2 Slide 13 Error Variance Estimation 22 ^ = tt ^ Unbiased estimator of the error variance: 2 22 ^ Transform to a chi-square distribution: Slide 14 Gauss-Markov Theorem Under the first of five assumptions of the multiple regression model, the ordinary least squares estimators have the smallest variance of all linear and unbiased estimators. This means that the least squares estimators are the Best Linear U nbiased Estimators (BLUE). Slide 15 Variances y t = 1 + 2 x t2 + 3 x t3 + t var(b 3 ) = ( 1 r 23 ) (x t3 x 3 ) 2 2 22 var(b 2 ) = ( 1 r 23 ) (x t2 x 2 ) 2 2 22 (x t2 x 2 ) 2 (x t3 x 3 ) 2 where r 23 = (x t2 x 2 )(x t3 x 3 ) When r 23 = 0 these reduce to the simple regression formulas. Slide 16 Variance Decomposition The variance of an estimator is smaller when: 1. The error variance, 2, is smaller: 2 0. 2. The sample size, T, is larger: (x t2 x 2 ) 2. 3. The variable values are more spread out: (x t2 x 2 ) 2. 4. The correlation is close to zero: r 23 0. 2 t = 1 T Slide 17 Covariances y t = 1 + 2 x t2 + 3 x t3 + t where r 23 = (x t2 x 2 ) 2 (x t3 x 3 ) 2 (x t2 x 2 )(x t3 x 3 ) ( 1 r 23 ) (x t2 x 2 ) 2 (x t3 x 3 ) 2 cov(b 2,b 3 ) = 2 r 23 2 Slide 18 Covariance Decomposition 1. The error variance, 2, is larger. 2. The sample size, T, is smaller. 3. The values of the variables are less spread out. 4. The correlation, r 23, is high. The covariance between any two estimators is larger in absolute value when: Slide 19 Var-Cov Matrix y t = 1 + 2 x t2 + 3 x t3 + t var(b 1 ) cov(b 1,b 2 ) cov(b 1,b 3 ) cov(b 1,b 2,b 3 ) = cov(b 1,b 2 ) var(b 2 ) cov(b 2,b 3 ) cov(b 1,b 3 ) cov(b 2,b 3 ) var(b 3 ) The least squares estimators b 1, b 2, and b 3 have covariance matrix: Slide 20 Normal y t = 1 + 2 x 2t + 3 x 3t +...+ K x Kt + t y t ~N ( 1 + 2 x 2t + 3 x 3t +...+ K x Kt ), 2 t ~ N(0, 2 ) This implies and is implied by: b k ~ N k, var(b k ) z = ~ N(0,1) for k = 1,2,...,K bk kbk k var(b k ) Since b k is a linear function of the y t : Slide 21 Student-t bk kbk k var(b k ) ^ t = = bk kbk k se(b k ) Since generally the population variance of b k, var(b k ), is unknown, we estimate it with which uses 2 instead of 2. var(b k ) ^ ^ t has a Student-t distribution with df=( T K ). Slide 22 Interval Estimation b k k se(b k ) P t c < < t c = 1 t c is critical value for (T-K) degrees of freedom such that P( t > t c ) = /2. P b k t c se(b k ) < k < b k + t c se(b k ) = 1 Interval endpoints: b k t c se(b k ), b k + t c se(b k ) Slide 23 Student - t Test y t = 1 + 2 X t2 + 3 X t3 + 4 X t4 + t Student-t tests can be used to test any linear combination of the regression coefficients: H 0 : 2 + 3 + 4 = 1H 0 : 1 = 0 H 0 : 3 2 7 3 = 21H 0 : 2 3 < 5 Every such t-test has exactly T K degrees of freedom where K = # of coefficients estimated(including the intercept). Slide 24 One Tail Test y t = 1 + 2 X t2 + 3 X t3 + 4 X t4 + t H 0 : 3 < 0 H 1 : 3 > 0 b3b3 se( b 3 ) t = ~ t ( T K ) tctc 0 df = T K = T 4 Slide 25 Two Tail Test y t = 1 + 2 X t2 + 3 X t3 + 4 X t4 + t b2b2 se( b 2 ) t = ~ t ( T K ) tctc 0 df = T K = T 4 -t c H 0 : 2 = 0 H 1 : 2 0 Slide 26 Goodness - of - Fit Coefficient of Determination SST R 2 = = (y t y) 2 t = 1 T ^ SSR (y t y) 2 t = 1 T 0 < R 2 < 1 Slide 27 Adjusted R-Squared Adjusted Coefficient of Determination Original: Adjusted: SST/ (T 1) R 2 = 1 SSE/ (T K) SST = 1 SSE R 2 = SST SSR Slide 28 Computer Output Table 8.2 Summary of Least Squares Results Variable Coefficient Std Error t-value p-value constant 104.79 6.48 16.17 0.000 price 6.642 3.191 2.081 0.042 advertising 2.984 0.167 17.868 0.000 b2b2 se( b 2 ) t = = 6.642 3.191 2.081 = Slide 29 Reporting Your Results y t = X t2 + X t3 ^ (6.48) (3.191) (0.167) (s.e.) y t = X t2 + X t3 ^ (16.17) (-2.081) (17.868) (t) Reporting t-statistics: Reporting standard errors: Slide 30 H 0 : 2 = 0 H 1 : 2 = 0 y t = 1 + 2 X t2 + 3 X t3 + 4 X t4 + t H 0 : y t = 1 + 3 X t3 + 4 X t4 + t H 1 : y t = 1 + 2 X t2 + 3 X t3 + 4 X t4 + t H 0 : Restricted Model H 1 : Unrestricted Model Slide 31 Single Restriction F-Test y t = 1 + 2 X t2 + 3 X t3 + 4 X t4 + t df d = T K = 49 df n = J = 1 (SSE R SSE U )/J SSE U /( T K ) F = (1964.758 1805.168)/1 1805.168/(52 3) = = 4.33 By definition this is the t-statistic squared: t = 2.081F = t 2 = H 0 : 2 = 0 H 1 : 2 = 0 ~ F J, T-K Under H 0 Slide 32 y t = 1 + 2 X t2 + 3 X t3 + 4 X t4 + t H 0 : y t = 1 + 3 X t3 + t H 1 : y t = 1 + 2 X t2 + 3 X t3 + 4 X t4 + t H 0 : Restricted Model H 1 : Unrestricted Model H 0 : 2 = 0, 4 = 0 H 1 : H 0 not true Slide 33 Multiple Restriction F-Test y t = 1 + 2 X t2 + 3 X t3 + 4 X t4 + t H 0 : 2 = 0, 4 = 0 H 1 : H 0 not true df d = T K = 49 df n = J = 2 (J: The number of hypothesis) (SSE R SSE U )/J SSE U /( T K ) F = First run the restricted regression by dropping X t2 and X t4 to get SSE R. Next run unrestricted regression to get SSE U. ~ F J, T-K Under H 0 Slide 34 F-Tests (SSE R SSE U )/J SSE U /( T K ) F = F-Tests of this type are always right-tailed, even for left-sided or two-sided hypotheses, because any deviation from the null will make the F value bigger (move rightward). 0 FcFc f( F ) F ~ F J, T-K Slide 35 F-Test of Entire Equation y t = 1 + 2 X t2 + 3 X t3 + t H 0 : 2 = 3 = 0 H 1 : H 0 not true df d = T K = 49 df n = J = 2 (SSE R SSE U )/J SSE U /( T K ) F = (13581.35 1805.168)/2 1805.168/(52 3) = = 159.828 We ignore 1. Why? F 2, 49, 0.005 = 3.187 = 0.05 Reject H 0 ! Slide 36 ANOVA Table Table 8.3 Analysis of Variance Table Sum of Mean Source DF Squares Square F-Value Regression 2 11776.18 5888.09 159.828 Error 49 1805.168 36.84 Total 51 13581.35 p-value: 0.0001 SST R 2 = = SSR = 0.867 11776.18 13581.35 Slide 37 Nonsample Information ln(y t ) = 1 + 2 ln(X t2 ) + 3 ln(X t3 ) + 4 ln(X t4 ) + t A certain production process is known to be Cobb-Douglas with constant returns to scale. 2 + 3 + 4 = 1 where 4 = (1 2 3 ) ln(y t /X t4 ) = 1 + 2 ln(X t2 /X t4 ) + 3 ln(X t3 /X t4 ) + t y t = 1 + 2 X t2 + 3 X t3 + t * ** Run least squares on the transformed model. Interpret coefficients same as in original model. Slide 38 Collinear Variables The term independent variables means an explanatory variable is independent of of the error term, but not necessarily independent of other explanatory variables. Since economists typically have no control over the implicit experimental design, explanatory variables tend to move together which often makes sorting out their separate influences rather problematic. Slide 39 Effects of Collinearity 1. no least squares output when collinearity is exact. 2. large standard errors and wide confidence intervals. 3. insignificant t-values even with high R 2 and a significant F-value. 4. estimates sensitive to deletion or addition of a few observations or insignificant variables. 5.The OLS estimators retain all their desired properties (BLUE and consistency), but the problem is that the influential procedure may be uninformative. A high degree of collinearity will produce: Slide 40 Identifying Collinearity Evidence of high collinearity include: 1. a high pairwise correlation between two explanatory variables (greater than.8 or.9). 2.a high R-squared (called R j 2 ) when regressing one explanatory variable (X j ) on the other explanatory variables. Variance inflation factor (VIF): VIF (b j ) = 1 / (1 R j 2 ) ( > 10) 3. high R 2 and a statistically significant F-value when the t-values are statistically insignificant. Slide 41 Mitigating Collinearity High collinearity is not a violation of any least squares assumption, but rather a lack of adequate information in the sample: 1. Collect more data with better information. 2. Impose economic restrictions as appropriate. 3.Impose statistical restrictions when justified. 4.Delete the variable which is highly collinear with other explanatory variables. Slide 42 Prediction Given a set of values for the explanatory variables, (1 X 02 X 03 ), the best linear unbiased predictor of y is given by: y t = 1 + 2 X t2 + 3 X t3 + t This predictor is unbiased in the sense that the average value of the forecast error is zero. y 0 = b 1 + b 2 X 02 + b 3 X 03 ^