Part 5: Regression Algebra and Fit 5-1/34 Econometrics I Professor William Greene Stern School of...

download Part 5: Regression Algebra and Fit 5-1/34 Econometrics I Professor William Greene Stern School of Business Department of Economics

of 33

  • date post

    18-Apr-2015
  • Category

    Documents

  • view

    102
  • download

    0

Embed Size (px)

Transcript of Part 5: Regression Algebra and Fit 5-1/34 Econometrics I Professor William Greene Stern School of...

  • Slide 1
  • Part 5: Regression Algebra and Fit 5-1/34 Econometrics I Professor William Greene Stern School of Business Department of Economics
  • Slide 2
  • Part 5: Regression Algebra and Fit 5-2/34 Gauss-Markov Theorem A theorem of Gauss and Markov: Least Squares is the minimum variance linear unbiased estimator (MVLUE) 1. Linear estimator 2. Unbiased: E[b|X] = Theorem: Var[b*|X] Var[b|X] is nonnegative definite for any other linear and unbiased estimator b* that is not equal to b. Definition: b is efficient in this class of estimators.
  • Slide 3
  • Part 5: Regression Algebra and Fit 5-3/34 Implications of Gauss-Markov Theorem: Var[b*|X] Var[b|X] is nonnegative definite for any other linear and unbiased estimator b* that is not equal to b. Implies: b k = the kth particular element of b. Var[b k |X] = the kth diagonal element of Var[b|X] Var[b k |X] < Var[b k *|X] for each coefficient. c b = any linear combination of the elements of b. Var[c b|X] < Var[c b*|X] for any nonzero c and b* that is not equal to b.
  • Slide 4
  • Part 5: Regression Algebra and Fit 5-4/34 Summary: Finite Sample Properties of b Unbiased: E[b]= Variance: Var[b|X] = 2 (X X) -1 Efficiency: Gauss-Markov Theorem with all implications Distribution: Under normality, b|X ~ N[, 2 (X X) -1 (Without normality, the distribution is generally unknown.)
  • Slide 5
  • Part 5: Regression Algebra and Fit 5-5/34 Comparao de modelos Podemos comparar modelos sem usar testes estatsticos, mas sim medidas conhecidas como criterios de informao que traduzem a qualidade de ajustamento de um modelo. Para estes indicadores a varivel principal acaba por ser uma medida do valor absoluto dos erros.
  • Slide 6
  • Part 5: Regression Algebra and Fit 5-6/34 Medida de Ajuste R 2 = b X M 0 Xb/y M 0 y (Very Important Result.) R 2 is bounded by zero and one only if: (a) There is a constant term in X and (b) The line is computed by linear least squares.
  • Slide 7
  • Part 5: Regression Algebra and Fit 5-7/34 Comparing fits of regressions Make sure the denominator in R 2 is the same - i.e., same left hand side variable. Example, linear vs. loglinear. Loglinear will almost always appear to fit better because taking logs reduces variation.
  • Slide 8
  • Part 5: Regression Algebra and Fit 5-8/34
  • Slide 9
  • Part 5: Regression Algebra and Fit 5-9/34 Adjusted R Squared Adjusted R 2 (for degrees of freedom) = 1 - [(n-1)/(n-K)](1 - R 2 ) includes a penalty for variables that dont add much fit. Can fall when a variable is added to the equation.
  • Slide 10
  • Part 5: Regression Algebra and Fit 5-10/34 Adjusted R 2 What is being adjusted? The penalty for using up degrees of freedom. = 1 - [e e/(n K)]/[y M 0 y/(n-1)] uses the ratio of two unbiased estimators. Is the ratio unbiased? = 1 [(n-1)/(n-K)(1 R 2 )] Will rise when a variable is added to the regression? is higher with z than without z if and only if the t ratio on z is in the regression when it is added is larger than one in absolute value.
  • Slide 11
  • Part 5: Regression Algebra and Fit 5-11/34 Full Regression (Without PD) ---------------------------------------------------------------------- Ordinary least squares regression............ LHS=G Mean = 226.09444 Standard deviation = 50.59182 Number of observs. = 36 Model size Parameters = 9 Degrees of freedom = 27 Residuals Sum of squares = 596.68995 Standard error of e = 4.70102 Fit R-squared =.99334
  • Part 5: Regression Algebra and Fit 5-18/34 Log Income Equation ---------------------------------------------------------------------- Ordinary least squares regression............ LHS=LOGY Mean = -1.15746 Estimated Cov[b1,b2] Standard deviation =.49149 Number of observs. = 27322 Model size Parameters = 7 Degrees of freedom = 27315 Residuals Sum of squares = 5462.03686 Standard error of e =.44717 Fit R-squared =.17237 --------+------------------------------------------------------------- Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X --------+------------------------------------------------------------- AGE|.06225***.00213 29.189.0000 43.5272 AGESQ| -.00074***.242482D-04 -30.576.0000 2022.99 Constant| -3.19130***.04567 -69.884.0000 MARRIED|.32153***.00703 45.767.0000.75869 HHKIDS| -.11134***.00655 -17.002.0000.40272 FEMALE| -.00491.00552 -.889.3739.47881 EDUC|.05542***.00120 46.050.0000 11.3202 --------+------------------------------------------------------------- Average Age = 43.5272. Estimated Partial effect =.066225 2(.00074)43.5272 =.00018. Estimated Variance 4.54799e-6 + 4(43.5272) 2 (5.87973e-10) + 4(43.5272)(-5.1285e-8) = 7.4755086e-08. Estimated standard error =.00027341.
  • Slide 19
  • Part 5: Regression Algebra and Fit 5-19/34 Specification and Functional Form: Interaction Effect
  • Slide 20
  • Part 5: Regression Algebra and Fit 5-20/34 Interaction Effect ---------------------------------------------------------------------- Ordinary least squares regression............ LHS=LOGY Mean = -1.15746 Standard deviation =.49149 Number of observs. = 27322 Model size Parameters = 4 Degrees of freedom = 27318 Residuals Sum of squares = 6540.45988 Standard error of e =.48931 Fit R-squared =.00896 Adjusted R-squared =.00885 Model test F[ 3, 27318] (prob) = 82.4(.0000) --------+------------------------------------------------------------- Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X --------+------------------------------------------------------------- Constant| -1.22592***.01605 -76.376.0000 AGE|.00227***.00036 6.240.0000 43.5272 FEMALE|.21239***.02363 8.987.0000.47881 AGE_FEM| -.00620***.00052 -11.819.0000 21.2960 --------+------------------------------------------------------------- Do women earn more than men (in this sample?) The +.21239 coefficient on FEMALE would suggest so. But, the female difference is +.21239 -.00620*Age. At average Age, the effect is.21239 -.00620(43.5272) = -.05748.
  • Slide 21
  • Part 5: Regression Algebra and Fit 5-21/34
  • Slide 22
  • Part 5: Regression Algebra and Fit 5-22/34
  • Slide 23
  • Part 5: Regression Algebra and Fit 5-23/34 Quebra estrutural
  • Slide 24
  • Part 5: Regression Algebra and Fit 5-24/34 Linear Restrictions Context: How do linear restrictions affect the properties of the least squares estimator? Model: y = X + Theory (information) R - q = 0 Restricted least squares estimator: b* = b - (X X) -1 R [R(X X) -1 R ] -1 (Rb - q) Expected value: E[b*] = - (X X) -1 R [R(X X) -1 R ] -1 (Rb - q) Variance: 2 (X X) -1 - 2 (X X) -1 R [R(X X) -1 R ] -1 R(X X) -1 = Var[b] a nonnegative definite matrix < Var[b] Implication: (As before) nonsample information reduces the variance of the estimator.
  • Slide 25
  • Part 5: Regression Algebra and Fit 5-25/34 Interpretation Case 1: Theory is correct: R - q = 0 (the restrictions do hold). b* is unbiased Var[b*] is smaller than Var[b] How do we know this? Case 2: Theory is incorrect: R - q 0 (the restrictions do not hold). b* is biased what does this mean? Var[b*] is still smaller than Var[b]
  • Slide 26
  • Part 5: Regression Algebra and Fit 5-26/34 Linear Least Squares Subject to Restrictions Restrictions: Theory imposes certain restrictions on parameters. Some common applications Dropping variables from the equation = certain coefficients in b forced to equal 0. (Probably the most common testing situation. Is a certain variable significant?) Adding up conditions: Sums of certain coefficients must equal fixed values. Adding up conditions in demand systems. Constant returns to scale in production functions. Equality restrictions: Certain coefficients must equal other coefficients. Using real vs. nominal variables in equations. General formulation for linear restrictions: Minimize the sum of squares, e e, subject to the linear constraint Rb = q.
  • Slide 27
  • Part 5: Regression Algebra and Fit 5-27/34 Restricted Least Squares
  • Slide 28
  • Part 5: Regression Algebra and Fit 5-28/34 Restricted Least Squares Solution General Approach: Programming Problem Minimize for L = (y - X ) (y - X ) subject to R = q Each row of R is the K coefficients in a restriction. There are J restrictions: J rows 3 = 0: R = [0,0,1,0,] q = (0). 2 = 3 : R = [0,1,-1,0,]q = (0) 2 = 0, 3 = 0: R = 0,1,0,0,q = 0 0,0,1,0, 0
  • Slide 29
  • Part 5: Regression Algebra and Fit 5-29/34 Solution Strategy Quadratic program: Minimize quadratic criterion subject to linear restrictions All restrictions are binding Solve using Lagrangean formulation Minimize over (, ) L* = (y - X ) (y - X ) + 2 (R -q) (The 2 is for convenience see below.)
  • Slide 30
  • Part 5: Regression Algebra and Fit 5-30/34 Restricted LS Solution
  • Slide 31
  • Part 5: Regression Algebra and Fit 5-31/34 Restricted Least Squares
  • Slide 32
  • Part 5: Regression Algebra and Fit 5-32/34 Aspects of Restricted LS 1. b* = b - Cm where m = the discrepancy vector Rb - q. Note what happens if m = 0. What does m = 0 mean? 2. =[R(X X) -1 R ] -1 (Rb - q) = [R(X X) -1 R ] -1 m. When does = 0. What does this mean? 3. Combining results: b* = b - (X X) -1 R. How could b* = b?
  • Slide 33
  • Part 5: Regression Algebra and Fit 5-33/34