Download - Part 5: Regression Algebra and Fit 5-1/34 Econometrics I Professor William Greene Stern School of Business Department of Economics.

Transcript

Part 5: Regression Algebra and Fit5-1/34

Econometrics IProfessor William GreeneStern School of Business

Department of Economics

Part 5: Regression Algebra and Fit5-2/34

Gauss-Markov Theorem

A theorem of Gauss and Markov: Least Squares is the minimum variance linear unbiased estimator (MVLUE)

1. Linear estimator2. Unbiased: E[b|X] = β Theorem: Var[b*|X] – Var[b|X] is nonnegative

definite for any other linear and unbiased estimator b* that is not equal to b.

Definition: b is efficient in this class of estimators.

i ii 1 =

Page 3: Part 5: Regression Algebra and Fit 5-1/34 Econometrics I Professor William Greene Stern School of Business Department of Economics.

Part 5: Regression Algebra and Fit5-3/34

Implications of Gauss-Markov

Theorem: Var[b*|X] – Var[b|X] is nonnegative definite for any other linear and unbiased estimator b* that is not equal to b. Implies:

bk = the kth particular element of b.Var[bk|X] = the kth diagonal element of Var[b|X]Var[bk|X] < Var[bk*|X] for each coefficient.

cb = any linear combination of the elements of b. Var[cb|X] < Var[cb*|X] for any nonzero c and b* that is not equal to b.

Page 4: Part 5: Regression Algebra and Fit 5-1/34 Econometrics I Professor William Greene Stern School of Business Department of Economics.

Part 5: Regression Algebra and Fit5-4/34

Summary: Finite Sample Properties of b

Unbiased: E[b]= Variance: Var[b|X] = 2(XX)-1 Efficiency: Gauss-Markov Theorem with all

implications Distribution: Under normality,

b|X ~ N[, 2(XX)-1

(Without normality, the distribution is generally unknown.)

Page 5: Part 5: Regression Algebra and Fit 5-1/34 Econometrics I Professor William Greene Stern School of Business Department of Economics.

Part 5: Regression Algebra and Fit5-5/34

Comparação de modelos

Podemos comparar modelos sem usar testes estatsticos, mas sim medidas conhecidas como criterios de informação que traduzem a qualidade de ajustamento de um modelo.

Para estes indicadores a variável principal acaba por ser uma medida do valor absoluto dos erros.

Page 6: Part 5: Regression Algebra and Fit 5-1/34 Econometrics I Professor William Greene Stern School of Business Department of Economics.

Part 5: Regression Algebra and Fit5-6/34

Medida de Ajuste

R2 = bXM0Xb/yM0y

(Very Important Result.) R2 is bounded by zero and one only if:(a) There is a constant term in X and (b) The line is computed by linear least squares.

N 2

ii 1

Regression Variation 1

Total Variation(y y)

e'e

Page 7: Part 5: Regression Algebra and Fit 5-1/34 Econometrics I Professor William Greene Stern School of Business Department of Economics.

Part 5: Regression Algebra and Fit5-7/34

Comparing fits of regressionsMake sure the denominator in R2 is the same - i.e.,

same left hand side variable. Example, linear vs. loglinear. Loglinear will almost always appear to fit better because taking logs reduces variation.

Page 8: Part 5: Regression Algebra and Fit 5-1/34 Econometrics I Professor William Greene Stern School of Business Department of Economics.

Part 5: Regression Algebra and Fit5-8/34

Page 9: Part 5: Regression Algebra and Fit 5-1/34 Econometrics I Professor William Greene Stern School of Business Department of Economics.

Part 5: Regression Algebra and Fit5-9/34

Adjusted R Squared Adjusted R2 (for degrees of freedom)

= 1 - [(n-1)/(n-K)](1 - R2)

includes a penalty for variables that don’t add much fit. Can fall when a variable is added to the equation.

Page 10: Part 5: Regression Algebra and Fit 5-1/34 Econometrics I Professor William Greene Stern School of Business Department of Economics.

Part 5: Regression Algebra and Fit5-10/34

Adjusted R2

What is being adjusted?

The penalty for using up degrees of freedom.

= 1 - [ee/(n – K)]/[yM0y/(n-1)] uses the ratio of two ‘unbiased’ estimators. Is the ratio unbiased?

= 1 – [(n-1)/(n-K)(1 – R2)]

Will rise when a variable is added to the regression?

is higher with z than without z if and only if the t ratio on z is in the regression when it is added is larger than one in absolute value.

2R2R

Page 11: Part 5: Regression Algebra and Fit 5-1/34 Econometrics I Professor William Greene Stern School of Business Department of Economics.

Part 5: Regression Algebra and Fit5-11/34

Full Regression (Without PD)----------------------------------------------------------------------Ordinary least squares regression ............LHS=G Mean = 226.09444 Standard deviation = 50.59182 Number of observs. = 36Model size Parameters = 9 Degrees of freedom = 27Residuals Sum of squares = 596.68995 Standard error of e = 4.70102Fit R-squared = .99334 <********** Adjusted R-squared = .99137 <**********Info criter. LogAmemiya Prd. Crt. = 3.31870 <********** Akaike Info. Criter. = 3.30788 <**********Model test F[ 8, 27] (prob) = 503.3(.0000)--------+-------------------------------------------------------------Variable| Coefficient Standard Error t-ratio P[|T|>t] Mean of X--------+-------------------------------------------------------------Constant| -8220.38** 3629.309 -2.265 .0317 PG| -26.8313*** 5.76403 -4.655 .0001 2.31661 Y| .02214*** .00711 3.116 .0043 9232.86 PNC| 36.2027 21.54563 1.680 .1044 1.67078 PUC| -6.23235 5.01098 -1.244 .2243 2.34364 PPT| 9.35681 8.94549 1.046 .3048 2.74486 PN| 53.5879* 30.61384 1.750 .0914 2.08511 PS| -65.4897*** 23.58819 -2.776 .0099 2.36898 YEAR| 4.18510** 1.87283 2.235 .0339 1977.50--------+-------------------------------------------------------------

Page 12: Part 5: Regression Algebra and Fit 5-1/34 Econometrics I Professor William Greene Stern School of Business Department of Economics.

Part 5: Regression Algebra and Fit5-12/34

PD added to the model. R2 rises, Adj. R2 falls

----------------------------------------------------------------------Ordinary least squares regression ............LHS=G Mean = 226.09444 Standard deviation = 50.59182 Number of observs. = 36Model size Parameters = 10 Degrees of freedom = 26Residuals Sum of squares = 594.54206 Standard error of e = 4.78195Fit R-squared = .99336 Was 0.99334 Adjusted R-squared = .99107 Was 0.99137--------+-------------------------------------------------------------Variable| Coefficient Standard Error t-ratio P[|T|>t] Mean of X--------+-------------------------------------------------------------Constant| -7916.51** 3822.602 -2.071 .0484 PG| -26.8077*** 5.86376 -4.572 .0001 2.31661 Y| .02231*** .00725 3.077 .0049 9232.86 PNC| 30.0618 29.69543 1.012 .3207 1.67078 PUC| -7.44699 6.45668 -1.153 .2592 2.34364 PPT| 9.05542 9.15246 .989 .3316 2.74486 PD| 11.8023 38.50913 .306 .7617 1.65056 (NOTE LOW t ratio) PN| 47.3306 37.23680 1.271 .2150 2.08511 PS| -60.6202** 28.77798 -2.106 .0450 2.36898 YEAR| 4.02861* 1.97231 2.043 .0514 1977.50--------+-------------------------------------------------------------

Page 13: Part 5: Regression Algebra and Fit 5-1/34 Econometrics I Professor William Greene Stern School of Business Department of Economics.

Part 5: Regression Algebra and Fit5-13/34

Outras medidas de AjustePara alternativas não aninhadasInclui penalidade de grau de liberdade.Critério de Informação Schwarz (BIC): n log(ee) + k(log(n))

Akaike (AIC): n log(ee) + 2k Quando se quer decidir entre dois modelos

não aninhados, o melhor é o que produz o menor valor do critério. A penalidade no BIC de incluir algo não relevante é maior que no AIC.

Page 14: Part 5: Regression Algebra and Fit 5-1/34 Econometrics I Professor William Greene Stern School of Business Department of Economics.

Part 5: Regression Algebra and Fit5-14/34

Outras medidas de Ajuste

Tanto o AIC quanto o BIC aumentam conforme SQR aumenta.

Além disso, ambos critérios penalizam modelos com muitas variáveis sendo que valores menores de AIC e BIC são preferíveis.

Como modelos com mais variáveis tendem a produzir menor SQR mas usam mais parâmetros, a melhor escolha é balancear o ajuste com a quantidade de variáveis.

A penalidade no BIC de incluir algo não relevante é maior que no AIC.

Page 15: Part 5: Regression Algebra and Fit 5-1/34 Econometrics I Professor William Greene Stern School of Business Department of Economics.

Part 5: Regression Algebra and Fit5-15/34

Multicolinearidade

Page 16: Part 5: Regression Algebra and Fit 5-1/34 Econometrics I Professor William Greene Stern School of Business Department of Economics.

Part 5: Regression Algebra and Fit5-16/34

Formas funcionais

Page 17: Part 5: Regression Algebra and Fit 5-1/34 Econometrics I Professor William Greene Stern School of Business Department of Economics.

Part 5: Regression Algebra and Fit5-17/34

Specification and Functional Form:Nonlinearity

2 21 2 3 4 1 2 3 4

2 3 2 3

Population Estimators

[ | , ] ˆ2 2

ˆEstimator of the variance of

ˆ. [ ] [ ] 4

x x

y x x z y b b x b x b z

E y x zx b b x

EstVar Var b x Va

3 2 3[ ] 4 [ , ]r b xCov b b

Page 18: Part 5: Regression Algebra and Fit 5-1/34 Econometrics I Professor William Greene Stern School of Business Department of Economics.

Part 5: Regression Algebra and Fit5-18/34

Log Income Equation----------------------------------------------------------------------Ordinary least squares regression ............LHS=LOGY Mean = -1.15746 Estimated Cov[b1,b2] Standard deviation = .49149 Number of observs. = 27322Model size Parameters = 7 Degrees of freedom = 27315Residuals Sum of squares = 5462.03686 Standard error of e = .44717Fit R-squared = .17237--------+-------------------------------------------------------------Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X--------+------------------------------------------------------------- AGE| .06225*** .00213 29.189 .0000 43.5272 AGESQ| -.00074*** .242482D-04 -30.576 .0000 2022.99Constant| -3.19130*** .04567 -69.884 .0000 MARRIED| .32153*** .00703 45.767 .0000 .75869 HHKIDS| -.11134*** .00655 -17.002 .0000 .40272 FEMALE| -.00491 .00552 -.889 .3739 .47881 EDUC| .05542*** .00120 46.050 .0000 11.3202--------+-------------------------------------------------------------Average Age = 43.5272. Estimated Partial effect = .066225 – 2(.00074)43.5272 = .00018.Estimated Variance 4.54799e-6 + 4(43.5272)2(5.87973e-10) + 4(43.5272)(-5.1285e-8)= 7.4755086e-08. Estimated standard error = .00027341.

Page 19: Part 5: Regression Algebra and Fit 5-1/34 Econometrics I Professor William Greene Stern School of Business Department of Economics.

Part 5: Regression Algebra and Fit5-19/34

Specification and Functional Form:Interaction Effect

1 2 3 4 1 2 3 4

2 4 2 4

Population Estimators

[ | , ] ˆz

ˆEstimator of the variance of

ˆ. [ ] [ ]

x x

y x z xz y b b x b z b xz

E y x zb b z

EstVar Var b z Va

4 2 4[ ] 2 [ , ]r b zCov b b

Page 20: Part 5: Regression Algebra and Fit 5-1/34 Econometrics I Professor William Greene Stern School of Business Department of Economics.

Part 5: Regression Algebra and Fit5-20/34

Interaction Effect----------------------------------------------------------------------Ordinary least squares regression ............LHS=LOGY Mean = -1.15746 Standard deviation = .49149 Number of observs. = 27322Model size Parameters = 4 Degrees of freedom = 27318Residuals Sum of squares = 6540.45988 Standard error of e = .48931Fit R-squared = .00896 Adjusted R-squared = .00885Model test F[ 3, 27318] (prob) = 82.4(.0000)--------+-------------------------------------------------------------Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X--------+-------------------------------------------------------------Constant| -1.22592*** .01605 -76.376 .0000 AGE| .00227*** .00036 6.240 .0000 43.5272 FEMALE| .21239*** .02363 8.987 .0000 .47881 AGE_FEM| -.00620*** .00052 -11.819 .0000 21.2960--------+-------------------------------------------------------------Do women earn more than men (in this sample?) The +.21239 coefficient on FEMALE wouldsuggest so. But, the female “difference” is +.21239 - .00620*Age. At average Age, theeffect is .21239 - .00620(43.5272) = -.05748.

Page 21: Part 5: Regression Algebra and Fit 5-1/34 Econometrics I Professor William Greene Stern School of Business Department of Economics.

Part 5: Regression Algebra and Fit5-21/34

Page 22: Part 5: Regression Algebra and Fit 5-1/34 Econometrics I Professor William Greene Stern School of Business Department of Economics.

Part 5: Regression Algebra and Fit5-22/34

Page 23: Part 5: Regression Algebra and Fit 5-1/34 Econometrics I Professor William Greene Stern School of Business Department of Economics.

Part 5: Regression Algebra and Fit5-23/34

Quebra estrutural

Page 24: Part 5: Regression Algebra and Fit 5-1/34 Econometrics I Professor William Greene Stern School of Business Department of Economics.

Part 5: Regression Algebra and Fit5-24/34

Linear RestrictionsContext: How do linear restrictions affect the

properties of the least squares estimator? Model: y = X + Theory (information) R - q = 0Restricted least squares estimator: b* = b - (XX)-1R[R(XX)-1R]-1(Rb - q)Expected value: E[b*] = - (XX)-1R[R(XX)-1R]-

1(Rb - q)Variance: 2(XX)-1 - 2 (XX)-1R[R(XX)-1R]-1

R(XX)-1

= Var[b] – a nonnegative definite matrix < Var[b]

Implication: (As before) nonsample information reduces the variance of the estimator.

Page 25: Part 5: Regression Algebra and Fit 5-1/34 Econometrics I Professor William Greene Stern School of Business Department of Economics.

Part 5: Regression Algebra and Fit5-25/34

InterpretationCase 1: Theory is correct: R - q = 0

(the restrictions do hold). b* is unbiased Var[b*] is smaller than Var[b] How do we know this?Case 2: Theory is incorrect: R - q 0

(the restrictions do not hold). b* is biased – what does this mean? Var[b*] is still smaller than Var[b]

Page 26: Part 5: Regression Algebra and Fit 5-1/34 Econometrics I Professor William Greene Stern School of Business Department of Economics.

Part 5: Regression Algebra and Fit5-26/34

Linear Least Squares Subject to Restrictions Restrictions: Theory imposes certain restrictions on parameters. Some common applications Dropping variables from the equation = certain coefficients in b

forced to equal 0. (Probably the most common testing situation. “Is a certain variable significant?”)

Adding up conditions: Sums of certain coefficients must equal fixed values. Adding up conditions in demand systems. Constant returns to scale in production functions.

Equality restrictions: Certain coefficients must equal other coefficients. Using real vs. nominal variables in equations.

General formulation for linear restrictions: Minimize the sum of squares, ee, subject to the linear constraint

Rb = q.

Page 27: Part 5: Regression Algebra and Fit 5-1/34 Econometrics I Professor William Greene Stern School of Business Department of Economics.

Part 5: Regression Algebra and Fit5-27/34

Restricted Least Squares

1 2 3 i 1 i1 2 i2 3 i

Force a coefficient to equal

In practice, restrictions can usually be imposed by solving them out.

1. Drop the variable from the equation

Problem: Minimize for , , (y x

zero

x x

n 2

3 3i 1

n 21 2 i 1 i1 2 i2i 1

1 2 3 3 1 2

1 2 i 1 i1

Adding up restri

) subject to 0

Solution: Minimize for , (y x x )

2. Impose + + = 1. Strategy: =1 .

Solution: Minimize for , (

ion

n 22 i2 1 2 i3i 1

n 2i i3 1 i1 i3 2 i2 i3i 1

3 2

1 2 3 i 1 i1 2 i2 3

x (1 )x )

= [(y x ) (x x ) (x x )]

3. Impose

Minimize for , , (y x x

Equality restriction.

n 2i3 3 2i 1

n 21 2 i 1 i1 2 i2 i3i 1

x ) subject to

Solution: Minimize for , [y x (x x )]

In each case, least squares using transformations of the data.

Page 28: Part 5: Regression Algebra and Fit 5-1/34 Econometrics I Professor William Greene Stern School of Business Department of Economics.

Part 5: Regression Algebra and Fit5-28/34

Restricted Least Squares Solution

General Approach: Programming ProblemMinimize for L = (y - X)(y - X) subject to R = qEach row of R is the K coefficients in a restriction.There are J restrictions: J rows

3 = 0: R = [0,0,1,0,…] q = (0).

2 = 3: R = [0,1,-1,0,…] q = (0)

2 = 0, 3 = 0: R = 0,1,0,0,… q = 0 0,0,1,0,… 0

Page 29: Part 5: Regression Algebra and Fit 5-1/34 Econometrics I Professor William Greene Stern School of Business Department of Economics.

Part 5: Regression Algebra and Fit5-29/34

Solution Strategy

Quadratic program: Minimize quadratic criterion subject to linear restrictions

All restrictions are binding Solve using Lagrangean formulation Minimize over (,)

L* = (y - X)(y - X) + 2(R-q)(The 2 is for convenience – see below.)

Page 30: Part 5: Regression Algebra and Fit 5-1/34 Econometrics I Professor William Greene Stern School of Business Department of Economics.

Part 5: Regression Algebra and Fit5-30/34

Restricted LS Solution

Necessary Conditions

L*2 ( ) 2

L* 2( )

Divide everything by 2. Collect in a matrix form

ˆ or = Solution =

Does no

X y X R 0

R q 0

XX R X yA w. A w

R 0 q

t rely on full rank of .

Relies on column rank of = K J.X

Page 31: Part 5: Regression Algebra and Fit 5-1/34 Econometrics I Professor William Greene Stern School of Business Department of Economics.

Part 5: Regression Algebra and Fit5-31/34

Restricted Least Squares

1 1

If has full rank, there is a partitioned solution for * and *

= - ( ) [ ( ) ]( )

* [ ( ) ]( )

where the simple least squares coefficients, = ( ) .

There are cas

β* b XX R R XX R Rb q

R XX R Rb q

b b XX X y

1 2 1 2 3 4 1 2 3 4

1 2 3 4

es in which does not have full rank. E.g.,

= [1, , , , , , ] where , , , are a complete set

of dummy variables with coefficients a ,a ,a ,a . Unrestricted

cannot be computed. Restri

X x x d d d d d d d d

1 2 3 4cted LS with a +a +a +a = 0 can

be computed.

Page 32: Part 5: Regression Algebra and Fit 5-1/34 Econometrics I Professor William Greene Stern School of Business Department of Economics.

Part 5: Regression Algebra and Fit5-32/34

Aspects of Restricted LS1. b* = b - Cm where m = the “discrepancy vector” Rb - q. Note what happens if m = 0. What does m = 0 mean?2. =[R(XX)-1R]-1(Rb - q) = [R(XX)-1R]-

1m. When does = 0. What does this mean?3. Combining results: b* = b - (XX)-1R. How could b* = b?