Introductory Econometrics - Session 5 - The linear model

46
The simple regression model The multiple regression model Inference Introductory Econometrics Session 5 - The linear model Roland Rathelot Sciences Po July 2011 Rathelot Introductory Econometrics

Transcript of Introductory Econometrics - Session 5 - The linear model

Page 1: Introductory Econometrics - Session 5 - The linear model

The simple regression model The multiple regression model Inference

Introductory EconometricsSession 5 - The linear model

Roland Rathelot

Sciences Po

July 2011

Rathelot

Introductory Econometrics

Page 2: Introductory Econometrics - Session 5 - The linear model

The simple regression model The multiple regression model Inference

Multivariate econometrics

I Outcome

I Covariate(s)

I Model

In the simple regression: 1 outcome and 1 regressor

Rathelot

Introductory Econometrics

Page 3: Introductory Econometrics - Session 5 - The linear model

The simple regression model The multiple regression model Inference

Assumption: random sample

I The population of interest should be defined

I (yi , xi ){i=1...n} are assumed to be iid

I Note that here yi and xi are rv

Rathelot

Introductory Econometrics

Page 4: Introductory Econometrics - Session 5 - The linear model

The simple regression model The multiple regression model Inference

Assumption: A linear model

y = α + βx + u

I y is the outcome (explained variable)

I x is the explanatory variable (covariate)

I α is the constant (intercept)

I β is the coefficient on x (slope)

I u is the error

Rathelot

Introductory Econometrics

Page 5: Introductory Econometrics - Session 5 - The linear model

The simple regression model The multiple regression model Inference

Correlation and causality

I Comovement of ∆x and ∆y

I Interpreting β in a causal sense: when is it possible?

I In a causal framework, u are the unobserved determinants of y

Rathelot

Introductory Econometrics

Page 6: Introductory Econometrics - Session 5 - The linear model

The simple regression model The multiple regression model Inference

Correlation and causality

I Comovement of ∆x and ∆y

I Interpreting β in a causal sense: when is it possible?

I In a causal framework, u are the unobserved determinants of y

Rathelot

Introductory Econometrics

Page 7: Introductory Econometrics - Session 5 - The linear model

The simple regression model The multiple regression model Inference

Assumption: Zero expectation of the error

E (u) = 0

When an intercept is introduced in the model, this assumptioncomes at no cost

Rathelot

Introductory Econometrics

Page 8: Introductory Econometrics - Session 5 - The linear model

The simple regression model The multiple regression model Inference

Assumption: Zero conditional expectation

E (u|x) = 0

I This means that the error is not correlated with any functionof x

I Crucial assumption

I As a consequence:

E (y |x) = α + βx

Rathelot

Introductory Econometrics

Page 9: Introductory Econometrics - Session 5 - The linear model

The simple regression model The multiple regression model Inference

What is the right estimator?

I Based on these assumptions, how can we estimate β (and α)?

I By the moments

I By the least squares

Rathelot

Introductory Econometrics

Page 10: Introductory Econometrics - Session 5 - The linear model

The simple regression model The multiple regression model Inference

The OLS estimator

β =

∑ni (yi − y)(xi − y)∑n

i (xi − x)2

α = y − βx

Rathelot

Introductory Econometrics

Page 11: Introductory Econometrics - Session 5 - The linear model

The simple regression model The multiple regression model Inference

Algebraic properties of the OLS estimator

Let’s define

I the residualui = yi − α− βxi

I the predicted valueyi = α + βxi

Then

1.∑

ui = 0

2.∑

xi ui = 0

3. (x , y) is on the regression line

Rathelot

Introductory Econometrics

Page 12: Introductory Econometrics - Session 5 - The linear model

The simple regression model The multiple regression model Inference

Decomposition of the variance

We define:

SST =∑

(yi − y)2

SSE =∑

(yi − y)2

SSR =∑

(ui )2

Then,SST = SSE + SSR

Rathelot

Introductory Econometrics

Page 13: Introductory Econometrics - Session 5 - The linear model

The simple regression model The multiple regression model Inference

Goodness of fit

The R-squared is usually used to appreciate the goodness of fit ofthe linear regression

R2 = SSE/SST = 1− SSR/SST

It is the share of the explained variance in the total variance

Rathelot

Introductory Econometrics

Page 14: Introductory Econometrics - Session 5 - The linear model

The simple regression model The multiple regression model Inference

Assumptions: Summary

I (A1) Linear model

I (A2) Random sample in the population

I (A3) Variability of the covariate in the sample

I (A4) Zero conditional expectation of the error

Rathelot

Introductory Econometrics

Page 15: Introductory Econometrics - Session 5 - The linear model

The simple regression model The multiple regression model Inference

Statistical properties

Under assumptions (A1) to (A4), two important properties for theOLS estimator (α, β)

I The OLS estimator is consistent

I The OLS estimator is unbiased

Rathelot

Introductory Econometrics

Page 16: Introductory Econometrics - Session 5 - The linear model

The simple regression model The multiple regression model Inference

An additional assumption

I So far, nothing about precision

I (A5) Homoskedasticity: V (u|x) = σ2

I This assumption means that the variance of the error does notdepend on the value of the covariate

Assumption (A1)-(A5) are sometimes called the Gauss-Markovassumptions

Rathelot

Introductory Econometrics

Page 17: Introductory Econometrics - Session 5 - The linear model

The simple regression model The multiple regression model Inference

The variance of the OLS estimator

Conditional of the sample {x1 . . . xn}, the variance of the OLSestimator is

V (β) =σ2∑

(xi − x)2

V (α) =

∑x2in

σ2∑(xi − x)2

Rathelot

Introductory Econometrics

Page 18: Introductory Econometrics - Session 5 - The linear model

The simple regression model The multiple regression model Inference

Estimating the variance of β

I What is unknown and needed is σ2

I The usual estimator is σ2 =∑

u2in−2

I This estimator is unbiased

Rathelot

Introductory Econometrics

Page 19: Introductory Econometrics - Session 5 - The linear model

The simple regression model The multiple regression model Inference

Estimating the variance of β

I What is unknown and needed is σ2

I The usual estimator is σ2 =∑

u2in−2

I This estimator is unbiased

Rathelot

Introductory Econometrics

Page 20: Introductory Econometrics - Session 5 - The linear model

The simple regression model The multiple regression model Inference

Regression through the origin

It is possible to estimate the model with no intercept

y = βx + u

In this case, β =∑

xiyix2i

When using this estimator instead of the

one with intercept?

I When strong a priori or theory to believe that α = 0

I When variables have been centered before the regression:xi = xi − x

Rathelot

Introductory Econometrics

Page 21: Introductory Econometrics - Session 5 - The linear model

The simple regression model The multiple regression model Inference

The multiple regression model

I Multiple: not just one but several covariates

I k covariates: 1, x1, x2,... xk−1I Caeteris paribus: principle and examples

Rathelot

Introductory Econometrics

Page 22: Introductory Econometrics - Session 5 - The linear model

The simple regression model The multiple regression model Inference

Scalar notations

The linear model now writes:

yi = β0 + β1x1,i + . . . βk−1x(k−1,i) + ui

where, for the sake of clarity, the subscript i is usually omitted

Rathelot

Introductory Econometrics

Page 23: Introductory Econometrics - Session 5 - The linear model

The simple regression model The multiple regression model Inference

Matrix notations

y = Xβ + u

I y is the vector (y1 . . . yn), of length n

I X is a matrix with k columns and n rows

I β is a vector of length k

I u is a vector of length n

Rathelot

Introductory Econometrics

Page 24: Introductory Econometrics - Session 5 - The linear model

The simple regression model The multiple regression model Inference

The Gauss-Markov assumption in the multiple case

I (A1) Linear model

I (A2) Random sample in the population

I (A3) No collinearity between covariates

I (A4) Zero conditional expectation of the error

I (A5) Homoskedasticity : Var(ui |xi = x) = σ2

Rathelot

Introductory Econometrics

Page 25: Introductory Econometrics - Session 5 - The linear model

The simple regression model The multiple regression model Inference

Obtaining the OLS

Just the same as before:

I Least squares

I Moments

provide the same expression for the OLS estimators

β = (β0, β1 . . . βk−1)

Rathelot

Introductory Econometrics

Page 26: Introductory Econometrics - Session 5 - The linear model

The simple regression model The multiple regression model Inference

A compact expression

β = (X ′X )−1(X ′y)

I Under A1 to A4, the estimator is consistent and unbiased

I Under A1 to A5, its variance is equal to (X ′X )−1σ2

Rathelot

Introductory Econometrics

Page 27: Introductory Econometrics - Session 5 - The linear model

The simple regression model The multiple regression model Inference

Residuals and predicted values

The residuals and the predicted values are defined as before

u = y − X β

y = X β

Rathelot

Introductory Econometrics

Page 28: Introductory Econometrics - Session 5 - The linear model

The simple regression model The multiple regression model Inference

Goodness of fit

The residuals and the predicted values are defined as beforeThe R-squared is still used to assess the model’s goodness of fit

R2 = SSE/SST = 1− SSR/SST

Rathelot

Introductory Econometrics

Page 29: Introductory Econometrics - Session 5 - The linear model

The simple regression model The multiple regression model Inference

Estimating the variance of the OLS estimator

I As in the simple case, σ2 has to be estimated

I An unbiased estimator for σ2 is:

σ2 =1

n − k

∑u2i

Rathelot

Introductory Econometrics

Page 30: Introductory Econometrics - Session 5 - The linear model

The simple regression model The multiple regression model Inference

Projections

To interpret the meaning of OLS estimators, it is useful tointroduce:

PX = X (X ′X )−1X ′

MX = I − X (X ′X )−1X ′

I PX and MX are symmetric

I PX and MX are projectors

so that y = PX y and u = MX y

Rathelot

Introductory Econometrics

Page 31: Introductory Econometrics - Session 5 - The linear model

The simple regression model The multiple regression model Inference

The Frisch Waugh theorem

Split the covariates in two groups: X = (X1,X2), β = (β1, β2)

I First regress y on X1 and X2 on X1 and keep the residualsMX1y and MX1X2

I Now regress MX1y on MX1X2: the obtained estimator is equalto the OLS estimator β2

Rathelot

Introductory Econometrics

Page 32: Introductory Econometrics - Session 5 - The linear model

The simple regression model The multiple regression model Inference

Frisch Waugh and ceteris paribus

I Suppose we are especially interested by βj , the coefficient onxj

I Regress first xj on all the other covariates: keep the residualM−jxj

I βj may be obtained by the regression of y on M−jxj

Rathelot

Introductory Econometrics

Page 33: Introductory Econometrics - Session 5 - The linear model

The simple regression model The multiple regression model Inference

Frisch Waugh and the variance of βj

Var(βj) =σ2

(1− R2j )SSTj

I SSTj =∑

(xij − xj)2

I R2j is the R-squared from regressing xj on all other

independent variables

Rathelot

Introductory Econometrics

Page 34: Introductory Econometrics - Session 5 - The linear model

The simple regression model The multiple regression model Inference

Misspecification

Suppose the true model is y = β0 + β1x1 + β2x2 + u

I β1 is the OLS estimator relating to x1 in the regression of yon x1 and x2

I β1 is the OLS estimator of the regression of y on x1

Rathelot

Introductory Econometrics

Page 35: Introductory Econometrics - Session 5 - The linear model

The simple regression model The multiple regression model Inference

Misspecification (2)

I β1 is biased iff:

1. β2 6= 02. Cov(x1, x2) 6= 0

I In terms of variance, β is always more precise than beta

Var(β) ≥ Var(β)

Rathelot

Introductory Econometrics

Page 36: Introductory Econometrics - Session 5 - The linear model

The simple regression model The multiple regression model Inference

BLUE

I Among the linear estimators

βj =n∑i

wiyi

I that are unbiased

I the OLS estimator is the one with the smallest variance

It is said to be the Best Linear Unbiased Estimator

Rathelot

Introductory Econometrics

Page 37: Introductory Econometrics - Session 5 - The linear model

The simple regression model The multiple regression model Inference

Normality

I Even under the Gauss Markov assumptions, the distribution ofβ may still have any form

I To be able to make inference, need to add a normalityassumption

(A6) u is independent from x1 . . . xk and is distributed as aN(0, σ2)

Rathelot

Introductory Econometrics

Page 38: Introductory Econometrics - Session 5 - The linear model

The simple regression model The multiple regression model Inference

Distribution of the estimator

Under A1 to A6, the OLS estimator is distributed as:

βj ∼ N(βj ,Var(βj))

Andβj − βj√Var(βj)

∼ N(0, 1)

Rathelot

Introductory Econometrics

Page 39: Introductory Econometrics - Session 5 - The linear model

The simple regression model The multiple regression model Inference

The t-stat

I The distribution of its empirical counterpart is a Student withn − k df

βj − βj√V (βj)

∼ tn−k

I When n is not too small, this distribution is really close to astandard normal

I To test the significance of the coefficient βj , the t-statistic isusually used

βj√V (βj)

Rathelot

Introductory Econometrics

Page 40: Introductory Econometrics - Session 5 - The linear model

The simple regression model The multiple regression model Inference

To test any linear restriction

This test may be used to test any case where there is one linearrestriction

I the equality of a coefficient to 0

I the equality of a coefficient to any number

I the equality of two coefficients

I any linear relationship between two or more coefficients

Rathelot

Introductory Econometrics

Page 41: Introductory Econometrics - Session 5 - The linear model

The simple regression model The multiple regression model Inference

Testing more restrictions

The Fisher test

Rathelot

Introductory Econometrics

Page 42: Introductory Econometrics - Session 5 - The linear model

The simple regression model The multiple regression model Inference

What if normality is not likely?

I In many cases, the normality of the errors is a strongassumption

I How can we reach inference without this assumption?

I We replace A6 by another assumption

I A′6 n is sufficiently large so that we can use the asymptoticproperties of the OLS estimator

Rathelot

Introductory Econometrics

Page 43: Introductory Econometrics - Session 5 - The linear model

The simple regression model The multiple regression model Inference

Consistency of the OLS estimator

I The OLS estimator is consistent

plimβ = β

Rathelot

Introductory Econometrics

Page 44: Introductory Econometrics - Session 5 - The linear model

The simple regression model The multiple regression model Inference

Asymptotic normality

Using the Central Limit Theorem, under the Gauss Markovassumptions

I√n(β − β) N(0, σ2/A2)

I where A2 = plim(X ′X )/n

I σ2 is a consistent estimator of σ2

I Finallyβj − βj√V (βj)

N(0, 1)

Rathelot

Introductory Econometrics

Page 45: Introductory Econometrics - Session 5 - The linear model

The simple regression model The multiple regression model Inference

Asymptotic inference

When n is large, one may, without any normality assumption A6

I use the t-test for one linear restriction

I use the Fisher test for several linear restrictions

Rathelot

Introductory Econometrics

Page 46: Introductory Econometrics - Session 5 - The linear model

The simple regression model The multiple regression model Inference

The asymptotic behavior of the variance

We already know that

V ar(βj) =σ2

(1− R2j )SSTj

I σ2 is consistent for σ2

I R2j converges to some value between 0 and 1

I SSTj/n converges to Var(xj)

So the variance is O(1/n) and the standard error O(1/√n)

Rathelot

Introductory Econometrics