The simple regression model The multiple regression model Inference
Introductory EconometricsSession 5 - The linear model
Roland Rathelot
Sciences Po
July 2011
Rathelot
Introductory Econometrics
The simple regression model The multiple regression model Inference
Multivariate econometrics
I Outcome
I Covariate(s)
I Model
In the simple regression: 1 outcome and 1 regressor
Rathelot
Introductory Econometrics
The simple regression model The multiple regression model Inference
Assumption: random sample
I The population of interest should be defined
I (yi , xi ){i=1...n} are assumed to be iid
I Note that here yi and xi are rv
Rathelot
Introductory Econometrics
The simple regression model The multiple regression model Inference
Assumption: A linear model
y = α + βx + u
I y is the outcome (explained variable)
I x is the explanatory variable (covariate)
I α is the constant (intercept)
I β is the coefficient on x (slope)
I u is the error
Rathelot
Introductory Econometrics
The simple regression model The multiple regression model Inference
Correlation and causality
I Comovement of ∆x and ∆y
I Interpreting β in a causal sense: when is it possible?
I In a causal framework, u are the unobserved determinants of y
Rathelot
Introductory Econometrics
The simple regression model The multiple regression model Inference
Correlation and causality
I Comovement of ∆x and ∆y
I Interpreting β in a causal sense: when is it possible?
I In a causal framework, u are the unobserved determinants of y
Rathelot
Introductory Econometrics
The simple regression model The multiple regression model Inference
Assumption: Zero expectation of the error
E (u) = 0
When an intercept is introduced in the model, this assumptioncomes at no cost
Rathelot
Introductory Econometrics
The simple regression model The multiple regression model Inference
Assumption: Zero conditional expectation
E (u|x) = 0
I This means that the error is not correlated with any functionof x
I Crucial assumption
I As a consequence:
E (y |x) = α + βx
Rathelot
Introductory Econometrics
The simple regression model The multiple regression model Inference
What is the right estimator?
I Based on these assumptions, how can we estimate β (and α)?
I By the moments
I By the least squares
Rathelot
Introductory Econometrics
The simple regression model The multiple regression model Inference
The OLS estimator
β =
∑ni (yi − y)(xi − y)∑n
i (xi − x)2
α = y − βx
Rathelot
Introductory Econometrics
The simple regression model The multiple regression model Inference
Algebraic properties of the OLS estimator
Let’s define
I the residualui = yi − α− βxi
I the predicted valueyi = α + βxi
Then
1.∑
ui = 0
2.∑
xi ui = 0
3. (x , y) is on the regression line
Rathelot
Introductory Econometrics
The simple regression model The multiple regression model Inference
Decomposition of the variance
We define:
SST =∑
(yi − y)2
SSE =∑
(yi − y)2
SSR =∑
(ui )2
Then,SST = SSE + SSR
Rathelot
Introductory Econometrics
The simple regression model The multiple regression model Inference
Goodness of fit
The R-squared is usually used to appreciate the goodness of fit ofthe linear regression
R2 = SSE/SST = 1− SSR/SST
It is the share of the explained variance in the total variance
Rathelot
Introductory Econometrics
The simple regression model The multiple regression model Inference
Assumptions: Summary
I (A1) Linear model
I (A2) Random sample in the population
I (A3) Variability of the covariate in the sample
I (A4) Zero conditional expectation of the error
Rathelot
Introductory Econometrics
The simple regression model The multiple regression model Inference
Statistical properties
Under assumptions (A1) to (A4), two important properties for theOLS estimator (α, β)
I The OLS estimator is consistent
I The OLS estimator is unbiased
Rathelot
Introductory Econometrics
The simple regression model The multiple regression model Inference
An additional assumption
I So far, nothing about precision
I (A5) Homoskedasticity: V (u|x) = σ2
I This assumption means that the variance of the error does notdepend on the value of the covariate
Assumption (A1)-(A5) are sometimes called the Gauss-Markovassumptions
Rathelot
Introductory Econometrics
The simple regression model The multiple regression model Inference
The variance of the OLS estimator
Conditional of the sample {x1 . . . xn}, the variance of the OLSestimator is
V (β) =σ2∑
(xi − x)2
V (α) =
∑x2in
σ2∑(xi − x)2
Rathelot
Introductory Econometrics
The simple regression model The multiple regression model Inference
Estimating the variance of β
I What is unknown and needed is σ2
I The usual estimator is σ2 =∑
u2in−2
I This estimator is unbiased
Rathelot
Introductory Econometrics
The simple regression model The multiple regression model Inference
Estimating the variance of β
I What is unknown and needed is σ2
I The usual estimator is σ2 =∑
u2in−2
I This estimator is unbiased
Rathelot
Introductory Econometrics
The simple regression model The multiple regression model Inference
Regression through the origin
It is possible to estimate the model with no intercept
y = βx + u
In this case, β =∑
xiyix2i
When using this estimator instead of the
one with intercept?
I When strong a priori or theory to believe that α = 0
I When variables have been centered before the regression:xi = xi − x
Rathelot
Introductory Econometrics
The simple regression model The multiple regression model Inference
The multiple regression model
I Multiple: not just one but several covariates
I k covariates: 1, x1, x2,... xk−1I Caeteris paribus: principle and examples
Rathelot
Introductory Econometrics
The simple regression model The multiple regression model Inference
Scalar notations
The linear model now writes:
yi = β0 + β1x1,i + . . . βk−1x(k−1,i) + ui
where, for the sake of clarity, the subscript i is usually omitted
Rathelot
Introductory Econometrics
The simple regression model The multiple regression model Inference
Matrix notations
y = Xβ + u
I y is the vector (y1 . . . yn), of length n
I X is a matrix with k columns and n rows
I β is a vector of length k
I u is a vector of length n
Rathelot
Introductory Econometrics
The simple regression model The multiple regression model Inference
The Gauss-Markov assumption in the multiple case
I (A1) Linear model
I (A2) Random sample in the population
I (A3) No collinearity between covariates
I (A4) Zero conditional expectation of the error
I (A5) Homoskedasticity : Var(ui |xi = x) = σ2
Rathelot
Introductory Econometrics
The simple regression model The multiple regression model Inference
Obtaining the OLS
Just the same as before:
I Least squares
I Moments
provide the same expression for the OLS estimators
β = (β0, β1 . . . βk−1)
Rathelot
Introductory Econometrics
The simple regression model The multiple regression model Inference
A compact expression
β = (X ′X )−1(X ′y)
I Under A1 to A4, the estimator is consistent and unbiased
I Under A1 to A5, its variance is equal to (X ′X )−1σ2
Rathelot
Introductory Econometrics
The simple regression model The multiple regression model Inference
Residuals and predicted values
The residuals and the predicted values are defined as before
u = y − X β
y = X β
Rathelot
Introductory Econometrics
The simple regression model The multiple regression model Inference
Goodness of fit
The residuals and the predicted values are defined as beforeThe R-squared is still used to assess the model’s goodness of fit
R2 = SSE/SST = 1− SSR/SST
Rathelot
Introductory Econometrics
The simple regression model The multiple regression model Inference
Estimating the variance of the OLS estimator
I As in the simple case, σ2 has to be estimated
I An unbiased estimator for σ2 is:
σ2 =1
n − k
∑u2i
Rathelot
Introductory Econometrics
The simple regression model The multiple regression model Inference
Projections
To interpret the meaning of OLS estimators, it is useful tointroduce:
PX = X (X ′X )−1X ′
MX = I − X (X ′X )−1X ′
I PX and MX are symmetric
I PX and MX are projectors
so that y = PX y and u = MX y
Rathelot
Introductory Econometrics
The simple regression model The multiple regression model Inference
The Frisch Waugh theorem
Split the covariates in two groups: X = (X1,X2), β = (β1, β2)
I First regress y on X1 and X2 on X1 and keep the residualsMX1y and MX1X2
I Now regress MX1y on MX1X2: the obtained estimator is equalto the OLS estimator β2
Rathelot
Introductory Econometrics
The simple regression model The multiple regression model Inference
Frisch Waugh and ceteris paribus
I Suppose we are especially interested by βj , the coefficient onxj
I Regress first xj on all the other covariates: keep the residualM−jxj
I βj may be obtained by the regression of y on M−jxj
Rathelot
Introductory Econometrics
The simple regression model The multiple regression model Inference
Frisch Waugh and the variance of βj
Var(βj) =σ2
(1− R2j )SSTj
I SSTj =∑
(xij − xj)2
I R2j is the R-squared from regressing xj on all other
independent variables
Rathelot
Introductory Econometrics
The simple regression model The multiple regression model Inference
Misspecification
Suppose the true model is y = β0 + β1x1 + β2x2 + u
I β1 is the OLS estimator relating to x1 in the regression of yon x1 and x2
I β1 is the OLS estimator of the regression of y on x1
Rathelot
Introductory Econometrics
The simple regression model The multiple regression model Inference
Misspecification (2)
I β1 is biased iff:
1. β2 6= 02. Cov(x1, x2) 6= 0
I In terms of variance, β is always more precise than beta
Var(β) ≥ Var(β)
Rathelot
Introductory Econometrics
The simple regression model The multiple regression model Inference
BLUE
I Among the linear estimators
βj =n∑i
wiyi
I that are unbiased
I the OLS estimator is the one with the smallest variance
It is said to be the Best Linear Unbiased Estimator
Rathelot
Introductory Econometrics
The simple regression model The multiple regression model Inference
Normality
I Even under the Gauss Markov assumptions, the distribution ofβ may still have any form
I To be able to make inference, need to add a normalityassumption
(A6) u is independent from x1 . . . xk and is distributed as aN(0, σ2)
Rathelot
Introductory Econometrics
The simple regression model The multiple regression model Inference
Distribution of the estimator
Under A1 to A6, the OLS estimator is distributed as:
βj ∼ N(βj ,Var(βj))
Andβj − βj√Var(βj)
∼ N(0, 1)
Rathelot
Introductory Econometrics
The simple regression model The multiple regression model Inference
The t-stat
I The distribution of its empirical counterpart is a Student withn − k df
βj − βj√V (βj)
∼ tn−k
I When n is not too small, this distribution is really close to astandard normal
I To test the significance of the coefficient βj , the t-statistic isusually used
βj√V (βj)
Rathelot
Introductory Econometrics
The simple regression model The multiple regression model Inference
To test any linear restriction
This test may be used to test any case where there is one linearrestriction
I the equality of a coefficient to 0
I the equality of a coefficient to any number
I the equality of two coefficients
I any linear relationship between two or more coefficients
Rathelot
Introductory Econometrics
The simple regression model The multiple regression model Inference
Testing more restrictions
The Fisher test
Rathelot
Introductory Econometrics
The simple regression model The multiple regression model Inference
What if normality is not likely?
I In many cases, the normality of the errors is a strongassumption
I How can we reach inference without this assumption?
I We replace A6 by another assumption
I A′6 n is sufficiently large so that we can use the asymptoticproperties of the OLS estimator
Rathelot
Introductory Econometrics
The simple regression model The multiple regression model Inference
Consistency of the OLS estimator
I The OLS estimator is consistent
plimβ = β
Rathelot
Introductory Econometrics
The simple regression model The multiple regression model Inference
Asymptotic normality
Using the Central Limit Theorem, under the Gauss Markovassumptions
I√n(β − β) N(0, σ2/A2)
I where A2 = plim(X ′X )/n
I σ2 is a consistent estimator of σ2
I Finallyβj − βj√V (βj)
N(0, 1)
Rathelot
Introductory Econometrics
The simple regression model The multiple regression model Inference
Asymptotic inference
When n is large, one may, without any normality assumption A6
I use the t-test for one linear restriction
I use the Fisher test for several linear restrictions
Rathelot
Introductory Econometrics
The simple regression model The multiple regression model Inference
The asymptotic behavior of the variance
We already know that
V ar(βj) =σ2
(1− R2j )SSTj
I σ2 is consistent for σ2
I R2j converges to some value between 0 and 1
I SSTj/n converges to Var(xj)
So the variance is O(1/n) and the standard error O(1/√n)
Rathelot
Introductory Econometrics
Top Related