Econometrics: Multiple Linear Regression - — OCW -...
date post
14-Jul-2018Category
Documents
view
213download
0
Embed Size (px)
Transcript of Econometrics: Multiple Linear Regression - — OCW -...
Econometrics:Multiple Linear Regression
Burcu Eke
UC3M
The Multiple Linear Regression Model
I Many economic problems involve more than one exogenousvariable affects the response variable
Demand for a product given prices of competing brands,advertising,house hold attributes, etc.Production function
I In SLR, we had Y = 0 + 1X1 + . Now, we are interestedin modeling Y with more variables, such as X2, X3, . . . , Xk
Y = 0 + 1X1 + 2X2 + . . .+ kXk +
2
Example 1: Crimes on campus
Consider the scatter plots: Crime vs. Enrollment and Crime vs.Police
0
10000
20000
30000
40000
50000
60000
0 500 1000 1500 2000 2500
0
10
20
30
40
50
60
70
80
0 500 1000 1500 2000 2500
police
3
The Multiple Linear Regression Model
I Y = 0 + 1X1 + 2X2 + . . .+ kXk +
The slope terms j , j = 1, . . . k are interpreted as thepartial effects, or ceteris paribus effects, of a change in thecorresponding XjWhen the assumptions of MLR are met, this interpretationis correct although the data do not come from anexperiment the MLR reproduces the conditions similarto a controlled experiment (holding other factors constant)in a non-experimental setting.
I X1 is not independent of X2, X3, . . . , Xk.
4
Example 1, contd
Consider the scatter plot: Enrollment vs. Police
0
10000
20000
30000
40000
50000
60000
0 10 20 30 40 50 60 70 80
enroll
enroll
5
Example 2: Effect of education on wages
I Let Y be the wage and X1 be the years of education
I Even though our primary interest is to assess the effects ofeducation, we know that other factors, such as gender(X2), work experience (X3), and performance (X4), caninfluence the wages
I Y = 0 + 1X1 + 2X2 + 3X3 + 4X4 +
I Do you think education and the other factors areindependent?
I 1: the partial effect of education, holding experience,gender and performance constant. In SLR, we have nocontrol over these factors since they are part of the errorterm.
6
Example 3: Effect of rooms on price of a house
I Suppose we have data on selling prices of apartments indifferent neighborhoods of Madrid
I We are interested in the effect of number of bedrooms onthese prices
I But the number of bathrooms also influence the price
I Moreover, as the number of rooms increase, number ofbathrooms may increase as well, i.e., it is hard to see anone-bedroom apartment with two bathrooms.
7
The Multiple Linear Regression Model:Assumptions
I A1: Linear in parameters
I A2: E[|X1,X2, . . . ,Xk] = 0For a given combination of independent variables,(X1, X2, . . . , Xk), the average of the error term is equal to 0.This assumption have similar implications as in SLR,
1. E[] = 02. For any function h(), C(h(Xj), ) = 0 for all
j {1, 2, . . . , k} C(Xj , ) = 0 for all j {1, 2, . . . , k}
8
The Multiple Linear Regression Model:Assumptions
I A3: Conditional Homoscedasticity GivenX1, X2, . . . , Xk, variance of and/or Y are the same for allobservations.
V (|X1, X2, . . . , Xk) = 2V (Y |X1, X2, . . . , Xk) = 2
I A4: No Perfect Multicolinearity: None of theX1, X2, . . . , Xk can be written as linear combinations of theremaining X1, X2, . . . , Xk.
9
The Multiple Linear Regression Model:Assumptions
I From A1 and A2, we have a conditional expectationfunction (CEF) such thatE[Y |X1, X2, . . . , Xk] = 0 + 1X1 + 2X2 + . . .+ kXk +
Then, CEF gives us the conditional mean of Y for thisspecific subpopulationRecall that E[Y |X1, X2, . . . , Xk] is the best prediction(achieved via minimizing E[2])MLR is similar to the SLR in the sense that under thelinearity assumption, best prediction and best linearprediction do coincide.
10
The Multiple Linear Regression Model: TwoVariable Case
I Lets consider the MLR model with two independentvariables. Our model is of the formY = 0 + 1X1 + 2X2 +
I Recall the housing price example, where Y is the sellingprice, X1 is the number of bedrooms, and X2 is the numberof bathrooms
11
The Multiple Linear Regression Model: TwoVariable Case
I Then, we have E [Y |X1, X2] = 0 + 1X1 + 2X2, henceE [Y |X1, X2 = 0] = 0 + 1X1E [Y |X1, X2 = 1] = (0 + 2) + 1X1
I This implies that, holding X1 constant, the change inE [Y |X1, X2 = 0]E [Y |X1, X2 = 1] is equal to 2, or moregenerally, 2X2
12
The Multiple Linear Regression Model:Interpretation of Coefficients
I More generally, when everything else held constant, achange in Xj results in E [Y |X1, . . . , Xk] = jXj . Inother words:
j =E [Y |X1, . . . , Xk]
Xj
I Then, j can be interpreted as follows: When Xj increasesby one unit, holding everything else constant, Y , onaverage, increases by varies, j units.
I Do you think the multiple regression of Y on X1, . . . , Xk isequivalent to the individual simple regressions of Y onX1, . . . , Xk seperately? WHY or WHY NOT?
13
The Multiple Linear Regression Model:Interpretation of Coefficients
I Recall Example 3. In the model Y = 0 + 1X1 + 2 + ,where X1 is the number of bedrooms, and X2 is thenumber of bathrooms
1 is the increase in housing prices, on average, for anadditional bedroom while holding the number of bathroomsconstant, in other worlds, for the apartments with the samenumber of bathrooms
I If we were to perform a SLR, Y = 0 +1X1 + , where X1is the number of bedrooms
1 is the increase in housing prices, on average, for anadditional bedroom.Notice that in this regression we have no control overnumber of bathrooms. In other words, we ignore thedifferences due to the number of bathrooms
I j partial regression coefficient14
The Multiple Linear Regression Model: FullModel vs Reduced Model
I Recall our model with two independent variables:Y = 0 + 1X1 + 2X2 + , whereE [Y |X1, X2] = 0 + 1X1 + 2X2
I In this model, the parameters satisfy:
E[] = 0 0 = E[Y ] 1E[X1] 2E[X2] (1)C(X1, ) = 0 1V (X1) + 2C(X1, X2) = C(X1, Y )(2)C(X2, ) = 0 1C(X1, X2) + 2V (X2) = C(X2, Y )(3)
15
The Multiple Linear Regression Model: FullModel vs Reduced Model
I From these equations, we then get
1 =V (X2)C(X1, Y ) C(X1, X2)C(X2, Y )
V (X1)V (X2) [C(X1, X2)]2
2 =V (X1)C(X2, Y ) C(X1, X2)C(X1, Y )
V (X1)V (X2) [C(X1, X2)]2
I Notice that if C(X1, X2) = 0, these parameters areequivalent to SLR of Y on X1 and X2, respectively, i.e.,
1 =C(X1, Y )
V (X1)and 2 =
C(X2, Y )
V (X2)
16
The Multiple Linear Regression Model: FullModel vs Reduced Model
I Assume we are interested in the effect of X1 on Y , andconcerned that X1 and X2 may be correlated. Then a SLRdoes not give us the effect we want.
I Define the model L(Y |X1) = 0 + 1X1 as the reducedmodel (We also could have considered reduced model withX2 as our independent variable, results would have beenthe same). Then 0 and 1 satisfy the following equations:
E[] = 0 0 = E[Y ] 1E[X1] (4)
C(X1, ) = 0 1 =C(X1, Y )
V (X1)(5)
17
The Multiple Linear Regression Model: FullModel vs Reduced Model
I Using equations (2) and (5) we get:
1 =C(X1, Y )
V (X1)=1V (X1) + 2C(X1, X2)
V (X1)= 1+2
C(X1, X2)
V (X1)
I Notice the following:
1. 1 = 1 only if when C(X1, X2) = 0 OR when 2 = 0
2.C(X1, X2)
V (X1)is the slope for the prediction
L(X2|X1) = 0 + 1X1
18
Review
I Our model: Yi = 0 + 1X1i + 2X2i + . . .+ kXki + iWe want X1 and X2, . . . , Xk NOT to be independent
I Assumptions:
A1 to A3 similar to SLRA4: No perfect multicollinearity
I For example, we dont have a case like X1i = 2X2i + 0.5X3ifor all i = 1, . . . , n
19
Review
I Interpretation
Model Coefficient of X1Yi = 0 + 1X1i + i For one unit increase
in X1, Y , on average,increases by 1 units
Yi = 0 + 1X1i + 2X2i + i For one unit increase in X1,holding everything elseconstant, Y , on average,increases by 1 units
20
Review
I Full Model vs Reduced Model - focusing on two variablecase
Full Model: Yi = 0 + 1X1i + 2X2i + iReduced Model: Yi = 0 + 1X1i + iReduced model cannot control for X2, cannot hold itconstant, and see the effects of X1, holding X2 constant
I 1 =C(X1, Y )
V (X1)=1V (X1) + 2C(X1, X2)
V (X1)=
1 + 2C(X1, X2)
V (X1)1 reflects the effect of X1, holding X2 constant, plus
2C(X1, X2)
V (X1)
21
Review: An Example
I Model Yi = 0 + 1X1i + 2X2i + i, where Y is theUniversity GPA, X1 is the high school GPA, and X2 is thetotal SAT score.
Coefficient Std. Error t-ratio p-value
const 0.0881312 0.286664 0.3074 0.7592HighSGPA 0.407113 0.0905946 4.4938 0.0000SATtotal 0.00121666 0.000301086 4.0409 0.0001
22
Review: An Example
Coefficient Std. Error t-ratio p-value
const 0.822036 0.190689 4.3109 0.0000HighSGPA 0.565491 0.0878337 6.4382 0.0000
Coefficient Std. Error t-ratio p-value
const 0.151892 0.307993 0.4932 0.6230SATtotal 0.00180201 0.000296847 6.0705 0.0000
23
Review: An Example
Table : Dependent variable: HighSGPA
Coefficient Std. Error t-ratio p-value
const 0.589574 0.314040 1.8774 0.0634SATtotal 0.00143780 0.000302675 4.7503 0.0000
24
MULTIPLE LINEAR REGRESSION: THE ESTIMATION
25
The Multiple Linear Regression Model:Estimation
I Our
Recommended