Multiple Regression - Kasetsart Universityfin.bus.ku.ac.th/135512 Economic Environment for...
Transcript of Multiple Regression - Kasetsart Universityfin.bus.ku.ac.th/135512 Economic Environment for...
The Multiple Regression Model
Examine the linear relationship between 1 dependent (Y) & 2 or more independent variables (Xi)
εXβXβXββY kk22110
Multiple Regression Model with k Independent Variables:
Y-intercept Population slopes Random Error
Multiple Regression Equation
The coefficients of the multiple regression model are
estimated using sample data
ik,k2i21i10i xbxbxbby
Estimated (or predicted) value of y
Estimated slope coefficients
Multiple regression equation with k independent variables:
Estimated intercept
We will always use a computer to obtain the regression
slope coefficients and other regression summary
measures.
Sales Example
Salest = b0 + b1 (Price)t
+ b2 (Advertising)t + et
Week
Pie
Sales
Price
($)
Advertising
($100s)
1 350 5.50 3.3
2 460 7.50 3.3
3 350 8.00 3.0
4 430 8.00 4.5
5 350 6.80 3.0
6 380 7.50 4.0
7 430 4.50 3.0
8 470 6.40 3.7
9 450 7.00 3.5
10 490 5.00 4.0
11 340 7.20 3.5
12 300 7.90 3.2
13 440 5.90 4.0
14 450 5.00 3.5
15 300 7.00 2.7
Multiple regression equation:
Multiple Regression Output
Regression Statistics
Multiple R 0.72213
R Square 0.52148
Adjusted R Square 0.44172
Standard Error 47.46341
Observations 15
ANOVA df SS MS F Significance F
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
ertising)74.131(Adv ce)24.975(Pri - 306.526 Sales
Adjusted
• R2 never decreases when a new X variable is added to the model, even if the new variable is not an important predictor variable
– Hence, models with different number of explanatory variables cannot be compared by R2
• What is the net effect of adding a new variable?
– We lose a degree of freedom when a new X variable is added
– Did the new X variable add enough explanatory power to offset the loss of one degree of freedom?
* Adjusted R2 penalizes excessive use of unimportant independent variables
Adjusted R2 is always smaller than R2 (except when R2 =1)
2R
F-Test for Overall Significance
of the Model
• Shows if there is a linear relationship between all of the
X variables considered together and Y
• Use F test statistic
• Hypotheses:
H0: β1 = β2 = … = βk = 0 (no linear relationship)
H1: at least one βi ≠ 0 (at least one independent variable affects Y)
6.53862252.8
14730.0
MSE
MSRF
Regression Statistics
Multiple R 0.72213
R Square 0.52148
Adjusted R Square 0.44172
Standard Error 47.46341
Observations 15
ANOVA df SS MS F Significance F
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
(continued)
F-Test for Overall Significance
With 2 and 12 degrees of
freedom P-value for
the F-Test
Source of
Variation
Sum of
Squares
Degrees of
Freedom Mean Square F Ratio
Regression SSR (k)
Error SSE (n-(k+1))
=(n-k-1)
Total SST (n-1)
kSSRMSR
)1(
knSSEMSE
MSTSST
n
( )1
FMSR
MSE
=SSR
SST= 1 -
SSE
SSTR
2
= 1 -
SSE
(n - (k + 1))
SST
(n - 1)
=MSE
MSTR
2FR
R
n k
k
2
12
1
( )
( ( ))
( )
The ANOVA Table in Regression
11-9
Hypothesis tests about individual regression slope parameters:
(1) H0: b1= 0
H1: b1 0
(2) H0: b2 = 0
H1: b2 0 . . . (k) H0: bk = 0
H1: bk 0
Test statistic for test i tb
s bn k
i
i
:( )
( ( )
1
0
Tests of the Significance of Individual
Regression Parameters
11-10
The Concept of Partial Regression
Coefficients
In multiple regression, the interpretation of slope
coefficients requires special attention:
• Here, b1 shows the relationship between X1 and
Y holding X2 constant (i.e. controlling for the
effect of X2 ).
2i21i10i xbxbby
Purifying X1 from X2 (i.e. Removing the effect of
X2 on X1 : Run a regression of X2 on X1
X2i = 0 + 1X1i + vi
vi = X2i – (0 + 1X1i) is X2 purified from X1
Then, run a regression of Yi on vi.
Yi = 0 + 1vi .
1 is the b1 in the original multiple regression
equation.
b1 shows the relationship between X1 purified from
X2 and Y.
Whenever, a new explanatory variable is
added into the regression equation or
removed from from the equation, all b
coefficients change.
(unless, the covariance of the added or
removed variable with all other variables is
zero).
The Principle of Parsimony:
Any insignificant explanatory variable should be removed out of the regression equation.
The Principle of Generosity:
Any significant variable must be included in the regression equation.
Choosing the best model:
Choose the model with the highest adjusted R2 or F or the lowest AIC (Akaike Information Criterion) or SC (Schwarz Criterion).
Apply the stepwise regression procedure.
Multiple Regression
For example:
A researcher may be interested in the
relationship between Education and Income
and Number of Children in a family.
Independent Variables
Education
Family Income
Dependent Variable
Number of Children
Multiple Regression
For example:
Research Hypothesis: As education of respondents
increases, the number of children in families will
decline (negative relationship).
Research Hypothesis: As family income of
respondents increases, the number of children in
families will decline (negative relationship).
Independent Variables
Education
Family Income
Dependent Variable
Number of Children
Multiple Regression
For example:
Null Hypothesis: There is no relationship between
education of respondents and the number of children
in families.
Null Hypothesis: There is no relationship between
family income and the number of children in families.
Independent Variables
Education
Family Income
Dependent Variable
Number of Children
Multiple Regression
Bivariate regression is based on fitting a line as
close as possible to the plotted coordinates of
your data on a two-dimensional graph.
Trivariate regression is based on fitting a plane
as close as possible to the plotted coordinates of
your data on a three-dimensional graph.
Case: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Children (Y): 2 5 1 9 6 3 0 3 7 7 2 5 1 9 6 3 0 3 7 14 2 5 1 9 6
Education (X1) 12 16 2012 9 18 16 14 9 12 12 10 20 11 9 18 16 14 9 8 12 10 20 11 9
Income 1=$10K (X2): 3 4 9 5 4 12 10 1 4 3 10 4 9 4 4 12 10 6 4 1 10 3 9 2 4
Multiple Regression
Case: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Children (Y): 2 5 1 9 6 3 0 3 7 7 2 5 1 9 6 3 0 3 7 14 2 5 1 9 6
Education (X1) 12 16 2012 9 18 16 14 9 12 12 10 20 11 9 18 16 14 9 8 12 10 20 11 9
Income 1=$10K (X2): 3 4 9 5 4 12 10 1 4 3 10 4 9 4 4 12 10 6 4 1 10 3 9 2 4
Y
X1 X2
0
Plotted coordinates
(1 – 10) for Education,
Income and Number of
Children
Multiple Regression
Case: 1 2 3 4 5 6 7 8 9 10
Children (Y): 2 5 1 9 6 3 0 3 7 7
Education (X1) 12 16 2012 9 18 16 14 9 12
Income 1=$10K (X2): 3 4 9 5 4 12 10 1 4 3
Y
X1 X2
0
What multiple regression
does is fit a plane to
these coordinates.
Multiple Regression
• Mathematically, that plane is:
Y = a + b1X1 + b2X2
a = y-intercept, where X’s equal zero
b=coefficient or slope for each variable
For our problem, SPSS says the equation is:
Y = 11.8 - .36X1 - .40X2
Expected # of Children = 11.8 - .36*Educ - .40*Income
Multiple Regression
• Let’s take a moment to
reflect…
Why do I write the equation:
Y = a + b1X1 + b2X2
Whereas KBM often write:
Yi = a + b1X1i + b2X2i + ei
One is the equation for a
prediction, the other is the
value of a data point for a
person.
Multiple Regression
Model Summary
.757a .573 .534 2.33785
Model
1
R R Square
Adjusted
R Square
Std. Error of
the Estimate
Predictors: (Constant), Income, Educat iona. ANOVAb
161.518 2 80.759 14.776 .000a
120.242 22 5.466
281.760 24
Regression
Residual
Total
Model
1
Sum of
Squares df Mean Square F Sig.
Predictors: (Constant), Income, Educat iona.
Dependent Variable: Childrenb.
Coefficientsa
11.770 1.734 6.787 .000
-.364 .173 -.412 -2.105 .047
-.403 .194 -.408 -2.084 .049
(Constant)
Education
Income
Model
1
B Std. Error
Unstandardized
Coeff icients
Beta
Standardized
Coeff icients
t Sig.
Dependent Variable: Childrena.
Y = 11.8 - .36X1 - .40X2
57% of the variation in
number of children is
explained by education
and income!
Multiple Regression
Model Summary
.757a .573 .534 2.33785
Model
1
R R Square
Adjusted
R Square
Std. Error of
the Estimate
Predictors: (Constant), Income, Educat iona. ANOVAb
161.518 2 80.759 14.776 .000a
120.242 22 5.466
281.760 24
Regression
Residual
Total
Model
1
Sum of
Squares df Mean Square F Sig.
Predictors: (Constant), Income, Educat iona.
Dependent Variable: Childrenb.
Coefficientsa
11.770 1.734 6.787 .000
-.364 .173 -.412 -2.105 .047
-.403 .194 -.408 -2.084 .049
(Constant)
Education
Income
Model
1
B Std. Error
Unstandardized
Coeff icients
Beta
Standardized
Coeff icients
t Sig.
Dependent Variable: Childrena.
Y = 11.8 - .36X1 - .40X2
r2
(Y – Y)2 - (Y – Y)2
(Y – Y)2
161.518 ÷ 261.76 = .573
Multiple Regression
So what does our equation tell us?
Y = 11.8 - .36X1 - .40X2
Expected # of Children = 11.8 - .36*Educ - .40*Income
Try “plugging in” some values for your
variables.
Multiple Regression
So what does our equation tell us?
Y = 11.8 - .36X1 - .40X2
Expected # of Children = 11.8 - .36*Educ - .40*Income
If Education equals:& If Income Equals: Then, children equals:
0 0 11.8
10 0 8.2
10 10 4.2
20 10 0.6
20 11 0.2
^
Multiple Regression
So what does our equation tell us?
Y = 11.8 - .36X1 - .40X2
Expected # of Children = 11.8 - .36*Educ - .40*Income
If Education equals:& If Income Equals: Then, children
equals:
1 0 11.44
1 1 11.04
1 5 9.44
1 10 7.44
1 15 5.44
^
Multiple Regression
So what does our equation tell us?
Y = 11.8 - .36X1 - .40X2
Expected # of Children = 11.8 - .36*Educ - .40*Income
If Education equals:& If Income Equals: Then, children
equals:
0 1 11.40
1 1 11.04
5 1 9.60
10 1 7.80
15 1 6.00
^
Multiple Regression
If graphed, holding one variable constant produces a two-
dimensional graph for the other variable.
Y
X2 = Income 0 15
11.44
5.44
b = -.4
Y
X1 = Education 0 15
11.40
6.00
b = -.36
Dummy Explanatory Variables
Qualitative binomial (0,1) variables. Di
Yi = β0 + β1Xi + β2Di + ui
For Di = 0 : Yi = β0 + β1Xi + ui
For Di = 1 : Yi = β0 + β1Xi + β2 +ui
Yi = (β0+β2)+ β1Xi +ui
To measure the effect of Di on the relation between X and Y
Yi = β0 + β1Xi + β2Xi*Di + ui
For Di = 0 : Yi = β0 + β1Xi + ui
For Di = 1 : Yi = β0 + β1Xi + β2Xi +ui
Yi = β0+ (β1+β2)Xi +ui
Warning: Dummy variables can be used only as regressors.
Should the dependent variable be binomial, you need to use
Logit or Probit regression models, which employ ML estimator.
This is because the binomial feature violates the normal
distribution assumption which renders t-statistics invalid.
(you can learn these techniques in Econometrics II)
Time-period dummies can be used for:
1) measuring the stability of a relationship over time
2) to treat outliers
Seasonal dummies can be used to treat seasonal variation in seasonally-unadjusted data. Simply create n–1 dummies for n seasonal sections and use them as regressors. You may include the seasonal dummies in the regression to control for seasonal variation.
Multiple Regression
The way you use nominal variables in regression is by converting them to a series of dummy variables.
Recode into different
Nomimal Variable Dummy Variables
Race 1. White
1 = White 0 = Not White; 1 = White
2 = Black 2. Black
3 = Other 0 = Not Black; 1 = Black
3. Other
0 = Not Other; 1 = Other
Multiple Regression
The way you use nominal variables in regression is by converting them to a series of dummy variables.
Recode into different
Nomimal Variable Dummy Variables
Religion 1. Catholic
1 = Catholic 0 = Not Catholic; 1 = Catholic
2 = Protestant 2. Protestant
3 = Jewish 0 = Not Prot.; 1 = Protestant
4 = Muslim 3. Jewish
5 = Other Religions 0 = Not Jewish; 1 = Jewish
4. Muslim
0 = Not Muslim; 1 = Muslim
5. Other Religions
0 = Not Other; 1 = Other Relig.
Multiple Regression
When you need to use a nominal variable in
regression (like race), just convert it to a series
of dummy variables.
When you enter the variables into your model,
you MUST LEAVE OUT ONE OF THE
DUMMIES.
Leave Out One Enter Rest into Regression
White Black
Other
Multiple Regression
The reason you MUST LEAVE OUT ONE OF
THE DUMMIES is that regression is
mathematically impossible without an excluded
group.
If all were in, holding one of them constant would
prohibit variation in all the rest.
Leave Out One Enter Rest into Regression
Catholic Protestant
Jewish
Muslim
Other Religion
Multiple Regression
The regression equations for dummies will
look the same.
For Race, with 3 dummies, predicting self-esteem:
Y = a + b1X1 + b2X2
a = the y-intercept,
which in this case is
the predicted value
of self-esteem for
the excluded group,
white.
b1 = the slope
for variable
X1, black
b2 = the slope
for variable
X2, other
Multiple Regression
• If our equation were:
For Race, with 3 dummies, predicting self-esteem:
Y = 28 + 5X1 – 2X2
a = the y-intercept,
which in this case is
the predicted value
of self-esteem for
the excluded group,
white.
5 = the slope
for variable
X1, black
-2 = the slope
for variable
X2, other
Plugging in values for
the dummies tells you
each group’s self-
esteem average:
White = 28
Black = 33
Other = 26
When cases’ values for X1 = 0 and X2 = 0, they are white;
when X1 = 1 and X2 = 0, they are black;
when X1 = 0 and X2 = 1, they are other.
Multiple Regression
• Dummy variables can be entered into multiple regression along with other dichotomous and continuous variables.
• For example, you could regress self-esteem on sex, race, and education:
Y = a + b1X1 + b2X2 + b3X3 + b4X4
How would you interpret this?
Y = 30 – 4X1 + 5X2 – 2X3 + 0.3X4
X1 = Female
X2 = Black
X3 = Other
X4 = Education
Multiple Regression
How would you interpret this?
Y = 30 – 4X1 + 5X2 – 2X3 + 0.3X4
1. Women’s self-esteem is 4 points lower than men’s.
2. Blacks’ self-esteem is 5 points higher than whites’.
3. Others’ self-esteem is 2 points lower than whites’ and
consequently 7 points lower than blacks’.
4. Each year of education improves self-esteem by 0.3
units.
X1 = Female
X2 = Black
X3 = Other
X4 = Education
Multiple Regression How would you interpret this?
Y = 30 – 4X1 + 5X2 – 2X3 + 0.3X4
Plugging in some select values, we’d get self-esteem for
select groups:
• White males with 10 years of education = 33
• Black males with 10 years of education = 38
• Other females with 10 years of education = 27
• Other females with 16 years of education = 28.8
X1 = Female
X2 = Black
X3 = Other
X4 = Education
Multiple Regression
How would you interpret this?
Y = 30 – 4X1 + 5X2 – 2X3 + 0.3X4
The same regression rules apply. The slopes represent
the linear relationship of each independent variable in
relation to the dependent while holding all other
variables constant.
X1 = Female
X2 = Black
X3 = Other
X4 = Education
Make sure you get into the habit of saying the
slope is the effect of an independent variable
“while holding everything else constant.”
Seasonal-adjusment using dummy variables
Example: Suppose a researcher is using seasonally-
unadjusted data at the quarterly frequency for the
variable Yt.
For 4 quarters, create 3 dummies:
D1= 1 if t is Q1, 0 otherwise
D2= 1 if t is Q2, 0 otherwise
D3= 1 if t is Q3, 0 otherwise
The residuals of the regression:
Yt = β0 + β1D1,t + β2D2,t + β3D3,t + εt is the seasonally-adjusted Yt
Log Transformations
Yi = β0 + β1Xi + ui
The β1 in the above regression indicates the
expected change in Yi resulting from a 1-unit
increase in Xi. – not the relationship in % terms –
If you need to compute the expected % change in
Yi resulting from a 1% increase in Xi , you need to
run the following regression:
Ln(Yi )= β0 + β1Ln(Xi) + ui
Assumptions of OLS Estimator
1) E(ei) = 0 (unbiasedness)
2) Var(ei) is constant (homoscedasticity)
3) Cov(ui,uj) = 0 (independent error terms)
4) Cov(ui,Xi) = 0 (error terms unrelated to X’s)
ei ~ iid (0 , 2)
Gauss-Markov Theorem: If these conditions
hold, OLS is the best linear unbiased
estimator (BLUE).
Additional Assumption: ei’s are normally distributed.
Time Series Regressions
Lagged variable: Yt = β0+β1Xt+β2Xt-1+ut
Autoregressive Model: Xt = β1Xt-1+β2Xt-2+ut
Time-Trend: Yt = β0 + β1Xt + β2Tt+ut
Spurious Regressions
• As a general and very strict rule:
All variables in a time-series regression must
be stationary.
Never run a regression with nonstationary
variables!
* DW statistic will warn.
A nonstationary variable can be made stationary by
taking its first difference.
If X is nonstationary, ΔX = Xt – Xt-1 may be
stationary.
Exercise: How to create a regression?
• Statistic descriptive: Mean, median, etc …
• Correlation: not over 0.5 for xi (explanatory
variables)
• Stationary: ADF test
• Run regression
• Test heteroscedasticity, Normality
• Test VIF in case of Multicollinearity