Lecture 19 Multiple (Linear) Regression - Statistical Science · Lecture 19 Multiple (Linear)...
Embed Size (px)
Transcript of Lecture 19 Multiple (Linear) Regression - Statistical Science · Lecture 19 Multiple (Linear)...

Lecture 19Multiple (Linear) Regression
Thais PaivaSTA 111 - Summer 2013 Term II
August 1, 2013
1 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

Lecture Plan
1 Multiple regression
2 OLS estimates of β and α
3 Interpretation
2 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

Linear regression
A study on depression:
The response variable is Depression, which is the score on aself-report depression inventory
Predictors:
Simplicity is the score that indicates a subjects need to seethe world in black and whiteFatalism is the score that indicates the belief in the ability tocontrol ones own destiny.
Depression is thought to be related to simplicity and fatalism
3 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

Linear regression
Patient Depression Simplicity Fatalism1 0.42 0.76 0.112 0.52 0.73 1.003 0.71 0.62 0.044 0.66 0.84 0.425 0.54 0.48 0.816 0.34 0.41 1.237 0.42 0.85 0.308 1.08 1.50 1.209 0.36 0.31 0.66
10 0.92 1.41 0.8511 0.33 0.43 0.4212 0.41 0.53 0.0713 0.83 1.17 0.3014 0.65 0.42 1.0915 0.80 0.76 1.13
4 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

Depression data
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
0.0
0.5
1.0
1.5
2.0
2.5
Simplicity
Fata
lism
Dep
ress
ion
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●●
●●
●
● ●
5 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

Depression data
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
0.0
0.5
1.0
1.5
2.0
2.5
Simplicity
Fata
lism
Dep
ress
ion
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●●
●●
●
● ●
6 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

Depression data - residuals
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
0.0
0.5
1.0
1.5
2.0
2.5
Simplicity
Fata
lism
Dep
ress
ion
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●●
●●
●
● ●
7 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

Assumptions for multiple linear regression
Yi = α + β1X1i + β2X2i + . . .+ βpXpi + εi
Just as with simple linear regression, the following have to hold:
1 Constant variance (also called homoscedasticity)
V (εi ) = σ2 for all i = 1, . . . , n, for some σ2
2 Linearity
3 Independence
εi ⊥ εj for all i , j = 1, . . . , n, i 6= j
8 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

Interpretation of the β’s
Yi = α + β1X1i + β2X2i + . . .+ βpXpi + εi
βj is the average effect on Y of increasing Xj by one unit,with all Xk 6=j held constant
This is sometimes referred to asthe effect of Xj after “controlling for” Xk 6=j
So βsimplicity is the average effect of simplicity on depression aftercontrolling for fatalism
9 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

Always plot residuals
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
0.5 1.0 1.5 2.0 2.5 3.0
−0.
50.
00.
51.
0
simplicity
ε
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
0.0 0.5 1.0 1.5 2.0
−0.
50.
00.
51.
0fatalism
ε
10 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

Histogram of residuals
ε
Fre
quen
cy
−0.5 0.0 0.5 1.0
05
1015
11 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

OLS estimates of α, β1, . . . , βp
(This is only really reasonable to write down if p = 2)
Yi = α + β1X1i + β2X2i + εi
β1 =sY (rX1Y − rX1X2 rX2Y )
sX1(1 − r 2X1X2)
β2 =sY (rX2Y − rX1X2 rX1Y )
sX2(1 − r 2X1X2)
α = Y − β1X1 − β2X2,
where
rAB =
∑ni=1(Ai − A)(Bi − B)√∑n
i=1(Ai − A)2√∑n
i=1(Bi − B)2for some A and B and
S2A =
1
n − 1
n∑i=1
(Ai − A)2 for some A
12 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

It is easier if you know matrix algebra
Y = Xβ + ε,
where
Y =
y1y2...
yn
, X =
1 x11 . . . x1p1 x21 . . . x2p...
.... . .
...1 x21 . . . xnp
, β =
αβ1...βp
, ε =
ε1ε2...εn
13 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

It is easier if you know matrix algebra
It turns out that the error sum of squares can be written as
ε = (Y − Xβ)T (Y − Xβ)
∂ε
∂β= 2XT (Y − Xβ)
set= 0
XTY − XTXβ = 0
XTY = XTXβ
(XTX)−1XTY = β
14 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

It is easier if you know matrix algebra
A couple of things are clear
β = (XTX)−1XTY
1 β is linear in Y
2 β is easy to compute if we have a computer
15 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

The coefficient of determination
Similarly to simple linear regression,
r2 =ESS
TSS
andTSS = ESS + RSS ,
where
TSS =n∑
i=1
(Yi − Y )2, ESS =n∑
i=1
(Yi − Y )2, RSS =n∑
i=1
(Yi − Yi )2
SS: Sum of Squares. T: Total. E: Explained. R: Residual
16 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

s2 and degrees of freedom
Similarly to simple linear regression,
s2 =1
n − p − 1
n∑i=1
(Yi − Yi )2
=RSS
n − p − 1
Note the n − p − 1 degrees of freedom. Why?
We had to estimate p + 1 regression parameters.
17 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

Hypothesis tests for βj
Suppose we are interested in testing
H0 : βj = 0
HA : βj 6= 0 (or the one-sided version)
Assuming p = 2 (tractable, but more complicated for p > 2), define
s2β1
=s2
(n − 1)s2X1(1− r2X1X2
)
and similarly for s2β2
.
Then (even for p > 2),
tβj=βj − βj
sβj
∼ tn−p−1
18 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

Hypothesis tests for βj
Notice that
s2β1
=s2
(n − 1)s2X1(1− r2X1X2
)
depends on r2X1X2, which depends on X2.
So the test for β1 depends on the other predictor variables
What is the interpretation of this test then?
“Assuming that the other βk 6=j 6= 0, can we reject the hypothesisthat βj = 0?”
19 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

Scatterplot matrix
●
depression
0.5 1.0 1.5 2.0 2.5 3.0
●●
●●
●
●●
●
●
●
●●
●
●
●
●●●
●● ●
● ●
●
●
●
●●
●
●
●●
●
●●
●
● ●
●
●●
●
●
●●
●
●
●
●
●●●
●
●
●●●
●
●●●
●
●
● ●
●
●
●●●
●
●
●●
●●
●
● ●
●
●
●
0.5
1.0
1.5
2.0
2.5
●●
●●
●
●●
●
●
●
●●
●
●
●
●●●
●● ●
●●
●
●
●
●●
●
●
●●
●
●●
●
●●
●
●●
●
●
●●
●
●
●
●
●●●
●
●
●●●
●
●●●
●
●
● ●
●
●
● ●●
●
●
●●
●●
●
● ●
●
●
●
0.5
1.0
1.5
2.0
2.5
3.0
● ●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
simplicity
● ●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
● ●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
● ●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
0.5 1.0 1.5 2.0 2.5
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
0.0 0.5 1.0 1.5 2.0
0.0
0.5
1.0
1.5
2.0
●●
fatalism
20 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

3D Scatterplot and plane
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
0.0
0.5
1.0
1.5
2.0
2.5
Simplicity
Fata
lism
Dep
ress
ion
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●●
●●
●
● ●
21 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

Tests for βsimplicity and βfatalism
βsimplicity:tβsimplicity
= 3.649→ p-value = 0.0005
βfatalism:tβfatalism
= 3.829→ p-value = 0.0003
But what if we take fatalism out of the model? Then we get
βsimplicity:tβsimplicity
= 4.175→ p-value = 2× 10−8
Why?
22 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

Scatterplot matrix
●
depression
0.5 1.0 1.5 2.0 2.5 3.0
●●
●●
●
●●
●
●
●
●●
●
●
●
●●●
●● ●
● ●
●
●
●
●●
●
●
●●
●
●●
●
● ●
●
●●
●
●
●●
●
●
●
●
●●●
●
●
●●●
●
●●●
●
●
● ●
●
●
●●●
●
●
●●
●●
●
● ●
●
●
●
0.5
1.0
1.5
2.0
2.5
●●
●●
●
●●
●
●
●
●●
●
●
●
●●●
●● ●
●●
●
●
●
●●
●
●
●●
●
●●
●
●●
●
●●
●
●
●●
●
●
●
●
●●●
●
●
●●●
●
●●●
●
●
● ●
●
●
● ●●
●
●
●●
●●
●
● ●
●
●
●
0.5
1.0
1.5
2.0
2.5
3.0
● ●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
simplicity
● ●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
● ●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
● ●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
0.5 1.0 1.5 2.0 2.5
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
0.0 0.5 1.0 1.5 2.0
0.0
0.5
1.0
1.5
2.0
●●
fatalism
23 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

1907 Romanian Peasant Rebellion
From Wikipedia:
The Romanian Peasants’ Revolt took place in March 1907 in Moldaviaand it quickly spread, reaching Wallachia.
Y = Intensity of the rebellion, by county
X1 = Commercialization of agriculture
X2 = Traditionalism
X3 = Strength of middle peasantry
X4 = Inequality of land tenure
24 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

Scatterplot matrix
●
●●intensity
10 20 30 40
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●● ●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●●
●
●
●
●
5 10 15
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●● ●●
●
●
●
●
●
● ●
●
●
●
● −1
12
34
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●● ●●
●
●
●
●
●
●●
●
●
●
●
1020
3040
●
●
●
●●
●
●
● ●
●
●●
●●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●
commerce
●
●
●
●●
●
●
● ●
●
●●
●●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●●
●
●
●●
●
●●
●●
●●
●
●
●
●
●
●
●
●● ●
●
●
●
●●
●
●
●●
●
●
● ●
●
●●
●●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●
● ●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●● ●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●tradition
●●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
8085
90
● ●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
510
15
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●●
●
●
● ●●
●
●
● ●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●●
●
●
● ●●
●
●
● ●
●
●
●
●
●
●
● ●●
●
●
●
●
●●
●
●
●
●
●●
●
●
● ●●
●
●
● ●
●
●
midpeasant
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●●
●
●
●●
●
−1 1 2 3 4
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
80 85 90
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
0.45 0.60 0.75
0.45
0.60
0.75
●
inequality
25 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

Peasant Rebellion results
With all the predictors in the model:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -12.32796 5.74640 -2.145 0.0418 *
commerce 0.10055 0.02144 4.690 8.33e-05 ***
tradition 0.10578 0.06161 1.717 0.0984 .
midpeasant 0.09333 0.07466 1.250 0.2229
inequality 0.42198 3.11171 0.136 0.8932
26 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

Peasant Rebellion results
Without commerce, tradition becomes significant at α = 0.05:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -20.03497 7.40287 -2.706 0.0119 *
tradition 0.19705 0.07859 2.507 0.0187 *
midpeasant 0.03480 0.09897 0.352 0.7279
inequality 5.12172 3.96053 1.293 0.2073
27 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

Caveats
1 Be careful interpreting the coefficients! Multiple regression is usuallyapplied to observational data
2 Do not think of the sign of the coefficient as special – it can actuallychange as other covariates are added or removed from the model
3 Similarly, tests about any covariate are only meaningful in thecontext of the other covariates in the model
4 Always make sure a linear model is appropriate for all predictors!
5 Always check residuals for heteroscedasticity and normality
28 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

Caveats
In particular, a special case that you should be careful about is whenthe predictors are highly correlated
In this situation the coefficient estimates may change erratically inresponse to small changes in the model or the data
This phenomenon is called multicollinearity
Because of that, matrix correlation of the predictors is alsosomething to look at (and report) in the analysis
29 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

Summary
1 Multiple linear regression fits the best hyperplane to the data
2 We can test hypotheses about any of the βj ’s
3 Be careful about interpretation
4 Correlation of the predictors also important because ofmulticollinearity
30 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013