Lecture 19 Multiple (Linear) Regression - Statistical Science · Lecture 19 Multiple (Linear)...

Lecture 19Multiple (Linear) Regression

Thais PaivaSTA 111 - Summer 2013 Term II

August 1, 2013

1 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

Lecture Plan

1 Multiple regression

2 OLS estimates of β and α

3 Interpretation


Linear regression

A study on depression:

The response variable is Depression, which is the score on aself-report depression inventory

Predictors:

Simplicity is the score that indicates a subjects need to seethe world in black and whiteFatalism is the score that indicates the belief in the ability tocontrol ones own destiny.

Depression is thought to be related to simplicity and fatalism


Linear regression

Patient Depression Simplicity Fatalism1 0.42 0.76 0.112 0.52 0.73 1.003 0.71 0.62 0.044 0.66 0.84 0.425 0.54 0.48 0.816 0.34 0.41 1.237 0.42 0.85 0.308 1.08 1.50 1.209 0.36 0.31 0.66

10 0.92 1.41 0.8511 0.33 0.43 0.4212 0.41 0.53 0.0713 0.83 1.17 0.3014 0.65 0.42 1.0915 0.80 0.76 1.13


Depression data

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.0

0.5

1.0

1.5

2.0

2.5

3.0

0.0

0.5

1.0

1.5

2.0

2.5

Simplicity

Fata

lism

Dep

ress

ion

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●●

●

●●

●

●●

●

●

●●

●

●

●

●●

●

●

●

●●●

●

●

●

●

●●

●●

●

● ●


Depression data - residuals

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.0

0.5

1.0

1.5

2.0

2.5

3.0

0.0

0.5

1.0

1.5

2.0

2.5

Simplicity

Fata

lism

Dep

ress

ion

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●●

●

●●

●

●●

●

●

●●

●

●

●

●●

●

●

●

●●●

●

●

●

●

●●

●●

●

● ●


Assumptions for multiple linear regression

Yi = α + β1X1i + β2X2i + . . .+ βpXpi + εi

Just as with simple linear regression, the following have to hold:

1 Constant variance (also called homoscedasticity)

V (εi ) = σ2 for all i = 1, . . . , n, for some σ2

2 Linearity

3 Independence

εi ⊥ εj for all i , j = 1, . . . , n, i 6= j


Interpretation of the β’s

Yi = α + β1X1i + β2X2i + . . .+ βpXpi + εi

βj is the average effect on Y of increasing Xj by one unit,with all Xk 6=j held constant

This is sometimes referred to asthe effect of Xj after “controlling for” Xk 6=j

So βsimplicity is the average effect of simplicity on depression aftercontrolling for fatalism


Always plot residuals

●

●

●

●

●

●

●

●

●

●●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

● ●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●●

●

●

●

●

0.5 1.0 1.5 2.0 2.5 3.0

−0.

50.

00.

51.

0

simplicity

ε

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

● ●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

● ●

●

●

●

●

0.0 0.5 1.0 1.5 2.0

−0.

50.

00.

51.

0fatalism

ε


Histogram of residuals

ε

Fre

quen

cy

−0.5 0.0 0.5 1.0

05

1015


OLS estimates of α, β1, . . . , βp

(This is only really reasonable to write down if p = 2)

Yi = α + β1X1i + β2X2i + εi

β1 =sY (rX1Y − rX1X2 rX2Y )

sX1(1 − r 2X1X2)

β2 =sY (rX2Y − rX1X2 rX1Y )

sX2(1 − r 2X1X2)

α = Y − β1X1 − β2X2,

where

rAB =

∑ni=1(Ai − A)(Bi − B)√∑n

i=1(Ai − A)2√∑n

i=1(Bi − B)2for some A and B and

S2A =

1

n − 1

n∑i=1

(Ai − A)2 for some A


It is easier if you know matrix algebra

Y = Xβ + ε,

where

Y =

y1y2...

yn

, X =

1 x11 . . . x1p1 x21 . . . x2p...

.... . .

...1 x21 . . . xnp

, β =

αβ1...βp

, ε =

ε1ε2...εn



It turns out that the error sum of squares can be written as

ε = (Y − Xβ)T (Y − Xβ)

∂ε

∂β= 2XT (Y − Xβ)

set= 0

XTY − XTXβ = 0

XTY = XTXβ

(XTX)−1XTY = β



A couple of things are clear

β = (XTX)−1XTY

1 β is linear in Y

2 β is easy to compute if we have a computer


The coefficient of determination

Similarly to simple linear regression,

r2 =ESS

TSS

andTSS = ESS + RSS ,

where

TSS =n∑

i=1

(Yi − Y )2, ESS =n∑

i=1

(Yi − Y )2, RSS =n∑

i=1

(Yi − Yi )2

SS: Sum of Squares. T: Total. E: Explained. R: Residual


s2 and degrees of freedom

Similarly to simple linear regression,

s2 =1

n − p − 1

n∑i=1

(Yi − Yi )2

=RSS

n − p − 1

Note the n − p − 1 degrees of freedom. Why?

We had to estimate p + 1 regression parameters.


Hypothesis tests for βj

Suppose we are interested in testing

H0 : βj = 0

HA : βj 6= 0 (or the one-sided version)

Assuming p = 2 (tractable, but more complicated for p > 2), define

s2β1

=s2

(n − 1)s2X1(1− r2X1X2

)

and similarly for s2β2

.

Then (even for p > 2),

tβj=βj − βj

sβj

∼ tn−p−1


Hypothesis tests for βj

Notice that

s2β1

=s2

(n − 1)s2X1(1− r2X1X2

)

depends on r2X1X2, which depends on X2.

So the test for β1 depends on the other predictor variables

What is the interpretation of this test then?

“Assuming that the other βk 6=j 6= 0, can we reject the hypothesisthat βj = 0?”


Scatterplot matrix

●

depression

0.5 1.0 1.5 2.0 2.5 3.0

●●

●●

●

●●

●

●

●

●●

●

●

●

●●●

●● ●

● ●

●

●

●

●●

●

●

●●

●

●●

●

● ●

●

●●

●

●

●●

●

●

●

●

●●●

●

●

●●●

●

●●●

●

●

● ●

●

●

●●●

●

●

●●

●●

●

● ●

●

●

●

0.5

1.0

1.5

2.0

2.5

●●

●●

●

●●

●

●

●

●●

●

●

●

●●●

●● ●

●●

●

●

●

●●

●

●

●●

●

●●

●

●●

●

●●

●

●

●●

●

●

●

●

●●●

●

●

●●●

●

●●●

●

●

● ●

●

●

● ●●

●

●

●●

●●

●

● ●

●

●

●

0.5

1.0

1.5

2.0

2.5

3.0

● ●●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●●

●

●

●

●●

●●

●

●

●

●

● ●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●●

●

●

●●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

simplicity

● ●●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●●

●

●

●

● ●

●●

●

●

●

●

● ●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●●

●

●

● ●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

0.5 1.0 1.5 2.0 2.5

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

0.0 0.5 1.0 1.5 2.0

0.0

0.5

1.0

1.5

2.0

●●

fatalism


3D Scatterplot and plane

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.0

0.5

1.0

1.5

2.0

2.5

3.0

0.0

0.5

1.0

1.5

2.0

2.5

Simplicity

Fata

lism

Dep

ress

ion

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●●

●

●●

●

●●

●

●

●●

●

●

●

●●

●

●

●

●●●

●

●

●

●

●●

●●

●

● ●


Tests for βsimplicity and βfatalism

βsimplicity:tβsimplicity

= 3.649→ p-value = 0.0005

βfatalism:tβfatalism

= 3.829→ p-value = 0.0003

But what if we take fatalism out of the model? Then we get

βsimplicity:tβsimplicity

= 4.175→ p-value = 2× 10−8

Why?


Scatterplot matrix

●

depression

0.5 1.0 1.5 2.0 2.5 3.0

●●

●●

●

●●

●

●

●

●●

●

●

●

●●●

●● ●

● ●

●

●

●

●●

●

●

●●

●

●●

●

● ●

●

●●

●

●

●●

●

●

●

●

●●●

●

●

●●●

●

●●●

●

●

● ●

●

●

●●●

●

●

●●

●●

●

● ●

●

●

●

0.5

1.0

1.5

2.0

2.5

●●

●●

●

●●

●

●

●

●●

●

●

●

●●●

●● ●

●●

●

●

●

●●

●

●

●●

●

●●

●

●●

●

●●

●

●

●●

●

●

●

●

●●●

●

●

●●●

●

●●●

●

●

● ●

●

●

● ●●

●

●

●●

●●

●

● ●

●

●

●

0.5

1.0

1.5

2.0

2.5

3.0

● ●●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●●

●

●

●

●●

●●

●

●

●

●

● ●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●●

●

●

●●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

simplicity

● ●●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●●

●

●

●

● ●

●●

●

●

●

●

● ●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●●

●

●

● ●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

0.5 1.0 1.5 2.0 2.5

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

0.0 0.5 1.0 1.5 2.0

0.0

0.5

1.0

1.5

2.0

●●

fatalism


1907 Romanian Peasant Rebellion

From Wikipedia:

The Romanian Peasants’ Revolt took place in March 1907 in Moldaviaand it quickly spread, reaching Wallachia.

Y = Intensity of the rebellion, by county

X1 = Commercialization of agriculture

X2 = Traditionalism

X3 = Strength of middle peasantry

X4 = Inequality of land tenure


Scatterplot matrix

●

●●intensity

10 20 30 40

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●● ●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●●

●

●

●

●

5 10 15

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●● ●●

●

●

●

●

●

● ●

●

●

●

● −1

12

34

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●● ●●

●

●

●

●

●

●●

●

●

●

●

1020

3040

●

●

●

●●

●

●

● ●

●

●●

●●

●●

●

●

●

●

●

●

●

●●●

●

●

●

●

commerce

●

●

●

●●

●

●

● ●

●

●●

●●

●●

●

●

●

●

●

●

●

●●●

●

●

●

●●

●

●

●●

●

●

●●

●

●●

●●

●●

●

●

●

●

●

●

●

●● ●

●

●

●

●●

●

●

●●

●

●

● ●

●

●●

●●

●●

●

●

●

●

●

●

●

●●●

●

●

●

●

● ●

●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●● ●

●

●

●

●● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●tradition

●●

●

●

●

●● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

● ●

8085

90

● ●

●

●

●

●● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

510

15

●

●

●

●

●

●●●

●

●

●

●

●●

●

●

●

●

●●

●

●

● ●●

●

●

● ●

●

●

●

●

●

●

●●●

●

●

●

●

●●

●

●

●

●

●●

●

●

● ●●

●

●

● ●

●

●

●

●

●

●

● ●●

●

●

●

●

●●

●

●

●

●

●●

●

●

● ●●

●

●

● ●

●

●

midpeasant

●

●

●

●

●

●●●

●

●

●

●

●●

●

●

●

●

●●

●

●

●●●

●

●

●●

●

−1 1 2 3 4

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●●

●

80 85 90

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●●

●

0.45 0.60 0.75

0.45

0.60

0.75

●

inequality


Peasant Rebellion results

With all the predictors in the model:

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -12.32796 5.74640 -2.145 0.0418 *

commerce 0.10055 0.02144 4.690 8.33e-05 ***

tradition 0.10578 0.06161 1.717 0.0984 .

midpeasant 0.09333 0.07466 1.250 0.2229

inequality 0.42198 3.11171 0.136 0.8932


Peasant Rebellion results

Without commerce, tradition becomes significant at α = 0.05:

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -20.03497 7.40287 -2.706 0.0119 *

tradition 0.19705 0.07859 2.507 0.0187 *

midpeasant 0.03480 0.09897 0.352 0.7279

inequality 5.12172 3.96053 1.293 0.2073


Caveats

1 Be careful interpreting the coefficients! Multiple regression is usuallyapplied to observational data

2 Do not think of the sign of the coefficient as special – it can actuallychange as other covariates are added or removed from the model

3 Similarly, tests about any covariate are only meaningful in thecontext of the other covariates in the model

4 Always make sure a linear model is appropriate for all predictors!

5 Always check residuals for heteroscedasticity and normality


Caveats

In particular, a special case that you should be careful about is whenthe predictors are highly correlated

In this situation the coefficient estimates may change erratically inresponse to small changes in the model or the data

This phenomenon is called multicollinearity

Because of that, matrix correlation of the predictors is alsosomething to look at (and report) in the analysis


Summary

1 Multiple linear regression fits the best hyperplane to the data

2 We can test hypotheses about any of the βj ’s

3 Be careful about interpretation

4 Correlation of the predictors also important because ofmulticollinearity


Lecture 19 Multiple (Linear) Regression - Statistical Science · Lecture 19 Multiple (Linear)...

Documents

Transcript of Lecture 19 Multiple (Linear) Regression - Statistical Science · Lecture 19 Multiple (Linear)...