Post on 07-Sep-2020
1
Chapter 4
Multiple Linear Regression
,
The model: y
: The depentent variable
: The indepentent variable or carriers
: Residual or err
ore .
, p
p px x x e
y
x x
β β β= + + + +1 1 2 2
1
L
L
,
,
A special case
y
y
:
o
,
x x x
r
x x
x e
x e
β α β β
β β
α β
β
== + +=
= =
= == +
1 2
1 2
1 1
1
2
2 t
Polynomial regression
y
function
f(x)
carriers
x , x , ,
Example:
x
tt
t
t
t
x x x e
response
x x x
the
x x
β β β β
β β β β
+
= + + + +
= + + + +
= = =
21 2
2
1 2
11
o
o
L
L
L
i
, ,...,
y
Data Model:
i i p ip i
i n
x x x eβ β β
=
= + + + +11 2 2
1 2
L
Vector model:
p p
p
n n np
n n
xx x
x x x
y e
y e
β β β= + + + +
= = =
= =
1 2 2
111 12
1 2
1 1
1
1 2 p
y x x x e
x x x
y e
L
M M L M
M M
3
n p
Matrix
Model
:
p
p
n
n np
p n
x xy
x x
yx x
e
e
β
β
β
β
×
= +
=
= =
11 1
1
21 2
1
1 1
X
X
X
1 p
y e
= x x
y =
e
L
LLM M O ML
M M
i
Least
squares estima
We minimize
Q( , , ) (y )
with respect to to obtain the
, denoted by .
The is
ˆ ˆ ˆˆ
,
tes
fitte
y
d
,
ˆ ˆ, ,
T
response
n
p
p
i i
p
p ipi
i i p ip
x x x
x x x
β β β β β
β β
β
β
β
β β
== − − − −
= + + +
∑ 2
1 1 1 2
1
1
21
1 1 2 2
L LL
L
L
i
he are
ˆ ˆ e
fitted residual
f , ,
s
ori iy y i n= − = 1L
4
1ˆ ˆˆ ˆ xy xα β β= + +1 2 2
x21 x
y
Observation space representation
When using least squares to estimate parameters, we obtain the plane such that the sum of square of the distances from each observation to the plane is minimized.
L, , 1 px x
ˆ−y y
y
The variable space representation .
Min Q( ) pβ β β1= − −2
1 py - x xL% % %
Normal equations:
ˆ , ,span ⊥ 1 p y - y x xL
5
ˆˆ
ˆ( )
ˆ( )
ˆ( )
ˆ( )
ˆ
β
β
β
β
β
β
=
== ⇒=
⇒
=
=
T
T
X
X 0
X 0
X 0
X X 0
X X 0
T1
T2
Tp
y
x y -
x y -
x
y -
y -
%
%
%
%
%M
%
If is nonsingular (i.e. is estimable),
then
ˆ ( ) β
β
−= T
T
T 1X X
X X
X y%
ˆ: is a l inear funct ion of ˆ i
N o
s a l inear es t imator .
te β
β
y
i.e.
.
is nonsingular if and only if columns
of , , are linearly independent.
•
=
TX X
X 1 px xL
1 ty ; x ,
1
,x
t
sample multiple correThe between
a variable y a
latio
nd variables x , ,x is denoted
R
by
n
.
Multiple correlation and coefficient of determination.
L
L
1 t ˆy ; x , , x
1 t
By definition
,
ˆwhere y is the fited value obtained from
the
R
intercept
y
mod
x .
el
xt
yy
e
r
α β β
•
= + + + +
=
1
L
L
6
[ ]1 ty ; x , , x , where is the angle
between and
R
, ,
cosθ θ•
− −
=
1 1 t ty - y span x x x x
L
L
−y y
L, , − − 1 1 p px x x x
θ
y;xIf t , R .xythen r= =1
ˆ ˆy y
ˆye
e
It can be shown that (problem 15)
For the intercept model
ˆ ˆ ˆ e ( )
ˆ ˆˆ ˆ ( )( ) (why?) ( )i
i i
e e
y y e e
=
⊥ ⇒ = ⇒ = ∗
= − − = ∗ ∗
∑∑
0 0
0
2 2 2S S + S
Hints:
1
S
y
ˆ ˆy e
ˆˆ
ˆ ˆˆ ˆ ( ) ( )
Expand RHS and use ( ) and ( ) to
show that RHS .
i i i
i i i
y y e
y y y e y e
= +
= − = + − −
∗ ∗ ∗
=
∑ ∑2 22
2 2
S
S + S
7
y2
y
1 t
Is defined by
R
variance explained by
the regression of y on x, , x .
Coefficiant of determination
The propertion o f
=2
2
S
S
L
1 t
2 2ˆ y ; x , , x, R R (problem 15)y yr= =2
L
ˆ ˆy yy
ˆy y y
ˆ ˆyy y
ˆye
Want to show
Show that ( ) by replacing
ˆˆ y and using the facttha
t y e
=
=
= + =
2
0
2 2
2 2 2
2 2
S S
H i n t :
S S S
S S
S
2R for multiple regression has the
same problems as those discassed in simple linear regression,and possibly
more.
2
2ˆ
R is not defined for models without an intercept :
, R= > ⇒ = ∞0 02 2y yS S !
2
1 t
R is an index that measures the degree
of linear relationship between y and
x , , x , the relation is lin ar.eifL
1y
2yy
x
8
2
When a model without an intercept is fit, statistical programs generlly output a
number in the intervthere is no commonly accepted definition
for R formod ls withou
al [0,1]. Howe
t an in
ver,
t .e ercept
Beware!
A review of algebra of random vectors:
Let be an n p matrix of nonrandom
ˆ
ˆelements, and let be a vectorˆ
of random elements.p
θ
θ
θ
×
=
1
A
M%
The fo l lowing propert ies hold:
ˆ ˆ( ) ( ) ( ).
ˆ ˆ( ) Cov ( ) ( ) T
E E
Cov
θ θ
θ θ
=
=
A A
A A A
1
2
% %
% %
( )
Then
ˆ ˆE( ) and Cov( )
ˆ ˆ ˆWhere and ( , ).
p
p p p pp
i i kl k lE Cov
θ σ σ σθ θ
θ σ σ σ
θ θ σ θ θ
= =
= =
1 11 12 1
1 2
…M M M M% % L
ˆ ˆ( ) If then .
( ) If is a constant vector ,thenˆ ˆ E( + ) +E( ) +
ˆ ˆ Cov( ) ( )
N N
4
Cov
= =
=
A3 ? ?
Z
Z ? Z ? Z ?
Z + ? ?
∼ ∼
9
( )
ˆIf E( ) then E( ) provided that is
estimable.
Proof:ˆ E( ) ( ) ( ) E( )
(
Unbiased errors:
)
E − −
−
= =
= =
= =
T 1 T T 1 T
T 1 T
X X X X X X
X X X X
e 0 ß ß ß
ß y y
ß ß
i
Recall the Data model y
Statistical properties of LS estima
1 n
, , : fixed (nonrandom)
:random
, :unkown and
tes:
i p ip i
i ip
i
p
x x e
x x
e
β β
β β
= + + +1 1
1
1
i = , ,
LL
L
L fixed.
T
T
-1
This assumption is equivallent to having
1 0 0
0 1 0Cov( )
0 1
Uncorrelated, constant variance errors:
ˆC
If is estimable,then
Proof: Let (
ov( ) ( )σ
σ σ
= =
=
=
2
2 2
X
A X
X
e
ß
I
ß
LL
M O ML
[ ]
T -1 T T -1
T -
-1
1
T
ˆCov( ) Cov( ) Cov( )
Cov( )
( ) ( )
ˆ) .then
This, in particular, leads to the formulas
of chapter 2 by using
( )
ProbeX ( l
T
T Tσ
σ
σ
=
=
= =
= =
=
=
2
2
2
A A A
A A AA
X X X X X X
X X A
X X
ß y y
e
ß y
1,x ) m 16
10
ˆRss ( )
ˆ
E( ˆ )
n
i ii
y y
RssRMSn p
σ
σ σ
=
= −
= =−
∗ =
∑ 2
1
2
2 2
ß
T
If E( )=0, Cov( ) , then ˆ ( ) y is the MVUE for ,
amongst all linear unbiased estimators
of .
Additionallyˆ ˆ ˆf(x) is MVUE for f(
Guss-Markov Theorem
x).
T
p px x
σ
β β
β β
−
=
=
= + +
2
1
1 1
X X X
e e I
L
STACK-LOSS DATA, Brownlee(1965):
Data describe operation of a plant forthe oxidation of ammonia to nitric ac
An example of using SAS fo
id.
Four variables are obse
r multiplereg
rved over a p
ressio
eri
n:
od of 21 days.
The variation in the variablestackloss ( ) is to be
explained by the independentvariables:
: airflow :water temperature
:acid co
LOSS
AIRT
ncentrat
EM
io
P
ACID n.
11
12
With the assumption of normality we have the following results:
( )ß
2
2
If e N(0 , ) , then
ˆ N 0 ,
T
n
I
X X
R S S
σ
σ
σ χ
−
−
1
2 21
∼∼
∼
2i
Confidence and prediction intervals
Here we assume e N(0,s )Let Var
I∼
iWe make the asssumption N(0, ).
Let be the estimate of
ˆVar , and let
ˆˆVar ˆ ( )
ˆ ˆ ˆˆstd( ) Var
Confidence a
.
Then u
nd predicti
sing the ch
on inte
aracter
rva
ization of t fromChapter 2
ls
Trrσ
σ
−=
=
2
2 1X X
r
r
r r
ß
ß
e I
ß ß
∼
( )
, we have
ˆ
~ˆ ˆstd
( ) n pt −
−r r
r
ß ß
ß
13
( )
,
T T
2 T
, let ( , ) be
a vector of constant values.Then
ˆ ˆ Var Var( )
More generall
( )
y
Tp
T
c c
σ −
=
=
=
1
1X X
c
c ß c ß c
c c
L
( )
2 T
ˆ ˆ ˆIf ˆ , then
ˆ ˆ ( ˆ )whe re
ˆV a r ( ˆ ) ˆ ( )
ˆ ˆa n d s t d ( ˆ ) V a r ( ˆ )
Tp p
n p
T T
c c
tstd
γ β β
γ γ
γ
γ γ σ
γ γ
−
−
= + + =
−
= =
=
1 1
1X X
c ß
c ß, c c
L
∼
1 1 1 2
2 1 22
1 1 12 2 2
ˆ ˆL e t ˆ 2 f o r t h e m o d e l
ˆ ˆ y
( 1 - 2 )
1( ˆ ) ( 1 - 2 )
2
4 4
E x a m p l e :
x x e
V a r
γ β β
β β
β
β
σ σγ
σ σ
σ σ σ
= −
= + +
=
= − = − +
1 2
1 1 2 2
1
2
Tc ß =
11 12 2
21 22
2
2
Where
( )
ˆTo obta in Var( ˆ ) we replace
ˆ and accordingly replace
by ˆ in the above formula.
T
ij
i j
by
σ σσ
σ σ
γ σ
σ σ
σ
− =
1X X
14
( )
T1 p
T1 p
(n-p)
2 T
( )
Let (x, ,x ) then
ˆ ˆ ˆ ˆ ˆ f( ) , where ( , , )
f( )-f( ) t ˆˆstd f( )
In g
ˆVar f( ) ˆ ( )
ener
a
l :
p p
T
T
f x x xβ β
β β
σ −
= + +
=
=
1 1
1X X
c =
x c ß ß =
x x
x
x c c
LL
L
∼
T
1 p
ˆLet and ˆ , where
=(c , ,c ) is a constant vector.
γ γ= =T Tc ß c ß
c L
( )
2
isgiven by
ˆ
A 100(1- )% confidenc
ˆ
e int
t s td( ˆ
erval for
) n p
δγ γ
δ γ
−±
( )/
( )
o
To test hypothesis at 100( )% level,
compute the test statistic
and compare it with t for two sided
tests, and with t for one-sided te
ˆt ˆ ( ˆ )
sts
n p
n p
std
δ
δ
γ γ
δ
γ−
−
−=
2
.
a
o
(n-2)/2
2
1 2
o
To test the hypothesis
we use
and compare t the observed value
H
to t
: 2 5
H : 2 5
.
ˆ ˆ2 5t ˆ ( ˆ )std
β β
β β
β βγ
− = − ≠
− −=
1
1
1 2
o
15
2
2
:( )Compute a 95% confidence interval
for the coefficient of AIR( ) in the
Stack-loss data.( ) Test 1 at 5% level.
Example
β
β ≥
a
b
( 1 7 )
2 .025 2
)ˆˆ ˆ t s t d ( )
0 . 7 1 6 2 . 1 1 ( 0 . 1 3 5 )
( 0 . 4 3 1 2 , 1 . 0 0 0 9 )
β β±= ±=
a
2 2 2
2 2
17
.05
2
)ˆH : 1
t ˆ ˆH : <1 std( )
0.716-1 2.10
0.135-2.10<-t
Reject H.
There is sufficient evidence at 5% level to support .
β β ββ β
β
≥ −=
= = −
<
1
1
b
o
o
2
2
2 1 2 3 4
T
(a)(
For the stack-loss data write a confidence interval for 3 and test
3 .5
3
Examp
( , , , )
note
b)
(0,3,1
l
0
e:
, )
β ββ β
γ β β β β β β
++ ≤
= +
3
3
3
3
ß =
c =
144424443
11 12 13 14
21 22 23 24
31 32 33 34
41 42 43 44
ˆ( )
22 23
32 33
0 3
( ˆ ) (0,3,1,0) 1
0
ˆ ˆ 3ˆ ( ˆ ) (3 1)
ˆ ˆ 1
.0182 (3 1)
Cov
Cov
Cov
σ σ σ σ
σ σ σ σγ
σ σ σ σ
σ σ σ σ
σ σγ
σ σ
=
⇒ =
=
ß
-0.0365 30.0365 0.135 1
9(.0182)-6(.0365)+.135 0.0798
−
==
16
ß ß ß ß( )2 . 2
ˆˆ ˆ ˆ ˆ3 (3 )
= 3 . 7 2 . .T h e 9 5 % C I i s ( - . 3 8 5 , . 8 0 7 1
)
)
a t std+ ± +
±
173 0 2 5 3
2 1 1 0 7 9 8
2
a 2
o
( 1 7 ).05
o
: 3 .5
H : 3 .5
. . t .
. t . D o n o t r
)
e jec t H a t 5 % l e v e l .
H
b
β ββ β
+ ≤ + >
−= =
=
0 3
3
33
3 7 2 3 578
0 7 9 81 7 4
T
The method of previous example can
be applied.But there is
Compute a 95% confidence interval for
f( ), where =(82 27 89)
( (1 82 27 89
an eas
) a
ier wa
nd com
E
p
xam
uta
y.
Not tionis cumbersome.)
e
le
:
p :
T=
x x
c
i i
Use
ˆ ˆThe option produces e ,y
Model Loss AI and their
s
R TEMP ACI
tandard er
RR
r
D/
.
;
ors
=
17
T
st
(n-p)0.25
In this example =(82 27 89) is the
1 observation.
A 95% for f( ) is:ˆ ˆˆ f( ) t std f( )
38.765 2.11(1.781)
(35.007, 42.523)ˆˆ std f( ) can also be used for test
of hypot
±
±
x
x
x x
xheses.
18
T
Suppose that we want a 95%
confidence interval for f( ), where
=(60 20 90) .This is not part of the data.
So we add a data point
. 60 20 90to the end of the data, where "."
x
x
(n-p)
0.25
is
placed in the location of the dependent variable (see results next
page).
ˆ ˆˆ f( ) t std f( )
38.765 2.11(1.781) (35.007, 42.523)
±
±
x x
19
2 3 4
1 2 3 4
3 2
perform the following tests of hypotheses for the Stack-Loss data:
) 2 3 4 7
) -40
) 0
For each of the problems value of is
) (0,2,3,-4)
) (1,1,1,1)
) (0,1,1,0)
To te
β β β
β β β β
β β
+ − =
+ + + =
+ =
=
=
=
T
T
T
a
b
c
c
a c
b c
c c
) Test 2*AIR+3*TEMP-4*ACID 7/PRINT;
) Test INTERCEPT+AIR+TEMP+ACID 407/PRINT;
st in SAS use
) Test AIR+TEMP/PRINT;
=
= −
a
b
c
o
-1
In ( )-( ) we are testing
:
:
where b is a constant scaler.ˆ-bUse t ˆ ˆstd( )
ˆ ˆˆwhere Var( ) Var( )
ˆ ( )
H b
H b
σ
=
≠
=
=
1
2 TX X
T
T
T
T
T T
T
a c
c ß
c ß
c ßc ß
c ß c ß c
= c c
The numerator and square of the denominator are given in the SAS output (see next page).
20
17.025
-1
) ˆ 10.5 for the stack loss data;
t 2.11ˆ( ) 0.129 -1.07
-1.07 t 0.919(10.5)(.129)
Do not reject H.
1.94) t 0.16(10.5)(10.1)
Do not reject H.
2.01) t
(0.008)(10
b
σ =
=
= − =
= = −
= =
=
2
TX XT T
a
c c c ß
b
c
o
o
6.9.5)
Reject H.
=
o
21
1442443x
A trick to do ( ) without using the
command .
Fit the model y ( )AIR
(TEMP-AIR) ACID
ˆ ˆThis gives s.e.for
TEST
.(SAME model)
e
β β β
β β
β β
1= + + +
+ +
+
2 3
3 4
2 3
c
: In general the TEST statment canˆ ˆˆ be used to obtain , Var( ).These
for example can be used to obtain ˆconfidence intervals for
NoteT T
T
c ß c ß
c ß.