bbb+++ xxxe Lbme2.aut.ac.ir/~towhidkhah/MI/Resources/... · pp xx s bb bb-= = =++ 2 1 11 XXX eeI L...

Post on 07-Sep-2020

9 views 0 download

Transcript of bbb+++ xxxe Lbme2.aut.ac.ir/~towhidkhah/MI/Resources/... · pp xx s bb bb-= = =++ 2 1 11 XXX eeI L...

1

Chapter 4

Multiple Linear Regression

,

The model: y

: The depentent variable

: The indepentent variable or carriers

: Residual or err

ore .

, p

p px x x e

y

x x

β β β= + + + +1 1 2 2

1

L

L

,

,

A special case

y

y

:

o

,

x x x

r

x x

x e

x e

β α β β

β β

α β

β

== + +=

= =

= == +

1 2

1 2

1 1

1

2

2 t

Polynomial regression

y

function

f(x)

carriers

x , x , ,

Example:

x

tt

t

t

t

x x x e

response

x x x

the

x x

β β β β

β β β β

+

= + + + +

= + + + +

= = =

21 2

2

1 2

11

o

o

L

L

L

i

, ,...,

y

Data Model:

i i p ip i

i n

x x x eβ β β

=

= + + + +11 2 2

1 2

L

Vector model:

p p

p

n n np

n n

xx x

x x x

y e

y e

β β β= + + + +

= = =

= =

1 2 2

111 12

1 2

1 1

1

1 2 p

y x x x e

x x x

y e

L

M M L M

M M

3

n p

Matrix

Model

:

p

p

n

n np

p n

x xy

x x

yx x

e

e

β

β

β

β

×

= +

=

= =

11 1

1

21 2

1

1 1

X

X

X

1 p

y e

= x x

y =

e

L

LLM M O ML

M M

i

Least

squares estima

We minimize

Q( , , ) (y )

with respect to to obtain the

, denoted by .

The is

ˆ ˆ ˆˆ

,

tes

fitte

y

d

,

ˆ ˆ, ,

T

response

n

p

p

i i

p

p ipi

i i p ip

x x x

x x x

β β β β β

β β

β

β

β

β β

== − − − −

= + + +

∑ 2

1 1 1 2

1

1

21

1 1 2 2

L LL

L

L

i

he are

ˆ ˆ e

fitted residual

f , ,

s

ori iy y i n= − = 1L

4

1ˆ ˆˆ ˆ xy xα β β= + +1 2 2

x21 x

y

Observation space representation

When using least squares to estimate parameters, we obtain the plane such that the sum of square of the distances from each observation to the plane is minimized.

L, , 1 px x

ˆ−y y

y

The variable space representation .

Min Q( ) pβ β β1= − −2

1 py - x xL% % %

Normal equations:

ˆ , ,span ⊥ 1 p y - y x xL

5

ˆˆ

ˆ( )

ˆ( )

ˆ( )

ˆ( )

ˆ

β

β

β

β

β

β

=

== ⇒=

=

=

T

T

X

X 0

X 0

X 0

X X 0

X X 0

T1

T2

Tp

y

x y -

x y -

x

y -

y -

%

%

%

%

%M

%

If is nonsingular (i.e. is estimable),

then

ˆ ( ) β

β

−= T

T

T 1X X

X X

X y%

ˆ: is a l inear funct ion of ˆ i

N o

s a l inear es t imator .

te β

β

y

i.e.

.

is nonsingular if and only if columns

of , , are linearly independent.

=

TX X

X 1 px xL

1 ty ; x ,

1

,x

t

sample multiple correThe between

a variable y a

latio

nd variables x , ,x is denoted

R

by

n

.

Multiple correlation and coefficient of determination.

L

L

1 t ˆy ; x , , x

1 t

By definition

,

ˆwhere y is the fited value obtained from

the

R

intercept

y

mod

x .

el

xt

yy

e

r

α β β

= + + + +

=

1

L

L

6

[ ]1 ty ; x , , x , where is the angle

between and

R

, ,

cosθ θ•

− −

=

1 1 t ty - y span x x x x

L

L

−y y

L, , − − 1 1 p px x x x

θ

y;xIf t , R .xythen r= =1

ˆ ˆy y

ˆye

e

It can be shown that (problem 15)

For the intercept model

ˆ ˆ ˆ e ( )

ˆ ˆˆ ˆ ( )( ) (why?) ( )i

i i

e e

y y e e

=

⊥ ⇒ = ⇒ = ∗

= − − = ∗ ∗

∑∑

0 0

0

2 2 2S S + S

Hints:

1

S

y

ˆ ˆy e

ˆˆ

ˆ ˆˆ ˆ ( ) ( )

Expand RHS and use ( ) and ( ) to

show that RHS .

i i i

i i i

y y e

y y y e y e

= +

= − = + − −

∗ ∗ ∗

=

∑ ∑2 22

2 2

S

S + S

7

y2

y

1 t

Is defined by

R

variance explained by

the regression of y on x, , x .

Coefficiant of determination

The propertion o f

=2

2

S

S

L

1 t

2 2ˆ y ; x , , x, R R (problem 15)y yr= =2

L

ˆ ˆy yy

ˆy y y

ˆ ˆyy y

ˆye

Want to show

Show that ( ) by replacing

ˆˆ y and using the facttha

t y e

=

=

= + =

2

0

2 2

2 2 2

2 2

S S

H i n t :

S S S

S S

S

2R for multiple regression has the

same problems as those discassed in simple linear regression,and possibly

more.

2

R is not defined for models without an intercept :

, R= > ⇒ = ∞0 02 2y yS S !

2

1 t

R is an index that measures the degree

of linear relationship between y and

x , , x , the relation is lin ar.eifL

1y

2yy

x

8

2

When a model without an intercept is fit, statistical programs generlly output a

number in the intervthere is no commonly accepted definition

for R formod ls withou

al [0,1]. Howe

t an in

ver,

t .e ercept

Beware!

A review of algebra of random vectors:

Let be an n p matrix of nonrandom

ˆ

ˆelements, and let be a vectorˆ

of random elements.p

θ

θ

θ

×

=

1

A

M%

The fo l lowing propert ies hold:

ˆ ˆ( ) ( ) ( ).

ˆ ˆ( ) Cov ( ) ( ) T

E E

Cov

θ θ

θ θ

=

=

A A

A A A

1

2

% %

% %

( )

Then

ˆ ˆE( ) and Cov( )

ˆ ˆ ˆWhere and ( , ).

p

p p p pp

i i kl k lE Cov

θ σ σ σθ θ

θ σ σ σ

θ θ σ θ θ

= =

= =

1 11 12 1

1 2

…M M M M% % L

ˆ ˆ( ) If then .

( ) If is a constant vector ,thenˆ ˆ E( + ) +E( ) +

ˆ ˆ Cov( ) ( )

N N

4

Cov

= =

=

A3 ? ?

Z

Z ? Z ? Z ?

Z + ? ?

∼ ∼

9

( )

ˆIf E( ) then E( ) provided that is

estimable.

Proof:ˆ E( ) ( ) ( ) E( )

(

Unbiased errors:

)

E − −

= =

= =

= =

T 1 T T 1 T

T 1 T

X X X X X X

X X X X

e 0 ß ß ß

ß y y

ß ß

i

Recall the Data model y

Statistical properties of LS estima

1 n

, , : fixed (nonrandom)

:random

, :unkown and

tes:

i p ip i

i ip

i

p

x x e

x x

e

β β

β β

= + + +1 1

1

1

i = , ,

LL

L

L fixed.

T

T

-1

This assumption is equivallent to having

1 0 0

0 1 0Cov( )

0 1

Uncorrelated, constant variance errors:

ˆC

If is estimable,then

Proof: Let (

ov( ) ( )σ

σ σ

= =

=

=

2

2 2

X

A X

X

e

ß

I

ß

LL

M O ML

[ ]

T -1 T T -1

T -

-1

1

T

ˆCov( ) Cov( ) Cov( )

Cov( )

( ) ( )

ˆ) .then

This, in particular, leads to the formulas

of chapter 2 by using

( )

ProbeX ( l

T

T Tσ

σ

σ

=

=

= =

= =

=

=

2

2

2

A A A

A A AA

X X X X X X

X X A

X X

ß y y

e

ß y

1,x ) m 16

10

ˆRss ( )

ˆ

E( ˆ )

n

i ii

y y

RssRMSn p

σ

σ σ

=

= −

= =−

∗ =

∑ 2

1

2

2 2

ß

T

If E( )=0, Cov( ) , then ˆ ( ) y is the MVUE for ,

amongst all linear unbiased estimators

of .

Additionallyˆ ˆ ˆf(x) is MVUE for f(

Guss-Markov Theorem

x).

T

p px x

σ

β β

β β

=

=

= + +

2

1

1 1

X X X

e e I

L

STACK-LOSS DATA, Brownlee(1965):

Data describe operation of a plant forthe oxidation of ammonia to nitric ac

An example of using SAS fo

id.

Four variables are obse

r multiplereg

rved over a p

ressio

eri

n:

od of 21 days.

The variation in the variablestackloss ( ) is to be

explained by the independentvariables:

: airflow :water temperature

:acid co

LOSS

AIRT

ncentrat

EM

io

P

ACID n.

11

12

With the assumption of normality we have the following results:

( )ß

2

2

If e N(0 , ) , then

ˆ N 0 ,

T

n

I

X X

R S S

σ

σ

σ χ

1

2 21

∼∼

2i

Confidence and prediction intervals

Here we assume e N(0,s )Let Var

I∼

iWe make the asssumption N(0, ).

Let be the estimate of

ˆVar , and let

ˆˆVar ˆ ( )

ˆ ˆ ˆˆstd( ) Var

Confidence a

.

Then u

nd predicti

sing the ch

on inte

aracter

rva

ization of t fromChapter 2

ls

Trrσ

σ

−=

=

2

2 1X X

r

r

r r

ß

ß

e I

ß ß

( )

, we have

ˆ

~ˆ ˆstd

( ) n pt −

−r r

r

ß ß

ß

13

( )

,

T T

2 T

, let ( , ) be

a vector of constant values.Then

ˆ ˆ Var Var( )

More generall

( )

y

Tp

T

c c

σ −

=

=

=

1

1X X

c

c ß c ß c

c c

L

( )

2 T

ˆ ˆ ˆIf ˆ , then

ˆ ˆ ( ˆ )whe re

ˆV a r ( ˆ ) ˆ ( )

ˆ ˆa n d s t d ( ˆ ) V a r ( ˆ )

Tp p

n p

T T

c c

tstd

γ β β

γ γ

γ

γ γ σ

γ γ

= + + =

= =

=

1 1

1X X

c ß

c ß, c c

L

1 1 1 2

2 1 22

1 1 12 2 2

ˆ ˆL e t ˆ 2 f o r t h e m o d e l

ˆ ˆ y

( 1 - 2 )

1( ˆ ) ( 1 - 2 )

2

4 4

E x a m p l e :

x x e

V a r

γ β β

β β

β

β

σ σγ

σ σ

σ σ σ

= −

= + +

=

= − = − +

1 2

1 1 2 2

1

2

Tc ß =

11 12 2

21 22

2

2

Where

( )

ˆTo obta in Var( ˆ ) we replace

ˆ and accordingly replace

by ˆ in the above formula.

T

ij

i j

by

σ σσ

σ σ

γ σ

σ σ

σ

− =

1X X

14

( )

T1 p

T1 p

(n-p)

2 T

( )

Let (x, ,x ) then

ˆ ˆ ˆ ˆ ˆ f( ) , where ( , , )

f( )-f( ) t ˆˆstd f( )

In g

ˆVar f( ) ˆ ( )

ener

a

l :

p p

T

T

f x x xβ β

β β

σ −

= + +

=

=

1 1

1X X

c =

x c ß ß =

x x

x

x c c

LL

L

T

1 p

ˆLet and ˆ , where

=(c , ,c ) is a constant vector.

γ γ= =T Tc ß c ß

c L

( )

2

isgiven by

ˆ

A 100(1- )% confidenc

ˆ

e int

t s td( ˆ

erval for

) n p

δγ γ

δ γ

−±

( )/

( )

o

To test hypothesis at 100( )% level,

compute the test statistic

and compare it with t for two sided

tests, and with t for one-sided te

ˆt ˆ ( ˆ )

sts

n p

n p

std

δ

δ

γ γ

δ

γ−

−=

2

.

a

o

(n-2)/2

2

1 2

o

To test the hypothesis

we use

and compare t the observed value

H

to t

: 2 5

H : 2 5

.

ˆ ˆ2 5t ˆ ( ˆ )std

β β

β β

β βγ

− = − ≠

− −=

1

1

1 2

o

15

2

2

:( )Compute a 95% confidence interval

for the coefficient of AIR( ) in the

Stack-loss data.( ) Test 1 at 5% level.

Example

β

β ≥

a

b

( 1 7 )

2 .025 2

)ˆˆ ˆ t s t d ( )

0 . 7 1 6 2 . 1 1 ( 0 . 1 3 5 )

( 0 . 4 3 1 2 , 1 . 0 0 0 9 )

β β±= ±=

a

2 2 2

2 2

17

.05

2

)ˆH : 1

t ˆ ˆH : <1 std( )

0.716-1 2.10

0.135-2.10<-t

Reject H.

There is sufficient evidence at 5% level to support .

β β ββ β

β

≥ −=

= = −

<

1

1

b

o

o

2

2

2 1 2 3 4

T

(a)(

For the stack-loss data write a confidence interval for 3 and test

3 .5

3

Examp

( , , , )

note

b)

(0,3,1

l

0

e:

, )

β ββ β

γ β β β β β β

++ ≤

= +

3

3

3

3

ß =

c =

144424443

11 12 13 14

21 22 23 24

31 32 33 34

41 42 43 44

ˆ( )

22 23

32 33

0 3

( ˆ ) (0,3,1,0) 1

0

ˆ ˆ 3ˆ ( ˆ ) (3 1)

ˆ ˆ 1

.0182 (3 1)

Cov

Cov

Cov

σ σ σ σ

σ σ σ σγ

σ σ σ σ

σ σ σ σ

σ σγ

σ σ

=

⇒ =

=

ß

-0.0365 30.0365 0.135 1

9(.0182)-6(.0365)+.135 0.0798

==

16

ß ß ß ß( )2 . 2

ˆˆ ˆ ˆ ˆ3 (3 )

= 3 . 7 2 . .T h e 9 5 % C I i s ( - . 3 8 5 , . 8 0 7 1

)

)

a t std+ ± +

±

173 0 2 5 3

2 1 1 0 7 9 8

2

a 2

o

( 1 7 ).05

o

: 3 .5

H : 3 .5

. . t .

. t . D o n o t r

)

e jec t H a t 5 % l e v e l .

H

b

β ββ β

+ ≤ + >

−= =

=

0 3

3

33

3 7 2 3 578

0 7 9 81 7 4

T

The method of previous example can

be applied.But there is

Compute a 95% confidence interval for

f( ), where =(82 27 89)

( (1 82 27 89

an eas

) a

ier wa

nd com

E

p

xam

uta

y.

Not tionis cumbersome.)

e

le

:

p :

T=

x x

c

i i

Use

ˆ ˆThe option produces e ,y

Model Loss AI and their

s

R TEMP ACI

tandard er

RR

r

D/

.

;

ors

=

17

T

st

(n-p)0.25

In this example =(82 27 89) is the

1 observation.

A 95% for f( ) is:ˆ ˆˆ f( ) t std f( )

38.765 2.11(1.781)

(35.007, 42.523)ˆˆ std f( ) can also be used for test

of hypot

±

±

x

x

x x

xheses.

18

T

Suppose that we want a 95%

confidence interval for f( ), where

=(60 20 90) .This is not part of the data.

So we add a data point

. 60 20 90to the end of the data, where "."

x

x

(n-p)

0.25

is

placed in the location of the dependent variable (see results next

page).

ˆ ˆˆ f( ) t std f( )

38.765 2.11(1.781) (35.007, 42.523)

±

±

x x

19

2 3 4

1 2 3 4

3 2

perform the following tests of hypotheses for the Stack-Loss data:

) 2 3 4 7

) -40

) 0

For each of the problems value of is

) (0,2,3,-4)

) (1,1,1,1)

) (0,1,1,0)

To te

β β β

β β β β

β β

+ − =

+ + + =

+ =

=

=

=

T

T

T

a

b

c

c

a c

b c

c c

) Test 2*AIR+3*TEMP-4*ACID 7/PRINT;

) Test INTERCEPT+AIR+TEMP+ACID 407/PRINT;

st in SAS use

) Test AIR+TEMP/PRINT;

=

= −

a

b

c

o

-1

In ( )-( ) we are testing

:

:

where b is a constant scaler.ˆ-bUse t ˆ ˆstd( )

ˆ ˆˆwhere Var( ) Var( )

ˆ ( )

H b

H b

σ

=

=

=

1

2 TX X

T

T

T

T

T T

T

a c

c ß

c ß

c ßc ß

c ß c ß c

= c c

The numerator and square of the denominator are given in the SAS output (see next page).

20

17.025

-1

) ˆ 10.5 for the stack loss data;

t 2.11ˆ( ) 0.129 -1.07

-1.07 t 0.919(10.5)(.129)

Do not reject H.

1.94) t 0.16(10.5)(10.1)

Do not reject H.

2.01) t

(0.008)(10

b

σ =

=

= − =

= = −

= =

=

2

TX XT T

a

c c c ß

b

c

o

o

6.9.5)

Reject H.

=

o

21

1442443x

A trick to do ( ) without using the

command .

Fit the model y ( )AIR

(TEMP-AIR) ACID

ˆ ˆThis gives s.e.for

TEST

.(SAME model)

e

β β β

β β

β β

1= + + +

+ +

+

2 3

3 4

2 3

c

: In general the TEST statment canˆ ˆˆ be used to obtain , Var( ).These

for example can be used to obtain ˆconfidence intervals for

NoteT T

T

c ß c ß

c ß.