• date post

31-Mar-2018
• Category

## Documents

• view

227

2

Embed Size (px)

### Transcript of Regression Estimation – Least Squares and Maximum...

• Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 1

Regression Estimation Least

Squares and Maximum Likelihood

Dr. Frank Wood

• Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 2

Least Squares Max(min)imization

Function to minimize w.r.t. ,

Minimize this by maximizing Q

Find partials and set both equal to zero

Q =n

i=1(Yi (0 + 1Xi))2

go to board

• Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 3

Normal Equations

The result of this maximization step are called the normal equations. b0 and b1 are called point estimators of and respectively

This is a system of two equations and two unknowns. The solution is given by

Yi = nb0 + b1

Xi

XiYi = b0

Xi + b1

X2i

Write these on board

• Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 4

Solution to Normal Equations

b1 =

(Xi X)(Yi Y )

(Xi X)2

b0 = Y b1X

X =

Xi

n

Y =

Yi

n

After a lot of algebra one arrives at

• Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 5

Least Squares Fit

1 2 3 4 5 6 7 8 9 10 1110

15

20

25

30

35

40

Predictor/Input

Response/O

utp

ut

Estimate, y = 2.09x + 8.36, mse: 4.15

True, y = 2x + 9, mse: 4.22

?

• Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 6

Guess #1

1 2 3 4 5 6 7 8 9 10 1110

15

20

25

30

35

40

Predictor/Input

Response/O

utp

ut

Guess, y = 0x + 21.2, mse: 37.1

True, y = 2x + 9, mse: 4.22

• Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 7

Guess #2

1 2 3 4 5 6 7 8 9 10 1110

15

20

25

30

35

40

Predictor/Input

Response/O

utp

ut

Guess, y = 1.5x + 13, mse: 7.84

True, y = 2x + 9, mse: 4.22

• Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 8

Solution to this equation is solution to least squares linear regression (and maximum likelihood under normal error distribution assumption)

Y1Y2...Yn

=

X1 1X2 1...Xn 1

[10

]

• Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 9

Is the relationship really linear?

What is the distribution of the of errors?

Is the fit good?

How much of the variability of the response is accounted for by including the predictor variable?

Is the chosen predictor variable the best one?

• Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 10

Is This Better?

1 2 3 4 5 6 7 8 9 10 1110

15

20

25

30

35

40

Predictor/Input

Response/O

utp

ut

7 Order, mse: 3.18

• Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 11

Goals for First Half of Course

How to do linear regression

Self familiarization with software tools

How to interpret standard linear regression results

How to derive tests

How to assess and address deficiencies in regression models

• Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 12

Properties of Solution

The ith residual is defined to be

The sum of the residuals is zero:

ei = Yi Yi

i

ei =(Yi b0 b1Xi)

=

Yi nb0 b1

Xi

= 0By first normal equation.

• Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 13

Properties of Solution

The sum of the observed values Yi equals the sum of the fitted values Yi

i

Yi =

i

Yi

=

i

(b1Xi + b0)

=

i

(b1Xi + Y b1X)

= b1

i

Xi + nY b1nX

= b1nX +

i

Yi b1nX

• Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 14

Properties of Solution

The sum of the weighted residuals is zero when the residual in the ith trial is weighted by the level of the predictor variable in the ith trial

i

Xiei =(Xi(Yi b0 b1Xi))

=

i

XiYi b0

Xi b1(X2i )

= 0By second normal equation.

• Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 15

Properties of Solution

The sum of the weighted residuals is zero when the residual in the ith trial is weighted by the fitted value of the response variable for the ith trial

i

Yiei =

i

(b0 + b1Xi)ei

= b0

i

ei + b1

i

eiXi

= 0By previous properties.

• Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 16

Properties of Solution

The regression line always goes through the point

X, Y

• Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 17

Estimating Error Term Variance

Review estimation in non-regression setting.

Show estimation results for regression setting.

• Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 18

Estimation Review

An estimator is a rule that tells how to calculate the value of an estimate based on the measurements contained in a sample

i.e. the sample mean

Y = 1nn

i=1 Yi

• Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 19

Point Estimators and Bias

Point estimator

Unknown quantity / parameter

Definition: Bias of estimator

= f({Y1, . . . , Yn})

B() = E()

• Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 20

One Sample Example

0 1 2 3 4 5 6 7 8 9 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

= 5, = 0.75

samples

est.

run bias_example_plot.m

• Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 21

Distribution of Estimator

If the estimator is a function of the samples and the distribution of the samples is known then the distribution of the estimator can (often) be determined

Methods

Distribution (CDF) functions

Transformations

Moment generating functions

Jacobians (change of variable)

• Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 22

Example

Samples from a Normal(,) distribution

Estimate the population mean

Yi Normal(, 2)

= , = Y = 1nn

i=1 Yi

• Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 23

Sampling Distribution of the Estimator

First moment

This is an example of an unbiased estimator

E() = E(1

n

n

i=1

Yi)

=1

n

n

i=1

E(Yi) =n

n=

B() = E() = 0

• Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 24

Variance of Estimator

Definition: Variance of estimator

Remember:

V () = E([ E()]2)

V (cY ) = c2V (Y )

V (n

i=1 Yi) =n

i=1 V (Yi)

Only if the Yi are independent with finite variance

• Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 25

Example Estimator Variance

For N(0,1) mean estimator

Note assumptions

V () = V (1

n

n

i=1

Yi)

=1

n2

n

i=1

V (Yi) =n2

n2=

2

n

• Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 26

Distribution of sample mean estimator

2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1000 samples

• Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 27

The mean squared error of an estimator

Can be re-expressed

MSE() = E([ ]2)

MSE() = V () + (B()2)

• Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 28

MSE = VAR + BIAS2

Proof

MSE() = E(( )2)

= E(([ E()] + [E() ])2)

= E([ E()]2) + 2E([E() ][ E()]) + E([E() ]2)

= V () + 2E([E()[ E()] [ E()])) + (B())2

= V () + 2(0 + 0) + (B())2

= V () + (B())2

• Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 29

Think of variance as confidence and bias as correctness.

Intuitions (largely) apply

Sometimes a biased estimator can produce lower MSE if it lowers the variance.

• Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 30

Estimating Error Term Variance

Regression model

Variance of each observation Yi is (the

same as for the error term i)

Each Yi comes from a different probability distribution with different means that depend on the level Xi

The deviation of an observation Yi must be calculated around its own estimated mean.

• Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 31

s2 estimator for

MSE is an unbiased estimator of

The sum of squares SSE has n-2 degrees of fre