Regression Estimation – Least Squares and Maximum...

44
Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 1 Regression Estimation – Least Squares and Maximum Likelihood Dr. Frank Wood

Transcript of Regression Estimation – Least Squares and Maximum...

Page 1: Regression Estimation – Least Squares and Maximum …fwood/Teaching/w4315/Fall2009/lecture_3.pdf · Regression Estimation – Least Squares and Maximum Likelihood ... Write these

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 1

Regression Estimation – Least

Squares and Maximum Likelihood

Dr. Frank Wood

Page 2: Regression Estimation – Least Squares and Maximum …fwood/Teaching/w4315/Fall2009/lecture_3.pdf · Regression Estimation – Least Squares and Maximum Likelihood ... Write these

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 2

Least Squares Max(min)imization

• Function to minimize w.r.t. β, β

• Minimize this by maximizing –Q

• Find partials and set both equal to zero

Q =∑n

i=1(Yi − (β0 + β1Xi))2

go to board

Page 3: Regression Estimation – Least Squares and Maximum …fwood/Teaching/w4315/Fall2009/lecture_3.pdf · Regression Estimation – Least Squares and Maximum Likelihood ... Write these

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 3

Normal Equations

• The result of this maximization step are called the normal equations. b0 and b1 are called point estimators of β and β respectively

• This is a system of two equations and two unknowns. The solution is given by…

∑Yi = nb0 + b1

∑Xi

∑XiYi = b0

∑Xi + b1

∑X2i

Write these on board

Page 4: Regression Estimation – Least Squares and Maximum …fwood/Teaching/w4315/Fall2009/lecture_3.pdf · Regression Estimation – Least Squares and Maximum Likelihood ... Write these

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 4

Solution to Normal Equations

b1 =

∑(Xi − X)(Yi − Y )∑(Xi − X)2

b0 = Y − b1X

X =

∑Xi

n

Y =

∑Yi

n

• After a lot of algebra one arrives at

Page 5: Regression Estimation – Least Squares and Maximum …fwood/Teaching/w4315/Fall2009/lecture_3.pdf · Regression Estimation – Least Squares and Maximum Likelihood ... Write these

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 5

Least Squares Fit

1 2 3 4 5 6 7 8 9 10 1110

15

20

25

30

35

40

Predictor/Input

Response/O

utp

ut

Estimate, y = 2.09x + 8.36, mse: 4.15

True, y = 2x + 9, mse: 4.22

?

Page 6: Regression Estimation – Least Squares and Maximum …fwood/Teaching/w4315/Fall2009/lecture_3.pdf · Regression Estimation – Least Squares and Maximum Likelihood ... Write these

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 6

Guess #1

1 2 3 4 5 6 7 8 9 10 1110

15

20

25

30

35

40

Predictor/Input

Response/O

utp

ut

Guess, y = 0x + 21.2, mse: 37.1

True, y = 2x + 9, mse: 4.22

Page 7: Regression Estimation – Least Squares and Maximum …fwood/Teaching/w4315/Fall2009/lecture_3.pdf · Regression Estimation – Least Squares and Maximum Likelihood ... Write these

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 7

Guess #2

1 2 3 4 5 6 7 8 9 10 1110

15

20

25

30

35

40

Predictor/Input

Response/O

utp

ut

Guess, y = 1.5x + 13, mse: 7.84

True, y = 2x + 9, mse: 4.22

Page 8: Regression Estimation – Least Squares and Maximum …fwood/Teaching/w4315/Fall2009/lecture_3.pdf · Regression Estimation – Least Squares and Maximum Likelihood ... Write these

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 8

Looking Ahead: Matrix Least Squares

• Solution to this equation is solution to least squares linear regression (and maximum likelihood under normal error distribution assumption)

Y1Y2...Yn

=

X1 1X2 1...Xn 1

[β1β0

]

Page 9: Regression Estimation – Least Squares and Maximum …fwood/Teaching/w4315/Fall2009/lecture_3.pdf · Regression Estimation – Least Squares and Maximum Likelihood ... Write these

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 9

Questions to Ask

• Is the relationship really linear?

• What is the distribution of the of “errors”?

• Is the fit good?

• How much of the variability of the response is accounted for by including the predictor variable?

• Is the chosen predictor variable the best one?

Page 10: Regression Estimation – Least Squares and Maximum …fwood/Teaching/w4315/Fall2009/lecture_3.pdf · Regression Estimation – Least Squares and Maximum Likelihood ... Write these

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 10

Is This Better?

1 2 3 4 5 6 7 8 9 10 1110

15

20

25

30

35

40

Predictor/Input

Response/O

utp

ut

7 Order, mse: 3.18

Page 11: Regression Estimation – Least Squares and Maximum …fwood/Teaching/w4315/Fall2009/lecture_3.pdf · Regression Estimation – Least Squares and Maximum Likelihood ... Write these

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 11

Goals for First Half of Course

• How to do linear regression

– Self familiarization with software tools

• How to interpret standard linear regression results

• How to derive tests

• How to assess and address deficiencies in regression models

Page 12: Regression Estimation – Least Squares and Maximum …fwood/Teaching/w4315/Fall2009/lecture_3.pdf · Regression Estimation – Least Squares and Maximum Likelihood ... Write these

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 12

Properties of Solution

• The ith residual is defined to be

• The sum of the residuals is zero:

ei = Yi − Yi

i

ei =∑(Yi − b0 − b1Xi)

=∑

Yi − nb0 − b1∑

Xi

= 0By first normal equation.

Page 13: Regression Estimation – Least Squares and Maximum …fwood/Teaching/w4315/Fall2009/lecture_3.pdf · Regression Estimation – Least Squares and Maximum Likelihood ... Write these

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 13

Properties of Solution

• The sum of the observed values Yi equals the sum of the fitted values Yi

i

Yi =∑

i

Yi

=∑

i

(b1Xi + b0)

=∑

i

(b1Xi + Y − b1X)

= b1∑

i

Xi + nY − b1nX

= b1nX +∑

i

Yi − b1nX

Page 14: Regression Estimation – Least Squares and Maximum …fwood/Teaching/w4315/Fall2009/lecture_3.pdf · Regression Estimation – Least Squares and Maximum Likelihood ... Write these

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 14

Properties of Solution

• The sum of the weighted residuals is zero when the residual in the ith trial is weighted by the level of the predictor variable in the ith trial

i

Xiei =∑(Xi(Yi − b0 − b1Xi))

=∑

i

XiYi − b0∑

Xi − b1∑(X2

i )

= 0By second normal equation.

Page 15: Regression Estimation – Least Squares and Maximum …fwood/Teaching/w4315/Fall2009/lecture_3.pdf · Regression Estimation – Least Squares and Maximum Likelihood ... Write these

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 15

Properties of Solution

• The sum of the weighted residuals is zero when the residual in the ith trial is weighted by the fitted value of the response variable for the ith trial

i

Yiei =∑

i

(b0 + b1Xi)ei

= b0∑

i

ei + b1∑

i

eiXi

= 0By previous properties.

Page 16: Regression Estimation – Least Squares and Maximum …fwood/Teaching/w4315/Fall2009/lecture_3.pdf · Regression Estimation – Least Squares and Maximum Likelihood ... Write these

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 16

Properties of Solution

• The regression line always goes through the point

X, Y

Page 17: Regression Estimation – Least Squares and Maximum …fwood/Teaching/w4315/Fall2009/lecture_3.pdf · Regression Estimation – Least Squares and Maximum Likelihood ... Write these

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 17

Estimating Error Term Variance σ

• Review estimation in non-regression setting.

• Show estimation results for regression setting.

Page 18: Regression Estimation – Least Squares and Maximum …fwood/Teaching/w4315/Fall2009/lecture_3.pdf · Regression Estimation – Least Squares and Maximum Likelihood ... Write these

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 18

Estimation Review

• An estimator is a rule that tells how to calculate the value of an estimate based on the measurements contained in a sample

• i.e. the sample mean

Y = 1n

∑ni=1 Yi

Page 19: Regression Estimation – Least Squares and Maximum …fwood/Teaching/w4315/Fall2009/lecture_3.pdf · Regression Estimation – Least Squares and Maximum Likelihood ... Write these

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 19

Point Estimators and Bias

• Point estimator

• Unknown quantity / parameter

• Definition: Bias of estimator

θ = f({Y1, . . . , Yn})

θ

B(θ) = E(θ)− θ

Page 20: Regression Estimation – Least Squares and Maximum …fwood/Teaching/w4315/Fall2009/lecture_3.pdf · Regression Estimation – Least Squares and Maximum Likelihood ... Write these

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 20

One Sample Example

0 1 2 3 4 5 6 7 8 9 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

µ = 5, σ = 0.75

samples

θ

est. θ

run bias_example_plot.m

Page 21: Regression Estimation – Least Squares and Maximum …fwood/Teaching/w4315/Fall2009/lecture_3.pdf · Regression Estimation – Least Squares and Maximum Likelihood ... Write these

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 21

Distribution of Estimator

• If the estimator is a function of the samples and the distribution of the samples is known then the distribution of the estimator can (often) be determined

– Methods

• Distribution (CDF) functions

• Transformations

• Moment generating functions

• Jacobians (change of variable)

Page 22: Regression Estimation – Least Squares and Maximum …fwood/Teaching/w4315/Fall2009/lecture_3.pdf · Regression Estimation – Least Squares and Maximum Likelihood ... Write these

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 22

Example

• Samples from a Normal(µ,σ) distribution

• Estimate the population mean

Yi ∼ Normal(µ, σ2)

θ = µ, θ = Y = 1n

∑ni=1 Yi

Page 23: Regression Estimation – Least Squares and Maximum …fwood/Teaching/w4315/Fall2009/lecture_3.pdf · Regression Estimation – Least Squares and Maximum Likelihood ... Write these

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 23

Sampling Distribution of the Estimator

• First moment

• This is an example of an unbiased estimator

E(θ) = E(1

n

n∑

i=1

Yi)

=1

n

n∑

i=1

E(Yi) =nµ

n= θ

B(θ) = E(θ)− θ = 0

Page 24: Regression Estimation – Least Squares and Maximum …fwood/Teaching/w4315/Fall2009/lecture_3.pdf · Regression Estimation – Least Squares and Maximum Likelihood ... Write these

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 24

Variance of Estimator

• Definition: Variance of estimator

• Remember:

V (θ) = E([θ − E(θ)]2)

V (cY ) = c2V (Y )

V (∑n

i=1 Yi) =∑n

i=1 V (Yi)

Only if the Yi are independent with finite variance

Page 25: Regression Estimation – Least Squares and Maximum …fwood/Teaching/w4315/Fall2009/lecture_3.pdf · Regression Estimation – Least Squares and Maximum Likelihood ... Write these

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 25

Example Estimator Variance

• For N(0,1) mean estimator

• Note assumptions

V (θ) = V (1

n

n∑

i=1

Yi)

=1

n2

n∑

i=1

V (Yi) =nσ2

n2=

σ2

n

Page 26: Regression Estimation – Least Squares and Maximum …fwood/Teaching/w4315/Fall2009/lecture_3.pdf · Regression Estimation – Least Squares and Maximum Likelihood ... Write these

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 26

Distribution of sample mean estimator

2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1000 samples

Page 27: Regression Estimation – Least Squares and Maximum …fwood/Teaching/w4315/Fall2009/lecture_3.pdf · Regression Estimation – Least Squares and Maximum Likelihood ... Write these

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 27

Bias Variance Trade-off

• The mean squared error of an estimator

• Can be re-expressed

MSE(θ) = E([θ − θ]2)

MSE(θ) = V (θ) + (B(θ)2)

Page 28: Regression Estimation – Least Squares and Maximum …fwood/Teaching/w4315/Fall2009/lecture_3.pdf · Regression Estimation – Least Squares and Maximum Likelihood ... Write these

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 28

MSE = VAR + BIAS2

• Proof

MSE(θ) = E((θ − θ)2)

= E(([θ − E(θ)] + [E(θ)− θ])2)

= E([θ − E(θ)]2) + 2E([E(θ)− θ][θ − E(θ)]) + E([E(θ)− θ]2)

= V (θ) + 2E([E(θ)[θ − E(θ)]− θ[θ − E(θ)])) + (B(θ))2

= V (θ) + 2(0 + 0) + (B(θ))2

= V (θ) + (B(θ))2

Page 29: Regression Estimation – Least Squares and Maximum …fwood/Teaching/w4315/Fall2009/lecture_3.pdf · Regression Estimation – Least Squares and Maximum Likelihood ... Write these

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 29

Trade-off

• Think of variance as confidence and bias as correctness.

– Intuitions (largely) apply

• Sometimes a biased estimator can produce lower MSE if it lowers the variance.

Page 30: Regression Estimation – Least Squares and Maximum …fwood/Teaching/w4315/Fall2009/lecture_3.pdf · Regression Estimation – Least Squares and Maximum Likelihood ... Write these

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 30

Estimating Error Term Variance σ

• Regression model

• Variance of each observation Yi is σ (the same as for the error term ǫi)

• Each Yi comes from a different probability distribution with different means that depend on the level Xi

• The deviation of an observation Yi must be calculated around its own estimated mean.

Page 31: Regression Estimation – Least Squares and Maximum …fwood/Teaching/w4315/Fall2009/lecture_3.pdf · Regression Estimation – Least Squares and Maximum Likelihood ... Write these

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 31

s2 estimator for σ

• MSE is an unbiased estimator of σ

• The sum of squares SSE has n-2 degrees of freedom associated with it.

s2 =MSE = SSEn−2 =

∑(Yi−Yi)

2

n−2 =

∑e2i

n−2

E(MSE) = σ2

Page 32: Regression Estimation – Least Squares and Maximum …fwood/Teaching/w4315/Fall2009/lecture_3.pdf · Regression Estimation – Least Squares and Maximum Likelihood ... Write these

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 32

Normal Error Regression Model

• No matter how the error terms ǫi are

distributed, the least squares method provides unbiased point estimators of β and β– that also have minimum variance among all

unbiased linear estimators

• To set up interval estimates and make tests we need to specify the distribution of the ǫi

• We will assume that the ǫi are normally

distributed.

Page 33: Regression Estimation – Least Squares and Maximum …fwood/Teaching/w4315/Fall2009/lecture_3.pdf · Regression Estimation – Least Squares and Maximum Likelihood ... Write these

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 33

Normal Error Regression Model

• Yi value of the response variable in the ith trial

• β and β are parameters

• Xi is a known constant, the value of the predictor variable in the ith trial

• ǫi ~iid N(0,σ)

• i = 1,…,n

Yi = β0 + β1Xi + ǫi

Page 34: Regression Estimation – Least Squares and Maximum …fwood/Teaching/w4315/Fall2009/lecture_3.pdf · Regression Estimation – Least Squares and Maximum Likelihood ... Write these

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 34

Notational Convention

• When you see ǫi ~iid N(0,σ)

• It is read as ǫi is distributed identically and

independently according to a normal distribution with mean 0 and variance σ

• Examples

– θ ~ Poisson(λ)

– z ~ G(θ)

Page 35: Regression Estimation – Least Squares and Maximum …fwood/Teaching/w4315/Fall2009/lecture_3.pdf · Regression Estimation – Least Squares and Maximum Likelihood ... Write these

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 35

Maximum Likelihood Principle

• The method of maximum likelihood chooses as estimates those values of the parameters that are most consistent with the sample data.

Page 36: Regression Estimation – Least Squares and Maximum …fwood/Teaching/w4315/Fall2009/lecture_3.pdf · Regression Estimation – Least Squares and Maximum Likelihood ... Write these

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 36

Likelihood Function

• If

then the likelihood function is

Xi ∼ F (Θ), i = 1 . . . n

L({Xi}ni=1,Θ) =

∏ni=1 F (Xi; Θ)

Page 37: Regression Estimation – Least Squares and Maximum …fwood/Teaching/w4315/Fall2009/lecture_3.pdf · Regression Estimation – Least Squares and Maximum Likelihood ... Write these

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 37

Example, N(10,3) Density, Single Obs.

N=10, - log likelihood = 4.3038

0 2 4 6 8 10 12 14 16 18 200

0.02

0.04

0.06

0.08

0.1

0.12

0.14

Samples

N(10, 3) Density

Page 38: Regression Estimation – Least Squares and Maximum …fwood/Teaching/w4315/Fall2009/lecture_3.pdf · Regression Estimation – Least Squares and Maximum Likelihood ... Write these

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 38

Example, N(10,3) Density, Single Obs. Again

N=10, - log likelihood = 4.3038

0 2 4 6 8 10 12 14 16 18 200

0.02

0.04

0.06

0.08

0.1

0.12

0.14

Samples

N(10, 3) Density

Page 39: Regression Estimation – Least Squares and Maximum …fwood/Teaching/w4315/Fall2009/lecture_3.pdf · Regression Estimation – Least Squares and Maximum Likelihood ... Write these

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 39

Example, N(10,3) Density, Multiple Obs.

0 5 10 15 20 250

0.02

0.04

0.06

0.08

0.1

0.12

0.14

Samples

N(10, 3) Density

N=10, - log likelihood = 36.2204

Page 40: Regression Estimation – Least Squares and Maximum …fwood/Teaching/w4315/Fall2009/lecture_3.pdf · Regression Estimation – Least Squares and Maximum Likelihood ... Write these

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 40

Maximum Likelihood Estimation

• The likelihood function can be maximized w.r.t. the parameter(s) Θ, doing this one can

arrive at estimators for parameters as well.

• To do this, find solutions to (analytically or by following gradient)

L({Xi}ni=1,Θ) =

∏ni=1 F (Xi; Θ)

dL({Xi}n

i=1,Θ)

dΘ = 0

Page 41: Regression Estimation – Least Squares and Maximum …fwood/Teaching/w4315/Fall2009/lecture_3.pdf · Regression Estimation – Least Squares and Maximum Likelihood ... Write these

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 41

Important Trick

• Never (almost) maximize the likelihood function, maximize the log likelihood function instead.

Quite often the log of the density is easier to work with mathematically.

log(L({Xi}ni=1,Θ)) = log(

n∏

i=1

F (Xi; Θ))

=n∑

i=1

log(F (Xi; Θ))

Page 42: Regression Estimation – Least Squares and Maximum …fwood/Teaching/w4315/Fall2009/lecture_3.pdf · Regression Estimation – Least Squares and Maximum Likelihood ... Write these

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 42

ML Normal Regression

• Likelihood function

which if you maximize (how?) w.r.t. to the parameters you get…

L(β0, β1, σ2) =

n∏

i=1

1

(2πσ2)1/2e− 1

2σ2(Yi−β0−β1Xi)

2

=1

(2πσ2)n/2e− 1

2σ2

∑n

i=1(Yi−β0−β1Xi)

2

Page 43: Regression Estimation – Least Squares and Maximum …fwood/Teaching/w4315/Fall2009/lecture_3.pdf · Regression Estimation – Least Squares and Maximum Likelihood ... Write these

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 43

Maximum Likelihood Estimator(s)

• β– b0 same as in least squares case

• β– b1 same as in least squares case

• σ

• Note that ML estimator is biased as s2 is unbiased

and

σ2 =

∑i(Yi−Yi)

2

n

s2 =MSE = nn−2 σ

2

Page 44: Regression Estimation – Least Squares and Maximum …fwood/Teaching/w4315/Fall2009/lecture_3.pdf · Regression Estimation – Least Squares and Maximum Likelihood ... Write these

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 44

Comments

• Least squares minimizes the squared error between the prediction and the true output

• The normal distribution is fully characterized by its first two central moments (mean and variance)

• Food for thought:

– What does the bias in the ML estimator of the

error variance mean? And where does it come

from?