Regression Estimation – Least Squares and Maximum...
date post
31-Mar-2018Category
Documents
view
227download
2
Embed Size (px)
Transcript of Regression Estimation – Least Squares and Maximum...
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 1
Regression Estimation Least
Squares and Maximum Likelihood
Dr. Frank Wood
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 2
Least Squares Max(min)imization
Function to minimize w.r.t. ,
Minimize this by maximizing Q
Find partials and set both equal to zero
Q =n
i=1(Yi (0 + 1Xi))2
go to board
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 3
Normal Equations
The result of this maximization step are called the normal equations. b0 and b1 are called point estimators of and respectively
This is a system of two equations and two unknowns. The solution is given by
Yi = nb0 + b1
Xi
XiYi = b0
Xi + b1
X2i
Write these on board
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 4
Solution to Normal Equations
b1 =
(Xi X)(Yi Y )
(Xi X)2
b0 = Y b1X
X =
Xi
n
Y =
Yi
n
After a lot of algebra one arrives at
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 5
Least Squares Fit
1 2 3 4 5 6 7 8 9 10 1110
15
20
25
30
35
40
Predictor/Input
Response/O
utp
ut
Estimate, y = 2.09x + 8.36, mse: 4.15
True, y = 2x + 9, mse: 4.22
?
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 6
Guess #1
1 2 3 4 5 6 7 8 9 10 1110
15
20
25
30
35
40
Predictor/Input
Response/O
utp
ut
Guess, y = 0x + 21.2, mse: 37.1
True, y = 2x + 9, mse: 4.22
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 7
Guess #2
1 2 3 4 5 6 7 8 9 10 1110
15
20
25
30
35
40
Predictor/Input
Response/O
utp
ut
Guess, y = 1.5x + 13, mse: 7.84
True, y = 2x + 9, mse: 4.22
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 8
Looking Ahead: Matrix Least Squares
Solution to this equation is solution to least squares linear regression (and maximum likelihood under normal error distribution assumption)
Y1Y2...Yn
=
X1 1X2 1...Xn 1
[10
]
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 9
Questions to Ask
Is the relationship really linear?
What is the distribution of the of errors?
Is the fit good?
How much of the variability of the response is accounted for by including the predictor variable?
Is the chosen predictor variable the best one?
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 10
Is This Better?
1 2 3 4 5 6 7 8 9 10 1110
15
20
25
30
35
40
Predictor/Input
Response/O
utp
ut
7 Order, mse: 3.18
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 11
Goals for First Half of Course
How to do linear regression
Self familiarization with software tools
How to interpret standard linear regression results
How to derive tests
How to assess and address deficiencies in regression models
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 12
Properties of Solution
The ith residual is defined to be
The sum of the residuals is zero:
ei = Yi Yi
i
ei =(Yi b0 b1Xi)
=
Yi nb0 b1
Xi
= 0By first normal equation.
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 13
Properties of Solution
The sum of the observed values Yi equals the sum of the fitted values Yi
i
Yi =
i
Yi
=
i
(b1Xi + b0)
=
i
(b1Xi + Y b1X)
= b1
i
Xi + nY b1nX
= b1nX +
i
Yi b1nX
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 14
Properties of Solution
The sum of the weighted residuals is zero when the residual in the ith trial is weighted by the level of the predictor variable in the ith trial
i
Xiei =(Xi(Yi b0 b1Xi))
=
i
XiYi b0
Xi b1(X2i )
= 0By second normal equation.
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 15
Properties of Solution
The sum of the weighted residuals is zero when the residual in the ith trial is weighted by the fitted value of the response variable for the ith trial
i
Yiei =
i
(b0 + b1Xi)ei
= b0
i
ei + b1
i
eiXi
= 0By previous properties.
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 16
Properties of Solution
The regression line always goes through the point
X, Y
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 17
Estimating Error Term Variance
Review estimation in non-regression setting.
Show estimation results for regression setting.
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 18
Estimation Review
An estimator is a rule that tells how to calculate the value of an estimate based on the measurements contained in a sample
i.e. the sample mean
Y = 1nn
i=1 Yi
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 19
Point Estimators and Bias
Point estimator
Unknown quantity / parameter
Definition: Bias of estimator
= f({Y1, . . . , Yn})
B() = E()
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 20
One Sample Example
0 1 2 3 4 5 6 7 8 9 100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
= 5, = 0.75
samples
est.
run bias_example_plot.m
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 21
Distribution of Estimator
If the estimator is a function of the samples and the distribution of the samples is known then the distribution of the estimator can (often) be determined
Methods
Distribution (CDF) functions
Transformations
Moment generating functions
Jacobians (change of variable)
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 22
Example
Samples from a Normal(,) distribution
Estimate the population mean
Yi Normal(, 2)
= , = Y = 1nn
i=1 Yi
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 23
Sampling Distribution of the Estimator
First moment
This is an example of an unbiased estimator
E() = E(1
n
n
i=1
Yi)
=1
n
n
i=1
E(Yi) =n
n=
B() = E() = 0
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 24
Variance of Estimator
Definition: Variance of estimator
Remember:
V () = E([ E()]2)
V (cY ) = c2V (Y )
V (n
i=1 Yi) =n
i=1 V (Yi)
Only if the Yi are independent with finite variance
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 25
Example Estimator Variance
For N(0,1) mean estimator
Note assumptions
V () = V (1
n
n
i=1
Yi)
=1
n2
n
i=1
V (Yi) =n2
n2=
2
n
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 26
Distribution of sample mean estimator
2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.50
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1000 samples
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 27
Bias Variance Trade-off
The mean squared error of an estimator
Can be re-expressed
MSE() = E([ ]2)
MSE() = V () + (B()2)
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 28
MSE = VAR + BIAS2
Proof
MSE() = E(( )2)
= E(([ E()] + [E() ])2)
= E([ E()]2) + 2E([E() ][ E()]) + E([E() ]2)
= V () + 2E([E()[ E()] [ E()])) + (B())2
= V () + 2(0 + 0) + (B())2
= V () + (B())2
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 29
Trade-off
Think of variance as confidence and bias as correctness.
Intuitions (largely) apply
Sometimes a biased estimator can produce lower MSE if it lowers the variance.
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 30
Estimating Error Term Variance
Regression model
Variance of each observation Yi is (the
same as for the error term i)
Each Yi comes from a different probability distribution with different means that depend on the level Xi
The deviation of an observation Yi must be calculated around its own estimated mean.
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 31
s2 estimator for
MSE is an unbiased estimator of
The sum of squares SSE has n-2 degrees of fre