Regression Estimation – Least Squares and Maximum...

Click here to load reader

  • date post

    31-Mar-2018
  • Category

    Documents

  • view

    227
  • download

    2

Embed Size (px)

Transcript of Regression Estimation – Least Squares and Maximum...

  • Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 1

    Regression Estimation Least

    Squares and Maximum Likelihood

    Dr. Frank Wood

  • Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 2

    Least Squares Max(min)imization

    Function to minimize w.r.t. ,

    Minimize this by maximizing Q

    Find partials and set both equal to zero

    Q =n

    i=1(Yi (0 + 1Xi))2

    go to board

  • Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 3

    Normal Equations

    The result of this maximization step are called the normal equations. b0 and b1 are called point estimators of and respectively

    This is a system of two equations and two unknowns. The solution is given by

    Yi = nb0 + b1

    Xi

    XiYi = b0

    Xi + b1

    X2i

    Write these on board

  • Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 4

    Solution to Normal Equations

    b1 =

    (Xi X)(Yi Y )

    (Xi X)2

    b0 = Y b1X

    X =

    Xi

    n

    Y =

    Yi

    n

    After a lot of algebra one arrives at

  • Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 5

    Least Squares Fit

    1 2 3 4 5 6 7 8 9 10 1110

    15

    20

    25

    30

    35

    40

    Predictor/Input

    Response/O

    utp

    ut

    Estimate, y = 2.09x + 8.36, mse: 4.15

    True, y = 2x + 9, mse: 4.22

    ?

  • Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 6

    Guess #1

    1 2 3 4 5 6 7 8 9 10 1110

    15

    20

    25

    30

    35

    40

    Predictor/Input

    Response/O

    utp

    ut

    Guess, y = 0x + 21.2, mse: 37.1

    True, y = 2x + 9, mse: 4.22

  • Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 7

    Guess #2

    1 2 3 4 5 6 7 8 9 10 1110

    15

    20

    25

    30

    35

    40

    Predictor/Input

    Response/O

    utp

    ut

    Guess, y = 1.5x + 13, mse: 7.84

    True, y = 2x + 9, mse: 4.22

  • Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 8

    Looking Ahead: Matrix Least Squares

    Solution to this equation is solution to least squares linear regression (and maximum likelihood under normal error distribution assumption)

    Y1Y2...Yn

    =

    X1 1X2 1...Xn 1

    [10

    ]

  • Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 9

    Questions to Ask

    Is the relationship really linear?

    What is the distribution of the of errors?

    Is the fit good?

    How much of the variability of the response is accounted for by including the predictor variable?

    Is the chosen predictor variable the best one?

  • Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 10

    Is This Better?

    1 2 3 4 5 6 7 8 9 10 1110

    15

    20

    25

    30

    35

    40

    Predictor/Input

    Response/O

    utp

    ut

    7 Order, mse: 3.18

  • Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 11

    Goals for First Half of Course

    How to do linear regression

    Self familiarization with software tools

    How to interpret standard linear regression results

    How to derive tests

    How to assess and address deficiencies in regression models

  • Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 12

    Properties of Solution

    The ith residual is defined to be

    The sum of the residuals is zero:

    ei = Yi Yi

    i

    ei =(Yi b0 b1Xi)

    =

    Yi nb0 b1

    Xi

    = 0By first normal equation.

  • Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 13

    Properties of Solution

    The sum of the observed values Yi equals the sum of the fitted values Yi

    i

    Yi =

    i

    Yi

    =

    i

    (b1Xi + b0)

    =

    i

    (b1Xi + Y b1X)

    = b1

    i

    Xi + nY b1nX

    = b1nX +

    i

    Yi b1nX

  • Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 14

    Properties of Solution

    The sum of the weighted residuals is zero when the residual in the ith trial is weighted by the level of the predictor variable in the ith trial

    i

    Xiei =(Xi(Yi b0 b1Xi))

    =

    i

    XiYi b0

    Xi b1(X2i )

    = 0By second normal equation.

  • Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 15

    Properties of Solution

    The sum of the weighted residuals is zero when the residual in the ith trial is weighted by the fitted value of the response variable for the ith trial

    i

    Yiei =

    i

    (b0 + b1Xi)ei

    = b0

    i

    ei + b1

    i

    eiXi

    = 0By previous properties.

  • Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 16

    Properties of Solution

    The regression line always goes through the point

    X, Y

  • Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 17

    Estimating Error Term Variance

    Review estimation in non-regression setting.

    Show estimation results for regression setting.

  • Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 18

    Estimation Review

    An estimator is a rule that tells how to calculate the value of an estimate based on the measurements contained in a sample

    i.e. the sample mean

    Y = 1nn

    i=1 Yi

  • Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 19

    Point Estimators and Bias

    Point estimator

    Unknown quantity / parameter

    Definition: Bias of estimator

    = f({Y1, . . . , Yn})

    B() = E()

  • Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 20

    One Sample Example

    0 1 2 3 4 5 6 7 8 9 100

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    = 5, = 0.75

    samples

    est.

    run bias_example_plot.m

  • Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 21

    Distribution of Estimator

    If the estimator is a function of the samples and the distribution of the samples is known then the distribution of the estimator can (often) be determined

    Methods

    Distribution (CDF) functions

    Transformations

    Moment generating functions

    Jacobians (change of variable)

  • Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 22

    Example

    Samples from a Normal(,) distribution

    Estimate the population mean

    Yi Normal(, 2)

    = , = Y = 1nn

    i=1 Yi

  • Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 23

    Sampling Distribution of the Estimator

    First moment

    This is an example of an unbiased estimator

    E() = E(1

    n

    n

    i=1

    Yi)

    =1

    n

    n

    i=1

    E(Yi) =n

    n=

    B() = E() = 0

  • Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 24

    Variance of Estimator

    Definition: Variance of estimator

    Remember:

    V () = E([ E()]2)

    V (cY ) = c2V (Y )

    V (n

    i=1 Yi) =n

    i=1 V (Yi)

    Only if the Yi are independent with finite variance

  • Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 25

    Example Estimator Variance

    For N(0,1) mean estimator

    Note assumptions

    V () = V (1

    n

    n

    i=1

    Yi)

    =1

    n2

    n

    i=1

    V (Yi) =n2

    n2=

    2

    n

  • Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 26

    Distribution of sample mean estimator

    2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.50

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1000 samples

  • Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 27

    Bias Variance Trade-off

    The mean squared error of an estimator

    Can be re-expressed

    MSE() = E([ ]2)

    MSE() = V () + (B()2)

  • Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 28

    MSE = VAR + BIAS2

    Proof

    MSE() = E(( )2)

    = E(([ E()] + [E() ])2)

    = E([ E()]2) + 2E([E() ][ E()]) + E([E() ]2)

    = V () + 2E([E()[ E()] [ E()])) + (B())2

    = V () + 2(0 + 0) + (B())2

    = V () + (B())2

  • Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 29

    Trade-off

    Think of variance as confidence and bias as correctness.

    Intuitions (largely) apply

    Sometimes a biased estimator can produce lower MSE if it lowers the variance.

  • Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 30

    Estimating Error Term Variance

    Regression model

    Variance of each observation Yi is (the

    same as for the error term i)

    Each Yi comes from a different probability distribution with different means that depend on the level Xi

    The deviation of an observation Yi must be calculated around its own estimated mean.

  • Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 31

    s2 estimator for

    MSE is an unbiased estimator of

    The sum of squares SSE has n-2 degrees of fre