Multiple Regression - Kasetsart Economic Environment for Finance... Multiple Regression Equation The

download Multiple Regression - Kasetsart Economic Environment for Finance... Multiple Regression Equation The

of 47

  • date post

    24-Jul-2020
  • Category

    Documents

  • view

    0
  • download

    0

Embed Size (px)

Transcript of Multiple Regression - Kasetsart Economic Environment for Finance... Multiple Regression Equation The

  • Multiple Regression

    Peerapat Wongchaiwat, Ph.D.

    wongchaiwat@hotmail.com

  • The Multiple Regression Model

    Examine the linear relationship between 1 dependent (Y) & 2 or more independent variables (Xi)

    εXβXβXββY kk22110  

    Multiple Regression Model with k Independent Variables:

    Y-intercept Population slopes Random Error

  • Multiple Regression Equation

    The coefficients of the multiple regression model are

    estimated using sample data

    ik,k2i21i10i xbxbxbbŷ  

    Estimated (or predicted) value of y

    Estimated slope coefficients

    Multiple regression equation with k independent variables:

    Estimated intercept

    We will always use a computer to obtain the regression

    slope coefficients and other regression summary

    measures.

  • Sales Example

    Salest = b0 + b1 (Price)t

    + b2 (Advertising)t + et

    Week

    Pie

    Sales

    Price

    ($)

    Advertising

    ($100s)

    1 350 5.50 3.3

    2 460 7.50 3.3

    3 350 8.00 3.0

    4 430 8.00 4.5

    5 350 6.80 3.0

    6 380 7.50 4.0

    7 430 4.50 3.0

    8 470 6.40 3.7

    9 450 7.00 3.5

    10 490 5.00 4.0

    11 340 7.20 3.5

    12 300 7.90 3.2

    13 440 5.90 4.0

    14 450 5.00 3.5

    15 300 7.00 2.7

    Multiple regression equation:

  • Multiple Regression Output

    Regression Statistics

    Multiple R 0.72213

    R Square 0.52148

    Adjusted R Square 0.44172

    Standard Error 47.46341

    Observations 15

    ANOVA df SS MS F Significance F

    Regression 2 29460.027 14730.013 6.53861 0.01201

    Residual 12 27033.306 2252.776

    Total 14 56493.333

    Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

    Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404

    Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392

    Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888

    ertising)74.131(Adv ce)24.975(Pri - 306.526 Sales 

  • Adjusted

    • R2 never decreases when a new X variable is added to the model, even if the new variable is not an important predictor variable

    – Hence, models with different number of explanatory variables cannot be compared by R2

    • What is the net effect of adding a new variable?

    – We lose a degree of freedom when a new X variable is added

    – Did the new X variable add enough explanatory power to offset the loss of one degree of freedom?

    * Adjusted R2 penalizes excessive use of unimportant independent variables

    Adjusted R2 is always smaller than R2 (except when R2 =1)

    2R

  • F-Test for Overall Significance

    of the Model

    • Shows if there is a linear relationship between all of the

    X variables considered together and Y

    • Use F test statistic

    • Hypotheses:

    H0: β1 = β2 = … = βk = 0 (no linear relationship)

    H1: at least one βi ≠ 0 (at least one independent variable affects Y)

  • 6.5386 2252.8

    14730.0

    MSE

    MSR F 

    Regression Statistics

    Multiple R 0.72213

    R Square 0.52148

    Adjusted R Square 0.44172

    Standard Error 47.46341

    Observations 15

    ANOVA df SS MS F Significance F

    Regression 2 29460.027 14730.013 6.53861 0.01201

    Residual 12 27033.306 2252.776

    Total 14 56493.333

    Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

    Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404

    Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392

    Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888

    (continued)

    F-Test for Overall Significance

    With 2 and 12 degrees of

    freedom P-value for

    the F-Test

  • Source of

    Variation

    Sum of

    Squares

    Degrees of

    Freedom Mean Square F Ratio

    Regression SSR (k)

    Error SSE (n-(k+1))

    =(n-k-1)

    Total SST (n-1)

    k SSRMSR 

    )1(  

    kn SSEMSE

    MST SST

    n 

    ( )1

    F MSR

    MSE 

    = SSR

    SST = 1 -

    SSE

    SST R

    2

    = 1 -

    SSE

    (n - (k + 1))

    SST

    (n - 1)

    = MSE

    MST R

    2F R

    R

    n k

    k 

      2

    1 2

    1

    ( )

    ( ( ))

    ( )

    The ANOVA Table in Regression

    11-9

  • Hypothesis tests about individual regression slope parameters:

    (1) H0: b1= 0

    H1: b1  0

    (2) H0: b2 = 0

    H1: b2  0 . . . (k) H0: bk = 0

    H1: bk  0

    Test statistic for test i t b

    s b n k

    i

    i

    : ( )

    ( ( )  

     1

    0

    Tests of the Significance of Individual

    Regression Parameters

    11-10

  • The Concept of Partial Regression

    Coefficients

    In multiple regression, the interpretation of slope

    coefficients requires special attention:

    • Here, b1 shows the relationship between X1 and

    Y holding X2 constant (i.e. controlling for the

    effect of X2 ).

    2i21i10i xbxbbŷ 

  • Purifying X1 from X2 (i.e. Removing the effect of

    X2 on X1 : Run a regression of X2 on X1

    X2i = 0 + 1X1i + vi

    vi = X2i – (0 + 1X1i) is X2 purified from X1

    Then, run a regression of Yi on vi.

    Yi = 0 + 1vi .

    1 is the b1 in the original multiple regression

    equation.

  • b1 shows the relationship between X1 purified from

    X2 and Y.

    Whenever, a new explanatory variable is

    added into the regression equation or

    removed from from the equation, all b

    coefficients change.

    (unless, the covariance of the added or

    removed variable with all other variables is

    zero).

  • The Principle of Parsimony:

    Any insignificant explanatory variable should be removed out of the regression equation.

    The Principle of Generosity:

    Any significant variable must be included in the regression equation.

    Choosing the best model:

    Choose the model with the highest adjusted R2 or F or the lowest AIC (Akaike Information Criterion) or SC (Schwarz Criterion).

    Apply the stepwise regression procedure.

  • Multiple Regression

    For example:

    A researcher may be interested in the

    relationship between Education and Income

    and Number of Children in a family.

    Independent Variables

    Education

    Family Income

    Dependent Variable

    Number of Children

  • Multiple Regression

     For example:

    Research Hypothesis: As education of respondents

    increases, the number of children in families will

    decline (negative relationship).

    Research Hypothesis: As family income of

    respondents increases, the number of children in

    families will decline (negative relationship).

    Independent Variables

    Education

    Family Income

    Dependent Variable

    Number of Children

  • Multiple Regression

     For example:

    Null Hypothesis: There is no relationship between

    education of respondents and the number of children

    in families.

    Null Hypothesis: There is no relationship between

    family income and the number of children in families.

    Independent Variables

    Education

    Family Income

    Dependent Variable

    Number of Children

  • Multiple Regression

    Bivariate regression is based on fitting a line as

    close as possible to the plotted coordinates of

    your data on a two-dimensional graph.

    Trivariate regression is based on fitting a plane

    as close as possible to the plotted coordinates of

    your data on a three-dimensional graph.

    Case: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

    Children (Y): 2 5 1 9 6 3 0 3 7 7 2 5 1 9 6 3 0 3 7 14 2 5 1 9 6

    Education (X1) 12 16 2012 9 18 16 14 9 12 12 10 20 11 9 18 16 14 9 8 12 10 20 11 9

    Income 1=$10K (X2): 3 4 9 5 4 12 10 1 4 3 10 4 9 4 4 12 10 6 4 1 10 3 9 2 4

  • Multiple Regression

    Case: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

    Children (Y): 2 5 1 9 6 3 0 3 7 7 2 5 1 9 6 3 0 3 7 14 2 5 1 9 6

    Education (X1) 12 16 2012 9 18 16 14 9 12 12 10 20 11 9 18 16 14 9 8 12 10 20 11 9

    Income 1=$10K (X2): 3 4 9 5 4 12 10 1 4 3 10 4 9 4 4 12