# Multiple Regression - Kasetsart Economic Environment for Finance... Multiple Regression Equation The...

date post

24-Jul-2020Category

## Documents

view

5download

3

Embed Size (px)

### Transcript of Multiple Regression - Kasetsart Economic Environment for Finance... Multiple Regression Equation The...

Multiple Regression

Peerapat Wongchaiwat, Ph.D.

The Multiple Regression Model

Examine the linear relationship between 1 dependent (Y) & 2 or more independent variables (Xi)

εXβXβXββY kk22110

Multiple Regression Model with k Independent Variables:

Y-intercept Population slopes Random Error

Multiple Regression Equation

The coefficients of the multiple regression model are

estimated using sample data

ik,k2i21i10i xbxbxbbŷ

Estimated (or predicted) value of y

Estimated slope coefficients

Multiple regression equation with k independent variables:

Estimated intercept

We will always use a computer to obtain the regression

slope coefficients and other regression summary

measures.

Sales Example

Salest = b0 + b1 (Price)t

+ b2 (Advertising)t + et

Week

Pie

Sales

Price

($)

Advertising

($100s)

1 350 5.50 3.3

2 460 7.50 3.3

3 350 8.00 3.0

4 430 8.00 4.5

5 350 6.80 3.0

6 380 7.50 4.0

7 430 4.50 3.0

8 470 6.40 3.7

9 450 7.00 3.5

10 490 5.00 4.0

11 340 7.20 3.5

12 300 7.90 3.2

13 440 5.90 4.0

14 450 5.00 3.5

15 300 7.00 2.7

Multiple regression equation:

Multiple Regression Output

Regression Statistics

Multiple R 0.72213

R Square 0.52148

Adjusted R Square 0.44172

Standard Error 47.46341

Observations 15

ANOVA df SS MS F Significance F

Regression 2 29460.027 14730.013 6.53861 0.01201

Residual 12 27033.306 2252.776

Total 14 56493.333

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404

Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392

Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888

ertising)74.131(Adv ce)24.975(Pri - 306.526 Sales

Adjusted

• R2 never decreases when a new X variable is added to the model, even if the new variable is not an important predictor variable

– Hence, models with different number of explanatory variables cannot be compared by R2

• What is the net effect of adding a new variable?

– We lose a degree of freedom when a new X variable is added

– Did the new X variable add enough explanatory power to offset the loss of one degree of freedom?

* Adjusted R2 penalizes excessive use of unimportant independent variables

Adjusted R2 is always smaller than R2 (except when R2 =1)

2R

F-Test for Overall Significance

of the Model

• Shows if there is a linear relationship between all of the

X variables considered together and Y

• Use F test statistic

• Hypotheses:

H0: β1 = β2 = … = βk = 0 (no linear relationship)

H1: at least one βi ≠ 0 (at least one independent variable affects Y)

6.5386 2252.8

14730.0

MSE

MSR F

Regression Statistics

Multiple R 0.72213

R Square 0.52148

Adjusted R Square 0.44172

Standard Error 47.46341

Observations 15

ANOVA df SS MS F Significance F

Regression 2 29460.027 14730.013 6.53861 0.01201

Residual 12 27033.306 2252.776

Total 14 56493.333

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404

Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392

Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888

(continued)

F-Test for Overall Significance

With 2 and 12 degrees of

freedom P-value for

the F-Test

Source of

Variation

Sum of

Squares

Degrees of

Freedom Mean Square F Ratio

Regression SSR (k)

Error SSE (n-(k+1))

=(n-k-1)

Total SST (n-1)

k SSRMSR

)1(

kn SSEMSE

MST SST

n

( )1

F MSR

MSE

= SSR

SST = 1 -

SSE

SST R

2

= 1 -

SSE

(n - (k + 1))

SST

(n - 1)

= MSE

MST R

2F R

R

n k

k

2

1 2

1

( )

( ( ))

( )

The ANOVA Table in Regression

11-9

Hypothesis tests about individual regression slope parameters:

(1) H0: b1= 0

H1: b1 0

(2) H0: b2 = 0

H1: b2 0 . . . (k) H0: bk = 0

H1: bk 0

Test statistic for test i t b

s b n k

i

i

: ( )

( ( )

1

0

Tests of the Significance of Individual

Regression Parameters

11-10

The Concept of Partial Regression

Coefficients

In multiple regression, the interpretation of slope

coefficients requires special attention:

• Here, b1 shows the relationship between X1 and

Y holding X2 constant (i.e. controlling for the

effect of X2 ).

2i21i10i xbxbbŷ

Purifying X1 from X2 (i.e. Removing the effect of

X2 on X1 : Run a regression of X2 on X1

X2i = 0 + 1X1i + vi

vi = X2i – (0 + 1X1i) is X2 purified from X1

Then, run a regression of Yi on vi.

Yi = 0 + 1vi .

1 is the b1 in the original multiple regression

equation.

b1 shows the relationship between X1 purified from

X2 and Y.

Whenever, a new explanatory variable is

added into the regression equation or

removed from from the equation, all b

coefficients change.

(unless, the covariance of the added or

removed variable with all other variables is

zero).

The Principle of Parsimony:

Any insignificant explanatory variable should be removed out of the regression equation.

The Principle of Generosity:

Any significant variable must be included in the regression equation.

Choosing the best model:

Choose the model with the highest adjusted R2 or F or the lowest AIC (Akaike Information Criterion) or SC (Schwarz Criterion).

Apply the stepwise regression procedure.

Multiple Regression

For example:

A researcher may be interested in the

relationship between Education and Income

and Number of Children in a family.

Independent Variables

Education

Family Income

Dependent Variable

Number of Children

Multiple Regression

For example:

Research Hypothesis: As education of respondents

increases, the number of children in families will

decline (negative relationship).

Research Hypothesis: As family income of

respondents increases, the number of children in

families will decline (negative relationship).

Independent Variables

Education

Family Income

Dependent Variable

Number of Children

Multiple Regression

For example:

Null Hypothesis: There is no relationship between

education of respondents and the number of children

in families.

Null Hypothesis: There is no relationship between

family income and the number of children in families.

Independent Variables

Education

Family Income

Dependent Variable

Number of Children

Multiple Regression

Bivariate regression is based on fitting a line as

close as possible to the plotted coordinates of

your data on a two-dimensional graph.

Trivariate regression is based on fitting a plane

as close as possible to the plotted coordinates of

your data on a three-dimensional graph.

Case: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Children (Y): 2 5 1 9 6 3 0 3 7 7 2 5 1 9 6 3 0 3 7 14 2 5 1 9 6

Education (X1) 12 16 2012 9 18 16 14 9 12 12 10 20 11 9 18 16 14 9 8 12 10 20 11 9

Income 1=$10K (X2): 3 4 9 5 4 12 10 1 4 3 10 4 9 4 4 12 10 6 4 1 10 3 9 2 4

Multiple Regression

Case: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Children (Y): 2 5 1 9 6 3 0 3 7 7 2 5 1 9 6 3 0 3 7 14 2 5 1 9 6

Education (X1) 12 16 2012 9 18 16 14 9 12 12 10 20 11 9 18 16 14 9 8 12 10 20 11 9

Income 1=$10K (X2): 3 4 9 5 4 12 10 1 4 3 10 4 9 4 4 12