• date post

24-Jul-2020
• Category

## Documents

• view

5

3

Embed Size (px)

### Transcript of Multiple Regression - Kasetsart Economic Environment for Finance... Multiple Regression Equation The...

• Multiple Regression

Peerapat Wongchaiwat, Ph.D.

• The Multiple Regression Model

Examine the linear relationship between 1 dependent (Y) & 2 or more independent variables (Xi)

εXβXβXββY kk22110  

Multiple Regression Model with k Independent Variables:

Y-intercept Population slopes Random Error

• Multiple Regression Equation

The coefficients of the multiple regression model are

estimated using sample data

ik,k2i21i10i xbxbxbbŷ  

Estimated (or predicted) value of y

Estimated slope coefficients

Multiple regression equation with k independent variables:

Estimated intercept

We will always use a computer to obtain the regression

slope coefficients and other regression summary

measures.

• Sales Example

Salest = b0 + b1 (Price)t

+ b2 (Advertising)t + et

Week

Pie

Sales

Price

(\$)

(\$100s)

1 350 5.50 3.3

2 460 7.50 3.3

3 350 8.00 3.0

4 430 8.00 4.5

5 350 6.80 3.0

6 380 7.50 4.0

7 430 4.50 3.0

8 470 6.40 3.7

9 450 7.00 3.5

10 490 5.00 4.0

11 340 7.20 3.5

12 300 7.90 3.2

13 440 5.90 4.0

14 450 5.00 3.5

15 300 7.00 2.7

Multiple regression equation:

• Multiple Regression Output

Regression Statistics

Multiple R 0.72213

R Square 0.52148

Adjusted R Square 0.44172

Standard Error 47.46341

Observations 15

ANOVA df SS MS F Significance F

Regression 2 29460.027 14730.013 6.53861 0.01201

Residual 12 27033.306 2252.776

Total 14 56493.333

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404

Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392

Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888

ertising)74.131(Adv ce)24.975(Pri - 306.526 Sales 

• R2 never decreases when a new X variable is added to the model, even if the new variable is not an important predictor variable

– Hence, models with different number of explanatory variables cannot be compared by R2

• What is the net effect of adding a new variable?

– We lose a degree of freedom when a new X variable is added

– Did the new X variable add enough explanatory power to offset the loss of one degree of freedom?

* Adjusted R2 penalizes excessive use of unimportant independent variables

Adjusted R2 is always smaller than R2 (except when R2 =1)

2R

• F-Test for Overall Significance

of the Model

• Shows if there is a linear relationship between all of the

X variables considered together and Y

• Use F test statistic

• Hypotheses:

H0: β1 = β2 = … = βk = 0 (no linear relationship)

H1: at least one βi ≠ 0 (at least one independent variable affects Y)

• 6.5386 2252.8

14730.0

MSE

MSR F 

Regression Statistics

Multiple R 0.72213

R Square 0.52148

Adjusted R Square 0.44172

Standard Error 47.46341

Observations 15

ANOVA df SS MS F Significance F

Regression 2 29460.027 14730.013 6.53861 0.01201

Residual 12 27033.306 2252.776

Total 14 56493.333

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404

Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392

Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888

(continued)

F-Test for Overall Significance

With 2 and 12 degrees of

freedom P-value for

the F-Test

• Source of

Variation

Sum of

Squares

Degrees of

Freedom Mean Square F Ratio

Regression SSR (k)

Error SSE (n-(k+1))

=(n-k-1)

Total SST (n-1)

k SSRMSR 

)1(  

kn SSEMSE

MST SST

n 

( )1

F MSR

MSE 

= SSR

SST = 1 -

SSE

SST R

2

= 1 -

SSE

(n - (k + 1))

SST

(n - 1)

= MSE

MST R

2F R

R

n k

k 

  2

1 2

1

( )

( ( ))

( )

The ANOVA Table in Regression

11-9

• Hypothesis tests about individual regression slope parameters:

(1) H0: b1= 0

H1: b1  0

(2) H0: b2 = 0

H1: b2  0 . . . (k) H0: bk = 0

H1: bk  0

Test statistic for test i t b

s b n k

i

i

: ( )

( ( )  

 1

0

Tests of the Significance of Individual

Regression Parameters

11-10

• The Concept of Partial Regression

Coefficients

In multiple regression, the interpretation of slope

coefficients requires special attention:

• Here, b1 shows the relationship between X1 and

Y holding X2 constant (i.e. controlling for the

effect of X2 ).

2i21i10i xbxbbŷ 

• Purifying X1 from X2 (i.e. Removing the effect of

X2 on X1 : Run a regression of X2 on X1

X2i = 0 + 1X1i + vi

vi = X2i – (0 + 1X1i) is X2 purified from X1

Then, run a regression of Yi on vi.

Yi = 0 + 1vi .

1 is the b1 in the original multiple regression

equation.

• b1 shows the relationship between X1 purified from

X2 and Y.

Whenever, a new explanatory variable is

added into the regression equation or

removed from from the equation, all b

coefficients change.

(unless, the covariance of the added or

removed variable with all other variables is

zero).

• The Principle of Parsimony:

Any insignificant explanatory variable should be removed out of the regression equation.

The Principle of Generosity:

Any significant variable must be included in the regression equation.

Choosing the best model:

Choose the model with the highest adjusted R2 or F or the lowest AIC (Akaike Information Criterion) or SC (Schwarz Criterion).

Apply the stepwise regression procedure.

• Multiple Regression

For example:

A researcher may be interested in the

relationship between Education and Income

and Number of Children in a family.

Independent Variables

Education

Family Income

Dependent Variable

Number of Children

• Multiple Regression

 For example:

Research Hypothesis: As education of respondents

increases, the number of children in families will

decline (negative relationship).

Research Hypothesis: As family income of

respondents increases, the number of children in

families will decline (negative relationship).

Independent Variables

Education

Family Income

Dependent Variable

Number of Children

• Multiple Regression

 For example:

Null Hypothesis: There is no relationship between

education of respondents and the number of children

in families.

Null Hypothesis: There is no relationship between

family income and the number of children in families.

Independent Variables

Education

Family Income

Dependent Variable

Number of Children

• Multiple Regression

Bivariate regression is based on fitting a line as

close as possible to the plotted coordinates of

your data on a two-dimensional graph.

Trivariate regression is based on fitting a plane

as close as possible to the plotted coordinates of

your data on a three-dimensional graph.

Case: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Children (Y): 2 5 1 9 6 3 0 3 7 7 2 5 1 9 6 3 0 3 7 14 2 5 1 9 6

Education (X1) 12 16 2012 9 18 16 14 9 12 12 10 20 11 9 18 16 14 9 8 12 10 20 11 9

Income 1=\$10K (X2): 3 4 9 5 4 12 10 1 4 3 10 4 9 4 4 12 10 6 4 1 10 3 9 2 4

• Multiple Regression

Case: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Children (Y): 2 5 1 9 6 3 0 3 7 7 2 5 1 9 6 3 0 3 7 14 2 5 1 9 6

Education (X1) 12 16 2012 9 18 16 14 9 12 12 10 20 11 9 18 16 14 9 8 12 10 20 11 9

Income 1=\$10K (X2): 3 4 9 5 4 12 10 1 4 3 10 4 9 4 4 12