Department of Managementweb2.utc.edu/~fhq532/212/lecture/ch14_16.pdf · Department of Management...

14

Click here to load reader

Transcript of Department of Managementweb2.utc.edu/~fhq532/212/lecture/ch14_16.pdf · Department of Management...

Page 1: Department of Managementweb2.utc.edu/~fhq532/212/lecture/ch14_16.pdf · Department of Management ... Develop a scatter diagram of sales versus advertising. ... relationship between

Lecture Notes for

STATISTICAL METHODS FOR BUSINESS II

BMGT 212

Chapters 14, 15 & 16

Professor Ahmadi, Ph.D.

Department of Management

Revised for the 9th Edition

Page 2: Department of Managementweb2.utc.edu/~fhq532/212/lecture/ch14_16.pdf · Department of Management ... Develop a scatter diagram of sales versus advertising. ... relationship between

Chapter 14 Formulas

Simple Linear Regression Model: y = + x +0 1β β εSimple Linear Regression Equation: E(y) = + x0 1β βLeast Squares Criterion: ( )Min y yi i−∑ $

2

Estimated Simple Linear Regression Equation

$y b b xo 1= + where = the estimated value of the dependent variable $y b0 = the y-intercept and b1 = the slope of the line

∑ −

−−∑=

2i

ii1 )xx(

)yy)(xx(b and b y bo 1= − x

Sum of Squares Due to Regression:

[ ]

∑ −−−∑

= 2i

2

ii

)x(x)y)(yx(xSSR

Total Sum of Squares: ∑ −=

2)y(ySST Also: SST = SSR + SSE

Sum of Squares Due to Error: SSE = ( ) y yi i−∑ $

2

Coefficient of Determination:

SSTSSR r 2 = Also

SSTSSE1r 2 −=

Sample Correlation Coefficient:

r 2 r = (the sign of b1) Coefficient of Determination = + where b1 = the slope of the regression equation

Professor Ahmadi’s Lecture Notes Page 2

Page 3: Department of Managementweb2.utc.edu/~fhq532/212/lecture/ch14_16.pdf · Department of Management ... Develop a scatter diagram of sales versus advertising. ... relationship between

t Test for significance of individual coefficients in Linear Regression

0:H0:H

1a

1o

≠β=β

t - statistic: tb

sb=

−1 1

1

β

where 1(Estimated Standard Deviation of bbs 1) is

2

i

b)xΣ(x

ss1 −= and MSEs =

Reject Ho if t t< − α 2 or: t t> α 2 (degrees of freedom = n – p – 1)

F Test for Significance of the Linear Regression Model (ANOVA)

0:H

1o =β (The model is not significant) 0:H

1a ≠β (The model is significant)

Source of Sum of Degrees of Mean Test Statistic Variation Squares Freedom Square F Regression SSR p MSR MSR MSE Error (Residual) SSE n - p - 1 MSE Total SST n - 1 Where: p = Number of independent variables n = The sample size Reject Ho if the Test statistic F > Critical Fα

Confidence Interval Estimate for the Mean Value of y, that is E(yp)

$ $y t sp

p± α 2 y where Estimated Standard Deviation of is py

2

i

2p

y )xx(

)xx(n1ss

p −Σ

−+= Remember: s MS= E

Professor Ahmadi’s Lecture Notes Page 3

Page 4: Department of Managementweb2.utc.edu/~fhq532/212/lecture/ch14_16.pdf · Department of Management ... Develop a scatter diagram of sales versus advertising. ... relationship between

Chapter 14

Simple (Bivariate) Linear Regression and Correlation Ahmadi, Inc. is a microcomputer producer. The following data represent Ahmadi's yearly sales volume and their advertising expenditure over a period of 8 years. (Y) (X) Sales Advertising Year (In $1,000,000) (In $10,000) 1996 15 32 1997 16 33 1998 18 35 1999 17 34 2000 16 36 2001 19 37 2002 19 39 2003 24 42 a. Develop a scatter diagram of sales versus advertising. b. Use the method of least squares to compute an estimated regression line between sales and

advertising. c. If the company's advertising expenditure is $400,000, what is the predicted sales? Give

the answer in dollars. d. What does the slope of the estimated regression line indicate? e. Compute the coefficient of determination and fully interpret its meaning. f. Use the F test to determine whether or not the regression model is significant. Let α = 0.05. g. Use the t test to determine whether the slope of the regression model is significant. Let α = 0.05 h. Explain the basic assumptions about the error term in regression. i. Develop a 95% confidence interval for predicting the average sales for the years when

$400,000 was spent on advertising. j. Use Excel and solve the above problems. k. Using Excel determine the regression equation between sales an time (where 1996 = 1).

Professor Ahmadi’s Lecture Notes Page 4

Page 5: Department of Managementweb2.utc.edu/~fhq532/212/lecture/ch14_16.pdf · Department of Management ... Develop a scatter diagram of sales versus advertising. ... relationship between

Chapters 15 and 16 Formulas Multiple Regression Model: y = β0 + β1x1 + β2x2 + . . . βpxp + ε Multiple Regression Equation: E(y) = β0 + β1x1 + β2x2 + . . . βpxp Estimated Regression Equation: y$ = b0 + b1x1 + b2x2 + . . . + bpxp Multiple Coefficient of Determination:

SSTSSRR 2 = Also

SSTSSE1R 2 −=

Adjusted Multiple Coefficient of Determination:

R 2a = 1 - (1 - R2)(

nn p

−− −

11 )

F Statistic for Determining When to Add or Delete x2:

F =

SSE x SSE x x

SSE x xn p

( ) ( , )

( , )

1 1

1 2

1

1

− −

2

General F Test for Adding or Deleting Variables:

F =

SSE x x x SSE x x x x xp q

SSE x x x x xn p

q q q

q q p

( , ,..., ) ( , ... , ... )

( , ,..., , ,..., )

1 2 1 2 1

1 2 1

1

− p+ + + +

− −

+

+

t Test for significance of individual coefficients in Linear Regression

0:0:

≠=

ia

io

HH

ββ

For i=1, 2, 3,…p

t statistic: ib

i

sb

=t where iis the estimated Standard Deviation of bi bs

Decision Rule:

Reject Ho if t t< − α 2 or: t t> α 2 (degrees of freedom = n – p – 1)

Using the p-value approach: Reject Ho if p-value < α

Professor Ahmadi’s Lecture Notes Page 5

Page 6: Department of Managementweb2.utc.edu/~fhq532/212/lecture/ch14_16.pdf · Department of Management ... Develop a scatter diagram of sales versus advertising. ... relationship between

F Test for Significance of the Linear Regression Model (ANOVA)

0...:H 2o 1

=== pβββ (i.e., the regression model is NOT significant) :Ha At least one of the coefficients is significantly different from zero (the regression model IS

significant) ANOVA Source of Sum of Degrees of Mean Test Statistic Variation Squares Freedom Square F Regression SSR p MSR MSR MSE Error (Residual) SSE n - p - 1 MSE Total SST n - 1 Where: p = Number of independent variables n = The sample size

Decision Rule:

Reject Ho if the Test statistic F > Critical Fα

Using the p-value approach: Reject Ho if p-value < α

Professor Ahmadi’s Lecture Notes Page 6

Page 7: Department of Managementweb2.utc.edu/~fhq532/212/lecture/ch14_16.pdf · Department of Management ... Develop a scatter diagram of sales versus advertising. ... relationship between

Chapter 15

Problem 1 Introduction to Multiple Regression and Correlation

Ahmadi, Inc. is a microcomputer producer. The following data represent Ahmadi's yearly sales volume, their advertising expenditure, and the number of individuals in the sales force over a period of 15 years: (Y) X1 X2 X3 Sales Advertising Sales Force Time Year ($1,000,000) ($10,000) (100) 1989 15 32 10 1 1990 16 33 12 2 1991 18 35 11 3 1992 17 34 14 4 1993 16 36 16 5 1994 19 37 18 6 1995 19 39 17 7 1996 24 42 20 8 1997 25 44 25 9 1998 27 40 22 10 1999 30 45 27 11 2000 33 50 28 12 2001 38 49 30 13 2002 40 50 30 14 2003 45 55 35 15 a. Using Excel, enter the above data in a file and save the file. Print the file as well as the

results of all of the following parts. b. Run the correlation analysis relating sales (Y) and all of the independent variables. (Do

not include the column of Year.) Explain the results. Discuss the concept of multicollinearity.

c. Run the Regression analyses relating sales (Y) and advertising (X1). Explain the results. d. Run a regression analysis relating sales (Y) and two independent variables X1 and X2.

Explain the results. e. Use an F test (α = 0.05) to determine if variable X2 contributes significantly to the model.

(Topic from Chapter Sixteen section 16.2) f. Run a regression analysis relating sales (Y) and two independent variables X1 and X3.

Explain the results. g. Using the model developed in part "f", predict sales for 2004 assuming we are planning to

advertise $700,000. h. Run a regression analysis relating sales (Y) and Time (X3). Explain the results. i. Using the model developed in part "h" predict sales for 2008. j. Run a regression analysis relating sales (Y) and three independent variables X1, X2, and

X3. Explain the results.

Professor Ahmadi’s Lecture Notes Page 7

Page 8: Department of Managementweb2.utc.edu/~fhq532/212/lecture/ch14_16.pdf · Department of Management ... Develop a scatter diagram of sales versus advertising. ... relationship between

Problem 2 Interpretation of Coefficients and Other Issues in Multiple Regression

A multiple regression model relating the price of Rawlston, Inc. stock (Y), the number of shares of the company's stocks sold (X1 in 100s), and the volume of exchange on the New York Stock Exchange (X2 in millions) was developed and part of the results are shown below. ANOVA

df SS MS F Significance F Regression 2 118.8474 59.4237 40.9216 0.0000 Residual 9 13.0692 1.4521 Total 11 131.9167

Coefficients Standard Error t Stat P-value Intercept 118.5059 33.5753 3.5296 0.0064 X1 -0.0163 0.0315 -0.5171 0.6176 X2 -1.5726 0.3590 -4.3807 0.0018 a. Use the output shown above and write an equation that can be used to predict the price of the stock. b. Interpret the coefficients of the estimated regression equation. c. At 95% confidence, determine which variables are significant and which are not. d. At 95% confidence, test to determine if the regression model represents a significant

relationship between the independent variables and the dependent variable. e. If in a given day, the number of shares of stock that were sold was 94,500 and the volume of

exchange on the New York Stock Exchange was 16 million, what would you expect the price of the stock to be?

Professor Ahmadi’s Lecture Notes Page 8

Page 9: Department of Managementweb2.utc.edu/~fhq532/212/lecture/ch14_16.pdf · Department of Management ... Develop a scatter diagram of sales versus advertising. ... relationship between

Problem 3 Multiple Regression and Qualitative Independent Variables

The following data is part of a sample taken from the mortality tables of a life insurance company. Data provide information on how life expectancy (dependent variable Y) relates to two independent variables: weight (X1 in pounds) and whether or not the individual is a smoker (X2), where:

⎪⎩

⎪⎨

smoker a is individual theif 1

nonsmoker a is individual theif 0=x2

Age (Y)

Weight (X1)

Smoker (X2)

59 253 1 93 180 0 70 201 1 60 268 1 70 152 0 . . .

etc. etc. etc. The results of regression analysis, relating Y to X1 and X2 is shown below.

Regression Statistics Multiple R 0.5983 R Square 0.3580 Adjusted R Square 0.3373 Standard Error 8.5599 Observations 65 ANOVA

df SS MS F Significance F Regression 2 2533.19 1266.60 17.29 0.0000Residual 62 4542.87 73.27 Total 64 7076.06

Coefficients Standard Error t Stat P-value Intercept 92.8770 4.0964 22.6729 0.0000 Weight -0.0623 0.0247 -2.5208 0.0143 Smoker -6.2675 2.9096 -2.1541 0.0351

Professor Ahmadi’s Lecture Notes Page 9

Page 10: Department of Managementweb2.utc.edu/~fhq532/212/lecture/ch14_16.pdf · Department of Management ... Develop a scatter diagram of sales versus advertising. ... relationship between

a. Use the output shown above and write the regression equation. b. Interpret the coefficients of the estimated regression equation. c. At 95% confidence, determine which variables are significant and which are not. d. At 95% confidence, test to determine if the regression model represents a significant

relationship between the independent variables and the dependent variable. e. Predict the life expectancy of a nonsmoker who weighs 150 pounds. f. Predict the life expectancy of a person who smokes 1 pack of cigarettes per day and weighs

150 pounds. g. Predict the life expectancy of a person who smokes 3 packs of cigarettes per day and weighs

150 pounds.

Professor Ahmadi’s Lecture Notes Page 10

Page 11: Department of Managementweb2.utc.edu/~fhq532/212/lecture/ch14_16.pdf · Department of Management ... Develop a scatter diagram of sales versus advertising. ... relationship between

Chapter 16

Problem 1 Curvilinear Regression

Monthly total production costs and the number of units produced at a local company over a period of 10 months are shown below. Production Costs (Yi) Units Produced (Xi) Month (in $ millions) (in millions) Z = X2 1 1 2 4 2 1 3 9 3 1 4 16 4 2 5 25 5 2 6 36 6 4 7 49 7 5 8 64 8 7 9 81 9 9 10 100 10 12 10 100 a. Using Excel, enter the above data in a file and save the file. b. Draw a scatter diagram relating X & Y. c. Perform a regression and correlation analysis relating X & Y. d. Draw a scatter diagram relating X2 & Y). e. If we can assume that a model in the form of: Y = β0 + β1X2 + ε best describes the relationship between X and Y, Perform a regression and correlation

analysis between X2 & Y. f. Compare the results of parts “c” and “d” and explain which would be a better model and

why?

Professor Ahmadi’s Lecture Notes Page 11

Page 12: Department of Managementweb2.utc.edu/~fhq532/212/lecture/ch14_16.pdf · Department of Management ... Develop a scatter diagram of sales versus advertising. ... relationship between

Chapter 16

Problem 2 Multiple Regression & Correlation With Dummy Variables

Fill in the Blanks

Ahmadi, Inc. is a microcomputer producer. The following data represent Ahmadi's yearly sales volume, their advertising expenditure, and whether in a given year they used all Television advertising (X2 = 0) or used Multimedia advertising (X2 = 1). (Y) X1 X2 Sales Advertising Dummy Variable Year ($1,000,000) ($10,000) (0,1) 1989 15 32 0 1990 16 33 1 1991 18 35 1 1992 17 34 1 1993 16 36 0 1994 19 37 1 1995 19 39 0 1996 24 42 0 1997 25 44 1 1998 27 40 0 1999 30 45 1 2000 33 50 1 2001 38 49 0 2002 40 50 0 2003 45 55 1 Regression procedure of Excel was used on the above data and parts of the results are shown on the next page. a. Fill in all the blanks on the next page. b. Write the estimated regression equation. c. Using the results shown on the next page, predict sales for the year 2004 assuming we are

planning to use $700,000 for television advertising only. d. Using the results shown on the next page, predict sales for the year 2004 assuming we are

planning to use $700,000 for multimedia advertising.

Professor Ahmadi’s Lecture Notes Page 12

Page 13: Department of Managementweb2.utc.edu/~fhq532/212/lecture/ch14_16.pdf · Department of Management ... Develop a scatter diagram of sales versus advertising. ... relationship between

SUMMARY OUTPUT Multiple R ___________? R Square ___________? Adjusted R Square ___________? Standard Error 2.715 Observations ___________? ANOVA df SS MS F Significance FRegression ___________? 1243.274 ___________? ___________? 8.59E-08 Residual ___________? ___________? ___________? Total ___________? ___________? Coefficients Standard Error t Stat P-value Intercept -28.462401 4.285592715 ___________? ___________? Advertising 1.31332227 0.10113336 ___________? ___________? Dummy -0.8296375 1.406029116 ___________? ___________?

Professor Ahmadi’s Lecture Notes Page 13

Page 14: Department of Managementweb2.utc.edu/~fhq532/212/lecture/ch14_16.pdf · Department of Management ... Develop a scatter diagram of sales versus advertising. ... relationship between

Your Turn – One Final Example

Significance of Variables and Other Issues 3. Ahmadi, Inc. produces several models of computer printers. Data on a few variables for one of the company’s printers are presented below.

Sales (Y) (In $1,000,000)

Advertising (X1) (In $1,000)

Price (X2) (In $100)

Competitor's Price (X3) (In $100)

Time (X4) (In Years)

Rating (X5) (0 to 10)

1578 588 21 20 1 4 1741 600 20 22 2 2 2295 600 17 19 3 4 2134 780 21 21 4 8 2035 750 21 21 5 6 2408 820 19 21 6 8 2337 810 20 20 7 8 2468 840 25 22 8 6 2533 700 25 24 9 8 2800 970 16 18 10 8 2729 920 15 21 11 6 2799 950 24 23 12 6 3264 980 17 23 13 6 3367 1167 19 17 14 4 3289 800 12 18 15 6 3453 1255 17 16 16 6 5031 1706 17 25 17 8 6125 1890 12 26 18 8 6519 1996 17 28 19 8 4586 1700 15 18 20 10 4876 1706 21 24 21 4 4675 1888 14 23 22 6 3473 1300 19 24 23 10 3669 1500 18 21 24 8 4167 1400 24 23 25 4

a. Enter the above data into an Excel file and save the file. Print the file and the results of

all of the following parts. b. Run a correlation analysis (among all variables) and print the results. Fully discuss the

meaning of the correlation coefficients. Be sure to discuss the concept of multicollinearity.

c. Run a regression analysis relating sales (Y) and ALL the independent variables. Fully explain the results.

d. Drop the variable(s) that at 95% confidence were not significant in part “c” and run a new regression analysis. Fully explain your results.

Professor Ahmadi’s Lecture Notes Page 14