Download - BBIVARIATEIVARIATEAND ANDMULTIPLEMULTIPLE REGRESSION …users.unimi.it/baldi/Chap8ENGrev2016(senzaspss).pdf · BBIVARIATEIVARIATEAND ANDMULTIPLEMULTIPLE REGRESSION LEZIONI IN LABORATORIO

Transcript
Page 1: BBIVARIATEIVARIATEAND ANDMULTIPLEMULTIPLE REGRESSION …users.unimi.it/baldi/Chap8ENGrev2016(senzaspss).pdf · BBIVARIATEIVARIATEAND ANDMULTIPLEMULTIPLE REGRESSION LEZIONI IN LABORATORIO

BBIVARIATEIVARIATE ANDAND MULTIPLEMULTIPLE

REGRESSIONREGRESSION

LEZIONI IN LABORATORIO

Corso di MARKETING

L. Baldi

Università degli Studi di Milano

1

REGRESSIONREGRESSION

Estratto dal Cap. 8 di:“Statistics for Marketing and Consumer Research”,

M. Mazzocchi, ed. SAGE, 2008.

Page 2: BBIVARIATEIVARIATEAND ANDMULTIPLEMULTIPLE REGRESSION …users.unimi.it/baldi/Chap8ENGrev2016(senzaspss).pdf · BBIVARIATEIVARIATEAND ANDMULTIPLEMULTIPLE REGRESSION LEZIONI IN LABORATORIO

BIVARIATE LINEAR REGRESSION

i i iy xα β ε= + +Dependent variable

Intercept(Random) error term

Explanatory variable

� Causality (from x to y) is assumed

� The error term embodies anything which is not accounted for by the linear relationship

� The unknown parameters (αααα and ββββ) need to be estimated (usually on sample data). We refer to the sample parameter estimates as a and b

2

Regression coefficientExplanatory variable

Page 3: BBIVARIATEIVARIATEAND ANDMULTIPLEMULTIPLE REGRESSION …users.unimi.it/baldi/Chap8ENGrev2016(senzaspss).pdf · BBIVARIATEIVARIATEAND ANDMULTIPLEMULTIPLE REGRESSION LEZIONI IN LABORATORIO

3

IL MODELLO DI REGRESSIONE LINEARE

SEMPLICE

Metodo dei minimi quadrati ordinari

(OLS):

Tecnica per individuare l’equazione della retta che

minimizza la somma totale dei quadrati delle deviazioni

(errori) tra dati osservati e punti sulla retta.

§

Page 4: BBIVARIATEIVARIATEAND ANDMULTIPLEMULTIPLE REGRESSION …users.unimi.it/baldi/Chap8ENGrev2016(senzaspss).pdf · BBIVARIATEIVARIATEAND ANDMULTIPLEMULTIPLE REGRESSION LEZIONI IN LABORATORIO

TO STUDY IN DETAIL:

LEAST SQUARES ESTIMATION OF THE

UNKNOWN PARAMETERS

� For a given value of the parameters, the error (residual)

term for each observation is

The least squares parameter estimates are those who

i i ie y a bx= − −� The least squares parameter estimates are those who

minimize the sum of squared errors:

4

2 2

1 1

( )n n

i i ii i

SSE y a bx e= =

= − − =∑ ∑

Page 5: BBIVARIATEIVARIATEAND ANDMULTIPLEMULTIPLE REGRESSION …users.unimi.it/baldi/Chap8ENGrev2016(senzaspss).pdf · BBIVARIATEIVARIATEAND ANDMULTIPLEMULTIPLE REGRESSION LEZIONI IN LABORATORIO

PREDICTION

� Once a and b have been estimated, it is possible

to predict the value of the dependent variable for

any given value of the explanatory variable

Example: change in price x, what happens in

consumption y?

5

ˆ j jy a bx= +

Page 6: BBIVARIATEIVARIATEAND ANDMULTIPLEMULTIPLE REGRESSION …users.unimi.it/baldi/Chap8ENGrev2016(senzaspss).pdf · BBIVARIATEIVARIATEAND ANDMULTIPLEMULTIPLE REGRESSION LEZIONI IN LABORATORIO

MODEL EVALUATION

�An evaluation of the model performance can be based on the residuals ( ), which provide information on the capability of the model predictions to fit the original data (goodness-of-fit)

�Since the parameters a and b are estimated on the

ii yy ˆ−

�Since the parameters a and b are estimated on the sample, just like a mean, they are accompanied by the standard error of the parameters, which measures the precision of these estimates and depends on the sampling size.

�Knowledge of the standard errors opens the way to run hypothesis testing. 6

Page 7: BBIVARIATEIVARIATEAND ANDMULTIPLEMULTIPLE REGRESSION …users.unimi.it/baldi/Chap8ENGrev2016(senzaspss).pdf · BBIVARIATEIVARIATEAND ANDMULTIPLEMULTIPLE REGRESSION LEZIONI IN LABORATORIO

HYPOTHESIS TESTING ON REGRESSION

COEFFICIENTS

� T-test on each of the individual coefficients • Null hypothesis: the corresponding population

coefficient is zero.

• The p-value allows one to decide whether to reject or not the null hypothesis that coeff.=zero, (usually p<0.05reject the null hyp.)

• This means that significant coefficient is when t-value is about greater than 2

7

Page 8: BBIVARIATEIVARIATEAND ANDMULTIPLEMULTIPLE REGRESSION …users.unimi.it/baldi/Chap8ENGrev2016(senzaspss).pdf · BBIVARIATEIVARIATEAND ANDMULTIPLEMULTIPLE REGRESSION LEZIONI IN LABORATORIO

COEFFICIENT OF DETERMINATION R2

The natural candidate for measuring how well the model fits the data is the coefficient of determination, which varies between zero (when the model does not explain any of the variability of the dependent variable) and 1 (when the model fits the data perfectly)

10 2 ≤≤ R

8

Definition: A statistical measure of the ‘goodness of fit’ in a regression equation. It gives the proportion of the total variance of the forecasted variable that is explained by the fitted regression equation, i.e. the independent explanatory variables.

10 ≤≤ R

Page 9: BBIVARIATEIVARIATEAND ANDMULTIPLEMULTIPLE REGRESSION …users.unimi.it/baldi/Chap8ENGrev2016(senzaspss).pdf · BBIVARIATEIVARIATEAND ANDMULTIPLEMULTIPLE REGRESSION LEZIONI IN LABORATORIO

MULTIPLE REGRESSION

The principle is identical to bivariate

regression, but there are more

explanatory variables

9

0 1 1 2 2 ...i i i k ki iy x x xα α α α ε= + + + + +

Page 10: BBIVARIATEIVARIATEAND ANDMULTIPLEMULTIPLE REGRESSION …users.unimi.it/baldi/Chap8ENGrev2016(senzaspss).pdf · BBIVARIATEIVARIATEAND ANDMULTIPLEMULTIPLE REGRESSION LEZIONI IN LABORATORIO

ADDITIONAL ISSUES:

Collinearity (or multicollinearity) problem:

� The independent variables must be also independent of each other.

� Otherwise we could run into some double-� Otherwise we could run into some double-counting problem and it would become very difficult to separate the meaning.

• Inefficient estimates

• Apparently good model but poor forecasts

10

Page 11: BBIVARIATEIVARIATEAND ANDMULTIPLEMULTIPLE REGRESSION …users.unimi.it/baldi/Chap8ENGrev2016(senzaspss).pdf · BBIVARIATEIVARIATEAND ANDMULTIPLEMULTIPLE REGRESSION LEZIONI IN LABORATORIO

GOODNESS-OF-FIT

�The coefficient of determination R2 always

increases with the inclusion of additional

regressors

�Thus, a proper indicator is the adjusted adjusted

RRRR22 which accounts for the number of

explanatory variables (k) in relation to the

number of observations (n)

11

2 2 -11 (1 )

- -1

nR R

n k= − − 10 2 ≤≤ R

Page 12: BBIVARIATEIVARIATEAND ANDMULTIPLEMULTIPLE REGRESSION …users.unimi.it/baldi/Chap8ENGrev2016(senzaspss).pdf · BBIVARIATEIVARIATEAND ANDMULTIPLEMULTIPLE REGRESSION LEZIONI IN LABORATORIO

con:

cons_elett= consumi di energia per

condizionamento.

Tmax= temperatura massima registrata

Tmin= temperatura minima registrata

velvento= velocità del vento (maggiore o

minore di 6 nodi)

nuvole=grado di copertura delle nuvole

_______________________________________________________ applicazione della regressione multivariata con EXCEL

FILE:esregress.xls

obs. cons_elett Tmax Tmin velvento nuvole1 45 87 68 1 2,02 73 90 70 1 1,03 43 88 68 1 1,04 61 88 69 1 1,55 52 86 69 1 2,06 56 91 75 1 2,07 70 91 76 1 1,58 69 90 73 1 2,09 53 79 72 0 3,010 51 76 63 0 0,011 39 83 57 0 0,012 55 86 61 1 1,013 55 85 70 1 2,014 57 89 69 0 2,015 68 88 72 1 1,516 73 85 73 0 3,017 57 84 68 1 3,018 51 83 69 0 2,019 55 81 70 0 1,020 56 89 70 1 1,521 72 88 69 1 0,0

12

21 72 88 69 1 0,022 73 88 76 1 2,523 69 77 66 1 3,024 38 75 65 1 2,525 50 72 64 1 3,026 37 68 65 1 3,027 43 71 67 0 3,028 42 75 66 1 3,029 25 74 52 1 0,030 31 77 51 0 0,031 31 79 50 0 0,032 32 80 50 0 0,033 35 80 53 0 0,034 32 81 53 1 0,035 34 80 53 0 0,036 35 81 54 1 2,037 41 83 67 0 2,038 51 84 67 1 1,539 34 80 63 1 3,040 19 73 53 1 1,041 19 71 49 0 0,042 30 72 56 1 3,043 23 72 53 1 0,044 35 79 48 1 0,045 29 84 63 1 1,046 55 74 62 0 3,047 56 83 72 1 2,5

Page 13: BBIVARIATEIVARIATEAND ANDMULTIPLEMULTIPLE REGRESSION …users.unimi.it/baldi/Chap8ENGrev2016(senzaspss).pdf · BBIVARIATEIVARIATEAND ANDMULTIPLEMULTIPLE REGRESSION LEZIONI IN LABORATORIO

OUTPUT RIEPILOGO

Statistica della regressioneR multiplo 0,856R al quadrato 0,732R al quadrato corretto 0,707Errore standard 8,341Osservazioni 47 ANALISI VARIANZA

gdl SQ MQ F Significatività FRegressione 4 7997,08 1999,27 28,74 0,00000Residuo 42 2921,90 69,57Totale 46 10918,98

13

CoefficientiErrore

standard Stat tValore di

significativitàInferiore

95%Superiore

95%Inferiore 95,0%

Superiore 95,0%

Intercetta -85,05 16,56 -5,14 0,0000 -118,46 -51,64 -118,46 -51,64Tmax 0,62 0,32 1,95 0,0574 -0,02 1,25 -0,02 1,25Tmin 1,31 0,30 4,45 0,0001 0,72 1,91 0,72 1,91velvento -1,96 2,71 -0,72 0,4735 -7,42 3,51 -7,42 3,51nuvole -0,19 1,75 -0,11 0,9160 -3,71 3,34 -3,71 3,34

Page 14: BBIVARIATEIVARIATEAND ANDMULTIPLEMULTIPLE REGRESSION …users.unimi.it/baldi/Chap8ENGrev2016(senzaspss).pdf · BBIVARIATEIVARIATEAND ANDMULTIPLEMULTIPLE REGRESSION LEZIONI IN LABORATORIO

Confronto tra valori reali e stimati con il modello di regressione multipla

40

50

60

70

80

14

0

10

20

30

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47

Previsto cons_elett cons_elett