Chapter 6 Multiple Regression I - National University of ...
Transcript of Chapter 6 Multiple Regression I - National University of ...
Chapter 6
Multiple Regression I
許湘伶
Applied Linear Regression Models(Kutner, Nachtsheim, Neter, Li)
hsuhl (NUK) LR Chap 6 1 / 44
Multiple regression analysis is one of the most widely used of allstatistical methods.
a variety(變化) of multiple regression models
basic statistical results in matrix form
hsuhl (NUK) LR Chap 6 2 / 44
6.1 Multiple Regression Models
First-Order Models with Two Predictor Variables
Two predictor variables: X1, X2
Yi = β0 + β1Xi1 + β2Xi2 + εi
first-order model with two predictor variables: linear in thepredictor variables
Assume E{εi} = 0:
E{Y} = β0 + β1X1 + β2X2
called a regression surface(迴歸曲面) or a response surface(反應曲面)
hsuhl (NUK) LR Chap 6 3 / 44
6.1 Multiple Regression Models
Figure : Response Function is a plane-Sales Promotion Example.
hsuhl (NUK) LR Chap 6 4 / 44
6.1 Multiple Regression Models
Meaning of Regression Coefficients
β0: intercept in the regression plane; X1 = 0,X2 = 0
β1: the change in the mean response with4X1 = 1 andX2=constant; ∂E{Y}
∂X1= β1
β2: the change in the mean response with X1=constant and4X2 = 1; ∂E{Y}
∂X2= β2
β1, β2: partial regression coefficients ∵ they reflect the partialeffect
X1, X2: additive effect(相加效應) or not to interact
Yi = β0 + β1Xi1 + β2Xi2 + εi: additive or do not interact
hsuhl (NUK) LR Chap 6 5 / 44
6.1 Multiple Regression Models
First-Order with More than Two Predictor Variables
The first-order regression model with p− 1 predictor variables:
Yi = β0 + β1Xi1 + β2Xi2 + · · ·+ βp−1Xi,p−1 + εi
= β0 +
p−1∑k=1
βkXik + εi
=
p−1∑k=0
βkXik + εi, where Xi0 ≡ 1
Assume E{εi} = 0, the response function (hyperplane):
⇒ E{Y} = β0 + β1X1 + β2X2 + · · ·+ βp−1Xp−1
hsuhl (NUK) LR Chap 6 6 / 44
6.1 Multiple Regression Models
First-Order with More than Two Predictor Variables (cont.)
A hyperplane: in more than two dimensions; no longer possible topicture
βk: the change in the mean response E{Y} with4Xk = 1 and allother predictor variables are held constant
additive and do not interact
hsuhl (NUK) LR Chap 6 7 / 44
6.1 Multiple Regression Models
General Linear Regression Model
The general linear regression model with normal error terms:
Yi = β0 + β1Xi1 + β2Xi2 + · · ·+ βp−1Xi,p−1 + εi, i = 1, . . . , n
= β0Xi0 + β1Xi1 + β2Xi2 + · · ·+ βp−1Xi,p−1 + εi (Xi0 ≡ 1)
=
p−1∑k=0
βkXik + εi, where Xi0 ≡ 1
parameters: β0, β1, . . . , βp−1
known constants: Xi1, . . . ,Xi,p−1
εi: independent N(0, σ2)
hsuhl (NUK) LR Chap 6 8 / 44
6.1 Multiple Regression Models
Qualitative predictor variables: quantitative(量的), qualitative(質的)
Polynomial regression: X, X2
Transformed variables: log Y
Interaction effects: X1X2
Combination of Cases: X21 , X2
2 ,X1X2
hsuhl (NUK) LR Chap 6 9 / 44
6.1 Multiple Regression Models
Qualitative Predictor Variables
Illustration:Y-the length of hospital stay; X1-age; X2-gender(性別)
Yi = β0 + β1Xi1 + β2Xi2 + εi
Xi1: patient’s age
Xi2:{
1, where patient female0, where patient male
hsuhl (NUK) LR Chap 6 10 / 44
6.1 Multiple Regression Models
Qualitative Predictor Variables (cont.)
The response function: E{Y} = β0 + β1X1 + β2X2
The response function for male patients: X2 = 0E{Y} = β0 + β1X1
The response function for female patients: X2 = 1
E{Y} = (β0 + β2) + β1X1
(Chap. 8)
hsuhl (NUK) LR Chap 6 11 / 44
6.1 Multiple Regression Models
Polynomial Regression (Chap. 8)
Special cases of the general linear regression model
Squared and higher-order terms of the predictor variables⇒curvilinear
Yi = β0 + β1Xi + β2X2i + εi = β0 + β1Xi1 + β2Xi2 + εi
hsuhl (NUK) LR Chap 6 12 / 44
6.1 Multiple Regression Models
Transformed Variables
Complex;
curvilinear response functions
log Yi =β0 + β1Xi1 + β2Xi2 + β3Xi3 + εi (Y ′i = log Yi)
Yi =1
β0 + β1Xi1 + β2Xi2 + εi(Y ′i =
1Yi
)
hsuhl (NUK) LR Chap 6 13 / 44
6.1 Multiple Regression Models
Interaction Effects and Combination of Cases
An example of a nonadditive regression model with X1, X2: (complex)
Yi = β0 + β1Xi1 + β2Xi2 + β3Xi1Xi2 + εi
= β0 + β1Xi1 + β2Xi2 + β3Xi3 + εi, (Xi3 = Xi1Xi2)
By cross-product term:
Yi = β0 + β1Xi1 + β2X2i2 + β3Xi2 + β4X2
i2 + β5Xi1Xi2 + εi
= β0 + β1Zi1 + β2Zi2 + β3Zi3 + β4Zi4 + β5Zi5 + εi
hsuhl (NUK) LR Chap 6 14 / 44
6.1 Multiple Regression Models
Interaction Effects and Combination of Cases (cont.)
Figure : Additional Examples of Response Functions.
hsuhl (NUK) LR Chap 6 15 / 44
6.1 Multiple Regression Models
Meaning of Linear in General Linear Regression Model
A regression model is linear in the parameters:
Yi = ci0β0 + ci1β1 + ci2β2 + · · ·+ ci,p−1βp−1 + εi
where cik, k = 0, . . . , p− 1 are coefficients involving thepredictor variables
Illustration: nonlinear
Yi = β0 exp(β1Xi) + εi
hsuhl (NUK) LR Chap 6 16 / 44
6.2 General Linear Regression Model in Matrix Terms
General Linear Regression Model in Matrix Terms
The general linear regression model:
Yi = β0 + β1Xi1 + β2Xi2 + · · ·+ βp−1Xi,p−1 + εi, i = 1, . . . , n
Yn×1
=
Y1
Y2...
Yn
Xn×p
=
1 X11 X12 · · · X1,p−1
1 X21 X22 · · · X2,p−1...
......
...1 Xn1 Xn2 · · · Xn,p−1
βp×1
=
β1
β2...
βp−1
εn×1
=
ε1
ε2...εn
hsuhl (NUK) LR Chap 6 17 / 44
6.2 General Linear Regression Model in Matrix Terms
General Linear Regression Model in Matrix Terms (cont.)
⇒ Yn×1
= Xn×pβ
p×1+ ε
n×1
Y: vector of responses;
β: vector of parameters
X: matrix of constants
ε: vector of independent normal random variables;E{ε} = 0; σ2{ε}
n×n= σ2I
Expectation and variance-covariance matrix of Y:
E{Y}n×1
= Xβ; σ2{Y}n×1
= σ2I
hsuhl (NUK) LR Chap 6 18 / 44
6.3 Estimation of Regression Coefficients
Estimation of Regression Coefficients
The general linear regression model:
Yi = β0 + β1Xi1 + β2Xi2 + · · ·+ βp−1Xi,p−1 + εi, i = 1, . . . , n
The least squares criterion: the values β0, . . . , βp−1 minimize Q
Q =n∑
i=1
(Yi − β0 − β1Xi1 − · · · − βp−1Xi,p−1)2
hsuhl (NUK) LR Chap 6 19 / 44
6.3 Estimation of Regression Coefficients
Estimation of Regression Coefficients (cont.)
the vector of the least squares estimated regression coefficients:
The normal equations:
bp×1
=
b0
b1...
bp−1
X′Xb = X′Y
LSE:
bp×1
= (X′X)−1
p×pX′p×1
Y
hsuhl (NUK) LR Chap 6 20 / 44
6.3 Estimation of Regression Coefficients
Estimation of Regression Coefficients (cont.)
The method of MLE leads to the same estimators of normal errorregression model
The likelihood function:
L(β, σ2) =1
(2πσ2)n2
exp
[− 1
2σ2
n∑i=1
(Yi − β0 − β1Xi1 − · · · − βp−1Xi,p−1)2
]
Maximizing the likelihood function with respect to β0, β1, . . . , βp−1
leads to the estimators b
hsuhl (NUK) LR Chap 6 21 / 44
6.4 Fitted Values and Residuals
Fitted Values and Residuals
The vector of Yi and the vector of ei = Yi − Yi:
Yn×1
=
Y1
Y2...
Yn
= Xb = HY ( Hn×n
= X(X′X)−1X′)
en×1
=
e1
e2...
en
= Y − Y = Y − Xb = (I −H)Y(= (I −H)ε
)
hsuhl (NUK) LR Chap 6 22 / 44
6.4 Fitted Values and Residuals
Fitted Values and Residuals (cont.)
the variance-covariance matrix of the residuals:
σ2{e}n×n
= σ2(I −H)
⇒ (estimated by) s2{e}n×n
= MSE(I −H)
hsuhl (NUK) LR Chap 6 23 / 44
6.5 Analysis of Variance Results
Analysis of Variance Results
The sums of squares for ANOVA in matrix terms:
SSTO = Y′Y −(
1n
)Y′JY = Y′
[I − (1
n)J]
Y
SSE = e′e = (Y − Xb)′(Y − Xb) = Y′Y − b′X′Y = Y′(I −H)Y
SSR = b′X′Y −(
1n
)Y′JY = Y′
[H −
(1n
)J]
Y
MSR =SSR
p− 1
MSE =SSE
n− p
hsuhl (NUK) LR Chap 6 24 / 44
6.5 Analysis of Variance Results
Analysis of Variance Results (cont.)
Table : ANOVA Table for General Linear Regression Model Model(6.19).
Source of Variation SS df MSRegression SSR = b′X′Y − (1
n)Y′JY p− 1 MSR = SSRp−1
Error SSE = Y′Y − b′X′Y n− p MSE = SSEn−p
Total SSTO = Y′Y − (1n)Y′JY n− 1
E{MSE} = σ2
p− 1 = 2:
E{MSR} = σ2 +12
[β2
1
∑(Xi1 − X1)2 + β2
2
∑(Xi2 − X2)2
+ 2β1β2
∑(Xi1 − X1)(Xi2 − X2)
]if β1 = 0 = β2 ⇒ E{MSR} = σ2, otherwise E{MSR} > σ2
hsuhl (NUK) LR Chap 6 25 / 44
6.5 Analysis of Variance Results
F Test for Regression Relation
Test whether there is a regression relation between Y and X:
H0 : β1 = β2 = · · · = βp−1 = 0
Ha : not all βk (k = 1, . . . , p− 1) equal zero
⇒ test statistic: F∗ =MSRMSE
The decision rule to control the Type I error at α:
If F∗ ≤ F(1− α; p− 1, n− p), conclude H0
If F∗ > F(1− α; p− 1, n− p), concludeHa
hsuhl (NUK) LR Chap 6 26 / 44
6.5 Analysis of Variance Results
Coefficient of Multiple Determination
The coefficient of multiple determination: R2
R2 =SSR
SSTO= 1− SSE
SSTO
Measures the proportionate reduction of total variation in Yassociated with X1, . . . ,Xp−1
0 ≤ R2 ≤ 1
Adding more X variable⇒ R2 ↑SSE can never become larger with more X variablesSSTO is always the same of a given set of responses
hsuhl (NUK) LR Chap 6 27 / 44
6.5 Analysis of Variance Results
Coefficient of Multiple Determination (cont.)
The adjusted coefficient of multiple determination: R2a
R2a = 1−
SSEn−pSSTOn−1
= 1−(
n− 1n− p
)SSE
SSTO
R2a becomes smaller when another X is introduce into the model
(∵ SSE ↓)
Coefficient of Multiple Correlation: R
R =√
R2
hsuhl (NUK) LR Chap 6 28 / 44
6.6 Inferences about Regression Parameters
Inferences about Regression Parameters
Unbiased: E{b} = βThe variance-covariance matrix σ2{b}:
σ2{b}p×p
=
σ2{b0} σ{b0, b1} · · · σ{b0, bp−1}σ{b1, b0} σ2{b1} · · · σ{b1, bp−1}...
......
σ{bp−1, b0} σ{bp−1, b1} · · · σ2{bp−1}
= σ2{b}
p×p= σ2(X′X)−1
hsuhl (NUK) LR Chap 6 29 / 44
6.6 Inferences about Regression Parameters
Inferences about Regression Parameters (cont.)
The estimated variance-covariance matrix s2{b}:
s2{b}p×p
=
s2{b0} s{b0, b1} · · · s{b0, bp−1}s{b1, b0} s2{b1} · · · s{b1, bp−1}...
......
s{bp−1, b0} s{bp−1, b1} · · · s2{bp−1}
= MSE(X′X)−1
hsuhl (NUK) LR Chap 6 30 / 44
6.6 Inferences about Regression Parameters
Interval Estimation of βk
For the normal error regression model:
bk − βk
s{bk}∼ t(n− p) k = 0, 1, . . . , p− 1
The confidence limits for βk with 1− α confidence coefficient:
bk ± t(1− α/2; n− p)s{bK}
hsuhl (NUK) LR Chap 6 31 / 44
6.6 Inferences about Regression Parameters
Interval Estimation of βk (cont.)
Tests for βk:
H0 :βk = 0 Ha : βk 6= 0
⇒t∗ =bk
s{bk}
⇒ The decision rule:
If |t∗| ≤ t(1− α/2; n− p), conclude H0
Otherwise conclude Ha
hsuhl (NUK) LR Chap 6 32 / 44
6.6 Inferences about Regression Parameters
Joint Inferences
The Bonferroni joint confidence intervals for g parameters with1− α: (g ≤ p)
bk ± Bs{bk}B = t(1− α/2g; n− p)
(Chap. 7: tests concerning subsets of the regression parameters)
hsuhl (NUK) LR Chap 6 33 / 44
6.7 Estimate of Mean Response and Prediction of New Observation
Interval estimation of E{Yh}
Given values of X1, . . . ,Xp−1: Xh,1, . . . ,Xh,p−1
Xhp×1
=
1
Xh1...
Xh,p−1
The mean response E{Yh}:
E{Yh} = X′hβ
hsuhl (NUK) LR Chap 6 34 / 44
6.7 Estimate of Mean Response and Prediction of New Observation
Interval estimation of E{Yh} (cont.)
The estimated mean response Yh:
Yh = X′hb⇒ E{Yh} = X′hβ = E{Yh} (Unbiased)
σ2{Yh} = σ2X′h(X′X)−1Xh = σ2{Yh} = X′hσ
2{b}Xh
The estimated variance s2{Yh}:
s2{Yh} = MSE(X′h(X′X)−1Xh) = X′hs2{b}Xh
The 1− α confidence limits for E{Yh}:
Yh ± t(1− α/2; n− p)s{Yh}
hsuhl (NUK) LR Chap 6 35 / 44
6.7 Estimate of Mean Response and Prediction of New Observation
Confidence region for regression surface
The Working-Hotelling confidence band for the regression line
Boundary points of the confidence region at Xh:
Yh ±Ws{Yh}W2 = pF(1− α; p, n− p)
hsuhl (NUK) LR Chap 6 36 / 44
6.7 Estimate of Mean Response and Prediction of New Observation
Simultaneous Confidence Intervals for Several Mean Responses
Estimate of E{Yh} corresponding to different Xh vectors with1− α:
Working-Hotelling confidence region bounds:
Yh ±Ws{Yh}
Bonferroni simultaneous confidence intervals: (g intervalestimates)
Yh ± Bs{Yh}B = t(1− α/2g; n− p)
hsuhl (NUK) LR Chap 6 37 / 44
6.7 Estimate of Mean Response and Prediction of New Observation
Prediction of New Observation Yh(new)
The 1− α prediction limits for a new observation Yh(new) at Xh:
Yh ± t(1− α/2; n− p)s{pred}where s2{pred} = MSE + s2{Yh} = MSE(1 + X′h(X
′X)−1Xh)
Prediction of mean of m new observations at Xh:
Yh ± t(1− α/2; n− p)s{spredmean}
where s2{spredmean} = MSEm
+ s2{Yh}
= MSE(1m
+ X′h(X′X)−1Xh)
hsuhl (NUK) LR Chap 6 38 / 44
6.7 Estimate of Mean Response and Prediction of New Observation
Predictions of g new observations
g new observations at g different levels Xh with family confidencecoefficient 1− α:
Yh ± Ss{pred}where S2 = gF(1− α; g, n− p)
Bonferroni simultaneous prediction limits:
Yh ± Bs{pred}where B = t(1− α/2g; n− p)
hsuhl (NUK) LR Chap 6 39 / 44
6.7 Estimate of Mean Response and Prediction of New Observation
Caution about Hidden Extrapolations (Chap. 10)
Danger: the model may not be appropriate when it is extendedoutside the region of the observations.In multiple regression, it is particularly easy to lose track of thisregion since the levels of X1, . . . ,Xp−1 jointly define the region
Figure : Region of Observations on X1 and X2 jointly,Compared with Rangesof X1 and X2 Individually.
hsuhl (NUK) LR Chap 6 40 / 44
6.8 Diagnostics and Remedial Measures
Diagnostics and Remedial Measures
Similar to the simple linear regression model given in Chap. 3.
Some important ones: Chap. 10, 11
hsuhl (NUK) LR Chap 6 41 / 44
6.8 Diagnostics and Remedial Measures
Diagnostics and Remedial Measures (cont.)
Figure : SYGRAPH Scatter Plot Matrix and Correlation Matrix-DwaineStudio Example.
Scatter Plots
Residual Plots
Correlation Test for Normality
Test for Constancy of Error Variance
F Test for Lack of Fit
Box-Cox tramsformations
hsuhl (NUK) LR Chap 6 42 / 44
6.8 Diagnostics and Remedial Measures
F test of lack of fit
To test whether the multiple regression response function
E{Y} = β0 + β1X1 + · · ·+ βp−1Xp−1
is an appropriate response surface
Repeat observations
SSE is decomposed into PE and LOF components
hsuhl (NUK) LR Chap 6 43 / 44
6.8 Diagnostics and Remedial Measures
F test of lack of fit (cont.)
Testing:
H0 : E{Y} = β0 + β1X1 + · · ·+ βp−1Xp−1
Ha : E{Y} 6= β0 + β1X1 + · · ·+ βp−1Xp−1
The test statistic:
F∗ =SSLEc− p
÷ SSPEn− c
=MSLFMSPE
⇒ If F∗ ≤ F(1− α; c− p; n− c), conclude H0
If F∗ > F(1− α; c− p; n− c), conclude Ha
hsuhl (NUK) LR Chap 6 44 / 44