Multiple Linear Regression: Inference &...

- p. 1/16

Statistics 203: Introduction to Regressionand Analysis of Variance

Multiple Linear Regression: Inference &Polynomial

Jonathan Taylor

● Today

● Summary of last class

● R2 for multiple regression

● Adjusted R2

● Inference in multiple

regression

● Testing H0 : β2 = 0


● Some details

● Overall goodness of fit

● Dropping subsets

● General linear hypothesis

● Another fact about

multivariate normal● Polynomial models

● Polynomial models

● Spline models

- p. 2/16

Today

■ Inference: trying to “reduce” model.■ Polynomial regression.■ Splines + other bases.

http://www-stat.stanford.edu/~jtaylo/courses/stats203/R

● Today



● Adjusted R2


regression



● Some details







● Spline models

- p. 3/16

Summary of last class

■

Yn×1 = Xn×pβp×1 + εn×1

■

Y = HY, H = X(XtX)−1Xt

■

e = (I − H)Y

■

‖e‖2 ∼ σ2χ2n−p

■ Generally, if P is a projection onto a subspace L such thatP (Xβ) = 0, then

‖PY ‖2 = ‖P (Xβ + ε)‖2 = ‖Pε‖2 ∼ σ2χ2

dimL.


● Today



● Adjusted R2


regression



● Some details







● Spline models

- p. 4/16

R2 for multiple regression

SSE =n∑

i=1

(Yi − Yi)2 = ‖Y − Y ‖2

SSR =n∑

i=1

(Y − Yi)2 = ‖Y − Y 111‖2

SST =

n∑

i=1

(Yi − Y )2 = ‖Y − Y 111‖2

R2 =SSR

SST


● Today



● Adjusted R2


regression



● Some details







● Spline models

- p. 5/16

Adjusted R2

■ As we add more and more variables to the model – evenrandom ones, R2 will go to 1.

■ Adjusted R2 tries to take this into account by replacing sumsof squares by “mean” squares

R2a = 1 −

SSE/(n − p)

SST/(n − 1)= 1 −

MSE

MST.

■ Here is an example.


http://www-stat.stanford.edu/~jtaylo/courses/stats203/R/multiple/multiple.R.html

● Today



● Adjusted R2


regression



● Some details







● Spline models

- p. 6/16

Inference in multiple regression

■ F -statistics.■ Dropping a subset of variables.■ General linear hypothesis.


● Today



● Adjusted R2


regression



● Some details







● Spline models

- p. 7/16

Testing H0 : β2 = 0

■ Can be tested with a t-test:

T =β2

SE(β2).

■ Alternatively, using an F -test with a “full” and “reduced”model◆ (F) Yi = β0 + β1Xi1 + β2Xi2 + εi

◆ (R) Yi = β0 + β1Xi1 + εi

■ F -statistic: under H0 : β2 = 0

SSEF = ‖Y − YF ‖2 ∼ σ2χ2

n−3

SSER = ‖Y − YR‖2 ∼ σ2χ2

n−2

SSEF − SSER = ‖YF − YR‖2 ∼ σ2χ2

1

and SSEF − SSER is independent of SSEF (see details).


● Today



● Adjusted R2


regression



● Some details







● Spline models

- p. 8/16

Testing H0 : β2 = 0

■ Under H0

F =(SSEF − SSER)/1

SSEF /(n − 3)∼ F1,n−3.

■ Reject H0 at level α if F > F1,n−3,1−α.


● Today



● Adjusted R2


regression



● Some details







● Spline models

- p. 9/16

Some details

■ SSEF ∼ σ2χ2n−3 if the full model is correct, and

SSER ∼ σ2χ2n−2 if H0 is correct because

HF Y = HF (Xβ + ε) = Xβ + HF ε

HRY = HR(Xβ + ε) = Xβ + HRε ( under H0)

If H0 is false SSER is σ2 times a non-central χ2n−2.

■ Why is SSER − SSEF independent of SSEF ?

SSER − SSEF = ‖Y − HRY ‖2 − ‖Y − HF Y ‖2

= ‖HRY − HF Y ‖2 (Pythagoras)

= ‖HRε − HF ε‖2 ( under H0)

(HR − HF )ε is in LF , the subspace of the full model whileeF = (I − HF )ε is in L⊥

F the orthogonal complement of thefull model – therefore eF is independent of (HR − HF )ε.


● Today



● Adjusted R2


regression



● Some details







● Spline models

- p. 10/16

Overall goodness of fit

■ TestingH0 : β1 = β2 = 0.

■ Two models:◆ (F) Yi = β0 + β1Xi1 + β2Xi2 + εi

◆ (R) Yi = β0 + εi

■ F -statistic, under H0:

F =(SSER − SSEF )/2

SSEF /(n − 3)=

‖(HR − HF )Y ‖2/2

‖(I − HF )Y ‖2/(n − 3)∼ F2,n−3.

■ Reject H0 if F > F1−α,2,n−3.■ Details: same as before.


● Today



● Adjusted R2


regression



● Some details







● Spline models

- p. 11/16

Dropping subsets

■ Suppose we have the model

Yi = β0 + β1Xi1 + · · · + βp−1Xi,p−1 + εi

and we want to test whether we can simplify the model bydropping variables, i.e. testing

H0 : βj1 = · · · = βjk= 0.

■ Two models:◆ (F) – above◆ (R) – model with columns Xj1 , . . . , Xjk

omitted from thedesign matrix.

■ Under H0

F =(SSER − SSEF )/(dfR − dfF )

SSEF /dfF∼ FdfR−dfF ,dfF

where dfF and dfR are the “residual” degrees of freedom ofthe two models.


● Today



● Adjusted R2


regression



● Some details







● Spline models

- p. 12/16

General linear hypothesis

■ In previous slide: we had to fit two models, and we mightwant to test more than just whether some coefficients arezero.

■ Suppose we want to test

H0 : Ck×pβp×1 = hk×1

Specifying the reduced model can be difficult.■ Under H0

Cβ − h ∼ N(0, σ2C(XtX)−1Ct

).

■ As long as C(XtX)−1Ct is invertible

(Cβ−h)t(C(XtX)−1Ct

)−1(Cβ−h) = SSER−SSEF ∼ σ2χ2

k.

■ F -statistic

F =(SSEF − SSER)/(dfR − dfF )

SSEF /dfF∼ FdfR−dfF ,dfF

.


● Today



● Adjusted R2


regression



● Some details







● Spline models

- p. 13/16

Another fact about multivariate normal

■ Suppose that Zk×1 ∼ N(0, Σk×k) where Σ is invertible. Then

ZtΣ−1Z ∼ χ2k.

■ Why? Let Σ−1/2 be a square root of Σ−1, i.e. Σ−1/2 is asymmetric matrix such that

Σ−1/2ΣΣ−1/2 = Ik×k

Σ−1/2Σ−1/2 = Σ−1.

■ Then,Σ−1/2Z ∼ N(0, Ik×k)

andZtΣ−1Z = ‖Σ−1/2Z‖2 ∼ χ2

k.


● Today



● Adjusted R2


regression



● Some details







● Spline models

- p. 14/16

Polynomial models

■ So far, we have considered models that are linear in the x’s.■ We could have regression model be linear in known

functions of x: example polynomials.

Yi = β0 + β1Xi + β2X2i + · · · + βkXk

i + εi.

■ Here is an example.


http://www-stat.stanford.edu/~jtaylo/courses/stats203/R/inference+polynomial/inference+polynomial.R.html

● Today



● Adjusted R2


regression



● Some details







● Spline models

- p. 15/16

Polynomial models

■ Caution should be used in degree of polynomial used: it iseasy to overfit the model.

■ Useful when there is reason to believe relation is nonlinear.■ Easy to add polynomials in more than two variables to the

regression: interactions.■ Although polynomials can approximate any continuous

function (Bernstein’s polynomials) there are sometimesbetter bases. For instance, regression model may not bepolynomial, but only “piecewise” polynomial.

■ Design matrix X can become ill-conditioned which cancause numerical problems.


● Today



● Adjusted R2


regression



● Some details







● Spline models

- p. 16/16

Spline models

■ Splines are piecewise polynomials functions, i.e. on aninterval between “knots” (ti, ti+1) the spline f(x) ispolynomial but the coefficients change within each interval.

■ Example: cubic spline with knows at t1 < t2 < · · · < th

f(x) =3∑

j=0

β0jxj +

h∑

i=1

βi(x − ti)3+

where

(x − ti)+ =

{x − ti if x − ti ≥ 0

0 otherwise.

■ Here is an example.■ Conditioning problem again: B-splines are used to keep the

model subspace the same but have the design lessill-conditioned.

■ Other bases one might use:◆ Fourier: sin and cos waves.◆ Wavelet: space/time localized basis for functions.


http://www-stat.stanford.edu/~jtaylo/courses/stats203/R/inference+polynomial/spline.R.html

Multiple Linear Regression: Inference &...

Documents

Transcript of Multiple Linear Regression: Inference &...