3. The F Test for Comparing Reduced vs. Full Models · Now back to determining the distribution of...

3. The F Test for ComparingReduced vs. Full Models

Assume the Gauss-Markov Model with normal errors:

y = Xβ + ε, ε ∼ N(0, σ2I).

Suppose C(X0) ⊂ C(X) and we wish to test

H0 : E(y) ∈ C(X0) vs. HA : E(y) ∈ C(X) \ C(X0).

The “reduced” model corresponds to the null hypothesis andsays that E(y) ∈ C(X0), a specified subspace of C(X).

The “full” model says that E(y) can be anywhere in C(X).

For example, suppose

111111

and X =

1 0 01 0 00 1 00 1 00 0 10 0 1

In this case, the reduced model says that all 6 observations havethe same mean.

The full model says that there are three groups of twoobservations. Within each group, observations have the samemean. The three group means may be different from oneanother.

For this example, let µ1, µ2, and µ3 be the elements of β in thefull model, i.e., β = [µ1, µ2, µ3]

′. Then, for the full model,

E(y) = Xβ =

1 0 01 0 00 1 00 1 00 0 10 0 1

H0 : E(y) ∈ C(X0) vs. HA : E(y) ∈ C(X) \ C(X0)

is equivalent to

H0 : µ1 = µ2 = µ3 vs. HA : µi 6= µj, for some i 6= j.

For the general case, consider the test statistic

F =y′(PX − PX0)y/ [rank(X)− rank(X0)]

y′(I − PX)y/ [n− rank(X)].

To show that this statistic has an F distribution, we will use thefollowing fact:

PX0PX = PXPX0 = PX0 .

There are many ways to see that this fact is true. First,

C(X0) ⊂ C(X) =⇒ Each column of X0 ∈ C(X)

=⇒ PXX0 = X0.

PXPX0 = PXX0(X′0X0)−X′0 = X0(X′0X0)

−X′0= PX0 .

This implies that

(PXPX0)′ = P′X0

=⇒ P′X0P′X = P′X0

=⇒ PX0PX = PX0 . 2

Alternatively,

∀ a ∈ Rn, PX0a ∈ C(X0) ⊂ C(X).

Thus, ∀ a ∈ Rn, PXPX0a = PX0a.

This implies PXPX0 = PX0.

Transposing both sides of this equality and using symmetry ofprojection matrices yields

PX0PX = PX0 . 2

Alternatively, C(X0) ⊂ C(X) =⇒ XB = X0 for some B becauseevery column of X0 must be in C(X).

PX0PX = X0(X′0X0)−X′0PX = X0(X′0X0)

−(XB)′PX

= X0(X′0X0)−B′X′PX = X0(X′0X0)

−B′X′

= X0(X′0X0)−(XB)′ = X0(X′0X0)

−X′0 = PX0 .

PXPX0 = PXX0(X′0X0)−X′0 = PXXB(X′0X0)

−X′0

= XB(X′0X0)−X′0 = X0(X′0X0)

−X′0 = PX0 .

Note that PX − PX0 is a symmetric and idempotent matrix:

(PX − PX0)′ = P′X − P′X0

= PX − PX0 .

(PX − PX0)(PX − PX0) = PXPX − PXPX0 − PX0PX + PX0PX0

= PX − PX0 − PX0 + PX0

= PX − PX0 .

Now back to determining the distribution of

y′(I − PX)y/ [n− rank(X)].

An important first step is to note that

F =y′(

PX−PX0σ2

)y/ [rank(X)− rank(X0)]

y′( I−PX

)y/ [n− rank(X)]

Now we can show that the numerator is a chi-square randomvariable divided by its degrees of freedom, independent of thedenominator, which is a central chi-square random variabledivided by its degrees of freedom. Once we show all thesethings, we will have established that the statistic F has an Fdistribution (see Slide Set 1 Slide 35).

We will use the following result from Slide Set 1.

Suppose Σ is an n× n positive definite matrix.

Suppose A is an n× n symmetric matrix of rank m such thatAΣ is idempotent (i.e., AΣAΣ = AΣ).

Then y ∼ N(µ,Σ) =⇒ y′Ay ∼ χ2m(µ

′Aµ/2).

For the numerator of our F statistic, we have

µ = Xβ, Σ = σ2I, A =

(PX − PX0

), and

m = rank(A) = rank(

PX − PX0

)= rank(PX − PX0)

= tr(PX − PX0) = tr(PX)− tr(PX0)

= rank(PX)− rank(PX0) = rank(X)− rank(X0).

(Multiplying by a nonzero constant does not affect the rank of amatrix. Rank is the same as trace for idempotent matrices.Trace of a difference is the same as the difference of traces. Therank of a projection matrix is equal to the rank of the matrixwhose column space is projected onto.)

To verify that Σ is positive definite, note that for any a ∈ Rn \ {0},

a′Σa = a′(σ2I)

a = σ2a′a = σ2n∑

a2i > 0.

To verify that AΣ is idempotent, we have

(PX − PX0

)(σ2I) = PX − PX0 .

Thus, we have

PX − PX0

)y ∼ χ2

rank(X)−rank(X0)

(12β′X′

(PX − PX0

Denominator Distribution

From the result on Slide Set 2 Slide 19, we have

I − PX

)y ∼ χ2

n−rank(X).

(This fact can be proved using the result on Slide Set 1 Slide 31,following the same type of argument used to show thedistribution of the numerator.)

Independence of Numerator and Denominator

By Slide Set 1 Slide 38,

PX−PX0σ2

)y is independent of y′

( I−PXσ2

)y because

(PX − PX0

)(σ2I)

(I − PX

1σ2 (PX − PXPX − PX0 + PX0PX)

=1σ2 (PX − PX − PX0 + PX0) = 0.

Thus, it follows that

y′(I − PX)y/ [n− rank(X)]

∼ Frank(X)−rank(X0),n−rank(X)

(β′X′(PX − PX0)Xβ

If H0 is true, i.e. if E(y) = Xβ ∈ C(X0), then the noncentralityparameter is 0 because

(PX − PX0)Xβ = PXXβ − PX0Xβ

= Xβ − Xβ = 0.

In general, the noncentrality parameter quantifies how far themean of y is from C(X0) because

β′X′(PX − PX0)Xβ

= β′X′(PX − PX0)′(PX − PX0)Xβ

= || (PX − PX0)Xβ ||2= || PXXβ − PX0Xβ ||2

= || Xβ − PX0Xβ ||2= || E(y)− PX0E(y) ||2 .

Note that

y′(PX − PX0)y = y′ [(I − PX0)− (I − PX)] y

= y′(I − PX0)y− y′(I − PX)y

= SSEREDUCED − SSEFULL.

Also rank(X)−rank(X0)

= [n− rank(X0)]− [n− rank(X)]

= DFEREDUCED − DFEFULL,

where DFE = Degrees of Freedom for Error.

Thus, the F statistic has the familiar form

(SSEREDUCED − SSEFULL)/(DFEREDUCED − DFEFULL)

SSEFULL/DFEFULL.

Equivalence of F Tests

It turns out that this reduced vs. full model F test is equivalent tothe F test for testing

H0 : Cβ = d vs. HA : Cβ 6= d

with an appropriately chosen C and d.

The equivalence of these tests is proved in STAT 611.

Example: F Test for Lack of Linear Fit

Suppose a balanced, completely randomized design is used toassign 1, 2, or 3 units of fertilizer to a total of 9 plots of land.

The yield harvested from each plot is recorded as the response.

Let yij denote the yield from the jth plot that received i units offertilizer (i, j = 1, 2, 3).

Suppose all yields are independent and yij ∼ N(µi, σ2) for all

i, j = 1, 2, 3.

If y =

, then E(y) =

Suppose we wish to determine whether there is a linearrelationship between the amount of fertilizer applied to a plot andthe expected value of a plot’s yield.

In other words, suppose we wish to know if there exists realnumbers β1 and β2 such that

µi = β1 + β2(i) for all i = 1, 2, 3.

Consider testing H0 : E(y) ∈ C(X0) vs. HA : E(y) ∈ C(X) \ C(X0),where

1 11 11 11 21 21 21 31 31 3

and X =

1 0 01 0 01 0 00 1 00 1 00 1 00 0 10 0 10 0 1

Note H0 : E(y) ∈ C(X0)⇐⇒ ∃ β ∈ R2 3 E(y) = X0β

⇐⇒ ∃[β1

]∈ R2 3

1 11 11 11 21 21 21 31 31 3

β1 + β2(1)β1 + β2(1)β1 + β2(1)β1 + β2(2)β1 + β2(2)β1 + β2(2)β1 + β2(3)β1 + β2(3)β1 + β2(3)

⇐⇒ µi = β1 + β2(i) for all i = 1, 2, 3.

Note E(y) ∈ C(X)⇐⇒ ∃ β ∈ R3 3 E(y) = Xβ

⇐⇒ ∃

∈ R3 3

1 0 01 0 01 0 00 1 00 1 00 1 00 0 10 0 10 0 1

This condition clearly holds with βi = µi for all i = 1, 2, 3.

The alternative hypothesis

HA : E(y) ∈ C(X) \ C(X0)

is equivalent to

HA : There do not exist β1, β2 ∈ R such that

µi = β1 + β2(i) ∀ i = 1, 2, 3.

Because the lack of fit test is a reduced vs. full model F test, wecan also obtain this test by testing

H0 : Cβ = d vs. HA : Cβ 6= d

for appropriate C and d.

C =? d =?

R Code and Output

> x=rep(1:3,each=3)> x[1] 1 1 1 2 2 2 3 3 3>> y=c(11,13,9,18,22,23,19,24,22)>> plot(x,y,pch=16,col=4,xlim=c(.5,3.5),+ xlab="Fertilizer Amount",+ ylab="Yield",axes=F,cex.lab=1.5)> axis(1,labels=1:3,at=1:3)> axis(2)> box()

Fertilizer Amount

> X0=model.matrix(˜x)> X0

(Intercept) x1 1 12 1 13 1 14 1 25 1 26 1 27 1 38 1 39 1 3

> X=model.matrix(˜0+factor(x))> X

factor(x)1 factor(x)2 factor(x)31 1 0 02 1 0 03 1 0 04 0 1 05 0 1 06 0 1 07 0 0 18 0 0 19 0 0 1

> proj=function(x){+ x%*%ginv(t(x)%*%x)%*%t(x)+ }>> library(MASS)> PX0=proj(X0)> PX=proj(X)

> Fstat=(t(y)%*%(PX-PX0)%*%y/1)/+ (t(y)%*%(diag(rep(1,9))-PX)%*%y/(9-3))> Fstat

[,1][1,] 7.538462>> pvalue=1-pf(Fstat,1,6)> pvalue

[,1][1,] 0.03348515

> reduced=lm(y˜x)> full=lm(y˜0+factor(x))>> rvsf=function(reduced,full)+ {+ sser=deviance(reduced)+ ssef=deviance(full)+ dfer=reduced$df+ dfef=full$df+ dfn=dfer-dfef+ Fstat=(sser-ssef)/dfn/+ (ssef/dfef)+ pvalue=1-pf(Fstat,dfn,dfef)+ list(Fstat=Fstat,dfn=dfn,dfd=dfef,+ pvalue=pvalue)+ }

> rvsf(reduced,full)$Fstat[1] 7.538462

$dfn[1] 1

$dfd[1] 6

$pvalue[1] 0.03348515

> anova(reduced,full)Analysis of Variance Table

Model 1: y ˜ xModel 2: y ˜ 0 + factor(x)

Res.Df RSS Df Sum of Sq F Pr(>F)1 7 78.2222 6 34.667 1 43.556 7.5385 0.03349 *---Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

> test=function(lmout,C,d=0){+ b=coef(lmout)+ V=vcov(lmout)+ dfn=nrow(C)+ dfd=lmout$df+ Cb.d=C%*%b-d+ Fstat=drop(

t(Cb.d)%*%solve(C%*%V%*%t(C))%*%Cb.d/dfn)+ pvalue=1-pf(Fstat,dfn,dfd)+ list(Fstat=Fstat,pvalue=pvalue)+ }> test(full,matrix(c(1,-2,1),nrow=1))$Fstat[1] 7.538462$pvalue[1] 0.03348515

SAS Code and Output

data d;input x y;cards;

1 111 131 92 182 222 233 193 243 22;run;

proc glm;class x;model y=x;contrast ’Lack of Linear Fit’ x 1 -2 1;

The SAS System

The GLM Procedure

Dependent Variable: y

Sum ofSource DF Squares Mean Square F Value Pr > F

Model 2 214.2222222 107.1111111 18.54 0.0027

Error 6 34.6666667 5.7777778

Corrected Total 8 248.8888889

R-Square Coeff Var Root MSE y Mean

0.860714 13.43684 2.403701 17.88889

Source DF Type I SS Mean Square F Value Pr > F

x 2 214.2222222 107.1111111 18.54 0.0027

Source DF Type III SS Mean Square F Value Pr > F

x 2 214.2222222 107.1111111 18.54 0.0027

Contrast DF Contrast SS Mean Square F Value Pr > F

Lack of Linear Fit 1 43.55555556 43.55555556 7.54 0.0335

3. The F Test for Comparing Reduced vs. Full Models · Now back to determining the distribution of...

Documents

Transcript of 3. The F Test for Comparing Reduced vs. Full Models · Now back to determining the distribution of...

Rank 3 permutation groups

A short course on groups of finite Morley rank Part 1webpages.csus.edu/wiscons/research/GFMR-Minicourse-Slides.pdf · A short course on groups of ﬁnite Morley rank —Part 1—

Rank Tests - University of Kentucky · Locally most powerful rank tests One way of constructing powerful tests based on ranks is to nd the locally most powerful rank test We will

Using Bayesian statistics to rank sports teams (or, my ...ccrg.rit.edu/~whelan/talks/whelan20101008.pdf · Using Bayesian statistics to rank sports teams (or, my replacement for the

X. X X X X X X X X X X X X X X X X.

Performance of Random Sampling for Computing Low-rank ...

X _ X _ X _ X _ X _ X _ X _ X _ X _ X _ X _ X _ X _ X _ X _ X _ X _ X _ X _ X _ X _ X _ X _ X _ X _ X _ X _ X _ X _ X _ X _ X _ X _ X _ X _ X _ μ.

SATURATION AND ELEMENTARY EQUIVALENCE OF C* …eaglec/Papers/Saturation.pdfSATURATION AND ELEMENTARY EQUIVALENCE OF C*-ALGEBRAS 3 space X. We focus in particular on the real rank zero

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden ... · Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden{Fletcher{Goldfarb{Shanno (BFGS) Limited memory BFGS (L-BFGS) February

An Historical Analysis of the California Current using ... · J NL =(z − zb)T D−1(z − zb)+(y0 − H(x))T R−1(y0 − H(x)) Prior information Observation Observation operator

icerm.brown.edu · Symmetricmatrixcompletion ? −? ? − ? − Constraints: ()Rank(smallestpossible,ﬁxed,…) ()Positivesemideﬁnite ∗ ? ∗? ∗ ? ∗ ? ∗ SpectralTheorem

trendence Graduate Barometer 2013 - European Business ... · Rank 2013 Employer % 2013 Rank 2012 % 2012 Δ Rank Δ % 29 Inditex 2,08% 42 1,55% +0,53% 30 Porsche 2,00% 28 2,09% -0,09%

Still Photography Throwdown: Silver Halide vs. Siliconmarkfairchild.org/PDFs/PRES18.pdf0 20 40 60 80 100 1 2 3 4 Percent in Rank Rank Hasselblad 120 Nikon D2x Nikon F3 135 Nikon CP

Critical Groups of Rank 3 graphs - People · Critical Groups of Rank 3 graphs Peter Sin, U. of Florida Finite Geometry and Extremal Combinatorics, U. Delaware August 2019. Contents

RANK FULL MODEL (VARIANCE ESTIMATION)

SML-Z1(C) Series: LEDs - Farnell element14 · [Data Sheet] Rank Reference of Brightness* *Measurement tolerance:±10% [SML-Z14x(C) series] Red(V,U) (Ta=25ºC, I F＝20mA) Rank AM

Fast Rank Algorithms Based on Multiscale Histograms

1104LS Series - Stanley Electronic Components D4 1,800 2,200. 1104LS Series (High Reliability type, V-Series) Single Color PLCC-2 Type 2010.12.15 Page 8 Luminous Intensity Rank Rank

Classifying higher rank analytic Toeplitz algebrasnyjm.albany.edu/j/2007/13-14v.pdfClassifying higher rank analytic Toeplitz algebras Stephen C. Power Abstract. To a higher rank directed

Representations of Higher Rank Graph Algebraskrdavids/Preprints/DavYang_hrg.pdfHigher Rank Graph Algebras 3 there is a permutation θ ij in S m i×m j so that ei se j t = e t0 e i