Hypothesis Testing: The Generalized Likelihood Ratio...

Hypothesis Testing: The Generalized Likelihood Ratio Test

Consider testing the hypothesesH0 : θ ∈ Θ0

H1 : θ ∈ Θ \Θ0

Definition: The Generalized Likelihood Ratio (GLR)

Let L(θ) be a likelihood for a random sample having joint pdf f(~x; θ) for θ ∈ Θ.

The (generalized) likelihood ratio (GLR) is defined to be

λ = λ(~x) =maxθ∈Θ0 L(θ)

maxθ∈Θ\Θ0L(θ)

=L(θ̂0)

L(θ̂)

where θ̂ denote the usual “unrestricted MLE” and θ̂0 denoted the MLE when H0 is true.

(If those maximums don’t exists, use supremums!)

Example: Suppose we wish to test H0 : θ ≤ 3 versus H1 : θ > 3 and we consider the likelihoodfunction L(θ).

We maximize L(θ) by setting ddθL(θ) = 0.

It is usually the case that there is only one solution to ddθL(θ) = 0 (and that it is indeed a max and

not a min!). Call this solution θ̂.

If θ̂ is the unique solution, there will be no other “turning points” on the graph of L(θ). Supposethe graph of L(θ) is as shown in Figure 1.

Figure 1: A Unimodal Likelihood

From Figure 1, we see that the location of the standard unrestricted MLE and then we see thatthe maximum of the likelihood over the restricted set where θ ≤ 3 occurs at θ̂0 = 3.

Figure 2: A Bimodal Likelihood

Of course it’s possible (though not probable) that the likelihood function looks like that shown inFigure 2. (This behavior would be reflected in multiple solutions to d

dθL(θ) = 0.) In this case, the

standard unrestricted (θ̂) and restricted (θ̂0) are shown.

***************************************************The Generalized Likelihood Ratio Test

Reject H0 if λ ≤ k where k is chosen to give a size α test.

(i.e.: α = maxθ∈Θ0

P (λ ≤ k; θ))

***************************************************

Remarks:

1. This is just like the Neyman-Pearson test in the simple versus simple case, the only changehere is the addition of the “maxes”.

2. The original Neyman-Pearson test can be thought of as a simple likelihood ratio test (orSLRT, pronounced “slirt”) since

maxθ∈Θ0

L(θ) = L(θ0) when Θ0 = {θ0}.

Example:

Let X1, X2, . . . , Xnbe a random sample from the N(µ, σ2) distribution where σ2 is known.

Test H0 : µ = µ0 versus H1 : µ 6= µ0.

(Note: There is no UMP for this problem as is often the case for a two-sided alternative hypothesis.)

Derive a GLRT of size α.

The joint pdf is

f(~x, µ) = (2πσ2)−n/2 e−1

2σ2

∑(xi−µ)2 .

Since a likelihood is any function proportional to the joint pdf, let’s take

L(µ) = e−1

2σ2

∑(xi−µ)2 .

We already know the usual (unrestricted) MLE for µ: µ̂ = X.

Question: Now what maximizes L(µ) when H0 is true?

Answer: That’s easy since H0 contains only one point! (µ0)

So,maxµ=µ0

L(µ) = L(µ0)!

(exciting not factorial...)

So,

λ =e−

12σ2

∑(xi−µ0)2

e−1

2σ2

∑(xi−x)2

= e−1

2σ2[∑

(xi−µ0)2−∑

(xi−x)2]

Since we’re going to have to compute a probability P (λ( ~X) ≤ k;H0), let’s simplify λ:

∑(xi − µ0)2 −

∑(xi − x)2 =

∑x2i − 2µ0

∑xi + nµ2

0 −∑x2i + 2x

∑xi − nx2

= −2µ0∑xi + nµ2

0 + 2x∑xi − nx2

Hey! This sort of looks like something squared...

In fact, if we pull the n out:

n(−2µ01n

∑xi + µ2

0 + 2x 1n

∑xi − x2) = n(−2µ0x+ µ2

0 + 2x2 − x2)

= n(−2µ0x+ µ20 + x2

= n(x− µ0)2

(Cool!)

So,

λ = exp

[−n(x− µ0)2

2σ2

]

Recall that we will reject H0 if λ ≤ k where k is such that P (λ( ~X) ≤ k;H0) = α:

P

(exp

[−n(X−µ0)2

2σ2

]≤ k;H0

)= P

(−n(X−µ0)2

2σ2 ≤ ln k;H0

)

= P

(n(X−µ0)2

σ2 ≥ −2 ln k;H0

)

= P

((X−µ0σ/√n

)2≥ k1;H0

)where k1 is such that this probability is α.

Now if H0 is true and µ is indeed µ0, then X−µ0σ/√n∼ N(0, 1) and so(

X − µ0

σ/√n

)2

∼ χ2(1)

So

P

(X − µ0

σ/√n

)2

≥ k1;H0

= P (W ≥ k1)

where W ∼ χ2(1).

Hence,k1 = χ2

α(1).

So, we will reject H0 if (X − µ0

σ/√n

)2

≥ χ2α(1).

This is the GLRT of size α!

Example:

(Note: I could only think of one example of a composite versus composite that isn’t a computationalnightmare, so I’ll save this easy problem for you!)

Let X1, X2, . . . , Xnbe a random sample from the unif(0, θ] distribution.

(Note that I closed the right side of the interval. I only did this so that we won’t have a problemwith a “max”, but this wouldn’t matter at all if we were using the more general definition of theGLR that uses supremums.)

Find the GLRT of size α for H0 : θ = θ0 versus H1 : θ 6= θ0.

f(x; θ) =1

θI(0,θ](x)

⇒ L(θ) = θ−nn∏i=1

I(0,θ](xi) = θ−n I(0,θ](x(n))

We already know that the plain old unrestricted MLE is θ̂ = X(n). (This is because the derivative ofθ−n with respect to θ set equal to zero gives no information, so since 0 < xi ≤ θ for i = 1, 2, . . . , n,the smallest θ can be is x(n) which then maximizes the decreasing L(θ) = θ−n.)

Since H0 consists of only one point: θ = θ0, the maximum of L(θ) restricted to this one point setis simply L(θ0).

So,

λ =θ−n0 I(0,θ0](x(n))

x−n(n) I(0,x(n)](x(n))=

(x(n)

θ0

)nI(0,θ0](x(n))

As usual, we will reject H0 if (x(n)

θ0

)nI(0,θ0](x(n)) ≤ k

where k is such that

P

((X(n)

θ0

)nI(0,θ0](X(n)) ≤ k;H0

)= α

Well, under H0, that indicator is always 1, so we can drop it:

α = P

((X(n)

θ0

)n≤ k;H0

)= P

(X(n) ≤ θ0k

1/n;H0

)= P

(X(n) ≤ k1;H0

)

Finally, we solve for k1:

α = P(X(n) ≤ k1;H0

)= [P (X1 ≤ k1;H0)]n

= k1θ0

⇒ k1 = θ0α1/n.

So, the GLRT of size α, for H0 : θ = θ0 versus H1 : θ 6= θ0 for a random sample of size n from theunif(0, θ] distribution, is to reject H0 in favor of H1 if

X(n) ≤ θ0α1/n.

Wait a minute...

Whoa, what’s going on in that last problem? Does that rejection rule make sense? One wouldcertainly think that is is not true that θ, the upper limit of the support set for the sample, isequal to θ0 if we happened to observe x(n) > θ0. Shouldn’t we be rejecting for some values of X(n)

that are too large? We don’t just want to say: “Well of course we would automatically reject ifX(n) > θ0” because making up rules to suit our needs will mess with the size of the test that weworked so hard to obtain.

The answer is “yes” but I didn’t mention it as we were going through the example because I didn’twant to muck up the steps of a standard GLRT procedure with this special case of having theindicators mixed with the parameters. Normally,

• We set something less than or equal to k.

• We simplify this to something else less (greater) than or equal to some k1.

• We say that the original k doesn’t matter anymore.

With indicators mixed with parameters, it does matter... this is a weird sticky little point that Iwill not hold you accountable for in this course, but, for the record, I will expound upon it here.

Going back to the original rejection rule, we reject H0 if(X(n)

θ0

)nI(0,θ0] ≤ k.

Since k = (k1/θ0)n, this becomes, “reject H0 if”:(X(n)

θ0

)nI(0,θ0](X(n)) ≤

(k1

θ0

)n=

(θ0α

1/n

θ0

)n= α.

For a non-trivial α (ie: α > 0), if X(n) is greater than θ0, the left hand side of this inequality willbe zero, hence less than α, hence we will reject, as desired.

Hypothesis Testing: The Generalized Likelihood Ratio...

Documents

Transcript of Hypothesis Testing: The Generalized Likelihood Ratio...