Lecture 4: H y p ot he sis T es tingam3xa/BiostatII/slides/lecture4.pdfap p roa c h Ba sed on our...

Lecture 4: Hypothesis Testing

Ani [email protected]

20 April 2007

1 / 69

Steps of Hypothesis Testing

Define the null hypothesis, H0

Define the alternative hypothesis, Ha, where Ha is usually ofthe form “not H0”

Define the type I error, α, usually 0.05

Calculate the test statistic

Calculate the p-value

If the p-value is less than α, reject H0

Otherwise, fail to reject H0

2 / 69

Hypothesis Testing

We will first discuss hypothesis testing as it applies to meansof distributions for continuous variables

We will then discuss discrete data (specifically dichotomousvariables)

3 / 69

Hypothesis test for a single mean I

Assume a population of normally distributed birth weightswith a known standard deviation, σ = 1000 grams

Birth weights are obtained on a sample of 10 infants; thesample mean is calculated as 2500 grams

Question: Is the mean birth weight in this population differentfrom 3000 grams?

Set up a two-sided test of

H0 : µ = 3000

vs. Ha : µ != 3000

Let α = 0.05 denote a 5% significance level

4 / 69

Hypothesis test for a single mean II

Calculate the test statistic:

zobs =X̄ − µ0

σ/√

n=

2500− 3000

1000/√

10= −1.58

What does this mean? Our observed mean is 1.58 standarderrors below the hypothesized mean

The test statistic is the standardized value of our dataassuming the null hypothesis is true!

Question: If the true mean is 3000 grams, is our observedsample mean of 2500 “common” or is this value unlikely tooccur?

5 / 69

Hypothesis test for a single mean III

Calculate the p-value:

p-value = P(Z < −|zobs |)+P(Z > |zobs|) = 2×0.057 = 0.114

If the true mean is 3000 grams, our data or data moreextreme than ours would occur in 11 out of 100 studies (ofthe same size, n=10)

In 11 out of 100 studies, just by chance we are likely toobserve a sample mean of 2500 or more extreme if the truemean is 3000 grams

What does this say about our hypothesis?

General guideline: if p-value < α, then reject H0

6 / 69

Hypothesis test for a single mean IV

Could also use the “critical region” or “rejection region”approach

Based on our significance level (α = 0.05) and assuming H0 istrue, how “far” does our sample mean have to be fromH0 : µ = 3000 in order to reject?

Critical value = zc where 2× P(Z > |zc |) = 0.05

In our example, zc = 1.96

The rejection region is any value of our test statistic that isless than -1.96 or greater than 1.96

Decision should be the same whether using the p-value orcritical / rejection region

7 / 69

Hypothesis test for a single mean V

An alternative approach for the two sided hypothesis test is tocalculate a 100(1-α)% confidence interval for the mean

We are 95% confident that the interval (1880, 3120) containsthe true population mean µ

X̄ ± zα/2σ√10→ 2500 ± 1.96

1000√10

The hypothetical true mean 3000 is a plausible value of thetrue mean given out data

We cannot say that the true mean is different from 3000

8 / 69

P-values

Definition: The p-value for a hypothesis test is the nullprobability of obtaining a value of the test statistic as or moreextreme than the observed test statistic

The rejection region is determined by α, the desired level ofsignificance, or probability of committing a type I error

Reporting the p-value associated with a test gives anindication of how common or rare the computed value of thetest statistic is, given that H0 is true

We often use zobs to denote the computed value of the teststatistic

9 / 69

Determining the correct test statistic

Depends on your assumptions on σ

When σ is known, we have a standard normal test statistic

When σ is unknown and our sample size is relatively small,the test statistic has a t-distribution

The only chance in the procedure is the calculation of thep-value or rejection region uses a t- instead of normaldistribution

10 / 69

Hypothesis tests for one meanH0 : µ = µ0, Ha : µ != µ0

Population Sample Population TestDistribution Size Variance Statistic

NormalAny σ2 known zobs = X̄−µ0

σ/√

n

Any σ2 unknown tobs = X̄−µ0

s/√

n

uses s2, df=n-1

Not Normal/ Large σ2 known zobs = X̄−µ0

σ/√

n

UnknownLarge s2 unknown zobs = X̄−µ0

s/√

n

uses s2

Small Any Non-parametric methods

11 / 69

Hypothesis tests for one proportionH0 : p = p0, Ha : p != p0

Population Sample TestDistribution Size Statistic

BinomialLarge zobs = p̂−p0q

p0(1−p0)n

Small Exact methods

12 / 69

Hypothesis tests for a difference of two meansH0 : µ1 − µ2 = µ0, Ha : µ1 − µ2 != µ0

Population Sample Population TestDistribution Size Variances Statistic

Normal

Any Known zobs = (X̄1−X̄2)−µ0rσ2

1n1

+σ2

2n2

Any unknowntobs = (X̄1−X̄2)−µ0r

s2pn1

+s2pn2

assume σ21 = σ2

2 ,df = n1 + n2 − 2

Any unknowntobs = (X̄1−X̄2)−µ0r

s21n1

+s22n2

assume σ21 != σ2

2 ,df = ν

13 / 69

Example: Hypothesis test for two means(two independent samples) I

The EPREDA Trial: randomized, placebo-controlled trial todetermine whether dipyridamole improves the efficacy ofaspirin in preventing fetal growth retardation

Pregnant women randomized to placebo (n=73), aspirin oraspirin plus dipyridamole (n=156)

Mean birth weight was statistically significantly higher in thetreated than in the placebo group

2751 (SD 670) grams vs. 2526 (SD 848) grams

14 / 69

Example: Hypothesis test for two means(two independent samples) II

Test the hypothesis:

H0 : µplacebo = µtreated

vs. Ha : µplacebo != µtreated

at the 5% significance level

The data are:

Treatment n mean SDPlacebo 73 2526 848Treated 156 2751 670

15 / 69

Example: Hypothesis test for two means(two independent samples) III

Calculate the test statistic:

tobs =(X̄1 − X̄2)− µ0√

s21

np+

s22

nt

=2526− 2751√

8482

73 + 6762

156

= −1.99

The observed difference in mean birth weight comparing theplacebo to treated groups is approximately 2 standard errorsbelow the hypothesized difference of 0

Our sample size is pretty large, so the test statistic will behavelike a standard normal variable

16 / 69

Example: Hypothesis test for two means(two independent samples) IV

What is the p-value in this example?

p-value= 0.047

What is your decision in this case?

Not straightforwardThere may be a difference in birth weight comparing the twogroupsNeed to consider the practical implications

17 / 69

Example: Hypothesis test for two means(two independent samples) V

Can also give 95% confidence interval for the difference in thetwo means: (-446.13, -3.87)

Again, this is a plausible range of values for the true differencein birth weights comparing the placebo to treated groups

What is your null hypothesis? No difference!

Given this confidence interval, is “no difference” a plausiblevalue? Almost?

18 / 69

Hypothesis tests for a difference of two meansH0 : µ1 − µ2 = µ0, Ha : µ1 − µ2 != µ0

Population Sample Population TestDistribution Size Variances Statistic

Large Known zobs = (X̄1−X̄2)−µ0rσ2

1n1

+σ2

2n2

Not Large unknownzobs = (X̄1−X̄2)−µ0r

s2pn1

+s2pn2

Normal/ assume σ21 = σ2

2 ,Unknown

Large unknownzobs = (X̄1−X̄2)−µ0r

σ21

n1+

σ22

n2

assume σ21 != σ2

2 ,

small Any NonparametricMethods

19 / 69

Additional Considerations:We’re not always right

Conclusion based on “Truth”Data (sample) H0 true H0 false

Reject H0 Type I error CorrectFail to reject H0 Correct Type II error

20 / 69

Errors in hypothesis testing α

α = P(Type I error)

= probability of rejecting a true null hypothesis

= “level of significance”

Aim: to keep Type I error small by specifying a small rejectionregion

α is usually set before performing a test, typically at levelα = 0.05

21 / 69

Errors in hypothesis testing β I

β = P(Type II error)

= P(fail to reject H0 given H0 is false)

Power = 1− β

= probability of rejecting H0 when H0 is false

Aim: to keep Type II error small and achieve large power

22 / 69

Errors in hypothesis testing β II

β depends on sample size, α, and the specified alternativevalue

The value of β is usually unknown since the true mean (orother parameter) is generally unknown

Before data collection, scientists should decidethe test they will performthe desired Type I error rate αthe desired β, for a specified alternative value

After specifying this information, an appropriate sample sizecan be determined

23 / 69

Critical Regions I

24 / 69

Critical Regions II

25 / 69

Critical Regions III

26 / 69

Type II error

27 / 69

Dichotomous variables

Proportions

2 × 2 tables

Study Design

Hypothesis tests

28 / 69

Proportions and 2 × 2 tables

Population Success Failure TotalPopulation 1 x1 n1 − x1 n1

Population 2 x2 n2 − x2 n2

Total x1 + x2 n − (x1 + x2) n

Row 1 shows results of a binomial experiment with n1 trials

Row 2 shows results of a binomial experiment with n2 trials

29 / 69

How do we compare these proportions

Often, we want to compare p1, the probability of success inpopulation 1, to p2, the probability of success in population 2

Usually: “Success” = DiseasePopulation 1 = Treatment 1

How do we compare these proportions?It depends!

30 / 69

Study Designs

Cross-sectional

Cohort

Case-controlMatched case-control

31 / 69

Cohort Studies

Application to Aceh Vitamin A Trial

25,939 pre-school children in 450 Indonesian villages innorthern Sumatra

200,000 IU vitamin A given 1-3 months after the baselinecensus, and again at 6-8 months

Consider 23,682 out of 25,939 who were visited on apre-designed schedule

32 / 69

Trial Outcome

Alive at 12 months?Vit A No Yes TotalYes 46 12,048 12,094No 74 11,514 11,588

Total 120 23,562 23,682

Does Vitamin A reduce mortality?

Calculate risk ratio or “relative risk”Relative Risk abbreviated as RRCould also compare difference in proportions: called“attributable risk”

33 / 69

Relative Risk Calculation

Relative Risk =Rate with Vitamin A

Rate without Vitamin A

=p̂1

p̂2

=46/12, 094

74/11, 588

=0.0038

0.0064= 0.59

Vitamin A group had 40% lower mortality!

34 / 69

Confidence interval for RR

Step 1: Find the estimate of the log RR

log(p̂1

p̂2)

Step 2: Estimate the variance of the log(RR) as:

1− p1

n1p1+

1− p2

n2p2

Step 3: Find the 95% CI for log(RR):

log(RR) ± 1.96 · SD(log RR) = (lower, upper)

Step 4: Exponentiate to get 95% CI for RR;

e(lower, upper)

35 / 69

Confidence interval for RRfrom Vitamin A Trial

95% CI for log relative risk is:

log(RR) ± 1.96 · SD(log RR)

= log(0.59) ± 1.96 ·√

0.9962

46+

0.9936

74= −0.53 ± 0.37

= (−0.90,−0.16)

95% CI for relative risk

(e−0.90, e−0.16) = (0.41, 0.85)

Does this confidence interval contain 1?

36 / 69

What if the data were from a case-control study?

Recall: in case-control studies, individuals are selected byoutcome status

Disease (mortality) status defines the population, andexposure status defines the success

p1 and p2 have a difference interpretation in a case-controlstudy than in a cohort study

Cohort:p1 = P(Disease | Exposure)p2 = P(Disease | No Exposure)

Case-Control:p1 = P(Exposure | Disease)p2 = P(Exposure | No Disease)

⇒ This is why we cannot estimate the relative risk fromcase-control data!

37 / 69

The Odds Ratio

The odds ratio measures association in Case-Control studies

Odds =P(event occurs)

P(event does not occur)Odds ratio for death given Vitamin A status is the odds ofdeath given Vitamin A divided by the odds of death given noVitamin A

OR = p̂1/(1−p̂1)p̂2/(1−p̂2)

38 / 69

Which p1 and p2 do we use?

Calculate OR both ways

Using “case-control” p1 and p2

OR =(46/120)/(74/120)

(12048/23562)/(11514/23562)=

46/74

12048/11514= 0.59

Using “cohort” p1 and p2

OR =(46/12094)/(12048/12094)

(74/11588)/(11514/11588)=

46/12048

74/11514= 0.59

We get the same answer either way!

39 / 69

Bottom Line

The relative risk cannot be estimated from a case-controlstudy

The odds ratio can be estimated from a case-control study

OR estimates the RR when the disease is rare

The OR is invariant to cohort or case-control designs, the RRis not

40 / 69

Confidence interval for OR

Step 1: Find the estimate of the log OR

log(p̂1/(1− p̂1)

p̂2/(1− p̂2))

Step 2: Estimate the variance of the log(OR) as:

1

n1p1+

1

n1q1+

1

n2p2+

1

n2q2

Step 3: Find the 95% CI for log(OR):

log(OR) ± 1.96 · SD(log OR) = (lower, upper)

Step 4: Exponentiate to get 95% CI for OR;

e(lower, upper)

41 / 69

Matched-pairs case-control study design I

Samples not independent

Cases and controls matched on age, race, sex, etc.

The data are summarized in a different type of table

42 / 69

Matched-pairs case-control study design II

Results

E = exposed

Ec = not exposed

N = total number of pairs

Concordant pair

Same exposure

Discordant pair

Different exposure

ControlsE Ec

CasesE a b a+bEc c d c+d

a+c b+d N

43 / 69

Matched-pairs case-control study design III

Concordant pairs provide little information about differences

We focus on the discordant pairs

EEc pairs (b), in which the case is exposed and the control isunexposedEcE pairs (c), in which the case is unexposed and the controlis exposed

44 / 69

Matched-pairs case-control study design IV

Under the null hypothesis of no difference:

P(EE c) = P(E cE ) = 12 = p

The number of EEc discordant pairs follows a binomialdistribution

mean = npvariance = npqn = b+c (the total number of discordant pairs)

So we can test the null hypothesis, H0 : p = 12 using the test

statistic z =b− n

2q12 · 1

2 ·n, which is approximately normally distributed

45 / 69

McNemar’s Test

Algebra shows that:

z2 = (b − n

2√12 · 1

2 · n)2

=(b − c)2

b + c∼ χ2

1

This test statistic is much easier to look at, but always gives us thesame result as our original z-test

Note that the χ21 distribution is defined as the distribution of Z 2

where Z ∼ N(0, 1)

46 / 69

Example: Estrogen and Endometrial Cancer I

H0 : OR = 1

Ha : OR != 1

Matched pairs design

ControlsEstrogen No estrogen

CasesEstrogen 17 76 93

No estrogen 10 111 12127 187 214 pairs

47 / 69

Example: Estrogen and Endometrial Cancer II

OR =b

c=

76

10= 7.6

= estimate of the relative risk

of disease for exposed vs. unexposed

McNemar’s test statistic:

z2 =(b − c)2

b + c=

(76− 10)2

76 + 10= 50.65

The estimated odds of endometrial cancer among estrogen users is7.6 times the odds of cancer among those with no estrogenexposure (p<0.001).

48 / 69

Confidence Interval for Matched-Pairs OR

Step 1: Find the estimate of log(OR)

log(b

c) = log(b)− log(c)

Step 2: Estimate the variance of log(OR)

var [log(OR)] =1

b+

1

c

Step 3: Find the 95% CI for log(OR)

log(OR) ± 1.96 · SD(log OR) = (lower, upper)

Step 4: Exponentiate

(elower, eupper)

49 / 69

Confidence Interval for Matched-Pairs OREstrogen and Endometrial Cancer Example

Thus, a 95% CI for the log odds ratio is:

log(OR) ± 1.96 · SD(log OR)

= log(7.6) ± 1.96 ·√

1

76+

1

10= 2.03 ± 0.66

= (1.37, 2.69)

The 95% CI for the odds ratio is

(e1.37, e2.69) = (3.93, 14.73)

Does this interval contain 1?

50 / 69

When to match subjects

Genetic predisposition to glaucoma or lifestyle might confoundthe association between contact lens use and development ofglaucoma

Matching on potential confounders removes those variablesfrom the analysis: the association is automatically adjusted forany matched variables

51 / 69

When not to match subjects I

Never match for a variable in the causal pathway between thepredictor and the outcome — such matching may removeassociation

Don’t match on too many thingsIt may be hard to find controls matched on age, gender, SES,rase, BMI, and smoking for each available caseIf such matched controls were available, the data might be“overmatched”, so that few differences remain between casesand controls

52 / 69

When not to match subjects II

An alternative to matching for potential confounders is toadjust for them

Continuous outcome: linear regressionBinary outcome: logistic regression

When in doubt, design an unmatched study

We can always adjust for confoundersWe can never unmatch matched data

53 / 69

All kinds of χ2 tests

Test of Goodness of fit

Test of independence

Test of homogeneity or (no) association

All of these test statistics have a χ2 distribution under the null

54 / 69

The χ2 distribution

Derived from the normal distribution

χ21 = (

y − µ

σ)2 = Z 2

χ2k = Z 2

1 + Z 22 + · · · + Z 2

k

where Z1, . . . ,Zk are all standard normal random variables

k denotes the degrees of freedom

A χ2k random variable hasmean = kvariance = 2k

55 / 69

The χ2 test statistic

χ2 =∑k

i=1[(Oi−Ei )2

Ei] where

Oi = i th observed frequency

Ei = i th expected frequency in the i th cell of a table

Note: This test is based on frequencies (cell counts) in a table, notproportions

56 / 69

χ2 Family of Distributions

57 / 69

χ2 Table

58 / 69

χ2 Goodness-of-Fit Test

Determine whether or not a sample of observed values of somerandom variable is compatible with the hypothesis that the samplewas drawn from a population with a specified distributional form,e.g.

Normal

Binomial

Poisson

etc.

Here, the expected cell counts would be derived from thedistributional assumption under the null hypothesis

59 / 69

Example: Handgun survey I

Survey 200 adults regarding handgun bill:

Statement: “I agree with a ban on handguns”Four categories: Strongly agree, agree, disagree, stronglydisagree

Can one conclude that opinions are equally distributed overfour responses?

60 / 69

Example: Handgun survey II

Response (count)1 2 3 4

Stronglyagree disagree

Stronglyagree disagree

Responding (Oi ) 102 30 60 8Expected (Ei ) 50 50 50 50

χ2 =k∑

i=1

[(Oi − Ei )2

Ei]

=(102− 50)2

50+

(30− 50)2

50+

(60− 50)2

50+

(8− 50)2

50= 99.36

61 / 69

Example: Handgun survey III

Critical value: χ24−1,0.05 = χ2

3,0.05 = 7.81

Since 99.36 > 7.81, we conclude that our observation wasunlikely by chance alone

Based on these data, opinions do not appear to be equallydistributed among the four responses

62 / 69

χ2 Test of Independence I

Test the null hypothesis that two criteria of classification areindependent

r × c contingency table

Criterion 11 2 3 · · · c Total

Criterion 2

1 n11 n12 n13 · · · n1c n1·2 n21 n22 n23 · · · n2c n2·3 n31 n32 n33 · · · n3c n3·

......

......

......

r nr1 nr2 nr3 · · · nrc nr ·Total n·1 n·2 n·3 · · · n·c n

63 / 69

χ2 Test of Independence II

Test statistic:

χ2 =k∑

i=1

[(Oi − Ei )2

Ei]

Degrees of freedom = (r-1)(c-1)

Assume the marginal totals are fixed

64 / 69

χ2 Test of Homogeneity (No association)

Test the null hypothesis that the samples are drawn frompopulations that are homogenous with respect to some factor

i.e. no association between group and factor

Same test statistic as χ2 test of independence

65 / 69

Example: Treatment response I

Response to TreatmentTreatment Yes No Total

Observed NumbersA 37 13 50B 17 53 70

Total 54 66 120

Calculate what numbers of “Yes” and “No” would beexpected assuming the probability of “Yes” was the same inboth groups

Condition on total the number of “Yes” and “No” responses

66 / 69

Example: Treatment response II

Expected proportion with “Yes” response = 54120 = 0.45

Expected proportion with “No” response = 66120 = 0.55

Response to TreatmentTreatment Yes No Total

Observed A 37 (22.5) 13 (27.5) 50(Expected) B 17 (31.5) 53 (38.5) 70

Total 54 66 120

67 / 69

Example: Treatment response III

Test statistic:

χ2 =k∑

i=1

[(Oi − Ei )2

Ei]

=(37− 22.5)2

22.5+

(13− 27.5)2

27.5

+(17− 31.5)2

31.5+

(53− 38.5)2

38.5= 29.1

Degrees of freedom = (r-1)(c-1) = (2-1)(2-1) = 1where r= num. rows, and c= num. columns

68 / 69

Example: Treatment response IV

We see p<0.001

Reject the null hypothesis, and conclude that the treatmentgroups are not homogenous (similar) with respect to response

Response appears to be associated with treatment

69 / 69

Lecture 4: H y p ot he sis T es tingam3xa/BiostatII/slides/lecture4.pdfap p roa c h Ba sed on our...

Documents

Transcript of Lecture 4: H y p ot he sis T es tingam3xa/BiostatII/slides/lecture4.pdfap p roa c h Ba sed on our...