Lecture 4: H y p ot he sis T es tingam3xa/BiostatII/slides/lecture4.pdfap p roa c h Ba sed on our...
Transcript of Lecture 4: H y p ot he sis T es tingam3xa/BiostatII/slides/lecture4.pdfap p roa c h Ba sed on our...
Steps of Hypothesis Testing
Define the null hypothesis, H0
Define the alternative hypothesis, Ha, where Ha is usually ofthe form “not H0”
Define the type I error, α, usually 0.05
Calculate the test statistic
Calculate the p-value
If the p-value is less than α, reject H0
Otherwise, fail to reject H0
2 / 69
Hypothesis Testing
We will first discuss hypothesis testing as it applies to meansof distributions for continuous variables
We will then discuss discrete data (specifically dichotomousvariables)
3 / 69
Hypothesis test for a single mean I
Assume a population of normally distributed birth weightswith a known standard deviation, σ = 1000 grams
Birth weights are obtained on a sample of 10 infants; thesample mean is calculated as 2500 grams
Question: Is the mean birth weight in this population differentfrom 3000 grams?
Set up a two-sided test of
H0 : µ = 3000
vs. Ha : µ != 3000
Let α = 0.05 denote a 5% significance level
4 / 69
Hypothesis test for a single mean II
Calculate the test statistic:
zobs =X̄ − µ0
σ/√
n=
2500− 3000
1000/√
10= −1.58
What does this mean? Our observed mean is 1.58 standarderrors below the hypothesized mean
The test statistic is the standardized value of our dataassuming the null hypothesis is true!
Question: If the true mean is 3000 grams, is our observedsample mean of 2500 “common” or is this value unlikely tooccur?
5 / 69
Hypothesis test for a single mean III
Calculate the p-value:
p-value = P(Z < −|zobs |)+P(Z > |zobs|) = 2×0.057 = 0.114
If the true mean is 3000 grams, our data or data moreextreme than ours would occur in 11 out of 100 studies (ofthe same size, n=10)
In 11 out of 100 studies, just by chance we are likely toobserve a sample mean of 2500 or more extreme if the truemean is 3000 grams
What does this say about our hypothesis?
General guideline: if p-value < α, then reject H0
6 / 69
Hypothesis test for a single mean IV
Could also use the “critical region” or “rejection region”approach
Based on our significance level (α = 0.05) and assuming H0 istrue, how “far” does our sample mean have to be fromH0 : µ = 3000 in order to reject?
Critical value = zc where 2× P(Z > |zc |) = 0.05
In our example, zc = 1.96
The rejection region is any value of our test statistic that isless than -1.96 or greater than 1.96
Decision should be the same whether using the p-value orcritical / rejection region
7 / 69
Hypothesis test for a single mean V
An alternative approach for the two sided hypothesis test is tocalculate a 100(1-α)% confidence interval for the mean
We are 95% confident that the interval (1880, 3120) containsthe true population mean µ
X̄ ± zα/2σ√10→ 2500 ± 1.96
1000√10
The hypothetical true mean 3000 is a plausible value of thetrue mean given out data
We cannot say that the true mean is different from 3000
8 / 69
P-values
Definition: The p-value for a hypothesis test is the nullprobability of obtaining a value of the test statistic as or moreextreme than the observed test statistic
The rejection region is determined by α, the desired level ofsignificance, or probability of committing a type I error
Reporting the p-value associated with a test gives anindication of how common or rare the computed value of thetest statistic is, given that H0 is true
We often use zobs to denote the computed value of the teststatistic
9 / 69
Determining the correct test statistic
Depends on your assumptions on σ
When σ is known, we have a standard normal test statistic
When σ is unknown and our sample size is relatively small,the test statistic has a t-distribution
The only chance in the procedure is the calculation of thep-value or rejection region uses a t- instead of normaldistribution
10 / 69
Hypothesis tests for one meanH0 : µ = µ0, Ha : µ != µ0
Population Sample Population TestDistribution Size Variance Statistic
NormalAny σ2 known zobs = X̄−µ0
σ/√
n
Any σ2 unknown tobs = X̄−µ0
s/√
n
uses s2, df=n-1
Not Normal/ Large σ2 known zobs = X̄−µ0
σ/√
n
UnknownLarge s2 unknown zobs = X̄−µ0
s/√
n
uses s2
Small Any Non-parametric methods
11 / 69
Hypothesis tests for one proportionH0 : p = p0, Ha : p != p0
Population Sample TestDistribution Size Statistic
BinomialLarge zobs = p̂−p0q
p0(1−p0)n
Small Exact methods
12 / 69
Hypothesis tests for a difference of two meansH0 : µ1 − µ2 = µ0, Ha : µ1 − µ2 != µ0
Population Sample Population TestDistribution Size Variances Statistic
Normal
Any Known zobs = (X̄1−X̄2)−µ0rσ2
1n1
+σ2
2n2
Any unknowntobs = (X̄1−X̄2)−µ0r
s2pn1
+s2pn2
assume σ21 = σ2
2 ,df = n1 + n2 − 2
Any unknowntobs = (X̄1−X̄2)−µ0r
s21n1
+s22n2
assume σ21 != σ2
2 ,df = ν
13 / 69
Example: Hypothesis test for two means(two independent samples) I
The EPREDA Trial: randomized, placebo-controlled trial todetermine whether dipyridamole improves the efficacy ofaspirin in preventing fetal growth retardation
Pregnant women randomized to placebo (n=73), aspirin oraspirin plus dipyridamole (n=156)
Mean birth weight was statistically significantly higher in thetreated than in the placebo group
2751 (SD 670) grams vs. 2526 (SD 848) grams
14 / 69
Example: Hypothesis test for two means(two independent samples) II
Test the hypothesis:
H0 : µplacebo = µtreated
vs. Ha : µplacebo != µtreated
at the 5% significance level
The data are:
Treatment n mean SDPlacebo 73 2526 848Treated 156 2751 670
15 / 69
Example: Hypothesis test for two means(two independent samples) III
Calculate the test statistic:
tobs =(X̄1 − X̄2)− µ0√
s21
np+
s22
nt
=2526− 2751√
8482
73 + 6762
156
= −1.99
The observed difference in mean birth weight comparing theplacebo to treated groups is approximately 2 standard errorsbelow the hypothesized difference of 0
Our sample size is pretty large, so the test statistic will behavelike a standard normal variable
16 / 69
Example: Hypothesis test for two means(two independent samples) IV
What is the p-value in this example?
p-value= 0.047
What is your decision in this case?
Not straightforwardThere may be a difference in birth weight comparing the twogroupsNeed to consider the practical implications
17 / 69
Example: Hypothesis test for two means(two independent samples) V
Can also give 95% confidence interval for the difference in thetwo means: (-446.13, -3.87)
Again, this is a plausible range of values for the true differencein birth weights comparing the placebo to treated groups
What is your null hypothesis? No difference!
Given this confidence interval, is “no difference” a plausiblevalue? Almost?
18 / 69
Hypothesis tests for a difference of two meansH0 : µ1 − µ2 = µ0, Ha : µ1 − µ2 != µ0
Population Sample Population TestDistribution Size Variances Statistic
Large Known zobs = (X̄1−X̄2)−µ0rσ2
1n1
+σ2
2n2
Not Large unknownzobs = (X̄1−X̄2)−µ0r
s2pn1
+s2pn2
Normal/ assume σ21 = σ2
2 ,Unknown
Large unknownzobs = (X̄1−X̄2)−µ0r
σ21
n1+
σ22
n2
assume σ21 != σ2
2 ,
small Any NonparametricMethods
19 / 69
Additional Considerations:We’re not always right
Conclusion based on “Truth”Data (sample) H0 true H0 false
Reject H0 Type I error CorrectFail to reject H0 Correct Type II error
20 / 69
Errors in hypothesis testing α
α = P(Type I error)
= probability of rejecting a true null hypothesis
= “level of significance”
Aim: to keep Type I error small by specifying a small rejectionregion
α is usually set before performing a test, typically at levelα = 0.05
21 / 69
Errors in hypothesis testing β I
β = P(Type II error)
= P(fail to reject H0 given H0 is false)
Power = 1− β
= probability of rejecting H0 when H0 is false
Aim: to keep Type II error small and achieve large power
22 / 69
Errors in hypothesis testing β II
β depends on sample size, α, and the specified alternativevalue
The value of β is usually unknown since the true mean (orother parameter) is generally unknown
Before data collection, scientists should decidethe test they will performthe desired Type I error rate αthe desired β, for a specified alternative value
After specifying this information, an appropriate sample sizecan be determined
23 / 69
Critical Regions I
24 / 69
Critical Regions II
25 / 69
Critical Regions III
26 / 69
Type II error
27 / 69
Dichotomous variables
Proportions
2 × 2 tables
Study Design
Hypothesis tests
28 / 69
Proportions and 2 × 2 tables
Population Success Failure TotalPopulation 1 x1 n1 − x1 n1
Population 2 x2 n2 − x2 n2
Total x1 + x2 n − (x1 + x2) n
Row 1 shows results of a binomial experiment with n1 trials
Row 2 shows results of a binomial experiment with n2 trials
29 / 69
How do we compare these proportions
Often, we want to compare p1, the probability of success inpopulation 1, to p2, the probability of success in population 2
Usually: “Success” = DiseasePopulation 1 = Treatment 1
How do we compare these proportions?It depends!
30 / 69
Study Designs
Cross-sectional
Cohort
Case-controlMatched case-control
31 / 69
Cohort Studies
Application to Aceh Vitamin A Trial
25,939 pre-school children in 450 Indonesian villages innorthern Sumatra
200,000 IU vitamin A given 1-3 months after the baselinecensus, and again at 6-8 months
Consider 23,682 out of 25,939 who were visited on apre-designed schedule
32 / 69
Trial Outcome
Alive at 12 months?Vit A No Yes TotalYes 46 12,048 12,094No 74 11,514 11,588
Total 120 23,562 23,682
Does Vitamin A reduce mortality?
Calculate risk ratio or “relative risk”Relative Risk abbreviated as RRCould also compare difference in proportions: called“attributable risk”
33 / 69
Relative Risk Calculation
Relative Risk =Rate with Vitamin A
Rate without Vitamin A
=p̂1
p̂2
=46/12, 094
74/11, 588
=0.0038
0.0064= 0.59
Vitamin A group had 40% lower mortality!
34 / 69
Confidence interval for RR
Step 1: Find the estimate of the log RR
log(p̂1
p̂2)
Step 2: Estimate the variance of the log(RR) as:
1− p1
n1p1+
1− p2
n2p2
Step 3: Find the 95% CI for log(RR):
log(RR) ± 1.96 · SD(log RR) = (lower, upper)
Step 4: Exponentiate to get 95% CI for RR;
e(lower, upper)
35 / 69
Confidence interval for RRfrom Vitamin A Trial
95% CI for log relative risk is:
log(RR) ± 1.96 · SD(log RR)
= log(0.59) ± 1.96 ·√
0.9962
46+
0.9936
74= −0.53 ± 0.37
= (−0.90,−0.16)
95% CI for relative risk
(e−0.90, e−0.16) = (0.41, 0.85)
Does this confidence interval contain 1?
36 / 69
What if the data were from a case-control study?
Recall: in case-control studies, individuals are selected byoutcome status
Disease (mortality) status defines the population, andexposure status defines the success
p1 and p2 have a difference interpretation in a case-controlstudy than in a cohort study
Cohort:p1 = P(Disease | Exposure)p2 = P(Disease | No Exposure)
Case-Control:p1 = P(Exposure | Disease)p2 = P(Exposure | No Disease)
⇒ This is why we cannot estimate the relative risk fromcase-control data!
37 / 69
The Odds Ratio
The odds ratio measures association in Case-Control studies
Odds =P(event occurs)
P(event does not occur)Odds ratio for death given Vitamin A status is the odds ofdeath given Vitamin A divided by the odds of death given noVitamin A
OR = p̂1/(1−p̂1)p̂2/(1−p̂2)
38 / 69
Which p1 and p2 do we use?
Calculate OR both ways
Using “case-control” p1 and p2
OR =(46/120)/(74/120)
(12048/23562)/(11514/23562)=
46/74
12048/11514= 0.59
Using “cohort” p1 and p2
OR =(46/12094)/(12048/12094)
(74/11588)/(11514/11588)=
46/12048
74/11514= 0.59
We get the same answer either way!
39 / 69
Bottom Line
The relative risk cannot be estimated from a case-controlstudy
The odds ratio can be estimated from a case-control study
OR estimates the RR when the disease is rare
The OR is invariant to cohort or case-control designs, the RRis not
40 / 69
Confidence interval for OR
Step 1: Find the estimate of the log OR
log(p̂1/(1− p̂1)
p̂2/(1− p̂2))
Step 2: Estimate the variance of the log(OR) as:
1
n1p1+
1
n1q1+
1
n2p2+
1
n2q2
Step 3: Find the 95% CI for log(OR):
log(OR) ± 1.96 · SD(log OR) = (lower, upper)
Step 4: Exponentiate to get 95% CI for OR;
e(lower, upper)
41 / 69
Matched-pairs case-control study design I
Samples not independent
Cases and controls matched on age, race, sex, etc.
The data are summarized in a different type of table
42 / 69
Matched-pairs case-control study design II
Results
E = exposed
Ec = not exposed
N = total number of pairs
Concordant pair
Same exposure
Discordant pair
Different exposure
ControlsE Ec
CasesE a b a+bEc c d c+d
a+c b+d N
43 / 69
Matched-pairs case-control study design III
Concordant pairs provide little information about differences
We focus on the discordant pairs
EEc pairs (b), in which the case is exposed and the control isunexposedEcE pairs (c), in which the case is unexposed and the controlis exposed
44 / 69
Matched-pairs case-control study design IV
Under the null hypothesis of no difference:
P(EE c) = P(E cE ) = 12 = p
The number of EEc discordant pairs follows a binomialdistribution
mean = npvariance = npqn = b+c (the total number of discordant pairs)
So we can test the null hypothesis, H0 : p = 12 using the test
statistic z =b− n
2q12 · 1
2 ·n, which is approximately normally distributed
45 / 69
McNemar’s Test
Algebra shows that:
z2 = (b − n
2√12 · 1
2 · n)2
=(b − c)2
b + c∼ χ2
1
This test statistic is much easier to look at, but always gives us thesame result as our original z-test
Note that the χ21 distribution is defined as the distribution of Z 2
where Z ∼ N(0, 1)
46 / 69
Example: Estrogen and Endometrial Cancer I
H0 : OR = 1
Ha : OR != 1
Matched pairs design
ControlsEstrogen No estrogen
CasesEstrogen 17 76 93
No estrogen 10 111 12127 187 214 pairs
47 / 69
Example: Estrogen and Endometrial Cancer II
OR =b
c=
76
10= 7.6
= estimate of the relative risk
of disease for exposed vs. unexposed
McNemar’s test statistic:
z2 =(b − c)2
b + c=
(76− 10)2
76 + 10= 50.65
The estimated odds of endometrial cancer among estrogen users is7.6 times the odds of cancer among those with no estrogenexposure (p<0.001).
48 / 69
Confidence Interval for Matched-Pairs OR
Step 1: Find the estimate of log(OR)
log(b
c) = log(b)− log(c)
Step 2: Estimate the variance of log(OR)
var [log(OR)] =1
b+
1
c
Step 3: Find the 95% CI for log(OR)
log(OR) ± 1.96 · SD(log OR) = (lower, upper)
Step 4: Exponentiate
(elower, eupper)
49 / 69
Confidence Interval for Matched-Pairs OREstrogen and Endometrial Cancer Example
Thus, a 95% CI for the log odds ratio is:
log(OR) ± 1.96 · SD(log OR)
= log(7.6) ± 1.96 ·√
1
76+
1
10= 2.03 ± 0.66
= (1.37, 2.69)
The 95% CI for the odds ratio is
(e1.37, e2.69) = (3.93, 14.73)
Does this interval contain 1?
50 / 69
When to match subjects
Genetic predisposition to glaucoma or lifestyle might confoundthe association between contact lens use and development ofglaucoma
Matching on potential confounders removes those variablesfrom the analysis: the association is automatically adjusted forany matched variables
51 / 69
When not to match subjects I
Never match for a variable in the causal pathway between thepredictor and the outcome — such matching may removeassociation
Don’t match on too many thingsIt may be hard to find controls matched on age, gender, SES,rase, BMI, and smoking for each available caseIf such matched controls were available, the data might be“overmatched”, so that few differences remain between casesand controls
52 / 69
When not to match subjects II
An alternative to matching for potential confounders is toadjust for them
Continuous outcome: linear regressionBinary outcome: logistic regression
When in doubt, design an unmatched study
We can always adjust for confoundersWe can never unmatch matched data
53 / 69
All kinds of χ2 tests
Test of Goodness of fit
Test of independence
Test of homogeneity or (no) association
All of these test statistics have a χ2 distribution under the null
54 / 69
The χ2 distribution
Derived from the normal distribution
χ21 = (
y − µ
σ)2 = Z 2
χ2k = Z 2
1 + Z 22 + · · · + Z 2
k
where Z1, . . . ,Zk are all standard normal random variables
k denotes the degrees of freedom
A χ2k random variable hasmean = kvariance = 2k
55 / 69
The χ2 test statistic
χ2 =∑k
i=1[(Oi−Ei )2
Ei] where
Oi = i th observed frequency
Ei = i th expected frequency in the i th cell of a table
Note: This test is based on frequencies (cell counts) in a table, notproportions
56 / 69
χ2 Family of Distributions
57 / 69
χ2 Table
58 / 69
χ2 Goodness-of-Fit Test
Determine whether or not a sample of observed values of somerandom variable is compatible with the hypothesis that the samplewas drawn from a population with a specified distributional form,e.g.
Normal
Binomial
Poisson
etc.
Here, the expected cell counts would be derived from thedistributional assumption under the null hypothesis
59 / 69
Example: Handgun survey I
Survey 200 adults regarding handgun bill:
Statement: “I agree with a ban on handguns”Four categories: Strongly agree, agree, disagree, stronglydisagree
Can one conclude that opinions are equally distributed overfour responses?
60 / 69
Example: Handgun survey II
Response (count)1 2 3 4
Stronglyagree disagree
Stronglyagree disagree
Responding (Oi ) 102 30 60 8Expected (Ei ) 50 50 50 50
χ2 =k∑
i=1
[(Oi − Ei )2
Ei]
=(102− 50)2
50+
(30− 50)2
50+
(60− 50)2
50+
(8− 50)2
50= 99.36
61 / 69
Example: Handgun survey III
Critical value: χ24−1,0.05 = χ2
3,0.05 = 7.81
Since 99.36 > 7.81, we conclude that our observation wasunlikely by chance alone
Based on these data, opinions do not appear to be equallydistributed among the four responses
62 / 69
χ2 Test of Independence I
Test the null hypothesis that two criteria of classification areindependent
r × c contingency table
Criterion 11 2 3 · · · c Total
Criterion 2
1 n11 n12 n13 · · · n1c n1·2 n21 n22 n23 · · · n2c n2·3 n31 n32 n33 · · · n3c n3·
......
......
......
r nr1 nr2 nr3 · · · nrc nr ·Total n·1 n·2 n·3 · · · n·c n
63 / 69
χ2 Test of Independence II
Test statistic:
χ2 =k∑
i=1
[(Oi − Ei )2
Ei]
Degrees of freedom = (r-1)(c-1)
Assume the marginal totals are fixed
64 / 69
χ2 Test of Homogeneity (No association)
Test the null hypothesis that the samples are drawn frompopulations that are homogenous with respect to some factor
i.e. no association between group and factor
Same test statistic as χ2 test of independence
65 / 69
Example: Treatment response I
Response to TreatmentTreatment Yes No Total
Observed NumbersA 37 13 50B 17 53 70
Total 54 66 120
Calculate what numbers of “Yes” and “No” would beexpected assuming the probability of “Yes” was the same inboth groups
Condition on total the number of “Yes” and “No” responses
66 / 69
Example: Treatment response II
Expected proportion with “Yes” response = 54120 = 0.45
Expected proportion with “No” response = 66120 = 0.55
Response to TreatmentTreatment Yes No Total
Observed A 37 (22.5) 13 (27.5) 50(Expected) B 17 (31.5) 53 (38.5) 70
Total 54 66 120
67 / 69
Example: Treatment response III
Test statistic:
χ2 =k∑
i=1
[(Oi − Ei )2
Ei]
=(37− 22.5)2
22.5+
(13− 27.5)2
27.5
+(17− 31.5)2
31.5+
(53− 38.5)2
38.5= 29.1
Degrees of freedom = (r-1)(c-1) = (2-1)(2-1) = 1where r= num. rows, and c= num. columns
68 / 69
Example: Treatment response IV
We see p<0.001
Reject the null hypothesis, and conclude that the treatmentgroups are not homogenous (similar) with respect to response
Response appears to be associated with treatment
69 / 69