Basic Statistics II. Significance/hypothesis tests.

Upload
jaedenwatchorn 
Category
Documents

view
227 
download
5
Transcript of Basic Statistics II. Significance/hypothesis tests.
Basic Statistics II
Significance/hypothesis tests
RCT comparing drug A and drug B for the treatment of hypertension
• 50 patients allocated to A
• 50 patients allocated to B
• Outcome = systolic BP at 3 months
Results
Group A
Mean = 145, sd = 9.9
Group B
Mean = 135, sd = 10.0
Null hypothesis : “μ (A) = μ (B)”
[ie. difference equals 0]
Alternative hypothesis : “μ (A) ≠ μ (B)”
[ie. difference doesn’t equal zero]
[where μ = population mean]
Statistical problem
When can we conclude that
the observed difference
mean(A)  mean(B)
is large enough to suspect that
μ (A)  μ (B) is not zero?
Pvalue :
“probability of obtaining observed data if the null hypothesis were true”
[eg. if no difference in systolic BP between two groups]
How do we evaluate the probability?
Test Statistic
• Numerical value which can be compared with a known statistical distribution
• Expressed in terms of the observed data and the data expected if the null hypothesis were true
Test statistic
[mean (A) – mean (B)] / sd [mean(A)mean(B)]
Under null hypothesis this ratio will follow a Normal distribution with mean = 0 and sd = 1
Hypertension example
Test statistic = [mean (A) – mean (B)] / sd [mean(A)mean(B)]
= [ 145 – 135 ] / 1.99 = 5
→ p <0.001
Interpretation
Drug B results in lower systolic blood pressure in patients with hypertension than does Drug A
Twosample ttest
Compares two independent groups of Normally distributed data
Significance test example I
Null hypothesis : “μ (A) = μ (B)”
[ie. difference equals 0]
Alternative hypothesis : “μ (A) ≠ μ (B)”
[ie. difference doesn’t equal zero]
Twosided test
Null hypothesis :
“μ (A) = μ (B) or μ (A) < μ (B) ”
Alternative hypothesis :
“μ (A) > μ (B)”
Onesided test
A onesided test is only appropriate if a difference in the opposite
direction would have the same meaning or
result in the same action as no difference
Pairedsample ttest
Compares two dependent groups of Normally distributed data
Pairedsample ttest
Mean daily dietary intake of 11 women measured over 10 premenstrual and 10 postmenstrual days
Dietary intake example
Premenstrual (n=11):
Mean=6753kJ, sd=1142
Postmenstrual (n=11):
Mean=5433kJ, sd=1217
Difference
Mean=1320, sd=367
Dietary intake example
Test statistic = 1320/[367/sqrt(11)]
= 11.9
p<0.001
Dietary intake example
Dietary intake during the premenstrual period was significantly greater than that during the postmenstrual period
The equivalent nonparametric tests
• MannWhitney Utest
•Wilcoxon matched pairs signed rank sum test
Nonparametric tests
• Based on the ranks of the data
• Use complicated formula
• Hence computer package is
recommended
Significance test example II
Type I error
Significant result when null hypothesis is true
(0.05)
Type II error
Nonsignificant result when null hypothesis is false
[Power = 1 – Type II]
The chisquare test
Used to investigate the relationship between two qualitative variables
The analysis of crosstabulations
The chisquare test
Compares proportions in two independent samples
Chisquare test example
In an RCT comparing infrared stimulation (IRS) with placebo on pain caused by osteoarthritis,
9/12 in IRS group ‘improved’ compared with 4/13 in placebo group
Chisquare test example
Improve?
Yes No
Placebo 4 9 13
IRS 9 3 12
13 12 25
Placebo : 4/13 = 31% improve
IRS: 9/12 = 75% improve
Crosstabulations
The chisquare test tests the null hypothesis of no relationship between ‘group’ and ‘improvement’ by comparing the observed frequencies with those expected if the null hypothesis were true
Crosstabulations
Expected frequency
= row total x col total
grand total
Chisquare test example Improve?
Yes No
Placebo 4 9 13
IRS 9 3 12
13 12 25
Expected value for ‘4’ = 13 x 13 / 25
= 6.8
Expected values
Improve?
Yes No
Placebo 6.8 6.2 13
IRS 6.2 5.8 12
13 12 25
Test Statistic
= (observed freq – expected freq)2
expected freq
Test Statistic
= (O – E)2
E
= (4  6.8)2/6.8 + (9 – 6.2)2/6.2
+ (4  6.8)2/6.8 + (9 – 6.2)2/6.2
= 4.9 → p=0.027
Chisquare test example
Statistically significant difference in improvement between the IRS and placebo groups
Small samples
The chisquare test is valid if:
at least 80% of the expected frequencies exceed 5 and all the expected frequencies exceed 1
Small samples
If criterion not satisfied then combine or delete rows and columns to give bigger expected values
Small samples
Alternatively:
Use Fisher’s Exact Test
[calculates probability of observed table of frequencies  or more extreme tablesunder null hypothesis]
Yates’ Correction
Improves the estimation of the discrete distribution of the test statistic by the continuous chisquare distribution
Chisquare test with Yates’ correction
Subtract ½ from the OE difference
(O – E½)2
E
Significance test example III
McNemar’s test
Compares proportions in two matched samples
McNemar’s test example
Severe cold age 14
Yes No
Severe Yes 212 144 356
cold No 256 707 963
age 468 851 1319
12
McNemar’s test example
Null hypothesis =
proportions saying ‘yes’ on the 1st and 2nd occasions are the same
the frequencies for ‘yes,no’ and
‘no,yes’ are equal
McNemar’s test
•Test statistic based on observed and expected ‘discordant’ frequencies
•Similar to that for simple chisquare test
McNemar’s test example
Test statistic = 31.4
=> p <0.001
Significant difference between the two ages
Significance test example IV
Comparison of means
2 groups 2sample ttest
3 or more groups ANOVA
Oneway analysis of variance
Example:
Assessing the effect of treatment on the stress levels of a cohort of 60 subjects.
3 agegroups: 1525, 2645, 4665
Stress measured on scale 0100
Stress levels
Group Mean (SD)
1525 (n=20) 52.8 (11.2)
2645 (n=20) 33.4 (15.0)
4665 (n=20) 35.6 (11.7)
Graph of stress levels
Age Group
43210
Str
ess
Le
vel
80
70
60
50
40
30
20
10
0
ANOVA
Sum of squares
Df Mean square
F Sig
Between groups
4513.6 2 2256.8 13.8 <0.001
Within groups
9294.8 57 163.1
Total 13808.4 59
Interpretation
Significant difference between the three agegroups with respect to stress levels
But what about the specific (pairwise) differences?
Stress levels
Group Mean (SD)
1525 (n=20) 52.8 (11.2)
2645 (n=20) 33.4 (15.0)
4665 (n=20) 35.6 (11.7)
Multiple comparisons
• Comparing each pair of means in turn gives a high probability of finding a significant result by chance
• A multiple comparison method (eg. Scheffé, Duncan, NewmanKeuls) makes appropriate adjustment
Scheffés test
Comparison
1525 vs. 2645 p<0.001
1525 vs. 4665 p<0.001
2645 vs. 4665 p=0.86
Stress levels
Group Mean (SD)
1525 (n=20) 52.8 (11.2)
2645 (n=20) 33.4 (15.0)
4665 (n=20) 35.6 (11.7)
Comparison of medians
2 groups MannWhitney
3 or more groups KruskalWallis
KruskalWallis
Example:
Stress levels
Overall comparison of 3 groups:
p<0.001
Multiple comparisons
• There are no nonparametric equivalents to the multiple comparison tests such as Scheffés
• Need to apply Bonferroni’s correction to multiple MannWhitney Utests
Bonferroni’s correction
For k comparisons between means:
multiply each p value by k
MannWhitney Utest
Comparison
1525 vs. 2645 p<0.001
1525 vs. 4665 p<0.001
2645 vs. 4665 p=0.68
Need to multiple each pvalue by 3
Significance test example V